The widespread use of digital cameras, as well as the increasing popularity of online photo sharing has led to the proliferation of networked photo collections. Handling such a huge amount of media, without imposing complex and time consuming archiving procedures, is highly desirable and poses a number of interesting research challenges to the media community. In particular, the definition of suitable content based indexing and retrieval methodologies is attracting the effort of a large number of researchers worldwide, who proposed various tools for automatic content organization, retrieval, search, annotation and summarization. In this thesis, we will present and discuss three different approaches for content-and-context based retrieval. The main focus will be put on personal photo albums, which can be considered one of the most challenging application domains in this field, due to the largely unstructured and variable nature of the datasets. The methodologies that we will describe can be summarized into the following three points: i. Stochastic approaches to exploit the user interaction in query-by-example photos retrieval. Understanding the subjective meaning of a visual query, by converting it into numerical parameters that can be extracted and compared by a computer, is the paramount challenge in the field of intelligent image retrieval, also referred to as the “semantic gap” problem. An innovative approach is proposed that combines a relevance feedback process with a stochastic optimization engine, as a way to grasp user's semantics through optimized iterative learning providing on one side a better exploration of the search space, and on the other side avoiding stagnation in local minima during the retrieval. ii. Unsupervised event collection, segmentation and summarization. The need for automatic tools able to extract salient moments and provide automatic summary of large photo galleries is becoming more and more important due to the exponential growth in the use of digital media for recording personal, familiar or social life events. The multi-modal event segmentation algorithm faces the summarization problem in an holistic way, making it possible to exploit the whole available information in a fully unsupervised way. The proposed technique aims at providing such a tool, with the specific goal of reducing the need of complex parameter settings and letting the system be widely useful for as many situations as possible. iii. Content-based synchronization of multiple galleries related to the same event. The large spread of photo cameras makes it quite common that an event is acquired through different devices, conveying different subjects and perspectives of the same happening. Automatic tools are more and more used to support the users in organizing such archives, and it is largely accepted that time information is crucial to this purpose. Unfortunately time-stamps may be affected by erroneous or imprecise setting of the camera clock. The synchronization algorithm presented is the first that uses the content of pictures to estimate the mutual delays among different cameras, thus achieving an a-posteriori synchronization of various photo collections referring to the same event.