Personal Photo Retrieval - A Pilot Task

Overview

The presented pilot task is aiming at providing a test bed for QBE-based retrieval scenarios in the scope of personal information retrieval. In contrast to other tasks relying on downloads from Flickr or the like, the underlying data set reflects an amalgamated personal image collection that has been taken by 19 photographers (see Fig. 4 for the contributions). Hence, it can be used best as a test set for layperson retrieval tasks carried out ad hoc on their own collections such as: “find all images with a street scene”, “find a beach similar to this”, or more event-based tasks like “show me more pictures from the last U2 concert”. The aim of this pilot task is to retrieve relevant images based on typical layperson usage scenarios in their own collections, i.e. the search for similar images or images depicting a similar event, e.g. a rock concert. The retrieval of events has been encouraged by the results of the enclosed user study.

Asian Temple Interior
Figure 1: Samples of the Visual Concept "Asian Temple Interior"

Rock Concerts
Figure 2: Samples of the Event Class "Rock Concert"

Rock Concerts
Figure 3: Samples of the Specific Event "Party/Love parade"

The topics available in the data set are widely spread. They range from general image qualities (e.g. blurred, backlit), more traditional topics (e.g. beaches, animals, clouds, cars), sceneries (e.g. street scene, city panorama), and events (e.g. concerts, parties, sports). Participants are free to solve the task by utilizing visual features, metadata, information about the photographer, or a combination.

Task 1: Retrieval of Visual Concepts

All images have been assessed with respect to each of the following topics by at least 2 assessors (i.e. 2.57 on average by 11th February 2012). The assessments are based on graded relevance.

1. Beach and Seaside 17. Still Life
2. Street Scene 18. Church (Christian)
3. Statue and Figurine 19. Art Object
4. Asian Temple & Palace 20. Cars
5. Landscape 21. Ship / Maritime Vessel
6. Hotel Room 22. Airplane
7. People* 23. Temple (Ancient)
8. Architecture (profane) 24. Squirrels
9. Animals 25. Sign
10. Asian Temple Interior 26. Mountains
11. Flower / Botanic Details 27. Monkeys
12. Market Scene 28. Birds
13. Submarine Scene 29. Trees
14. Ceremony and Party 30. Abstract Content
15. Theater / Performing Arts 31. City Panorama
16. Clouds 32. (Christ.) Church Interior

Table 1: Traditional Topics

*: Due to privacy concerns some images have been anonymized, so a use of typical face detection algorithms will probably not lead to the expected results. Instead, we provide a manually created list of images with people (see below).

Simulated Interaction (Experimental)

In order to compare different relevance feedback mechanisms and to assess how they adjust to different users, we provide data for user simulations, i.e. personas incl. their assessments. The personas can then be used to run automatic RF experiments without manual intervention. The actual personas (e.g. CBIR-savvy, layperson, or ones with specific demographics) will be released after the assessments have been analyzed completely - stay tuned.

More information about personas.

User-centered Initiative

To assess the usability of CBIR systems, we would like successful participants to filter/cluster their result sets in order to provide users visually different/non-duplicate images or images with a good motif quality (see below) at the first ranks (even if the retrieval engine would rank the documents almost equal). This subtask reflects the assumption that a user-centered system should offer users good and varying retrieval results. Varying results are likely to compensate for the vagueness inherent in both retrieval and query formulation. Hence, an additional filtering or clustering of the result list can improve the effectiveness and efficiency (in terms of usability) of the retrieval process.

Participants are free to choose their method of filtering or clustering but should describe it afterwards. The accompanying IPTC data must not be used. Training data will not be provided - the task has to be solved ad hoc.

Query Data and Submission

The queries are available as a ZIP archive containing XML files in the following format.

<query id='Q6001' title='Beach and Seaside'>
<qbe docpath='090822_Ostsee_034_Heiligendamm.jpg'>270</qbe>
<qbe docpath='100_0455.jpg'>455</qbe>
<qbe docpath='100_1487.jpg'>746</qbe>
<qbe docpath='DSC00526.jpg'>2254</qbe>
<qbe docpath='img_95.jpg'>5167</qbe>
<browsing docpath='090822_Ostsee_087_Warnemu_nde.jpg'>323</browsing>
<browsing docpath='100_0984.jpg'>549</browsing>
<browsing docpath='100_1116.jpg'>604</browsing>
<browsing docpath='100_1418.jpg'>721</browsing>
<browsing docpath='100_1646.jpg'>805</browsing>
<browsing docpath='100_1757.jpg'>867</browsing>
<browsing docpath='100_1980.jpg'>883</browsing>
<browsing docpath='CIMG0080.jpg'>1174</browsing>
<browsing docpath='CIMG0178.jpg'>1245</browsing>
<browsing docpath='CIMG0209.jpg'>1266</browsing>
<browsing docpath='CIMG0345.jpg'>1338</browsing>
<browsing docpath='CIMG0511.jpg'>1419</browsing>
<browsing docpath='CIMG0893.jpg'>1633</browsing>
<browsing docpath='img_2_406.jpg'>5467</browsing>
</query>

Each of the 24 query files consists of two parts. First, there are 5 QBE documents for each query. The QBE documents are fully relevant according to our assessors. Second, the data is enriched with documents that have been inspected during browsing. These documents range from slightly to fully relevant but do not contain irrelevant documents. We offer information about browsed documents to model a user that first browses a personal photo collection (e.g. by clicking documents in a thumbnail view) and is then submitting a directed search based on the provided QBE documents. The attribute 'docpath' contains the path to a document in the collection and is only provided for convenience. For submission, please use only the file IDs that range from 1 to 5,555.

In consequence, participants are free to rely only on the provided QBE documents or to use data derived from the browsed documents as well. Please state on submission if you have used the browsing data. Participants are free to experiment with whatever methods they wish for image retrieval, e.g., relevance feedback or the integration of other modalities such as spatial or temporal information. We ask participants to indicate which of the following applies to each of their runs following the notation of last years Wikipedia run.

Run type (MAN/AUTO):
We distinguish between manual (MAN) and automatic (AUTO) submissions. Automatic runs will involve no user interaction; whereby manual runs are those in which a human has been involved in query construction and the iterative retrieval process, e.g. manual relevance feedback is performed. A nice description on the differences between these types of runs is provided by TRECVID at here.

Relevance Feedback (BINARY/GRADED,NOFB)
When relevance feedback is used, please state if your relevance feedback mechanism is using positive or negative examples only (BINARY) or if it allows grades of relevance for input (GRADED). If no feedback was used indicate these runs by using NOFB.

Retrieval type (Modality):
This describes the use of visual (image) or other features in your submission. A purely visual run will have modality visual ("IMG"). Runs using metadata will have the modality "MET". If you use browsing data add "BRO". Combined submissions (e.g., a purely visual search using the browsing data) will have as modality:visual+browsing (IMGBRO), also referred to as "mixed".  Please not that you must not use the IPTC metadata fields (see below). Thus, the following combinations are possible:

A list that maps document paths to file IDs used for submission can be found here.

For the submission of your results, please use the official ImageCLEF submission system. To login you will have to use the username and password that you have received with your registration. When submitting a run (in total you can submit 5 runs), you will need to fill in the following form:

  1. Select the personal photo retrieval track.
  2. In the "method description" field, you can outline the method you have used for the current run.
  3. Please choose "not applicable" for "retrieval type" because you will have to use the "other information" field for providing this information.
  4. "Language" is also "not applicable".
  5. Choose the "run type" as described above.
  6. If this is your best run, indicate this by clicking the "primary run" checkbox.
  7. In the "other information" section, you must indicate how the run has been created. Please follow the following scheme.
    1. The first line has to be BINARY,GRADED, or NOFB depending on the relevance feedback approach you have used.
    2. The second line should indicate the modalities you have use (see above), i.e. it should contain one of the following keywords: IMG, IMGMET, IMGMETBRO, IMGBRO, MET, METBRO, or BRO.
    3. If your run features a filtering or clustering approach, indicate this with FILTERING or CLUSTERING at line 3.
  8. If you have used additional resources (a web service from Bing or Google etc.), please state this.
  9. To finish your submission upload a runfile with the following format.

Runfile format:

The runfiles have to be in trec_eval format. Please use ".treceval" as its file ending. A line in trec_eval format looks like this:

Q6001    Q0    12345    1    0.996147    RunID

That is (from left to right): Topic,(ignored), file ID (1-5555), rank, similarity, run ID (arbitrary string). Please use tabulators ("\t") as field delimiter. Please submit no more than 101 documents for each run as we will only calculate the MAP at cut-off 100 in addition to other metrics.

If your run features a filtering/clustering method to provide varying results, add three more fields to each line.

Q6001    Q0    12345    1    0.996147    RunID	ClusterID	ClusterGradedRelevance ClusterSimilarity

That is (from left to right): Topic,(ignored), file ID (1-5555), rank, similarity, run ID (arbitrary string), cluster ID (a unique ID for the cluster the document is associated with), the cluster center's graded relevance (0-3, whereas 0=irrelevant and 3= highly relevant), the similarity to the cluster center (1.0 indicates the center).

Please do not provide more than 30 unique cluster centers. The clusters will be sorted according to their cluster center's graded relevance, i.e. clusters with 3 will be ranked better than ones with 2 and so on. Irrelevant clusters (=0) will be ignored. We will assume that only the cluster center documents will be displayed to the user, these cluster centers will then be used to calculate additional performance metrics in comparison to the unmodified top-k metrics. In other words, these runfiles will first be evaluated traditionally as above (precision @ k, MAP, etc.) and then additionally based on their cluster centers or filtered documents. If you are only filtering to offer an alternative top-30 ranking, simply take 30 file IDs associate them with a unique cluster ID and give them a ClusterGradedRelevance of >0 and aClusterSimilarity of 1.

Important notes

Task 1: (Preliminary) Results

 

Group Run ID Run Type Relevance Feedback Retrieval Type P_5 P_10 P_15 P_20 P_30 P_100 ndcg_cut_5 ndcg_cut_10 ndcg_cut_15 ndcg_cut_20 ndcg_cut_30 ndcg_cut_100 map_cut_30 map_cut_100
KIDS IBMA0 Automatic NOFB IMGMET 0,8333 0,7833 0,7222 0,6896 0,6347 0,4379 0,6405 0,6017 0,5658 0,5459 0,5213 0,4436 0,1026 0,1777
KIDS OBOA0 Automatic NOFB MET 0,8000 0,7292 0,6667 0,6354 0,6083 0,4117 0,5858 0,5348 0,5028 0,4836 0,4728 0,4144 0,0952 0,1589
KIDS IOMA0 Automatic NOFB IMGMET 0,7667 0,6583 0,6222 0,6104 0,5639 0,3925 0,5800 0,5184 0,4951 0,4872 0,4615 0,3979 0,0906 0,1558
KIDS OBMA0 Automatic NOFB MET 0,6500 0,6500 0,6083 0,5771 0,5611 0,3925 0,4073 0,4268 0,4123 0,4066 0,4046 0,3717 0,0854 0,1518
REGIM run4 Automatic NOFB IMGBRO 0,9000 0,8375 0,7917 0,7333 0,6292 0,3992 0,4896 0,4703 0,4667 0,4563 0,4274 0,3572 0,0810 0,1224
REGIM run2 Automatic NOFB IMGBRO 0,9000 0,8417 0,7917 0,7292 0,6278 0,3975 0,4926 0,4722 0,4687 0,4561 0,4271 0,3566 0,0811 0,1224
REGIM run1 Automatic NOFB IMGBRO 0,9000 0,8417 0,7889 0,7292 0,6278 0,3967 0,4908 0,4730 0,4680 0,4560 0,4271 0,3561 0,0811 0,1222
REGIM run5 Automatic NOFB IMGBRO 0,9000 0,8458 0,7889 0,7292 0,6278 0,3971 0,4899 0,4742 0,4681 0,4551 0,4264 0,3563 0,0811 0,1222
REGIM run3 Automatic NOFB IMGBRO 0,9000 0,8458 0,7889 0,7292 0,6278 0,3975 0,4899 0,4742 0,4681 0,4551 0,4264 0,3564 0,0811 0,1222
Lpiras Run_1_2 Feedback and/or human assistance BINARY IMG 0,7917 0,7667 0,7361 0,6938 0,6083 0,3417 0,6122 0,5777 0,5656 0,5457 0,5017 0,3700 0,0991 0,1216
KIDS IOOA4 Automatic NOFB IMG 0,6750 0,6125 0,5778 0,5354 0,4486 0,3054 0,5701 0,5062 0,4798 0,4545 0,4016 0,3303 0,0632 0,0930
Lpiras Run_1_1 Feedback and/or human assistance BINARY IMG 0,8000 0,7083 0,6222 0,5646 0,4903 0,2825 0,6265 0,5642 0,5165 0,4835 0,4342 0,3087 0,0692 0,0852
Lpiras Run_3_2 Feedback and/or human assistance BINARY IMG 0,6583 0,5667 0,4667 0,3958 0,2972 0,1425 0,4857 0,4404 0,3917 0,3466 0,2853 0,1795 0,0319 0,0363

Bold entries denote the best values for the given metric. The evaluation has been carried out with trec_eval version 9.0.

Task 2: Retrieval of Events

The ground-truth has been obtained by the original photographers because they are likely to know what their pictures depict. Event information is saved as an IPTC keyword as follows: wnet:"holiday"<London>. For semantic clarification, the term following wnet: equals a general term that can be found in WordNet , whereas the term within brackets contains optional information to specify a unique event. One might notice that holiday-related events are dominating all others. This is reflecting the findings on the studied real-world personal photo collections and not a freely chosen bias. In total, there are 61 unique events available.

WordNet Event Type %
A. Conference 0.65 %
B. Event 0.36 %
C. Excursion 7.23 %
D. Flight 1.78 %
E. Holiday 77.86 %
F. Jubilation 0.49 %
G. Party 1.33 %
H. Rock Concert 8.70 %
I. Scuba Diving 1.03 %
J. Soccer 0.04 %
K. Visit 0.54 %

Table 2: Event Class Frequency

Simulated Interaction (Experimental)

In order to compare different relevance feedback mechanisms and to assess how they adjust to different users, we provide data for user simulations, i.e. personas incl. their assessments. The personas can then be used to run automatic RF experiments without manual intervention. The actual personas (e.g. CBIR-savvy, layperson, or ones with specific demographics) will be released after the assessments have been analyzed completely - stay tuned.

More information about personas.

User-centered Initiative

To assess the usability of CBIR systems, we would like successful participants to filter/cluster their result sets in order to provide users visually different/non-duplicate images or images with a good motif quality (see below) at the first ranks (even if the retrieval engine would rank the documents almost equal). This subtask reflects the assumption that a user-centered system should offer users good and varying retrieval results. Varying results are likely to compensate for the vagueness inherent in both retrieval and query formulation. Hence, an additional filtering or clustering of the result list can improve the effectiveness and efficiency (in terms of usability) of the retrieval process.

Participants are free to choose their method of filtering or clustering but should describe it afterwards. The accompanying IPTC data must not be used. Training data will not be provided - the task has to be solved ad hoc.

Query Data and Submission

The queries are available as a ZIP archive containing a XML file in the following format.

<eventqueries>
	<query id='Q6501' title='conference'>
		<qbe docpath='DSC00414.jpg'>2191</qbe>
		<qbe docpath='P1010871.jpg'>3613</qbe>
		<qbe docpath='DSC00418.jpg'>2195</qbe>
	</query>
	<query id='Q6502' title='fire'>
		<qbe docpath='PICT5067.jpg'>3948</qbe>
		<qbe docpath='PICT5103.jpg'>3962</qbe>
	</query>
...
</eventqueries>

There are at most 3 QBE documents for each query. The QBE documents should be used to retrieve all other documents from the same event. The QBE documents are fully relevant according to our assessors. The attribute 'docpath' contains the path to a document in the collection and is only provided for convenience. For submission, please use only the file IDs that range from 1 to 5,555.

Participants are free to experiment with whatever methods they wish for image retrieval, e.g., relevance feedback or the integration of other modalities such as spatial or temporal information. We ask participants to indicate which of the following applies to each of their runs the notation described above.

Task 2: (Preliminary) Results

 

Group Run ID Run Type Relevance Feedback Retrieval Type P_5 P_10 P_15 P_20 P_30 P_100 ndcg_cut_5 ndcg_cut_10 ndcg_cut_15 ndcg_cut_20 ndcg_cut_30 ndcg_cut_100 map_cut_30 map_cut_100
KIDS OOMA0 Automatic NOFB MET 1,0000 1,0000 0,9644 0,9333 0,8889 0,6787 1,0000 1,0000 0,9837 0,9697 0,9586 0,9126 0,3305 0,5533
KIDS IOMA0 Automatic NOFB IMGMET 1,0000 1,0000 0,9644 0,9267 0,8756 0,6307 1,0000 1,0000 0,9841 0,9655 0,9489 0,8601 0,3225 0,4947
KIDS IOMA0-2 Automatic NOFB IMGMET 0,9333 0,9000 0,8533 0,8100 0,7622 0,5740 0,9417 0,9153 0,8884 0,8636 0,8458 0,8042 0,2800 0,4282
KIDS IOMA0-3 Automatic NOFB IMGMET 0,9200 0,8733 0,8400 0,7867 0,6956 0,4613 0,9201 0,8877 0,8681 0,8357 0,7854 0,6638 0,2287 0,3179
KIDS IOOA0 Automatic NOFB IMG 0,6533 0,5800 0,5156 0,4833 0,4467 0,2693 0,6904 0,6247 0,5727 0,5446 0,5186 0,4101 0,1100 0,1484
REGIM run8 Automatic NOFB IMGBRO 0,2400 0,2067 0,1911 0,1767 0,1844 0,1447 0,2408 0,2185 0,2058 0,1936 0,1953 0,1687 0,0153 0,0264
REGIM run7 Automatic NOFB IMGBRO 0,2400 0,2067 0,1867 0,1733 0,1844 0,1447 0,2424 0,2188 0,2030 0,1915 0,1953 0,1687 0,0152 0,0263
REGIM run9 Automatic NOFB IMGBRO 0,2400 0,2067 0,1867 0,1733 0,1800 0,1440 0,2424 0,2186 0,2028 0,1913 0,1922 0,1681 0,0151 0,0261

Bold entries denote the best values for the given metric. The evaluation has been carried out with trec_eval version 9.0.

Data set

The data set is described in detail in a technical report which is available here. It consists of 5,555 images plus rich metadata as they have found on the hard disks of the 19 contributors ranging from year of birth 1944 to 1985. Thus, one can interpret the content of the collection as a mirror of a photographer’s life span with typical changing usage behaviors, cameras, topics, and places.

Contribution per Photographer   Global Distribution of Images
Figure 4: Contribution by each Photographer (left); Global Distribution of Photographs (right)

 

Motif Quality and Duplicates

During our study of personal photo collections, it became obvious that all of them contained a particular amount of duplicate images. A motif duplicate (MD) as defined for the scope of this task is a photograph that has been taken twice or more subsequently with the photographer's intention to depict the same motif. These MD are characterized by the fact that the photographer mostly did not move but shot the same motif again by correcting the rotation, translation, shutter speed, or the like of the camera because the composition did not look well. Hence, such MD have a low visual variance. Fig. 5 illustrates a motif duplicate using the camera's zoom, whereas Fig. 6 shows a variance in rotation and zooming. Other reasons are unsharp images or the choice of a wrong picture format (format switch). In other words, MD vary more in image quality than in motif.

Motif Duplicate with Zooming
Figure 5: Motif Duplicate with Zooming

Motif Duplicate with Zooming and Rotation
Figure 6: Complex Motif Duplicate with Zooming and Rotation

Table 3 (see below) provides a full list of possible categories. Note that the category association is not exclusive, i.e., a motif duplicate can be both rotated and zoomed or the like.

Type No. of Documents %

1. Motif Duplicates

    1. Unmodified
    2. Translated
    3. Format Switch
    4. Zoomed
    5. Rotated
    6. Sharpened
    7. Altered (Lighting or Effect)

379

  • 15
  • 71
  • 21
  • 75
  • 26
  • 18
  • 12

6.82 %

  • 8.38 %
  • 39.66 %
  • 11.73 %
  • 41.90 %
  • 14.53 %
  • 10.06 %
  • 6.70 %
2. Blurred Images 231 4.16 %
3. Backlit Images 204 3.67 %
4. Silhouettes (Shadows etc.) 119 2.14 %
4. Altered Images (e.g. sepia) 54 0.97 %
5. Rendered Date 1106 19.91 %
6. Panorama Images 14 0.25 %

Table 3: Motif Duplicates and Motif Qualities

The full list of of images and their categorization is available here.

Visual features

We provide the following low-level visual features for the data set in order to improve the comparability of each submitted run. Participants are free to combine them freely or to use their own implementations.

  1. Auto Color Correlogram
  2. BIC
  3. BRIEF (5 variants)
  4. CEDD
  5. Color Histogram
  6. Color Layout (MPEG-7)
  7. Color Structure (MPEG-7)
  8. Contour Shape (MPEG-7)
  9. Dominant Color (MPEG-7)
  10. Edge Histogram (MPEG-7)
  11. FCTH
  12. Gabor
  13. Region Shape (MPEG-7)
  14. Scalable Color (MPEG-7)
  15. SURF (5 variants)
  16. SURF Hash (5 variants
  17. Tamura

Using face detection features is not encouraged in this task because some faces had to be anonymized to preserve individual privacy rights. To compensate, a manually created Excel sheet with the number of depicted persons for each image is provided and can be used as face detection input. The intervals of depicted people and their frequency in the data set can be found in Fig. 7.

Number of People Depicted on an Image
Figure 7: Number of People Depicted on Images

More information will be released soon.

Metadata

All images come with full EXIF data as written by the used camera (i.e. raw data), 81.85% of the images contain GPS data that has been manually added or automatically set by GPS-supporting camera models. The file names have not been changed (besides the removal of special characters) and are still reflecting the names given by the original photographers.

Additionally, IPTC data -partially providing the ground truth - is available. This data contains also the photographer and must not be used to solve the tasks.

One of our cooperation partners from the field of marketing research is currently collecting sentiment information for the data set. If you are interested in obtaining the results contact David (see below), as we cannot guarantee a release within the ImageCLEF deadlines.

Additional Information about Photographers and Assessors

Demographic information and survey results of both contributing photographers and assessors will be released soon.

The survey incl. the encryption keys can be found here.

Download

Download information will be provided on time in the registration system. Alternatively, it can be obtained from the organizers (see below).

Schedule

This schedule is preliminary and might be adjusted slightly.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Germany License.

Please cite this report for proper attribution:

@inproceedings{Zel12,
author = {Zellh\"ofer, David},
title = {An Extensible Personal Photograph Collection for Graded Relevance Assessments and User Simulation},
pages = {to appear},
publisher = {ACM},
isbn = {978-1-4503-1329-2},
series = {ICMR '12},
booktitle = {{P}roceedings of the 2nd {A}{C}{M} {I}nternational {C}onference on {M}ultimedia {R}etrieval},
year = {2012}
}

A technical report with similar content is available.

"© ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
The definitive version was published in Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, {VOL#, ISS#, (DATE)} http://doi.acm.org/10.1145/nnnnnn.nnnnnn"


Organizers

This research was supported by a grant of the Federal Ministry of Education and Research (Grant Number 03FO3072).

BTU Logo