LREC 2016 - REAL Corpus


Our interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes.
The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions.
To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex
urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection
of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data
on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains
32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated
with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with
an unrestricted input format, as well as using real-world urban scenes.


LREC2016 conference paper - REAL Corpus

REAL Corpus Dataset(60MB zip file)

Please cite if used:

Paper title: The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
Data title: The REAL Corpus
Authors: Phil Bartie, William Mackaness, Dimitra Gkatzia and Verena Rieser
Conference Name: LREC2016
Conference Date: 23-28 May 2016
Conference Location: Portorož (Slovenia)

Details of what is in the ZIP file:
The dataset includes a set of SOURCE images of features in typical urban scenes. A target was indicated in each image and participants were asked to describe that target (these words/phrases were typed by the participant). A validation process then asked other participants to read the description and tag (click) the object on the corresponding image. A set of validation images were generated to show if the tagged location was correct.

+ Source Images – these are presented at two resolutions – the high quality 3000by2000 pixel version and a lower 825by550 pixel version of the same image. - source images are given a filename imgN.jpg and a corresponding version of the image with the designated target indicated is saved as imgNt.jpg - the participant saw the source version of the image but could toggle to see the target version briefly to know which object to describe in the scene

+ Validation Images – these images are 825 x 550 pixels and have superimposed GREEN (correct target) and RED (incorrect) dots for where the validators have clicked. This gives an indication of how well the description worked and other features that were confused with the intended

The data collected from the web based experiments are available in 2 formats (XL and TXT).

+ ReferringExpressionsData_withValidationDetails.xlsx – userid (an integer number), age (range value – check look up table supplied for details), gender (male,female), photoid (links to the source images),x(coordinate x value where clicked),y(coordinate y value where clicked), annotation shown, status of validator (correct, incorrect, cantfind, ambiguous), validator_userid,validator_age(see lookup table), validator_gender(male,female)

+ ReferringExpressionsData_withValidationDetails-TAB_delimited.txt

+ Lookup table – age - this indicated the age ranges recorded in the results table. (e.g. class 4 = 41yr-50yr)