SlideImages

Please note: this archive requires support for dangling symlinks, which excludes the Windows operating system.

To use this dataset, you will need to download the MS COCO 2017 detection images and expand them to a folder called coco17 in the train_val_combined directory. The download can be found here: https://cocodataset.org/#download You will also need to download the AI2D image description dataset and expand them to a folder called ai2d in the train_val_combined directory. The download can be found here: https://prior.allenai.org/projects/diagram-understanding

License Notes for Train and Val: Since the images in this dataset come from different sources, they are bound by different licenses.

Images for bar charts, x-y plots, maps, pie charts, tables, and technical drawings were downloaded directly from wikimedia commons. License and authorship information is stored independently for each image in these categories in the wikimedia_commons_licenses.csv file. Each row (note: some rows are multi-line) is formatted so: ,,,;

Images in the slides category were taken from presentations which were downloaded from Wikimedia Commons. The names of the presentations on Wikimedia Commons omits the trailing underscore, number, and file extension, and ends with .pdf instead. The source materials' licenses are shown in source_slices_licenses.csv.

Wikimedia commons photos' information page can be found at "https://commons.wikimedia.org/wiki/File:".

License Notes for Testing: The testing images have been uploaded to SlideWiki by SlideWiki users. The image authorship and copyright information is available in authors.csv.

Further information can be found for each image using the SlideWiki file service. Documentation is available at https://fileservice.slidewiki.org/documentation#/ and in particular: metadata is available at "https://fileservice.slidewiki.org/metadata/", and the image can be accessed at "https://fileservice.slidewiki.org/picture/".

This is the SlideImages dataset, which has been assembled for the SlideImages paper. If you find the dataset useful, please cite our paper: https://doi.org/10.1007/978-3-030-45442-5_36

Data and Resources

Cite this as

David Morris, Eric Müller-Budack, Ralph Ewerth (2020). Dataset: SlideImages. https://doi.org/10.25835/0037153

DOI retrieved: 23:15 19 Jan 2021 (GMT)

Additional Info

Field Value
Author David Morris, Eric Müller-Budack, Ralph Ewerth
Last Updated January 6, 2021, 17:05 (CET)
Created December 17, 2020, 16:07 (CET)
License Creative Commons Attribution Share-Alike 3.0