MM_Claims Dataset

doi:doi:10.25835/2lg7peic

MM_Claims Dataset

This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media"

If you use this dataset in your work, please cite:

@inproceedings{cheema-etal-2022-mm, title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media", author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and M{\"u}ller-Budack, Eric and Otto, Christian and Ewerth, Ralph}, booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.72", pages = "962--979" }

Information about columns in the files:

claim_binary: {0: 'Not a claim', 1: 'claim'}
claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'}
claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'}

Official code repository: https://github.com/TIBHannover/MM_Claims

All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.

If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main

Data and Resources

train_with_resolved_conflicts.csvCSV
Training split with annotation conflicts resolved according to the strategy... File size: 79.1 KByte
Explore
- More information
- Download
train_without_conflicts.csvCSV
Training split with annotation conflict samples removed. File size: 71.8 KByte
Explore
- More information
- Download
val_with_resolved_conflicts.csvCSV
Validation split with annotation conflicts resolved according to the strategy... File size: 8.9 KByte
Explore
- More information
- Download
val_without_conflicts.csvCSV
Validation split with annotation conflict samples removed. File size: 8.0 KByte
Explore
- More information
- Download
test_with_resolved_conflicts.csvCSV
Testing split with annotation conflicts resolved according to the strategy... File size: 18.3 KByte
Explore
- More information
- Download
test_without_conflicts.csvCSV
Testing split with annotation conflict samples removed. File size: 16.4 KByte
Explore
- More information
- Download
unlabeled_tweet_ids.csvCSV
Unlabeled data tweet IDs File size: 1.9 MByte
Explore
- More information
- Download

Cite this as

Gullal S. Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, Ralph Ewerth (2022). MM_Claims Dataset [Data set]. LUIS. https://doi.org/10.25835/2lg7peic

Retrieved: 21:38 26 Jul 2026 (UTC)

BibTeX

Additional Info

Field	Value
Author	Gullal S. Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, Ralph Ewerth
Maintainer	Gullal S. Cheema
Version	1.1
Last Updated	July 10, 2023, 09:52 (UTC)
Created	May 2, 2022, 09:51 (UTC)
License	Creative Commons Attribution-NonCommercial 3.0
Dataset Size	2.1 MByte