MM_Claims Dataset

This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media"

If you use this dataset in your work, please cite:

@inproceedings{cheema-etal-2022-mm, title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media", author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and M{\"u}ller-Budack, Eric and Otto, Christian and Ewerth, Ralph}, booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.72", pages = "962--979" }

Information about columns in the files:

  1. claim_binary: {0: 'Not a claim', 1: 'claim'}

  2. claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'}

  3. claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'}

Official code repository: https://github.com/TIBHannover/MM_Claims

All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.

If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main

Data and Resources

Cite this as

Gullal S. Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, Ralph Ewerth (2022). MM_Claims Dataset [Data set]. LUIS. https://doi.org/10.25835/2lg7peic
Retrieved: 08:17 04 May 2026 (UTC)

Additional Info

Field Value
Author Gullal S. Cheema, Sherzod Hakimov, Abdul Sittar, Eric Müller-Budack, Christian Otto, Ralph Ewerth
Maintainer Gullal S. Cheema
Version 1.1
Last Updated July 10, 2023, 09:52 (UTC)
Created May 2, 2022, 09:51 (UTC)
License Creative Commons Attribution-NonCommercial 3.0
Dataset Size 2.1 MByte