NLPContributions Pilot Dataset

An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature

This dataset is the result of a pilot annotation exercise to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. The pilot annotation exercise was performed on 50 NLP-ML scholarly articles presenting contributions to the five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification.

The outcome of this pilot annotation exercise was two-fold: 1) a preliminary annotation methodology, and 2) the dataset released in this repository.

The resulting annotation scheme is called NLPContributions.

Supporting Publication

D'Souza, Jennifer, and Sören Auer. "NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature." arXiv preprint arXiv:2006.12870 (2020).

Data and Resources

Cite this as

Jennifer D’Souza; Soeren Auer (2020). Dataset: NLPContributions Pilot Dataset. https://doi.org/10.25835/0019761

DOI retrieved: 17:03 01 Oct 2020 (GMT)

Additional Info

Field Value
Source https://github.com/jenlindadsouza/NLPContributions
Author Jennifer D’Souza; Soeren Auer
Maintainer Jennifer D'Souza
Last Updated July 24, 2020, 14:08 (CEST)
Created July 3, 2020, 14:08 (CEST)
License Creative Commons Attribution Share-Alike 3.0