STEM-NER-60k

A Large-scale Dataset of STEM Science as PROCESS, METHOD, MATERIAL, and DATA Named Entities

This repository hosts data as a follow-up study to the following publications

D'Souza, J., Hoppe, A., Brack, A., Jaradeh, M., Auer, S., & Ewerth, R. (2020). The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 2192–2203). European Language Resources Association.

Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R. (2020). Domain-Independent Extraction of Scientific Concepts from Research Articles. In: , et al. Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12035. Springer, Cham. https://doi.org/10.1007/978-3-030-45439-5_17

Supporting dataset link https://data.uni-hannover.de/dataset/stem-ecr-v1-0

Description

Roughly 60,000 titles and abstracts of scholarly articles with the CC-BY redistributable license were downloaded from Elsevier. The articles spanned 10 STEM domains which were the most prolific on Elsevier viz., Agriculture, Astronomy, Biology, Chemistry, Computer Science, Earth Science, Engineering, Material Science, and Mathematics. The STEM NER system reported in the publication above was applied on these articles. An automatically extracted dataset of 4 typed entities, viz., Process, Method, Material, and Data was created.

What this repository contains?

Aggregated lists of Process, Method, Material, and Data entities with respective occurrence counts extracted from 59,984 scholarly publications organized per the 10 STEM domains considered.

Additionally, the list of Elsevier CC-BY articles used in this study are provided in the raw-data directory of the repository.

Useful Links

Daten und Ressourcen

Cite this as

Jennifer D'Souza (2022). STEM-NER-60k [Data set]. LUIS. https://doi.org/10.25835/heyid7l7
Retrieved: 04:33 16 May 2026 (UTC)

Zusätzliche Informationen

Feld Wert
Quelle https://github.com/jd-coderepos/stem-ner-60k
Autor Jennifer D'Souza
Verantwortlicher Jennifer D'Souza
Zuletzt aktualisiert Mai 24, 2022, 13:26 (UTC)
Erstellt Mai 24, 2022, 07:41 (UTC)
Lizenz Creative Commons Attribution Share-Alike 3.0
Dataset Size 35.0 MByte