ReadMe2KG: Github ReadMe to Knowledge Graph
The vision of NFDI4DataScience (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence. GitHub is a popular platform for hosting and collaborating on software projects. In the context of research, authors can use GitHub repositories to share the datasets, models, and source code of experiments in the paper. These repositories can provide implementation details and facilitate the exploration and reproduction of research results. Each GitHub repository typically includes a README.md file, which serves as an introductory document for the project. READMEs are usually written in Markdown format and provide key information such as the project’s purpose, setup instructions, usage examples, and often links to the original research paper. Aiming to enhance the NDFI4DS-KG[1] with information from GitHub README files, a fine-grained Named Entity Recognition task is proposed.
Participants will develop classifiers that take README files as input and output the mentions of the 10 entity types in the NFDI4DS Ontology (NFDI4DSO [2]): “Conference”, “Dataset”, “Evaluation Metric”, “License”, “Ontology”, “Programming Language”, “Project”, “Publication”, “Software” and ”Workshop”. A dataset with approximately 160 README.md files will be made available to train the classifiers.
More details are available at the official submission website: https://www.codabench.org/competitions/4925/.
Organisers
- Genet Asefa Gesese (FIZ Karlsruhe, Germany)
- Zongxiong Chen (Fraunhofer FOKUS, Germany)
- Shufan Jiang (FIZ Karlsruhe, Germany)
- Mary Ann Tan (FIZ Karlsruhe, Germany)
- Sonja Schimmler (Fraunhofer FOKUS, Germany)
Contact
- Genet Asefa Gesese (FIZ Karlsruhe, Germany)
- Zongxiong Chen (Fraunhofer FOKUS, Germany)
- Shufan Jiang (FIZ Karlsruhe, Germany)
Important dates
- Release of training datasets: January 22, 2025
- Release of testing datasets: February 15, 2025
- Deadline for system submissions: February 22, 2025
- Announcement of results: February 27, 2025
- Paper submission deadline: March 6, 2025
- Notification of acceptance: April 3, 2025
- Camera-ready submission: April 17, 2025
- Workshop: June 1 or 2 2025
All deadlines are 23:59 UTC-12:00 (“anywhere on Earth”).
References:
[1] NFDI4DS-KG https://nfdi.fiz-karlsruhe.de/4ds/sparql, https://nfdi.fiz-karlsruhe.de/4ds/shmarql
[2] Genet Asefa Gesese et al. “NFDI4DSO: Towards a BFO Compliant Ontology for Data Science”. In: arXiv preprint arXiv:2408.08698 (2024).