ReadMe2KG: Github ReadMe to Knowledge Graph

The vision of NFDI4DataScience (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence. GitHub is a popular platform for hosting and collaborating on software projects. In the context of research, authors can use GitHub repositories to share the datasets, models, and source code of experiments in the paper. These repositories can provide implementation details and facilitate the exploration and reproduction of research results. Each GitHub repository typically includes a README.md file, which serves as an introductory document for the project. READMEs are usually written in Markdown format and provide key information such as the project’s purpose, setup instructions, usage examples, and often links to the original research paper. Aiming to enhance the NDFI4DS-KG[1] with information from GitHub README files, a fine-grained Named Entity Recognition task is proposed.

Participants will develop classifiers that take README files as input and output the mentions of the 10 entity types in the NFDI4DS Ontology (NFDI4DSO [2]): “Conference”, “Dataset”, “Evaluation Metric”, “License”, “Ontology”, “Programming Language”, “Project”, “Publication”, “Software” and ”Workshop”. A dataset with approximately 160 README.md files will be made available to train the classifiers.

More details are available at the ReadMe2KG competition website.

Organisers

  • Genet Asefa Gesese (FIZ Karlsruhe, Germany)
  • Zongxiong Chen (Fraunhofer FOKUS, Germany)
  • Shufan Jiang (FIZ Karlsruhe, Germany)
  • Mary Ann Tan (FIZ Karlsruhe, Germany)
  • Sonja Schimmler (Fraunhofer FOKUS, Germany)

Contact

  • Genet Asefa Gesese (genet-asefa.gesese@fiz-karlsruhe.de)
  • Zongxiong Chen (zongxiong.chen@fokus.fraunhofer.de)
  • Shufan Jiang (shfuan.jiang@fiz-karlsruhe.de)

Important dates

  • Release of training datasets: January 25, 2025
  • Release of testing datasets: February 15, 2025 March 18, 2025
  • Deadline for system submissions: February 22, 2025 March 25, 2025
  • Paper submission deadline: March 6, 2025 March 27, 2025
  • Notification of acceptance: April 3, 2025 April 10, 2025
  • Camera-ready submission: April 17, 2025
  • Workshop: June 1 or 2 2025

All deadlines are 23:59 UTC-12:00 (“anywhere on Earth”).

References:

[1] NFDI4DS-KG https://nfdi.fiz-karlsruhe.de/4ds/sparql, https://nfdi.fiz-karlsruhe.de/4ds/shmarql

[2] Genet Asefa Gesese et al. “NFDI4DSO: Towards a BFO Compliant Ontology for Data Science”. In: arXiv preprint arXiv:2408.08698 (2024).