ReadMe2KG: Github ReadMe to Knowledge

The vision of NFDI4DataScience9 (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence. GitHub is a popular platform for hosting and collaborating on software projects. In the context of research, authors can use GitHub repositories to share the datasets, models, and source code of experiments in the paper. These repositories can provide implementation details and facilitate the exploration and reproduction of research results. Each GitHub repository typically includes a README.md file, which serves as an introductory document for the project. READMEs are usually written in Markdown format and provide key information such as the project’s purpose, setup instructions, usage examples, and often links to the original research paper. Aiming to enhance the NDFI4DS-KG with information from GitHub README files, we propose the following two subtasks.

Subtask I: Fine-grained Named Entity Recognition. Participants will develop classifiers that take README files as input and output the mentions of the approx. 50 classes in the NFDI4DS Ontology (NFDI4DSO), e.g., “person”, “project”, “website”, “dataset”, “creative work”. A dataset with approximately 2000 README.md files will be made available in IOB2 format to train the classifiers. The classifiers will be evaluated using micro and macro scores of recall, precision, and F1.

Subtask II: Entity Linking. Participants will develop a method to link entities in README files to entities in the NFDI4DS-KG. A dataset with approx. 2000 README.md files for entity linking will be made available. The systems will be evaluated using micro and macro scores of recall, precision, and F1.

Organisers

  • Genet Asefa Gesese (FIZ Karlsruhe, Germany)
  • Zongxiong Chen (Fraunhofer FOKUS, Germany)
  • Shufan Jiang (FIZ Karlsruhe, Germany)
  • Sonja Schimmler (Fraunhofer FOKUS, Germany)

Contact

TBA

Important dates

TBA