AbderrahimAl / Show-US-the-Data_Coleridge-Initiative

Discover how data is used for the public good.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Coleridge Initiative - Show US the Data

We will be using natural language processing (NLP) to automate the discovery of how scientific data are referenced in publications. Utilizing the full text of scientific publications from numerous research areas gathered from CHORUS publisher members and other sources.

📌 Goal : The objective of the competition is to identify the mention of datasets within scientific publications.

📚 You can Find the dataset Here.

Notebook Content

    1. Importing necessary packages and libraries📚
    1. Loading the data ⌛
    1. Data Pre-Processing🔧
    1. Matching 📑
    1. Masked Language Modling using HF 🤗 transformers

About

Discover how data is used for the public good.


Languages

Language:Jupyter Notebook 100.0%