AbderrahimAl/Show-US-the-Data_Coleridge-Initiative

Coleridge Initiative - Show US the Data

We will be using natural language processing (NLP) to automate the discovery of how scientific data are referenced in publications. Utilizing the full text of scientific publications from numerous research areas gathered from CHORUS publisher members and other sources.

📌 Goal : The objective of the competition is to identify the mention of datasets within scientific publications.

📚 You can Find the dataset Here.

Notebook Content

1. Importing necessary packages and libraries📚
1. Loading the data ⌛
1. Data Pre-Processing🔧
1. Matching 📑
1. Masked Language Modling using HF 🤗 transformers

About

Discover how data is used for the public good.

bert huggingface-transformers mlm nlp

Languages

Language:Jupyter Notebook 100.0%