We will be using natural language processing (NLP) to automate the discovery of how scientific data are referenced in publications. Utilizing the full text of scientific publications from numerous research areas gathered from CHORUS publisher members and other sources.
📌 Goal : The objective of the competition is to identify the mention of datasets within scientific publications.
📚 You can Find the dataset Here.
-
- Importing necessary packages and libraries📚
-
- Loading the data ⌛
-
- Data Pre-Processing🔧
-
- Matching 📑
-
- Masked Language Modling using HF 🤗 transformers