This repository contains code for topic modeling using the SaudiNewsNet dataset. The topic modeling approach is based on the BERTopic library, as demonstrated in Abu Bakr Soliman's video tutorial and accompanying Colab notebook. I have adapted and modified the code for the SaudiNewsNet dataset after thorough understanding and research.
BERTopic can be viewed as a sequence of steps to create its topic representations. There are five steps to this process:
The SaudiNewsNet dataset consists of news articles from various sources related to Saudi Arabia. It covers a wide range of topics including politics, economy, culture, and more.
To use the code in this repository, follow these steps:
- Clone the repository to your local machine:
git clone https://github.com/your-username/saudinewsnet-topic-modeling.git
- Download the SaudiNewsNet dataset and place it in the
data
directory.
- Run the
BERTopic-SaudiNewsNet.ipynb
notebook to perform topic modeling on the SaudiNewsNet dataset. - Modify the notebook as needed for experimentation and analysis.
The topic modeling results will be available in the notebook, showcasing the discovered topics and their respective keywords. Feel free to explore and interpret the results to gain insights into the themes present in the SaudiNewsNet dataset.
Contributions are welcome! If you have any suggestions, improvements, or bug fixes, please open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Abu Bakr Soliman for the insightful tutorial on topic modeling with BERTopic.
- The creators of the SaudiNewsNet dataset for providing valuable data for research and analysis.