MahmoudHassanen99 / Topic-modeling-Bertopic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Topic-modeling-Bertopic

Introduction

This repository contains code for topic modeling using the SaudiNewsNet dataset. The topic modeling approach is based on the BERTopic library, as demonstrated in Abu Bakr Soliman's video tutorial and accompanying Colab notebook. I have adapted and modified the code for the SaudiNewsNet dataset after thorough understanding and research.

Bertopic Steps

BERTopic can be viewed as a sequence of steps to create its topic representations. There are five steps to this process:

image

Dataset

The SaudiNewsNet dataset consists of news articles from various sources related to Saudi Arabia. It covers a wide range of topics including politics, economy, culture, and more.

Getting Started

To use the code in this repository, follow these steps:

  1. Clone the repository to your local machine:
    git clone https://github.com/your-username/saudinewsnet-topic-modeling.git
    
  2. Download the SaudiNewsNet dataset and place it in the data directory.

Usage

  1. Run the BERTopic-SaudiNewsNet.ipynb notebook to perform topic modeling on the SaudiNewsNet dataset.
  2. Modify the notebook as needed for experimentation and analysis.

Results

The topic modeling results will be available in the notebook, showcasing the discovered topics and their respective keywords. Feel free to explore and interpret the results to gain insights into the themes present in the SaudiNewsNet dataset.

Contributing

Contributions are welcome! If you have any suggestions, improvements, or bug fixes, please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

  • Abu Bakr Soliman for the insightful tutorial on topic modeling with BERTopic.
  • The creators of the SaudiNewsNet dataset for providing valuable data for research and analysis.

About


Languages

Language:Jupyter Notebook 100.0%