amineHY / webdata_collector

The project is a web crawler and scraper that extracts data from marketplace posts. The backend is built using FastAPI, which receives requests from the frontend, collects data using the crawler and scraper, and saves it to a CSV file.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🌟 AI Crawler and Scraper of Marketplace - CSS or LLM (OpenAI)

image 2024 05 17 13 59 45

πŸš€ Architecture

diagram

Backend

A FastAPI consist of accessing posts published in marketplace and return the data to the frontend FastAPI

Frontend

A simple streamlit dashboard that allows the user to interact with the backend API A simple query is sent to the backend then, the data is collected by the backend api, crawler & scraper. The collected data is saved in a CSV file

πŸ§‘β€πŸ”¬ Run the project

Launch the API server:

You need to run the FastAPI server using Uvicorn, a lightning-fast ASGI server. You can do this by executing the following command in your terminal:

uvicorn main:app --reload

This command will start the FastAPI application defined in the main.py file with hot reloading enabled, which is useful during development as it automatically reloads your application when code changes.

image 2024 05 02 23 17 59

Launch the dashboard server:

To run the Streamlit dashboard, use the following command:

streamlit run dashboard.py

This command starts the Streamlit server with the dashboard defined in dashboard.py, allowing you to interact with the backend API through a user-friendly interface.

image 2024 05 16 22 44 35

πŸ–₯️ Installation

Create and activate a virtual environnement

python3 -m venv myenv
source myenv/bin/activate

Install python libraries

pip install requirements.txt

Install playwright

playwright install

About

The project is a web crawler and scraper that extracts data from marketplace posts. The backend is built using FastAPI, which receives requests from the frontend, collects data using the crawler and scraper, and saves it to a CSV file.


Languages

Language:Python 100.0%