SyedMuqtasidAli / PDF-and-Web-Data-Scraping-for-Machine-Learning

Python repository for machine learning, featuring PDF data extraction using PyMuPDF and web scraping with BeautifulSoup, integrated with pandas, numpy, and MySQL for streamlined data processing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ“„ PDF and Web Data Scraping for Machine Learning

Welcome to the EZline Machine Learning Repository! This repository features PDF data extraction using PyMuPDF and web scraping with BeautifulSoup, integrated with pandas, numpy, and MySQL for streamlined data processing.

πŸ“š Table of Contents

πŸ› οΈ Instructions for Running the Project

πŸ“₯ Download and Extract EZline.zip:

  1. Download the ezline.zip file.
  2. Extract the file to the directory where Anaconda is typically set up, often at C:/Users/syed-muqtasid-ali/.

πŸ–₯️ Start XAMPP Server:

  1. Open XAMPP and start both Apache and MySQL servers.

πŸš€ Open Anaconda Navigator:

  1. Open Anaconda Navigator and launch Jupyter Lab.

πŸ“’ Run EZline Test Notebook:

  1. Open the EZline_test.ipynb file in Jupyter Lab.
  2. Run the notebook cells from start to end using SHIFT + ENTER.

🌐 Run Flask Web Application:

  1. Open Anaconda Prompt.
  2. Navigate to the directory where app.py is located.
  3. Run the script using the command: python app.py.

🌍 Access Web Application:

  1. Open the provided URL in your browser.
  2. Click on the "Download Data" button to fetch the data.

πŸŽ₯ Watch the Instructional Video:

For detailed steps, refer to the instructional video provided in Instruction Video.mp4.

πŸ’» Installation

To install the required dependencies, run the following command in Anaconda Prompt:

pip install -r requirements.txt

Contact

Feel free to contact me on LinkedIn for any questions or collaborations: LinkedIn Email

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Python repository for machine learning, featuring PDF data extraction using PyMuPDF and web scraping with BeautifulSoup, integrated with pandas, numpy, and MySQL for streamlined data processing.


Languages

Language:Jupyter Notebook 98.9%Language:Python 0.7%Language:HTML 0.5%