datadrivenconstruction / PDF-to-Excel

This repository contains a Python script designed to extract tables from a given PDF and save them to an Excel file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

πŸ“„ PDF Table Extraction to Excel πŸ“Š

Easily extract tables from PDFs and save them into Excel format. Designed specifically for Google Colab, this tool leverages the power of tabula-py to seamlessly transform your PDF tables into actionable Excel data.

Demo Image/GIF

🌟 Features

  • Easy Extraction: Just provide the PDF link, and let the magic happen.
  • Google Colab Integration: Optimized for Google Colab to make use of its free resources.
  • Multi-Table Support: Extracts multiple tables and saves them in separate Excel sheets.

πŸš€ Getting Started

Prerequisites

  • Python 3.x
  • Google Colab environment
  • Libraries: tabula-py, pandas

Usage

  1. Clone the Repository:

    git clone https://github.com/yourusername/pdf-table-extraction.git
    cd pdf-table-extraction
  2. Google Colab:

    • Upload the notebook to Google Colab.
    • Follow the step-by-step instructions within the notebook.

πŸ” How It Works

  1. Setup: Installs the required libraries in the Google Colab environment.
  2. Fetch PDF: Downloads the specified PDF.
  3. Extract: Utilizes tabula-py to extract tables from the PDF.
  4. Save: The tables are then structured and saved to an Excel file.

🀝 Contributions

Feel the need to optimize the extraction or add more features? Your contributions are heartily welcome!

  • Create a pull request with enhancements.
  • Found a bug? Open an issue.

πŸ“ƒ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

πŸ™Œ Acknowledgments

  • tabula-py for making PDF table extraction possible.
  • All contributors and testers of this project.

About

This repository contains a Python script designed to extract tables from a given PDF and save them to an Excel file