Easily extract tables from PDFs and save them into Excel format. Designed specifically for Google Colab, this tool leverages the power of tabula-py
to seamlessly transform your PDF tables into actionable Excel data.
- Easy Extraction: Just provide the PDF link, and let the magic happen.
- Google Colab Integration: Optimized for Google Colab to make use of its free resources.
- Multi-Table Support: Extracts multiple tables and saves them in separate Excel sheets.
- Python 3.x
- Google Colab environment
- Libraries:
tabula-py
,pandas
-
Clone the Repository:
git clone https://github.com/yourusername/pdf-table-extraction.git cd pdf-table-extraction
-
Google Colab:
- Upload the notebook to Google Colab.
- Follow the step-by-step instructions within the notebook.
- Setup: Installs the required libraries in the Google Colab environment.
- Fetch PDF: Downloads the specified PDF.
- Extract: Utilizes
tabula-py
to extract tables from the PDF. - Save: The tables are then structured and saved to an Excel file.
Feel the need to optimize the extraction or add more features? Your contributions are heartily welcome!
- Create a pull request with enhancements.
- Found a bug? Open an issue.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- tabula-py for making PDF table extraction possible.
- All contributors and testers of this project.