sai-tej31 / ETL-SEC-Edgar-10-K-Filings

This repository features a powerful cloud-based crawler capable of meticulously compiling paths to HTML files and organizing them in a dictionary, efficiently categorized by tickers. This ready-to-use dictionary facilitates seamless extraction, transformation, and loading (ETL) processes, streamlining further data analysis workflows.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ETL for SEC Edgar 10-K Filings Downloader

Table of Contents


1. Overview

The 10k-download project provides utilities for downloading, collecting, and organizing 10-K filings for various companies from the SEC Edgar website. This open-source project consists of modules that facilitate file collection, 10-K download, and usage examples for a seamless experience.


2. Folder Structure

10k-download/
|-- CODE_OF_CONDUCT.md
|-- README.md
|-- myenv/
|-- requirements.txt
|-- LICENSE
|-- data/
|-- playground.ipynb
|-- utils/
    |-- TickerFilesCollector.py
    |-- __init__.py
    |-- __pycache__/
    |-- collect_ticker_files.py
    |-- get_ticker_10k_filings.py

  • myenv/: Virtual environment directory for managing project dependencies.
  • playground.ipynb: A Jupyter notebook for testing and usage examples.
  • requirements.txt: A text file containing required Python packages.
  • utils/: A directory containing utility modules.
    • TickerFilesCollector.py: Module for collecting ticker files.
    • __init__.py: Initialization file for the utils package.

3. Module: TickerFilesCollector

The TickerFilesCollector class in the TickerFilesCollector.py module is responsible for collecting and organizing ticker files from the specified data folder.

3.1. Methods

  • __init__(self, root_folder): Initializes a TickerFilesCollector object.
  • _collect_files(self, root_folder): Collects all TXT, HTML, and XML files inside the root_folder and its subfolders.
  • _get_ticker_files(self, root_folder, ticker): Collects all TXT, HTML, and XML files for a specific ticker and stores them in a dictionary.
  • get_all_ticker_files(self): Collects all TXT, HTML, and XML files for all tickers in the root_folder.

4. Module: collect_ticker_files

The collect_ticker_files.py module provides the collect_ticker_files function, which is responsible for collecting and organizing ticker files from the specified data folder for all tickers.

4.1. Function

  • collect_ticker_files(data_folder='data/sec-edgar-filings'): Collects and organizes ticker files from the specified data folder for all tickers.

5. Module: get_ticker_10k_filings

The get_ticker_10k_filings.py module provides the get_ticker_10k_filings function, which downloads all the 10-K filings for a given ticker from the SEC Edgar website.

5.1. Function

  • get_ticker_10k_filings(ticker): Downloads all the 10-K filings for a given ticker from the SEC Edgar website.

6. Usage Examples

For detailed usage examples and demonstrations of the project functionalities, refer to the playground.ipynb Jupyter notebook in the root directory.


7. License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code.


8. Contributing

If you wish to contribute to this project, we welcome your contributions! Please follow the guidelines in CONTRIBUTING.md for information on how to get started.


9. Acknowledgments

Special thanks to all the contributors who have contributed to this project. Your efforts are greatly appreciated

That should be a more professional and organized documentation for your 10k-download project. It includes a clear table of contents, detailed explanations for each section, and proper formatting to make it look like a typical open-source project's README on GitHub. The license, contributing guidelines, and acknowledgments sections provide the necessary information for potential users and contributors.

Remember to update the actual content of the sections with the relevant information for your project. This includes adding descriptions of the project's functionalities, installation instructions, and usage examples in the "Usage Examples" section. Additionally, make sure to include details about how users can set up their development environment and run the project.

If you have any more specific questions or need further assistance, feel free to ask!

About

This repository features a powerful cloud-based crawler capable of meticulously compiling paths to HTML files and organizing them in a dictionary, efficiently categorized by tickers. This ready-to-use dictionary facilitates seamless extraction, transformation, and loading (ETL) processes, streamlining further data analysis workflows.

License:MIT License


Languages

Language:Python 70.6%Language:Jupyter Notebook 29.4%