File-based SEC EDGAR parser

Question

File-based SEC EDGAR parser

theOGognf opened this issue a year ago · comments

I currently implement the SEC EDGAR API, but the API is still relatively new and doesn't contain all the data that may be available through the SEC EDGAR historical data file archives. I think we'd want to use the file-based SEC EDGAR data as an alternative to using the SEC EDGAR API in cases where a company's data cant be found through the API. Just glancing at how the data files are organized, I don't think itd be too big of an effort to implement. A first implementation should probably have the following elements

Methods for crawling through the index files for each year and quarter
Methods for parsing index files and storing them into SQL tables
Methods for getting filings based on an index entry
Methods for storing filings in SQL tables and querying them from SQL tables
Options for enabling the file-based methods as an alternative to the API methods in cases of errors

theOGognf · Answer 1 · Sun May 28 2023 06:30:03 GMT+0800 (China Standard Time)

I messed around with this a bit. I'm not sure if this feature is quite worth the effort. The general workflow is as follows:

Use bs4 to crawl through the /Archives/full_index URL links and download the tables for each quarter-year pair to get filing URLs for each company
Use bs4 to parse a filing and search through tags
Look through tags to get metadata about each XBRL tag
Store tags in their own table

This is really straightforward, but I wonder if this is just replicating what the SEC EDGAR REST API is already doing behind-the-scenes. I'm going to pause development for now until I find that this is not the case

theOGognf · Answer 2 · Fri Aug 11 2023 08:37:47 GMT+0800 (China Standard Time)

Settled on this not being worth the effort and will close this. Can be reopened if necessary