paulgb / sec-data-parser

Rust parser for SEC EDGAR .nc submission container files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sec-data-parser

A parser for SEC EDGAR .nc files, which represent an SEC filing.

These are SGML files, but the format seems to have drifted since the latest publicly-available DTD files I could find, so the parser implemented here is partly derived from the real-world data contained in the filings.

Attempts to provide a lossless Rust struct representation of each filing. Dates and datetimes are represented as chrono objects.

This is currently a work-in-progress, and as such is not yet on crates.io, but it successfully parses all non-corrupt .nc filings I have fed into it, which range from 1995 to 2021.

Decodes binary files when provided. Extracts included XBRL (enclosed in <XBRL></XBRL> tags) as a String, but does not attempt to parse XBRL, which is an entirely separate format.

About

Rust parser for SEC EDGAR .nc submission container files.


Languages

Language:Rust 100.0%