A parser for SEC EDGAR .nc
files,
which represent an SEC filing.
These are SGML files, but the format seems to have drifted since the latest publicly-available DTD files I could find, so the parser implemented here is partly derived from the real-world data contained in the filings.
Attempts to provide a lossless Rust struct
representation of each filing. Dates and datetimes
are represented as chrono
objects.
This is currently a work-in-progress, and as such is not yet on crates.io, but it successfully
parses all non-corrupt .nc
filings I have fed into it, which range from 1995 to 2021.
Decodes binary files when provided. Extracts included XBRL
(enclosed in <XBRL></XBRL>
tags)
as a String
, but does not attempt to parse XBRL, which is an entirely separate format.