maxscott / sec_latest_filings_rss_feed_parser

Stupid Simple Elixir XML Parser for SEC "Recent Filings" RSS Feed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CircleCI HEX version

SecLatestFilingsRssFeedParser

This is a simple parser with one goal: to hit the SEC's Latest Filings RSS Feed and parse the XML to return back a workable JSON-like format.

For example, if you went to the SEC's Latest Filings RSS Feed here, you would see XML in the following format:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Latest Filings - Thu, 25 Feb 2016 18:54:17 EST</title>
<link rel="alternate" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<link rel="self" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<id>http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent</id>
<author><name>Webmaster</name><email>webmaster@sec.gov</email></author>
<updated>2016-02-25T18:54:17-05:00</updated>
<entry>
<title>10-K - TESORO CORP /NEW/ (0000050104) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055-index.htm"/>
<summary type="html">
 &lt;b&gt;Filed:&lt;/b&gt; 2016-02-25 &lt;b&gt;AccNo:&lt;/b&gt; 0000050104-16-000055 &lt;b&gt;Size:&lt;/b&gt; 23 MB
</summary>
<updated>2016-02-25T17:29:49-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0000050104-16-000055</id>
</entry>
<entry>
<title>10-K - BB&amp;T CORP (0000092230) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125-index.htm"/>
<summary type="html">
 &lt;b&gt;Filed:&lt;/b&gt; 2016-02-25 &lt;b&gt;AccNo:&lt;/b&gt; 0000092230-16-000125 &lt;b&gt;Size:&lt;/b&gt; 28 MB
</summary>
<updated>2016-02-25T17:25:57-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0000092230-16-000125</id>
</entry>
.
.
.
<entry>
<title>10-K - Benefitfocus,Inc. (0001576169) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532-index.htm"/>
<summary type="html">
 &lt;b&gt;Filed:&lt;/b&gt; 2016-02-25 &lt;b&gt;AccNo:&lt;/b&gt; 0001193125-16-478532 &lt;b&gt;Size:&lt;/b&gt; 7 MB
</summary>
<updated>2016-02-25T17:05:40-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0001193125-16-478532</id>
</entry>
</feed>

The XML has a feed, which has many entries. Parsing the feed (SecLatestFilingsRssFeedParser.parse(xml_document)) would return an Elixir map that looks like the following:

{:ok,
 %{entries: [%{cik: "0000050104",
      html_link: "http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055-index.htm",
      text_link: "http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055.txt",
      rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0000050104-16-000055",
      summary: "Filed: 2016-02-25 AccNo: 0000050104-16-000055 Size: 23 MB",
      title: "10-K - TESORO CORP /NEW/ (0000050104) (Filer)",
      updated_date: "2016-02-25T17:29:49-05:00"
      category: "10-K"},
    %{cik: "0000092230",
      html_link: "http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125-index.htm",
      text_link: "http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125.txt",
      rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0000092230-16-000125",
      summary: "Filed: 2016-02-25 AccNo: 0000092230-16-000125 Size: 28 MB",
      title: "10-K - BB&T CORP (0000092230) (Filer)",
      updated_date: "2016-02-25T17:25:57-05:00"
      category: "10-K"},
    .
    .
    .
    %{cik: "0001576169",
      html_link: "http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532-index.htm",
      text_link: "http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532.txt",
      rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0001193125-16-478532",
      summary: "Filed: 2016-02-25 AccNo: 0001193125-16-478532 Size: 7 MB",
      title: "10-K - Benefitfocus,Inc. (0001576169) (Filer)",
      updated_date: "2016-02-25T17:05:40-05:00",
      category: "10-K"}],
   updated: "2016-02-25T18:54:17-05:00"}}

An entry's map contains a cik (the identifier the SEC uses for a company or security), an html_link to the filing, a text_link to the text version of teh filing, a category which represents the category of filing (10-K, 10-Q, 4, etc.), an rss_feed_id which represents a unique id of the entry, a summary which is a short summary of the document, a filing title and an updated_date. The feed is a map of those entries and an updated date of the feed.

Be bold, use this tool to bring some sanity to parsing the SEC's XML feed and feel free to contribute!

Installation

This project is available in Hex and the package can be installed as:

  1. Add sec_latest_filings_rss_feed_parser to your list of dependencies in mix.exs:

    def deps do [{:sec_latest_filings_rss_feed_parser, "~> 0.0.6"}] end

  2. Ensure sec_latest_filings_rss_feed_parser is started before your application:

    def application do [applications: [:sec_latest_filings_rss_feed_parser]] end

License

This library is under the MIT license.

About

Stupid Simple Elixir XML Parser for SEC "Recent Filings" RSS Feed

License:Other


Languages

Language:Elixir 100.0%