mr-z-ro / msf-401k-hackathon-tools

A set of tools, data, and specifications for understanding what stocks I hold in my Mutual Funds, ETFs, etc as an individual.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MSF 401k Hackathon Tools!

Included in this repository is a set of tools and specifications for use at MSF's 401K Investor Resource Hackathon.

Please note that these scripts have been run on 10 March 2017, and the resulting data is included in csv, json, and mysql formats in respective directories within the repository as well.

Feedback Welcome!

For requests, tech questions, general comments, happy feedback, etc, feel free to use the wonderful Github tools provided here, or reach out via twitter to @mr_z_ro!

Data!

MySQL Data

The data that's presented granularly below has also been collated into a MySQL data dump, which is included in the repository's mysql directory.

The data is also be hosted live in a location that can be accessed with credentials that will be announced at the hackathon.

Sources

The following sources were referenced in aggregating this data

  • MorningStar: Aggregated list of top 20 funds holding PFE and GSK
  • HoldingsChannel: Aggregated Lists of all Institutions (mislabeled on their site as “funds”) holding PFE and GSK, pulled from SEC’s EDGAR database
  • ETFdb: List of ETFs holding PFE and GSK
  • MutualFunds.com: List of all funds with corresponding abbreviations

Precooked Granular Data

All data provided in this repository at time of writing are for PFE and GSK stocks, which are in files prefixed with their respective tickers.

Prerequisites for Gathering Fresh Data

For use cases requiring fresh data beyond the hackathon, the scripts can be run on demand, after installing the following prerequisites.

The scripts require BeautifulSoup and Selenium libraries, which can be installed using pip as follows:

pip install bs4
pip install selenium

Next, in order to actually walk through the data, browser emulators are needed. PhantomJS is a great one that can be installed as follows:

Download phantomjs (for silent scraping):
http://phantomjs.org/download.html
[extract]
mv ~/Downloads/phantomjs-2.1.1-macosx/bin/phantomjs /usr/local/bin

Firefox (geckodriver) can also be helpful for debugging, and can be installed as follows:

Download geckodriver (for debugging):
https://github.com/mozilla/geckodriver/releases
[extract]
mv ~/Downloads/geckodriver /usr/local/bin

Note: please ensure PATH is updated to include /usr/local/bin directory. An example of how to do this for a linux-based system (e.g. Mac, Ubuntu, or Windows with cygwin) can be found here

###Using the Tools

####scrape_ms.py This script pulls data about the top mutual fund holders of a given stock (parameterized by TICKER) and dumps to a file called TICKER_mfund_holder.csv. For instance, for Google (GOOG), this script can be run by calling:

python scrape_ms.py -t GOOG

Sample files for PFE and GSK have been provided as part of this repository.

####scrape_edb.py This script pulls data about the top exchange-traded funds (ETFs) that hold a given stock (parameterized by TICKER) and dumps to a file called TICKER_etf_holder.csv. For instance, for Yahoo (YHOO), this script can be run by calling:

python scrape_edb.py -t YHOO

Sample files for PFE and GSK have been provided as part of this repository.

####scrape_hc.py This script pulls data about the top Institutions that hold a given stock (parameterized by TICKER) and dumps to a file called TICKER_inst_holder.csv. For instance, for Yahoo (YHOO), this script can be run by calling:

python scrape_hc.py -t YHOO

Sample files for PFE and GSK have been provided as part of this repository.

####scrape_mf.py This script pulls data about the ticker symbols of the top mutual funds, and dumps to a file called mfund_tickers.csv. It can be run by calling:

python scrape_mf.py

####cleanup.sh This script cleans up logs and csvs produced by running the scrape files. It can be executed by running:

./cleanup.sh

About

A set of tools, data, and specifications for understanding what stocks I hold in my Mutual Funds, ETFs, etc as an individual.


Languages

Language:Python 99.8%Language:Shell 0.2%