scrapinghub / navscraper

Vanguard ETF NAV scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NAVScraper

This project aims to provide easy-to-use tools to extract Net-Asset Values from Exchange-Traded Funds available in public websites.

NAVScraper is based on Scrapy framework and sponsored by Scrapinghub.

Requirements

Contributed scripts might require additional software as:

Usage

(These commands are executed within the project directory.)

Listing available spiders:

$ scrapy list

Vanguard spider

Scraping available funds:

$ scrapy crawl vanguard_funds

Scraping data from one fund (using one fund_id value scraped in the previous command) for the current year:

$ scrapy crawl vanguard -a fund_id=0967

Scraping data for a specific date range:

$ scrapy crawl vanguard -a fund_id=0967 -a date_start=01/01/2012 -a date_end=01/30/2012

Scraping data from multiple funds and storing the output in a file:

$ scrapy crawl vanguard -a fund_id=0951,0955,3184,0963,0936,0960 -o output.jl

Note

The extension .jl is used as convention to specify that the file contains one JSON object per line.

WisdomTree spider

Scraping available funds:

$ scrapy crawl wisdomtree_funds

Scraping data from one fund or more funds:

$ scrapy crawl wisdomtree -a fund_id=40,42 -o output.jl

Note

This spider scrapes all history values as the site does not provide the option to filter by date range.

Plotting the output

The output can be use to do analysis or plots. The directory scripts/ contains a script plot.py to plot the output of a spider.

$ python scripts/plot.py output.jl

docs/sample-output.png?raw=true

Data format

The spiders extracts two entities: Fund and NAV.

  • Fund fields:

    • id: Identifier (per-site value).
    • symbol: Fund ticker symbol.
    • name: Fund name.

    For example:

    {
      "id": "0938",
      "symbol": "VBK",
      "name": " Small-Cap Growth "
    }
  • NAV fields:

    • fund_id: Fund identifier (per-site value)
    • dates: Array of dates.
    • values: Array of values for the given dates.

    For example:

    {
      "fund_id": "0938",
      "dates": ["2013-01-02", "2013-01-03", "2013-01-04"],
      "values": [76.73, 76.72, 77.15]
    }

Changelog

  • 0.1-dev
    • Added spider to scrape funds and NAVs from vanguard.com.

About

Vanguard ETF NAV scraper


Languages

Language:Python 100.0%