purohit / refranero-scraper

Scrapes the CVC Refranero Multingüe database for commonly used Spanish idioms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This program crawls Spanish idioms from the Refranero Multingüe, maintained by the Centro Virtual Cervantes: http://cvc.cervantes.es/lengua/refranero/Default.aspx. The center reserves all rights (reservados todos los derechos), so you likely can only use the data non-commercially.

It outputs all idioms with Spanish definitions and their usage. If you just want the output, see the idioms.tsv file included here.

Usage:

0. Install dependencies:

    go get .

1. Build the refranero-scraper program:

    go build

2. Get link slugs for all idioms:

    refranero-scraper -print-slugs > slugs.txt

3. Parse link slugs, and print out the idioms and definitions. It could take a while to parse these ~2000 pages.

    < slugs.txt | refranero-scraper -read-slugs > idioms_and_definitions.txt

About

Scrapes the CVC Refranero Multingüe database for commonly used Spanish idioms


Languages

Language:Go 100.0%