dboyd42 / cantonese-dictionary

Writes a Cantonese dictionary into a csv file from a word list.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cantonese Dictionary

Author: David Boyd
Date: Fall 2019
Type:Web Scraper

Description

This web scraper program translates Cantonese words to English words.

How it Works

Input

The program begins by reading in a text file that contains a wordlist to be translated or defined.

Process

The program then runs these terms into cantonese.org to scrap the first definition and jyutping (if available).

  1. Create a query web address from a word in the wordlist.
  2. Requests library uses the (HTTP) GET method to download the corresponding reponse webpage.
  3. Move the response into BeautifulSoup4 (HTML parser) object.
  4. Extract specific data based on the HTML tags and index.

Output

The program creates a file called "canto-definitions.csv" in which the first column displays the jyutping and the second column displays the definition.

That's it!

You can view your new "canto-dict.csv" from the current working directory.

How to Run Program

Dependencies

  • BeautifulSoup4 (HTML parser)
  • Requests (HTTP library)

Run

# Run program
python3 main.py

# Input wordlist
sample/sample2.txt

# Review the saved file
vim canto-definitions.csv

About

Writes a Cantonese dictionary into a csv file from a word list.


Languages

Language:Python 100.0%