sdtblck / Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Opensubtitles_dataset

downloads and parses subtitle dataset from opensubtitles.org

Usage

python3 parse_opensubtitle_xml.py

the above will download a zip containing the english opensubtitles corpus, and extract text from all the xml files (removes metadata)

About

downloads and parses subtitle dataset from opensubtitles.org


Languages

Language:Python 100.0%