Mimino666 / langdetect

Port of Google's language-detection library to Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can i just specify languages that i want to detect, such as only detect en, ja and zh-cn?

maliho0803 opened this issue · comments

can i just specify languages that i want to detect, such as only detect en, ja and zh-cn?

You can do this by instantiating the detector yourself:

import csv
import html
import langdetect

with open('rawdata.csv', newline='', encoding="UTF-8") as rawdata:
    rawreader = csv.reader(rawdata, delimiter=',', quotechar='"')

    # instantiate the DetectorFactory
    factory = langdetect.detector_factory.DetectorFactory()
    factory.load_profile(langdetect.detector_factory.PROFILES_DIRECTORY)

    for row in rawreader:
        # this re-creates the detector each time
        detector = factory.create()
        # or whatever your text probabilities are.
        detector.set_prior_map({"en": 0.5, "de": 0.5})
        # give the detector the text to run on
        detector.append(row[column])
        # let the detector run!
        print(detector.detect())

@Mimino666 can we just ignore specified language ?, and isn't be nice to have that as method ?