MTA-PPKE Hungarian Language Technology Research Group's repositories
boilerplateResults
Results of boilerplate removal algorithms
whats-wrong-python
What's Wrong With My NLP? is visualizer and graphical diff for Natural Language Processing problems. We are reimplementing this program in Python 3. For more information about the original program go to http://whatswrong.googlecode.com
purepos-python3
PurePOS rewritten in Python3
AnaGramma-Parser
Egy pszicholingvisztikai indíttatású elemző modell
CleanPortalEval
boilerplate removal test set for portals (more sites from the same domain)
commoncrawl-downloader
Simple Python command line tools for retrieving a list of urls and specific files in bulk
gut-besser-chunker
The program used in the paper 'Gut, Besser, Chunker – Selecting the best models for text chunking with voting' by Balázs Indig and István Endrédy
less-is-more
The program used in the paper 'Less is More, More or Less... – Finding the Optimal Threshold for Lexicalisation in Chunking' by Balázs Indig
nom-or-not
algorithm for case-disambiguation
NYTK-NerKor-Cars-OntoNotesPP
A 1M+-token Hungarian named entity dataset with ~30 entity types derived from NYTK-NerKor
nom-or-what
Nom-or-what algorithm, designed to disambiguate case endings on nouns, adjectives, numerals etc. in Hungarian.