dumitrescustefan / dumitrescustefan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About me

I'm an Machine Learning Engineer, working on cool projects at the intersection of NLP and CV. I finished my PhD in 2011, worked as a Research Scientist at the Research Institute for AI (Romanian Academy) for 7 years, then switched applied ML as an ML engineer at Sustainalytics (2017-2019) and now at Adobe.

I'm active in open source, especially on Romanian NLP. Throughout the years I've published, teached and coded, all while having fun. I like to build stuff.

Showcase on HuggingFace:


Projects I'm proud of

Under development:

  • Romanian Text Corpus (joint project with Mihai Ilie)
  • Word Sense Disambiguation Corpus & Models for Romanian (large scale, long running project)
  • NLI Corpus for Romanian
  • Sentence segmentation for Romanian (because current Romanian tools fail miserably for anything but clean text)

2023

  • May Appeared on live TV discussing AI (#1, #2)
  • Apr Participated in WE Smart Diaspora conference in Timisoara, Romania, presenting "The Impact of Large Language Models"

2022

2021

2020

  • Aug I lead the development of the first ML leaderboard named LiRo Benchmark, together with Viorica Patraucean and other amazing RomaniaAI volunteers.
  • Jun Proposed and lead the development of the Romanian Semantic Textual Similarity dataset. It's a 1:1 high-quality human translation of the English STS dataset.
  • Apr: Trained an released the first monolingual Romanian BERT model, which became the most used BERT model in Romania, with thousdands of monthly downloads.

2019 and before

  • RoWordNet pip package providing quick access to the Romanian WordNet. After all these years it's still the only python plug-and-play package for Romanian - seems to be working well :)
  • Developed NLP-Cube with Tiberiu Boros (lead). Started as an entry in the 2018 Conll competition and evolved into a multilingual toolkit providing Tokenization, Sentence Segmentation, Lemmatization, POS and DEP parsing, trained on the Universal Dependencies dataset.

Selected publications

Google Scholar profile , h-index: 9


About