kotyukov / nlp-course-projects

Projects from students of NLP Course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nlp-course-projects

Projects from students of NLP Course

Name Description Team Repository
Movie Poster Caption Generation @kazzand https://github.com/kazzand/huaweiproject
Chinese-Russian Machine Translation @RonanenkovN https://github.com/RomanenkovN/HuaweiNLP
Aspect-Based Sentiment Analysis in German Identify aspect and document-level polarity of messages in German. It is important for German services providers such as railways. @DrFirestream https://github.com/DrFirestream/NLP
Aspect Extraction with Capsule Networks Topic modelling with CapsNet. Knowing what people are talking about and understanding their problems and opinions is highly valuable to businesses, administrators, political campaigns. And it’s really hard to manually read through such large volumes and compile the topics. Thus is required an automated algorithm that can read through the text documents and automatically output the topics discussed. @KirillKrasikov https://github.com/KirillKrasikov/TopicModelingWithCapsNet
Text Summarization in Russian The project's goal is to summarize the text for the Russian language. I think that one of the most valuable and expensive things in a person's life is their time. The task of selecting the main from text item will allow you not to read news articles in their entirety and save a lot of time. I planned to build a model that would make a summary for news about stock trading in Russian language. To create my own set of texts and there’s summary I have short news tweets in the telegram(as summary) and full news articles about trading on the exchange(texts) on the site https://quote.rbc.ru. @medphisiker https://github.com/medphisiker/Huawei-s-nlp-course-project
BERT-based Aspect Extraction The goal of my project is to solve the problem of aspect extraction from text data. In order to solve the problem one should discover not only an author's opinion of an entity mentioned in text but also opinions relative to specific properties of the entity called aspects. Aspects are represented in texts via aspect terms. The practical importance of the problem includes the possibility to use the developed models in analysis of social media to assess users' perception of products, manage brand reputation, conduct different political and social researches and so on. @ulaelfray https://bitbucket.org/ulaelfray/huawei-nlp-course/
Setiment Analysis in Russian @alekxd https://github.com/alekxd/project-NLP-sentiment-rus
Text Summarization Task in Russian The problem which I am going to solve is summarization task in Russian. Nowadays, we have a lot of information and it is important to extract the main idea from a text, in my case the model will help people to generate headlines for news articles. @alexvishnevskiy https://github.com/alexvishnevskiy/Huawei-project
Generation of news headlines Summarization task in Russian for news data set @germanjke, @kotyukov https://github.com/germanjke/huaweiNLP
Russian aspect-based sentiment analysis BERT-based techniques to identify the sentiment of the selected entity in the text. For example, "In general I like the car but I hate it's ". The sentiment of the "color" is negative.The most relevant dataset is https://github.com/songyouwei/ABSA-PyTorch/tree/master/datasets/semeval14 @preduct0r https://github.com/preduct0r/huawei
Jigsaw Multilingual Toxic Comment Classification "Jigsaw Multilingual Toxic Comment Classification" is the Kaggle competition. Use TPUs to identify toxicity comments across multiple languages. We have to predict the probability that a comment is toxic/non-toxic. https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification LeonidMorozov, Mteterin https://github.com/LeonidMorozov/jigsaw_toxic_classification
Headlines generation from news articles in Russian Reading full texts is time consuming. If the headline of the text reflects the main idea of the original version, then reading it saves a lot of time. I will be working on Rossiya Segodnya (RIA) corpus, consisting of long text-heading pairs. I'm going to make Data preprocessing and then use Pre-trained embeddings to build Attentive RNN model in pyTorch implementation. @vadimvvlasov https://github.com/vadimvvlasov/nlp-project
Text summarization by using the topic (aspect) of the text. Our task is to hybridize topic modelling and summarization. Particularly - to use aspects vectors in the summary generation process. And therefore manipulate the focus of the summary. Subtask is to check whether an aspect can influence the result of summing text E.g. generate a different summary of the text by bias to one or more of its topics: text about sport event with attention to politics, should (from our point of view) summarize more information about famous people who attended the event, than about the event itself. @dmitriy.valetov @RomanButov https://github.com/DmitriyValetov/nlp_course_project
Authorship probability estimation Authorship probability assessment of doubtful documents attributed to the author; Single out the characteristic features inherent to the authors works; Approach to typing the periods of the author's creative works @dbadeev https://github.com/dbadeev/nlp_huawei_project
Chinese to Russian machine translation The zh-ru translation pair is pretty weak now even in Yandex and Google translation systems. The main goal of this project is to practice with attention models and build the machine translation system producing the decent BLEU. There is also a competition hosted by ML Bootcamp. @averkij https://github.com/averkij/ml-bootcamp-zh-ru-translation
Search engine with topic document embeddings Development of a search-engine using a topic model built with the help of the TopicNet library. The search corpora is based on byweb-2007 open collection. @To-olak https://github.com/Evgeny-Egorov-Projects/ROMIP-search
Generate text with outside context changing. In that project I want to try generate next word with external context. Also I want to try solve reverse text summarisation task with reverse text generation (previous word) if will be in time. Metrics will be statistical, like researches used here http://gltr.io/dist/index.html. Dataset will be collected from zero to understand all aspects of such work. @FrankShikhaliev https://github.com/MindSetLib/MS-Education/tree/master/NLP/HuaweiProject
Math word problem solver/explainer @Max Plevako https://github.com/mplevako/znaiqa
Social networks' posts classification The problem I am trying to solve is the problem of social networks posts classification. The problem is important to solve because it helps to extract inappropriate content from the network and hide it from users, who are under the age limit. It also helps administrators of social networks to moderate their groups in an automated way. I will be working with the data that I will collect myself from the public resources. @BorodinDmitriy https://github.com/BorodinDmitriy/huawei-nlp-course
Clustering learners’ essays on the basis of key words @Aniezka,@lkoteuka https://github.com/Aniezka/huawei-nlp-project

About

Projects from students of NLP Course