paschok / Diploma

Bachelor Thesis: Classsification of Advertisements by means of Supervised Learning Methods

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diploma

My bachelor work in Hochschule Merseburg written in Python, using Native Language Processing of ML

My bachelor thesis is: ***Classification of advertisements by means of supervised learning methods ***

Work process:

  • Learn about NLP
  • Scrap data
  • Try NLTK / spacy on datasets
  • Learn more about hclustering algorithms / Neural networks / Other NLP methods like Topic Modelling, W2W and so on
  • Code the Diploma
  • Write a Diploma itself = Thesis

My bachelor has two major branches:

  1. Data
    • Scrapping data from web using scapy, google useragent or proxies. I used to scrap amazon with proxie, but because of lagging and switching off decided to use useragent and time.sleep()
  2. ML
    • Code implemenation

Commits

One of the 2 branches above: subproject: message. Not including README.md.

Example:

Data: amazon: added new spider

README.md: update

Data comes from these websites:

  • obszone
    • had problems downloading american products for sale, so had to use a litle trick with url
  • geebo
  • adlandpro
  • pennysaverusa
  • hoobly
  • oodle
  • gumtree
  • letgo
  • salespider
  • ebay
  • amazon

Amazon data issues:

When entering departments on amazon you can scrap either 400 pages of common products of said department, or go into Feature Categories and scrap precise products.
For instance: 400 pages of automotive department OR Car care, car electronics and so on.

About

Bachelor Thesis: Classsification of Advertisements by means of Supervised Learning Methods


Languages

Language:Python 55.4%Language:Jupyter Notebook 42.9%Language:C 1.1%Language:C++ 0.3%Language:XSLT 0.2%Language:Objective-C 0.0%Language:GAP 0.0%Language:Roff 0.0%Language:HTML 0.0%Language:Fortran 0.0%Language:Smarty 0.0%Language:PowerShell 0.0%Language:Shell 0.0%Language:Batchfile 0.0%