eXascaleInfolab / fashionNLP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FashionNLP: natural language processing tool for fahion textual dataset.

EU project 732328: "Fashion Brain".

D2.1: "Named entity recognition and linking methods".

Getting started

Installation:

git clone https://github.com/eXascaleInfolab/fashionNLP.git
cd fashionNLP/

Description of the "fashionNLP" package

The ”fashionnlp” package contains the following files:

  • updateFBT.py: This code performs the following tasks:
    1. Take a concept present in WikiKB as input.
    2. Find the mentions of this concept and its similar concepts (using String matching and tree search) in instagram posts.
    3. Find if the WikiKB concept is present in FBtaxonomy. If not, update the FB taxonomy.
  • wikitaxonomy.py: This file is used to generate the wikipedia taxonomy from the wikipedia categorisation https://en.wikipedia.org/wiki/Category:Clothing_by_type
  • input folder: This folder contains the following input files:
    • FBTaxonomy.csv: The initial FashionBrain taxonomy in a csv format.
    • Find the mentions of this concept and its similar concepts (using String matching and tree search) in instagram posts.
    • FBTaxonomy.csv: The initial FashionBrain taxonomy in a json format.
    • ner_posts.csv: This file contains the output result of applying SENNA on the instagram posts.
    • wikipediaKB.json: This file contains the wikipedia knowledge base of fashion items in a json format.
  • result folder: This folder contains the updated taxonomy

Running the code

Download StandfordCoreNLP and add it in this directory.

https://stanfordnlp.github.io/CoreNLP/history.html

In order to run an experiment, the updateFBT.py file is used. The corresponding command line to run the code is :

pip install stanfordcorenlp
pip install python-levenshtein
python updateFBT.py

About


Languages

Language:Python 67.5%Language:Jupyter Notebook 32.5%