wuyi0614 / recipe170

The main code repository for '170' recipe analysis project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recipe170

"Recipe170" is the title of our Global Recipe Analysis Project, which analyses the worldwide recipes with NLP-based solutions.

1. Quick start

  • 1.1 Environment specification

    • Miniconda (conda 23.11.0)
    • Python 3.9
    • Pipenv 2023.11.17
  • 1.2 Preprocessing

    • Step 1: cleaning texts
    • Step 2: unify quantity units
    • Step 3: get unique values and match all entries
    • Step 4: translate ingredients and quantities

2. TODOs

The overall processing of recipe data consists of a few steps:

  • Materials
    • a nice translator for Japanese--GPT-3.5-turbo API
    • parse materials + usage and export a mapping dataframe
    • parse materials and its upper-level materials, exporting a mapping dataframe
  • Units
    • numeric units, 1 0 0 g
    • textual units, 5、6個
    • enhanced units, 强弱
  • Procedure
    • longer-token translator
    • entity-recognition for cooking/timing/objects
  • Caveats
    • some recipes do not have ingredients!!!
    • create an error table for manual annotation (~20k from ingredient side)
    • use steps (ingredient extraction) to supplement to the ingredients

3. Issue Log

  • id=eb1d2e4604d93afd2753880c9f79b48e4d2fe582, issue=皮*小麦粉*3と3/4カップ
  • id=a00912dda86900e54e0df98fe658cb2f4686f23e, issue=皆さんのレシピ*みおりんさん*8801、MIRELLEさん*8785、カヨリーヌさん*9176、まるりんさん*8799

About

The main code repository for '170' recipe analysis project.

License:MIT License


Languages

Language:Python 100.0%