"Recipe170" is the title of our Global Recipe Analysis Project, which analyses the worldwide recipes with NLP-based solutions.
-
1.1 Environment specification
- Miniconda (conda 23.11.0)
- Python 3.9
- Pipenv 2023.11.17
-
1.2 Preprocessing
- Step 1: cleaning texts
- Step 2: unify quantity units
- Step 3: get unique values and match all entries
- Step 4: translate ingredients and quantities
The overall processing of recipe data consists of a few steps:
- Materials
- a nice translator for Japanese--GPT-3.5-turbo API
- parse materials + usage and export a mapping dataframe
- parse materials and its upper-level materials, exporting a mapping dataframe
- Units
- numeric units, 1 0 0 g
- textual units, 5、6個
- enhanced units, 强弱
- Procedure
- longer-token translator
- entity-recognition for cooking/timing/objects
- Caveats
- some recipes do not have ingredients!!!
- create an error table for manual annotation (~20k from ingredient side)
- use steps (ingredient extraction) to supplement to the ingredients
id=eb1d2e4604d93afd2753880c9f79b48e4d2fe582
,issue=皮*小麦粉*3と3/4カップ
id=a00912dda86900e54e0df98fe658cb2f4686f23e
,issue=皆さんのレシピ*みおりんさん*8801、MIRELLEさん*8785、カヨリーヌさん*9176、まるりんさん*8799