microsoft / nlp-recipes

Natural Language Processing Best Practices & Examples

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicated notebooks at sentiment_analysis/absa/

loomlike opened this issue · comments

Description

The contents of two notebooks under sentiment_analysis/absa, absa.ipynb and absa_azureml.ipynb are the same.

Expected behavior (i.e. solution)

No duplication

Other Comments

Is this a matter of just deleting one of the files?

Just wanted to follow up here:

I notice two differences between the files in question here. Both differences are not major and can be incorporated in the other and we can have a single example file instead of two.

  1. In the absa_azureml.ipynb file, we have the following extra lines to pull in the datasets required for the example. These are missing in the absa.ipynb file under the Upload Data section.
!wget -O 'dataset/clothing_absa_train.csv' 'https://nlpbp.blob.core.windows.net/data/clothing_absa_train.csv'
!wget -O 'dataset/clothing-absa-validation.json' 'https://nlpbp.blob.core.windows.net/data/clothing-absa-validation.json'
!wget -O 'dataset/clothing_absa_train_small.csv' 'https://nlpbp.blob.core.windows.net/data/clothing_absa_train_small.csv'

The solution could be to add these lines to the absa.ipynb file making it easier to fetch the dataset rather than doing it outside of the notebook environment.

  1. The value of the pip_packages argument passed in the Estimator under the Create An Experiment section is separate out in a separate variable, PIP_PACKAGES, in absa.ipynb file which is not the case for absa_azureml.ipynb file.

We can use either strategy to pass the argument. Nothing major here.