Patch for /trunk/demo-train-big-model-v1.sh

Question

Patch for /trunk/demo-train-big-model-v1.sh

GoogleCodeExporter opened this issue 9 years ago · comments

Google Code Exporter commented 9 years ago

Fixed a couple of bugs:
1. name mismatch with the UMBC-webbase corpus
2. Downloading the phrases dataset

Original issue reported on code.google.com by roys...@gmail.com on 15 Sep 2014 at 6:42

Attachments:

demo-train-big-model-v1.sh.patch

Google Code Exporter · Answer 1 · Sat Mar 21 2015 11:47:20 GMT+0800 (China Standard Time)

Thanks, I fixed the second part (the missing download of 
questions-phrases.txt). However, I don't know what the first problem is about - 
this part of the script runs OK for me.

Original comment by tmiko...@gmail.com on 15 Sep 2014 at 9:23

Google Code Exporter · Answer 2 · Sat Mar 21 2015 11:47:20 GMT+0800 (China Standard Time)

1. Is your shell case-insensitive? Also, does it implicitly add the .tar.gz 
suffix?
You download UMBC-webbase-corpus and extract umbc_webbase_corpus.tar.gz. 

2. The corpus contains two types of files - plain txt (.txt) and parsed files 
(.possf2). I assume you are only interested in the txt files, so you want to 
iterate over these files only.

Original comment by roys...@gmail.com on 16 Sep 2014 at 8:30

Google Code Exporter · Answer 3 · Sat Mar 21 2015 11:47:20 GMT+0800 (China Standard Time)

I just noticed that when downloading 
http://ebiquity.umbc.edu/redirect/to/resource/id/351/UMBC-webbase-corpus 
through my browser I also get umbc_webbase_corpus.tar.gz, as in the script. 
However, when I download it using wget, I get UMBC-webbase-corpus. This might 
explain the difference. And I also noticed you also handle the txt files only, 
so that's cool.

Original comment by roys...@gmail.com on 17 Sep 2014 at 8:25

Google Code Exporter · Answer 4 · Sat Mar 21 2015 11:47:20 GMT+0800 (China Standard Time)

I get umbc_webbase_corpus.tar.gz when using wget, so the issue must be in 
something else. If more people will have the same problem as you, I may have to 
update the script and give the output file an exact name.

Original comment by tmiko...@gmail.com on 17 Sep 2014 at 5:48