Upload logs for Squad 2.0

Question

Upload logs for Squad 2.0

LearningPytorch opened this issue 6 years ago · comments

Hi
Thanks for new code..they're great. Can you also upload the logs (san.log) for squad 2.0? I want to make sure that I'm getting similar scores like you. thank you again.

NN_for_QA · Answer 1 · Fri Sep 21 2018 01:04:23 GMT+0800 (China Standard Time)

Specially I want to check the vocab size since you changed the prepro.py:
Raw vocab size vs vocab in glove: 106415/90949
OOV rate:1.2000=262509/21875454
final vocab size: 90953

Xiaodong · Answer 2 · Mon Sep 24 2018 23:42:47 GMT+0800 (China Standard Time)

Sure. I'll release it soon.

NN_for_QA · Answer 3 · Tue Sep 25 2018 00:41:03 GMT+0800 (China Standard Time)

Thanks a lot! I'll wait for it. It'll be very useful to compare with your performance on Squad 2.0.
E.g. What is your performance (EM, F1) on Squad dev 2.0?

Xiaodong · Answer 4 · Tue Sep 25 2018 01:31:34 GMT+0800 (China Standard Time)

We got 69.x/72.x on dev in terms EM/F1. We're writing a tech report about our model/experiments and will publish soon.

NN_for_QA · Answer 5 · Tue Sep 25 2018 01:46:22 GMT+0800 (China Standard Time)

Wow! That's much higher than what I got when I ran this package: best EM: 62.x F1: 66.x
Did you see anything unusual with my vocab size which I uploaded above? I'm not sure why my performance is ~6 points lower than yours.
I was able to get almost same numbers (as your reported) on 1.1 by running your system.

NN_for_QA · Answer 6 · Tue Sep 25 2018 21:33:47 GMT+0800 (China Standard Time)

@namisan is there any way you can upload the updated code soon? Your code is good and I get to learn a lot about Squad 2. I'm working on a course project with some of your code. I saw that you're going on a vacation on the other open issue. Hope you upload before that. Thanks

hackiey · Answer 7 · Fri Sep 28 2018 13:50:27 GMT+0800 (China Standard Time)

I had a similar EM: 62.x and F1: 66.x results, maybe something is wrong.

NN_for_QA · Answer 8 · Fri Sep 28 2018 22:10:17 GMT+0800 (China Standard Time)

Hi @hackiey. Thanks for confirming that you got same/ similar results as me. The package gets similar results reported in the readme for Squad 1.1 but not for 2.0. @namisan @kevinduh maybe we're doing something wrong?

Xiaodong · Answer 9 · Sat Sep 29 2018 22:44:09 GMT+0800 (China Standard Time)

The current config is for v1.1, not for 2.0. As the attached tech report, using a lower dropout rate, e.g., 0.1, and larger hidden size (300) could lead a better result. Hope this helps. I'm currently on vacation and will checkin the logs or models once I'm back.

NN_for_QA · Answer 10 · Wed Oct 03 2018 09:13:07 GMT+0800 (China Standard Time)

Hope you will upload all the code that gives you the performance gains ..that would be very useful @namisan. Have a good vacation.

NN_for_QA · Answer 11 · Fri Oct 12 2018 20:56:46 GMT+0800 (China Standard Time)

Are you back @namisan ?

hackerwei · Answer 12 · Sat Oct 13 2018 15:20:30 GMT+0800 (China Standard Time)

@namisan could U update your hyper-params for squad-2.0, I have tried dropout & hidden-size, the highest F1 reached 69.3.

NN_for_QA · Answer 13 · Mon Oct 15 2018 21:07:05 GMT+0800 (China Standard Time)

@hackerwei could you please elaborate which params did you change? there are so many drop-out params and hidden size variables. It would be great if you upload your config.py file. Thank you.

@namisan we are also waiting for you too since your params will let us get the numbers reported in the tech report.

Xiaodong · Answer 14 · Sat Nov 10 2018 03:04:25 GMT+0800 (China Standard Time)

I released the worksheets of official submissions. I close this.