AllenNLP Broken?

Question

AllenNLP Broken?

KTRosenberg opened this issue 4 years ago · comments

Karl Toby Rosenberg commented 4 years ago

Hi. I came across your repository after having discovered that somewhere down the line, AllenNLP stopped returning reasonable results. Since you depend partly on AllenNLP, I am curious if you have encountered issues with simple sentences. For example, "This is a dog" returns no results whatsoever. "To be" and "To have" both seem bugged. Results also seem to be worse in general than what I remember. I was planning to try your implementation, but I don't see where to find a model to use for SRL / propbank.

The issue I posted over on AllenNLP's repository is here: allenai/allennlp#4818 .

In all honesty, I have no idea why this is happening. I have tried on multiple machines, older and newer versions and models. It seems something has changed.

Would you have any insight into this? I am hoping this issue can be resolved soon, as I depend on BERT/AllenNLP for a major project.

Riccardo Orlando · Answer 1 · Thu Nov 26 2020 18:03:48 GMT+0800 (China Standard Time)

I tried your example with my pretrained model (you can download it from the README) and I got this result:

is: [ARG1: This] [be.01: is] [ARG2: a dog]

Karl Toby Rosenberg · Answer 2 · Thu Nov 26 2020 21:36:13 GMT+0800 (China Standard Time)

I see. This is definitely better than not getting any results.

Is the verbatlas-based model available anywhere? The 2012 one is fairly old.

Is there a way to get this string as a map between tag and entities instead of one large string that I’d need to parse, though? I suppose I could edit the library myself to do this, but maybe there is a way.

Is there a reference for which tags are used? In allennlp (before the issue), “is” would’ve been marked as a “V”.

Also, why is the github page saying that the build is failing though?

Hieu Minh Nguyen · Answer 3 · Thu Nov 26 2020 22:02:09 GMT+0800 (China Standard Time)

I had an answer about your case here. Hope that can help you: here

About the tags in SRL, you can read about PropBank annotations. This document is quite clear: propbank_guidelines

Karl Toby Rosenberg · Answer 4 · Thu Nov 26 2020 22:10:55 GMT+0800 (China Standard Time)

Thanks.
Another test case that seemed to break involves something along the lines of “With its paws, the cat opens the door.” The latest versions would just give me Arg 1 and Arg0s, which is wrong.

Hieu Minh Nguyen · Answer 5 · Thu Nov 26 2020 22:14:30 GMT+0800 (China Standard Time)

How did you try with that sentence?
Here is the result I got from the web demo:

Riccardo Orlando · Answer 6 · Thu Nov 26 2020 22:24:46 GMT+0800 (China Standard Time)

Is the verbatlas-based model available anywhere? The 2012 one is fairly old.

I don't know if I can upload it yet. I will let you know if you are interested. But the model linked in the README is fairly new. I trained it a month ago, using BERT and the PropBank inventory, that at the moment is still the most used.

Is there a way to get this string as a map between tag and entities instead of one large string that I’d need to parse, though? I suppose I could edit the library myself to do this, but maybe there is a way.

The output format follows the AllenNLP one that you can find here. The output is a dictionary with the following keys. Under the key verbs you will probably find the output you need.

Is there a reference for which tags are used? In allennlp (before the issue), “is” would’ve been marked as a “V”.

@Hieudepchai is right, you can read there about those tags. TL;DR is that the AllenNLP model doesn't disambiguate verbs, this one does. Hence, the output contains also the verb sense (in place of the generic V label).

Also, why is the github page saying that the build is failing though?

Because I didn't change the version in the last push. It's just a visual error. The last build works.

Karl Toby Rosenberg · Answer 7 · Thu Nov 26 2020 22:57:32 GMT+0800 (China Standard Time)

How did you try with that sentence?
Here is the result I got from the web demo:

That works. If I change to “Using its paws,” the demo appears to hang, which is really strange.

@Riccorl I would absolutely be interested in trying the latest/most accurate systems. It seems like verbatlas has more information about relations between words. I’m currently working on a graduate research project that could benefit from extra semantic information, but it’s true that the propbank tags seem more ubiquitous. I also need real-time performance. This project will be open-sourced eventually, but not now, if that matters. VerbAtlas also appears to have a REST API, but I’m not sure how fast that is.

I’ve been very worried considering that things broke recently. As soon as I get to my development machine, I’ll be able to try these things.

Also, for that batch prediction, what is the format of the input list entries?

Riccardo Orlando · Answer 8 · Thu Nov 26 2020 23:22:58 GMT+0800 (China Standard Time)

How did you try with that sentence?
Here is the result I got from the web demo:

That works. If I change to “Using its paws,” the demo appears to hang, which is really strange.

@Riccorl I would absolutely be interested in trying the latest/most accurate systems. It seems like verbatlas has more information about relations between words. I’m currently working on a graduate research project that could benefit from extra semantic information, but it’s true that the propbank tags seem more ubiquitous. I also need real-time performance. This project will be open-sourced eventually, but not now, if that matters. VerbAtlas also appears to have a REST API, but I’m not sure how fast that is.

It is kinda slow I'm afraid. The research is fairly new, I don't know exactly the latest development in that regard.

Also, for that batch prediction, what is the format of the input list entries?

A dictionary with at least the sentence key.

batch_input = [
    {"sentence": ...},
    {"sentence": ...},
    {"sentence": ...},
    ...
]

Karl Toby Rosenberg · Answer 9 · Thu Nov 26 2020 23:27:02 GMT+0800 (China Standard Time)

Ah, which is slow? This transformer-srl, or the VerbAtlas REST API? Both? AllenNLP seemed kind of fast on its own.

Riccardo Orlando · Answer 10 · Fri Nov 27 2020 00:43:14 GMT+0800 (China Standard Time)

Ah, which is slow? This transformer-srl, or the VerbAtlas REST API? Both? AllenNLP seemed kind of fast on its own.

VerbAtlas REST API. transformer-srl should be fast. I usually tag 80k sentences in 1 hour and a half on Colab.

Karl Toby Rosenberg · Answer 11 · Fri Nov 27 2020 03:32:30 GMT+0800 (China Standard Time)

This seems to work great! Please keep a back-up around that maybe stores a cached version of allennlp as it is now. For all we know it might break. It would be great to have an archive including all dependencies together.

By the way, why is the verb sense not also included outside the description string for easy access? Should I use the "frame" entry?

Riccardo Orlando · Answer 12 · Fri Nov 27 2020 04:15:42 GMT+0800 (China Standard Time)

This seems to work great! Please keep a back-up around that maybe stores a cached version of allennlp as it is now. For all we know it might break. It would be great to have an archive including all dependencies together.

transformer-srl 2.4.6 will always have allennlp 1.2.0 as requirement, so it should work :)

By the way, why is the verb sense not also included outside the description string for easy access? Should I use the "frame" entry?

You can find it in the frame key and in tags key. To be honest I don't remember if in tags it is just V, I wrote the predictor parser a long time ago and probably I didn't want to mess with Allen code. Anyway, as of now, frame contains the predicate sense of the tag V.

Karl Toby Rosenberg · Answer 13 · Fri Nov 27 2020 04:20:14 GMT+0800 (China Standard Time)

Got it.

I really do wonder what happened woth Allen’s module though. I know I used to use a different version of spacy.

Last major question: can I use Spacy and other data with Spacy independently from this? (e.g. the large word data for dependency parsing) How about other Allen modules like coreference resolution. I want to make sure I don’t introduce conflicta.

Thanks again. Somehow this saved my project, it seems. :)

Riccardo Orlando · Answer 14 · Fri Nov 27 2020 07:21:51 GMT+0800 (China Standard Time)

Last major question: can I use Spacy and other data with Spacy independently from this? (e.g. the large word data for dependency parsing) How about other Allen modules like coreference resolution. I want to make sure I don’t introduce conflicta.

Do you mean if you can use Allen and Spacy in your project together with transformer-srl? Sure, there is no problem.

Thanks again. Somehow this saved my project, it seems. :)

I'm glad it can help 😄

Karl Toby Rosenberg · Answer 15 · Sat Nov 28 2020 06:30:42 GMT+0800 (China Standard Time)

By the way, I discovered that some rare cases, words have the wrong sense. "ablated" gets info for "abdicate."

Riccardo Orlando · Answer 16 · Sat Nov 28 2020 23:29:58 GMT+0800 (China Standard Time)

By the way, I discovered that some rare cases, words have the wrong sense. "ablated" gets info for "abdicate."

Do you have the full sentence?

It can be a Spacy issue. The predicate identification part is done using Spacy, by default the small model (I don't know if you are familiar with Spacy, but the are three model "size"). Probably it choose abdicate as lemma of ablated. The constructor of transformer-srl Predictor class has a parameter for the Spacy model, you can pass a bigger model to it.

Karl Toby Rosenberg · Answer 17 · Sun Nov 29 2020 01:35:53 GMT+0800 (China Standard Time)

Ah, I was wondering how to pass the larger model. That should do it.

By the way (also), it’s a little ubexpected that sentences like “I can dance” don’t put “can dance” together. There are little issues like this, but I figure no model is perfect. I am not sure it this has anything to do with Spacy at that point.

Riccardo Orlando · Answer 18 · Sun Nov 29 2020 02:27:34 GMT+0800 (China Standard Time)

By the way (also), it’s a little ubexpected that sentences like “I can dance” don’t put “can dance” together. There are little issues like this, but I figure no model is perfect. I am not sure it this has anything to do with Spacy at that point.

The problem here is the dataset. There are no training dataset that have verb span longer than one token.

Karl Toby Rosenberg · Answer 19 · Sun Nov 29 2020 02:40:11 GMT+0800 (China Standard Time)

You mean that no one has ever done it? Interesting.

Karl Toby Rosenberg · Answer 20 · Sun Nov 29 2020 09:26:09 GMT+0800 (China Standard Time)

Is this correct? language=?

predictor = predictors.SrlTransformersPredictor.from_path("./models/transformer_srl/srl_bert_base_conll2012.tar.gz", "transformer_srl", language="en_core_web_lg")

I still get incorrect results for that one word:

{'verbs': [{'verb': 'ablated', 'description': '[ARG0: I] [abdicate.01: ablated] [ARG1: it] .', 'tags': ['B-ARG0', 'B-V', 'B-ARG1', 'O'], 'frame': 'abdicate.01', 'frame_score': 0.07089297473430634, 'lemma': 'ablate'}], 'words': ['I', 'ablated', 'it', '.']}

Karl Toby Rosenberg · Answer 21 · Tue Dec 08 2020 21:05:55 GMT+0800 (China Standard Time)

Sorry for bumping, but is the above the correct way to specify the large Spacy dataset? I didn’t notice a difference with the example word, but maybe it’s just OOV. In that case, I would just set the sense to a “?” and use the word itself in “lemma” or use some other method.

Riccardo Orlando · Answer 22 · Mon Dec 14 2020 17:43:14 GMT+0800 (China Standard Time)

I'm really sorry about this late post. Anyway, I got confused by the other models I'm working on. There is actually no filter for predicate disambiguation, the models is free to choose any sense regardless of the actual lemma. Changing the spacy model has no consequence. It is just an OOV or a mistake by the model.

In my (little) experience, a filter on the output labels often doesn't improve the results. A naive (and simple) workaround is, for example, to filter out the predictions that don't match the lemma.

Karl Toby Rosenberg · Answer 23 · Tue Dec 15 2020 12:01:44 GMT+0800 (China Standard Time)

That's alright. Thanks again for your help. They fixed the main AllenNLP model, but it still doesn't have senses/roles, so I'll stick with yours, assuming the performance continues to be alright. :)