Riccorl / transformer-srl

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AllenNLP Broken?

KTRosenberg opened this issue · comments

Hi. I came across your repository after having discovered that somewhere down the line, AllenNLP stopped returning reasonable results. Since you depend partly on AllenNLP, I am curious if you have encountered issues with simple sentences. For example, "This is a dog" returns no results whatsoever. "To be" and "To have" both seem bugged. Results also seem to be worse in general than what I remember. I was planning to try your implementation, but I don't see where to find a model to use for SRL / propbank.

The issue I posted over on AllenNLP's repository is here: allenai/allennlp#4818 .

In all honesty, I have no idea why this is happening. I have tried on multiple machines, older and newer versions and models. It seems something has changed.

Would you have any insight into this? I am hoping this issue can be resolved soon, as I depend on BERT/AllenNLP for a major project.

I tried your example with my pretrained model (you can download it from the README) and I got this result:

is: [ARG1: This] [be.01: is] [ARG2: a dog]

I see. This is definitely better than not getting any results.

Is the verbatlas-based model available anywhere? The 2012 one is fairly old.

Is there a way to get this string as a map between tag and entities instead of one large string that I’d need to parse, though? I suppose I could edit the library myself to do this, but maybe there is a way.

Is there a reference for which tags are used? In allennlp (before the issue), “is” would’ve been marked as a “V”.

Also, why is the github page saying that the build is failing though?

I had an answer about your case here. Hope that can help you: here

About the tags in SRL, you can read about PropBank annotations. This document is quite clear: propbank_guidelines

Thanks.
Another test case that seemed to break involves something along the lines of “With its paws, the cat opens the door.” The latest versions would just give me Arg 1 and Arg0s, which is wrong.

How did you try with that sentence?
Here is the result I got from the web demo:
image

Is the verbatlas-based model available anywhere? The 2012 one is fairly old.

I don't know if I can upload it yet. I will let you know if you are interested. But the model linked in the README is fairly new. I trained it a month ago, using BERT and the PropBank inventory, that at the moment is still the most used.

Is there a way to get this string as a map between tag and entities instead of one large string that I’d need to parse, though? I suppose I could edit the library myself to do this, but maybe there is a way.

The output format follows the AllenNLP one that you can find here. The output is a dictionary with the following keys. Under the key verbs you will probably find the output you need.

Is there a reference for which tags are used? In allennlp (before the issue), “is” would’ve been marked as a “V”.

@Hieudepchai is right, you can read there about those tags. TL;DR is that the AllenNLP model doesn't disambiguate verbs, this one does. Hence, the output contains also the verb sense (in place of the generic V label).

Also, why is the github page saying that the build is failing though?

Because I didn't change the version in the last push. It's just a visual error. The last build works.

How did you try with that sentence?
Here is the result I got from the web demo:
image

That works. If I change to “Using its paws,” the demo appears to hang, which is really strange.

@Riccorl I would absolutely be interested in trying the latest/most accurate systems. It seems like verbatlas has more information about relations between words. I’m currently working on a graduate research project that could benefit from extra semantic information, but it’s true that the propbank tags seem more ubiquitous. I also need real-time performance. This project will be open-sourced eventually, but not now, if that matters. VerbAtlas also appears to have a REST API, but I’m not sure how fast that is.

I’ve been very worried considering that things broke recently. As soon as I get to my development machine, I’ll be able to try these things.

Also, for that batch prediction, what is the format of the input list entries?

How did you try with that sentence?
Here is the result I got from the web demo:
image

That works. If I change to “Using its paws,” the demo appears to hang, which is really strange.

@Riccorl I would absolutely be interested in trying the latest/most accurate systems. It seems like verbatlas has more information about relations between words. I’m currently working on a graduate research project that could benefit from extra semantic information, but it’s true that the propbank tags seem more ubiquitous. I also need real-time performance. This project will be open-sourced eventually, but not now, if that matters. VerbAtlas also appears to have a REST API, but I’m not sure how fast that is.

It is kinda slow I'm afraid. The research is fairly new, I don't know exactly the latest development in that regard.

Also, for that batch prediction, what is the format of the input list entries?

A dictionary with at least the sentence key.

batch_input = [
    {"sentence": ...},
    {"sentence": ...},
    {"sentence": ...},
    ...
]

Ah, which is slow? This transformer-srl, or the VerbAtlas REST API? Both? AllenNLP seemed kind of fast on its own.

Ah, which is slow? This transformer-srl, or the VerbAtlas REST API? Both? AllenNLP seemed kind of fast on its own.

VerbAtlas REST API. transformer-srl should be fast. I usually tag 80k sentences in 1 hour and a half on Colab.

This seems to work great! Please keep a back-up around that maybe stores a cached version of allennlp as it is now. For all we know it might break. It would be great to have an archive including all dependencies together.

By the way, why is the verb sense not also included outside the description string for easy access? Should I use the "frame" entry?

This seems to work great! Please keep a back-up around that maybe stores a cached version of allennlp as it is now. For all we know it might break. It would be great to have an archive including all dependencies together.

transformer-srl 2.4.6 will always have allennlp 1.2.0 as requirement, so it should work :)

By the way, why is the verb sense not also included outside the description string for easy access? Should I use the "frame" entry?

You can find it in the frame key and in tags key. To be honest I don't remember if in tags it is just V, I wrote the predictor parser a long time ago and probably I didn't want to mess with Allen code. Anyway, as of now, frame contains the predicate sense of the tag V.

Got it.

I really do wonder what happened woth Allen’s module though. I know I used to use a different version of spacy.

Last major question: can I use Spacy and other data with Spacy independently from this? (e.g. the large word data for dependency parsing) How about other Allen modules like coreference resolution. I want to make sure I don’t introduce conflicta.

Thanks again. Somehow this saved my project, it seems. :)

Last major question: can I use Spacy and other data with Spacy independently from this? (e.g. the large word data for dependency parsing) How about other Allen modules like coreference resolution. I want to make sure I don’t introduce conflicta.

Do you mean if you can use Allen and Spacy in your project together with transformer-srl? Sure, there is no problem.

Thanks again. Somehow this saved my project, it seems. :)

I'm glad it can help 😄

By the way, I discovered that some rare cases, words have the wrong sense. "ablated" gets info for "abdicate."

By the way, I discovered that some rare cases, words have the wrong sense. "ablated" gets info for "abdicate."

Do you have the full sentence?

It can be a Spacy issue. The predicate identification part is done using Spacy, by default the small model (I don't know if you are familiar with Spacy, but the are three model "size"). Probably it choose abdicate as lemma of ablated. The constructor of transformer-srl Predictor class has a parameter for the Spacy model, you can pass a bigger model to it.

Ah, I was wondering how to pass the larger model. That should do it.

By the way (also), it’s a little ubexpected that sentences like “I can dance” don’t put “can dance” together. There are little issues like this, but I figure no model is perfect. I am not sure it this has anything to do with Spacy at that point.

By the way (also), it’s a little ubexpected that sentences like “I can dance” don’t put “can dance” together. There are little issues like this, but I figure no model is perfect. I am not sure it this has anything to do with Spacy at that point.

The problem here is the dataset. There are no training dataset that have verb span longer than one token.

You mean that no one has ever done it? Interesting.

Is this correct? language=?

predictor = predictors.SrlTransformersPredictor.from_path("./models/transformer_srl/srl_bert_base_conll2012.tar.gz", "transformer_srl", language="en_core_web_lg")

I still get incorrect results for that one word:

{'verbs': [{'verb': 'ablated', 'description': '[ARG0: I] [abdicate.01: ablated] [ARG1: it] .', 'tags': ['B-ARG0', 'B-V', 'B-ARG1', 'O'], 'frame': 'abdicate.01', 'frame_score': 0.07089297473430634, 'lemma': 'ablate'}], 'words': ['I', 'ablated', 'it', '.']}

Sorry for bumping, but is the above the correct way to specify the large Spacy dataset? I didn’t notice a difference with the example word, but maybe it’s just OOV. In that case, I would just set the sense to a “?” and use the word itself in “lemma” or use some other method.

I'm really sorry about this late post. Anyway, I got confused by the other models I'm working on. There is actually no filter for predicate disambiguation, the models is free to choose any sense regardless of the actual lemma. Changing the spacy model has no consequence. It is just an OOV or a mistake by the model.

In my (little) experience, a filter on the output labels often doesn't improve the results. A naive (and simple) workaround is, for example, to filter out the predictions that don't match the lemma.

That's alright. Thanks again for your help. They fixed the main AllenNLP model, but it still doesn't have senses/roles, so I'll stick with yours, assuming the performance continues to be alright. :)