delph-in / docs

DELPH-IN Documentation

Home Page:https://delph-in.github.io/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update links for Utool

arademaker opened this issue · comments

We are using https://www.coli.uni-saarland.de/projects/chorus/utool/ to solve all possible quantifiers scopes of MRSs. The first question is: why in the wiki we don't have any single reference to this tool?

I know that LKB also implements a quantifier scopes solver, but for integrating with our current pipeline, it was easier to start testing utool in a client/server mode. Problem is that we don't have much control of the process. We can't ask for the top N solutions, for example. Moreover, we can't have any estimative of the amount of memory that will be needed to process a given MRS. In one particular case:

law stating that when two elements can combine to form more than one compound the amounts of one of them that combines with a fixed amount of the other will exhibit a simple multiple relation

we have 18 HCONS in the resulted MRS from ERG and the utool is asking for more than 8GB to solve the quantifiers scopes.

So my second question is if someone here has already used utool and may have any suggestions...

The first question is: why in the wiki we don't have any single reference to this tool?

There are 7 references, but I'd agree they could be easy to miss: https://github.com/delph-in/docs/search?q=utool&type=wikis

We can't ask for the top N solutions, for example.

I don't recall it having a way to rank the solutions. If you just want the first N, maybe you could use the command-line mode and pipe the output to head -nN?

Moreover, we can't have any estimative of the amount of memory that will be needed to process a given MRS.

The benchmarks section in the manual (5.5) has a chart plotting the number of constraints with the (search) chart size:

utool-size-chart

It's not in term of memory but it might be useful. The manual also states this in Section 2.3:

If you want to give your computer a challenge, we encourage you to run solvable on
the input ex:rondane-650.mrs.pl, an MRS USR representing more than two trillion
readings. You will have to increase Java’s memory limit by passing it the option -Xmx512m
to avoid “out of memory” errors.

I think the 512M is just to hold the chart, as I think they only get the number of readings and not the readings themselves when computing the benchmarks. If you could cram each reading into a single byte, it would take about 2000GB to hold 2 trillion readings.

Yes, crazy numbers. But I wrote to @alexanderkoller and he quickly replied to me with a solution:

great to see that Utool is still useful! I don’t think I have touched the code in the past ten years, except for very occasional updates to keep it working with modern Java versions. As you can see in the GUI, Utool can enumerate the solved forms one by one. I just added a quick hack to the Utool server so you can limit the number of solved forms that are computed. In your XML command, replace

<utool cmd=“solve”>

by

<utool cmd=“solve” limit=“10”>

to receive only the first 10 solved forms.

I have just created an issue coli-saar/utool#5 but I hope it will not be complex to solve it.

For my project with the WN glosses I got

http://wn.mybluemix.net/synset?id=05882226-n

law stating that when two elements can combine to form more than one compound the amounts of one of them that combines with a fixed amount of the other will exhibit a simple multiple relation;

{‘solvable’: ‘true’, ‘fragments’: ‘30’, ‘count’: ‘60044376’, ‘chartsize’: ‘4536’, ‘time’: ‘110’, ‘id’: ‘1474’}

sixty million forty-four thousand three hundred seventy-six possible solutions! and

http://wn.mybluemix.net/synset?id=05887365-n

the basis of quantum theory; the energy of electromagnetic waves is contained in indivisible quanta that have to be radiated or absorbed as a whole; the magnitude is proportional to frequency where the constant of proportionality is given by Planck's constant;

{‘solvable’: ‘true’, ‘fragments’: ‘33’, ‘count’: ‘19338641280’, ‘chartsize’: ‘29597’, ‘time’: ‘675’, ‘id’: ‘1500’}

nineteen billion three hundred thirty-eight million six hundred forty-one thousand two hundred eighty possible solutions!

There are 7 references, but I'd agree they could be easy to miss: https://github.com/delph-in/docs/search?q=utool&type=wikis

Yep, my mistake. But as you said, easy to miss references inside discussions. We don't have any specific page about the scope resolution and approaches for that.

Great, I'm glad Alexander Koller was so responsive and helpful to the technical issue. For the wiki, I think we either need an overhaul of the *MRS-related wikis or the separate, more "published" kind of documentation for established technologies that we've been discussing.

I wasn't even aware that people were still using Utool - great to see it is still useful.

Utool now lives on Github. It would probably make sense to replace all references to Utool websites with this link. I'd be happy to include Delphin-specific information about Utool in the Github wiki if you think that would be a useful division of labor.

Thanks @alexanderkoller, that sounds good. I'll reopen and repurpose this issue for updating those links.

I'd be happy to include Delphin-specific information about Utool in the Github wiki if you think that would be a useful division of labor.

I'm not yet sure where is the best place for that, but it would be very welcome. In particular, I have the rough understanding that Utool works with dominance constraints instead of MRS's handle constraints, but it's not fully clear what practical differences in interpretation this entails. For instance, the manual states the following:

Utool doesn’t contain any MRS output codecs, because MRS makes some specific as-
sumptions about the underlying object language, and it is not clear that a useful class
of labelled dominance graphs can indeed be correctly translated into MRS.

I'd be happy to see these points expanded upon.

Utool doesn’t contain any MRS output codecs, because MRS makes some specific as-
sumptions about the underlying object language, and it is not clear that a useful class
of labelled dominance graphs can indeed be correctly translated into MRS.

I'd be happy to see these points expanded upon.

Yes, Utool works with dominance graphs internally because that's the only underspecification formalism that anyone knows how to solve efficiently. We showed how to convert MRS to dominance constraints (and thence, dominance graphs) in Fuchss et al. 2004.

As we show there, not every MRS can be converted to a dominance graph; but when an MRS can't be converted, it is probably not correct anyway (Flickinger et al., 2005). Conversely, we never specified a conversion from dominance graphs to MRS because MRS uses idiosyncratic tools to express dominance (variable binding, qeq), and you can easily write down a dominance graph for which it's not clear what the MRS would look like. For instance, the chain of length 3 in the manual.

In practice, this was never an issue; the MRS -> dominance graph conversion is much more important than the reverse direction because we can get MRSs at scale (through HPSG grammars) and we can solve dominance graphs efficiently. And the MRSs that are produced by the grammars can almost all be converted.

Thank you @alexanderkoller for the additional references and explanations. I can start draft a page about utool in the wiki and later you and @goodmami can elaborate. Does it work?

Yes, many thanks @alexanderkoller!

@arademaker I don't think we need a wiki page specifically about Utool because we should instead just link to the repo proper, but this information would do well in a page describing underspecification and scope resolution with MRS, in which we can discuss the issue in general and also include mentions of other implementations, such as in the LKB. What do you think?