Informal & Untested Suggestions for Possible Transformations

Question

Informal & Untested Suggestions for Possible Transformations

kaustubhdhole opened this issue 3 years ago · comments

Here are some random ideas informally put which could be used for perturbations & augmentations. @vgtomahawk is making a formal list in this branch.

Meanwhile here is an informal list for the benefit of the participants.

Interchange positions of SRL AM arguments for non-overlapping AM arguments:
- Alex left for Delhi with his wife at 5 pm. --> Alex left for Delhi at 5 pm with his wife.
- "at 5 pm" (AM-TMP) and "with his wife" (AM-COM) can be exchanged: This is safe to do only with non-core arguments and non-overlapping arguments. Check what SRL is here.
The ButterFingersPertubation could be implemented for keyboard types other than English - like Devanagiri (Hindi, Marathi, Nepail), Shahmukhi (Urdu, Persian), South Indian languages (Tamil, Telugu, Kannada, Malayalam) or Chinese, etc.
Style transfer approaches could be interesting to look at - Changing formal to informal and vice versa. Check this model.

What the heck is going on? --> What is going on?
What you upto? --> What are you doing?

Word Order Changes: Active to Passive & vice versa, Topicalisation, Extraposition, Wh-fronting, (& vice versa) & other used in constituency tests.
Scrambling (for German, Turkic languages)
John went to the store to buy bread. --> To buy bread, John went to the store.

The above are only related to SentenceOperation. There are other transformation types too which could be looked at.

Kaustubh Dhole · Answer 1 · Mon Jun 28 2021 14:02:56 GMT+0800 (China Standard Time)

Adversarial SQUAD adds wrong but similar facts at the end of the context in a QuestionAnswer setting which does not affect the QA pair.

Kaustubh Dhole · Answer 2 · Fri Jul 02 2021 21:24:43 GMT+0800 (China Standard Time)

These two surveys provide a great overview of previous approaches - This is a great place to look for ideas:
https://github.com/AgaMiko/data-augmentation-review
https://arxiv.org/pdf/2105.03075.pdf

Kaustubh Dhole · Answer 3 · Sat Jul 17 2021 03:12:27 GMT+0800 (China Standard Time)

Another excellent set of paraphrases can be checked here: http://cognet.mit.edu/pdfviewer/journal/coli_a_00166

Varun Gangal · Answer 4 · Wed Jul 21 2021 14:29:14 GMT+0800 (China Standard Time)

Another excellent set of paraphrases can be checked here: http://cognet.mit.edu/pdfviewer/journal/coli_a_00166

In particular from the lists in this paper, "Converse Substitution", "Manipulator-Device Substitution" and "Metaphor Substitution" are three which I have seldom seen being implemented anywhere properly in code..

Kaustubh Dhole · Answer 5 · Wed Jul 28 2021 02:21:26 GMT+0800 (China Standard Time)

There is interesting work on gapping worth looking at: https://arxiv.org/pdf/1804.06922.pdf
Paul likes coffee and Mary tea. (gapped sentence)
Paul likes coffee and Mary likes tea. (ungapped sentence)
It would be interesting for building rules to convert to and fro between the above two forms.

Varun Gangal · Answer 6 · Mon Aug 09 2021 13:30:35 GMT+0800 (China Standard Time)

This semi-syntactic paraphrasing algorithm by Tanya Goyal et al, based on reordering source word position [a part of the stream of work following up SCPNs a.k.a Syntactically Controlled Paraphrase Networks (Wieting et al) ] is a really interesting augmentation, particularly due to its reduced sensitivity to the constituency parses.