transform functionalities

Question

transform functionalities

LinguList opened this issue 4 years ago · comments

Johann-Mattis List commented 4 years ago

transform or manipulate makes another sequence out of a given sequence

lingpy.sequence.soundclasses.syllabify (infers syllable boundaries and inserts them in form of +)
lingpy.sequence.soundclasses.get_all_ngrams (quite useful NLP function, and a classical example for sequence manipulation, but this function occurs also in sequence.ngrams, so it is duplicated (!))
lingpy.sequence.soundclasses.tokens2morphemes

And maybe some of the ngram functions, but they are also rather specific, I think.

Robert Forkel · Answer 1 · Fri Apr 24 2020 23:48:02 GMT+0800 (China Standard Time)

Regarding ngrams, I'm not sure this is needed considering that it's rather short to implement:

def ngrams(l):
    for i in reversed(range(len(l))):
        for j in range(len(l) - i):
            yield l[j:j+i+1]
             
> list(ngrams(list('abcdefg')))
[['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f'], ['b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e'], ['d', 'e', 'f'], ['e', 'f', 'g'], ['a', 'b'], ['b', 'c'], ['c', 'd'], ['d', 'e'], ['e', 'f'], ['f', 'g'], ['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g']]

get_all_posngrams seems a lot more powerful. So I'd rather just not add such a function here.

Johann-Mattis List · Answer 2 · Tue May 12 2020 16:03:09 GMT+0800 (China Standard Time)

Just thought about ngram functions. They are basically all easy to implement, also bi, trigrams, and the like. And they are not necessarily needed by now, it would rather be handy to have them in some place, for developing new experiments and algortithms. If needed, one could add ngram functions in a specific ngram module of linse, I think, since they are a specific way of manipulation that one recognizes as something specific.

Johann-Mattis List · Answer 3 · Tue May 12 2020 16:03:26 GMT+0800 (China Standard Time)

So in my opinion, we can drop this for the time being and mark this closed.