transform functionalities
LinguList opened this issue · comments
transform or manipulate makes another sequence out of a given sequence
- lingpy.sequence.soundclasses.syllabify (infers syllable boundaries and inserts them in form of
+
) - lingpy.sequence.soundclasses.get_all_ngrams (quite useful NLP function, and a classical example for sequence manipulation, but this function occurs also in sequence.ngrams, so it is duplicated (!))
- lingpy.sequence.soundclasses.tokens2morphemes
And maybe some of the ngram functions, but they are also rather specific, I think.
Regarding ngrams
, I'm not sure this is needed considering that it's rather short to implement:
def ngrams(l):
for i in reversed(range(len(l))):
for j in range(len(l) - i):
yield l[j:j+i+1]
> list(ngrams(list('abcdefg')))
[['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f'], ['b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e'], ['b', 'c', 'd', 'e', 'f'], ['c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd'], ['b', 'c', 'd', 'e'], ['c', 'd', 'e', 'f'], ['d', 'e', 'f', 'g'], ['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e'], ['d', 'e', 'f'], ['e', 'f', 'g'], ['a', 'b'], ['b', 'c'], ['c', 'd'], ['d', 'e'], ['e', 'f'], ['f', 'g'], ['a'], ['b'], ['c'], ['d'], ['e'], ['f'], ['g']]
get_all_posngrams
seems a lot more powerful. So I'd rather just not add such a function here.
Just thought about ngram functions. They are basically all easy to implement, also bi, trigrams, and the like. And they are not necessarily needed by now, it would rather be handy to have them in some place, for developing new experiments and algortithms. If needed, one could add ngram functions in a specific ngram module of linse, I think, since they are a specific way of manipulation that one recognizes as something specific.
So in my opinion, we can drop this for the time being and mark this closed.