difference with just plain regex?
randomgambit opened this issue · comments
Hello Jay,
Thanks for this nice package. Sorry but I am a bit unexperienced in NLP and I do not quite get the difference between running some Part-of-Speech parsing using udpipe
+ some regex, and your function corpuslingr::clr_search_gramx
.
Is there something more that I am missing here?
Thanks!
Hi @randomgambit . Thanks for the interest. Sorry for the slow response. The clr_search_gramx
function streamlines search for lexical/grammatical patterns occurring across multiple annotation features (eg, lemma, token, and part-of-speech). It is regex based, but is supplemented with a simple "corpus querying language" to make search easier. Before using clr_search_gramx
, an annotated corpus (via udpipe
, eg) needs to be amended some using the clr_set_corpus
function.
Some_annotated_corpus %>% clr_set_corpus() %>% clr_search_gramx (search = “ADJ (like a)? NOUN”)
Additional search examples can be viewed here. Let me know.