covars_make_all returns NAs for baselines
kmunger opened this issue · comments
Kevin Munger commented
When I run the function covars_make_all on hansard speeches, 29 of the 33 measures are returned correctly, but not the 4 measures related to word rarity.
However, when I run covars_make_baselines, these 4 measures work on the same corpus.
setwd("C:/Users/kevin/Dropbox/Benoit_Spirling_Readability/hansard_data/")
files<-list.files()
##initialize
all_files<-read.csv(paste0(files[2]), stringsAsFactors = F)
restricted<-filter(all_files, party == "Conservative" | party == "Labour")
speakers<-all_files$speaker
tab<-table(speakers)
speakers_morethan10 <- names(tab[tab > 10])
restricted <- filter(restricted, speaker %in% speakers_morethan10)
restricted<-restricted[which(ntoken(restricted$text)>10),]
data_corpus_speeches66 <- corpus(restricted)
pos<-covars_make_all(data_corpus_speeches66, dependency=F)`
> pos$google_min_2000[100]
[1] NA
> pos$brown_mean[1000]
[1] NA
Kenneth Benoit commented
@kmunger is this still a concern, or just an issue to fix (eventually) in the software?
Kevin Munger commented
@kbenoit Not an immediate concern, there's an easy workaround, just something to fix at some point