juliasilge / tidytext

Text mining using tidy tools :sparkles::page_facing_up::sparkles:

Home Page:https://juliasilge.github.io/tidytext/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Working out the Pearson Moment Correlation AMONG the Bronte Sisters

TerencePatrick opened this issue · comments

Hi Julia,
This code was working until a few days ago, but now there is a persistent set of error messages. What I have been trying to do is to find the Pearson Correlation that exists among the Bronte sisters themselves rather than between Jane Austen and the Bronte sisters as a collective. It seems that Gutenberg doesn't want to help me download the texts, and this has a cascading effect on the subsequent commands.

library(dplyr)
library(stringr)
library(gutenbergr)
library(ggplot2)
library(tidytext)
library(tidyr)
library(scales)

cbronte <- gutenberg_download (c(1260, 9182))
tidy_cbronte <- cbronte %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)

tidy_cbronte %>%
count(word, sort = TRUE)

ebronte <- gutenberg_download(c(768))
tidy_ebronte <- ebronte %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)

tidy_ebronte %>%
count(word, sort = TRUE)

abronte <- gutenberg_download(c(969, 767))
tidy_abronte <- abronte %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)

tidy_abronte %>%
count(word, sort = TRUE)

frequency <- bind_rows(mutate(tidy_cbronte, author = "Charlotte Brontë"),
mutate(tidy_ebronte, author = "Emily Brontë"),
mutate(tidy_abronte, author = "Anne Brontë")) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
count(author, word) %>%
group_by(author) %>%
mutate(proportion = n / sum(n)) %>%
select(-n) %>%
pivot_wider(names_from = author, values_from = proportion) %>%
pivot_longer(Charlotte Brontë:Emily Brontë,
names_to = "author", values_to = "proportion")

frequency

cor.test(data = frequency[frequency$author == "Charlotte Brontë",],
~ proportion + Anne Brontë)

cor.test(data = frequency[frequency$author == "Emily Brontë",],
~ proportion + Anne Brontë)

Are you having trouble using gutenbergr? Is it the same problem as ropensci/gutenbergr#28?

Let me know if you are having trouble with tidytext functions!

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.