A package for scraping patientforum discussion threads.
You can install the released version of healthforum from CRAN with:
install.packages("healthforum")
And the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("LingshuHu/healthforum")
This is a basic example which shows you how to scrape this discussion thread from patient.info.
## load healthforum
library(healthforum)
## scrape pages 1-2 from thread about gastritis
gas <- scrape_one_post(
url = "https://patient.info/forums/discuss/can-gastritis-be-cured--613999",
From = 1, To = 2)
#> Warning in FUN(X[[i]], ...): NAs introduced by coercion
Preview the returned data frame
tibble::as_tibble(gas)
#> # A tibble: 346 x 13
#> posts_id post_time types user_names reply_names likes replies text
#> * <chr> <dttm> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 613999 2017-09-30 10:38:00 main… TheWolver… <NA> 4 343 I ha…
#> 2 2858159 2017-09-30 14:37:00 reply pippa58442 TheWolveri… 1 332 Gast…
#> 3 2858195 2017-09-30 15:42:00 nest… suzanne_6… pippa58442 0 0 Yes …
#> 4 2858274 2017-09-30 17:56:00 nest… TheWolver… pippa58442 0 0 Will…
#> 5 2858298 2017-09-30 18:27:00 nest… pippa58442 TheWolveri… 1 0 To b…
#> 6 2858300 2017-09-30 18:31:00 nest… TheWolver… pippa58442 0 0 Dont…
#> 7 2858367 2017-09-30 20:22:00 nest… pippa58442 TheWolveri… 0 0 The …
#> 8 2858405 2017-09-30 21:17:00 nest… TheWolver… pippa58442 0 0 HOW …
#> 9 2858502 2017-09-30 23:04:00 nest… pippa58442 TheWolveri… 0 0 I ha…
#> 10 2858730 2017-10-01 08:34:00 nest… TheWolver… <NA> 0 0 I ha…
#> # ... with 336 more rows, and 5 more variables: post_title <chr>, join_date <dttm>,
#> # posts_num <dbl>, profile_text <chr>, group_names <chr>
healthforum
was developed to collect publicly available data from the website patient.info. The purpose of this package is to facilitate academic research. It is the final user's responsibility to store the data securely and obey all applicable local, state, and federal laws.