EleutherAI / the-pile

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NIH Abstract text for awarded grants

thoppe opened this issue · comments

The NIH (National Institutes of Health) provide a record of all the abstracts of publicly funded grants on ExPorter. There are two main URLs:

https://exporter.nih.gov/ExPORTER_Catalog.aspx?sid=0&index=1
https://exporter.nih.gov/CRISP_Catalog.aspx?sid=0&index=1

The later of which contains some overlapping legacy data. The text needs some minimal preprocessing, but is otherwise in good shape. Example:

DESCRIPTION (provided by applicant): Promising results from prophylactic HPV vaccine trials support using these vaccines in cervical cancer prevention programs in the-future. Since vaccine coverage rarely if ever reaches 100%, population-level effectiveness of a prophylactic vaccine designed to prevent a sexually transmitted infection, such as an HPV vaccine, depends not only on the efficacy of the vaccine, but also on the incidence and duration of infection in both men and women. Although much has been learned about the epidemiology of human papillomavirus (HPV) infections in women, little is known about the incidence, determinants, and natural history of HPV infections in men. Research in men has been hampered, in part, by an inability to obtain adequate genital samples for HPV DNA testing. As discussed in this proposal, we developed a sensitive and acceptable method for sample collection and now propose to use this method in a prospective natural history study with the following aims. Among young men, (1) determine the incidence of infection with any type of HPV, oncogenic HPV, specific HPV types including HPV 16 and HPV6/11, and HPV 16 variants; (2) define risk 'factors for incident HPV infection, including lifetime and recent number of sex partners, circumcision status, condom use, frequency of vaginal intercourse, and courtship behavior; and (3) describe the natural history of HPV infection in men as measured by duration and levels of HPV DNA, HPV type-specific seroconversion, duration of antibodies, and development of genital warts. Our long-term goal is development of cost-effective approaches to the prevention of HPV-related cancers.

It's not the largest dataset (estimated about 2 GB compressed?) but it's easy to get and the text is high-quality.

This is done, writing up now. Please add me as the assignee for it :)

Code is complete, and is currently up at https://github.com/thoppe/The-Pile-NIH-ExPORTER . Working on importing it into the main repo now.

This is done, writing up now. Please add me as the assignee for it :)

So you're able to close issues but not assign yourself to them? Can you change the label or where it is in the Kanban?

That's good info to have. If you get annoyed with your current permission level we can kick it up but we've been putting off handing them out because we are still feeling out how organization permissions + teams work.

Assignments are designed to auto-move to "done" when you close the comment or merge the PR. That's probably what you've noticed happening.

Bump it up only if you get bothered by my requests. I'm just trying to find the organizational structure so I don't mess up the flow that's already there.

This looks like it's finished and merged @thoppe? Should it be closed?

It is! Closing.