ContentMine / CambridgeChemistryWorkshopSep2015

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic validation and extraction of data from publications in Chemical and Materials Sciences

Workshop at the Department of Chemistry, University of Cambridge

Register [here] (http://www.eventbrite.co.uk/e/contentmine-chemistry-hack-tickets-18534620549) (registration is FREE, places limited to 25 )

==============

ContentMine logo

Location: U202, Department of Chemistry, Lensfield Road CB2 1EW

Dates: 18-19 September 2015

18 September 2015 19 September 2015
Training Workshop & Publisher Panel Session Hackday
9:00 - 18:00 10:00 - 17:00

Contact us via [@TheContentMine] (https://twitter.com/TheContentMine) or contact@contentmine.org

Trainers:

Please read the [Pre-workshop Installation Instructions] (https://github.com/ContentMine/vms/blob/master/installation_intructions.md)

We would also appreciate your feedback

Workshop Purpose

Ever found that the key data you want is published in a text-based PDF journal?

  • ...found yourself manually downloading 100 papers click-by-click?
  • ...redrawing structures/spectra/graphs so you can recompute/analyze them?
  • ...retyping data from tables?
  • ...wishing that a computer can do the really boring discovery and retrieval of the data in the literature?

We all have. But new approaches are solving it. That's why Content-Mining (aka text-and-data mining, TDM) is one of the most exciting areas in scientific data. It's even been intensively debated in the European Parliament and Commission. And the UK is leading the way with new exemptions from copyright so that Universities like Cambridge are the ideal places to learn and develop the new techniques.

The workshop will bring together:

  • scientists with a need to discover data, especially in chemistry, materials, molecular bioscience - both experimental and computational
  • scientific publishers
  • library staff
  • technology developers.

We'll show how Open software can be used to

  • crawl the literature effectively using search APIs
  • scrape all the content from publisher web pages (supplemental data, structures)
  • normalize PDFs into semantic HTML
  • run search plugins to discover particular.

The first day will include overviews, installation of technology [1], and a panel of experts from the participants on policy and practice and a hands-on introduction. The second day will be a project-based hack where small groups will tackle their own communal problems. The event is sponsored by the EPSRC-IAA Knowledge Transfer Fund of the Chemistry Department. Facilitators are from Chemistry and Plant Sciences. Coffee, lunches and a Friday dinner are provided.

[1] all essential technology is Open and from contentmine.org, an Open project funded by the Shuttleworth Foundation.

Training Workshop and Publisher Panel Session Agenda

Times Session
9:00 Introductions
9:15 What is content mining?
  • Overview presentation from ContentMine staff
9:30 Think like a content miner
  • Hands-on activity facilitated by ContentMine staff introducing entity extraction techniques, precision and recall.
Scraping and the anatomy of scrapers
11:00 Preparations for panel discussion with publishers
12:30 Lunch
13:30 Publishers Q&A
15:30 Tea time
16:00 Entity recognition using AMI
  • Hands-on activity facilitated by ContentMine staff including extracting species names from OA papers using AMI-species.
18:00 onwards Informal social event (dinner)
  • Move as a group to nearby pub or late opening cafe (to discuss hackday projects).
Reservation to be confirmed at Browns from 18:00 onwards.

Workshop Hackday Agenda

Times Session
10:00 **Hacking in teams working on AMICHEM, Chemical tagger,... **
12:30 Lunch
13:30 **Hacking in teams working on AMICHEM, Chemical tagger,... **
15:30 Coffee Break
16:00 Presentation of hackday projects
  • Presentations delivered by participants, including future scope for development of their projects.
16:30 Panel discussion on accelerating uptake of content mining.
  • Panel and Q&A with audience including workshop participants.
17:00 Event close

Intended Audience

This two day event is intended for researchers or research-related staff who are not currently heavily involved in text and data mining but have at least some pre-existing computational skills. At minimum we expect familiarity with a command line interface and basic coding abilities in some language.

Click here to be advised of future ContentMine Workshops

About

License:Creative Commons Zero v1.0 Universal