spiros / ukbiobank-resources

A curated list for preprocessing, cleaning, mapping and analyzing UK Biobank data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UK Biobank Resources

A curated list for preprocessing, cleaning, mapping and analyzing UK Biobank data.

Data schema

  • Tofu - Tofu is a Python tool for generating synthetic UK Biobank data.
  • ukbschemas - Use R to generate a database containing the UK Biobank data schemas.
  • Biobank Read - Python-based tools for the extraction, cleaning and pre-processing.
  • FUNPACK - Python library for pre-processing of UK BioBank data.
  • phemap - Python functions to map between ICD-10 terms and PheCodes.
  • ukb_download_and_prep_template - Template for common processing operations.
  • ukbiobank-loaders - a collection of Python tools to load data into Apache Parquet files developed by BenevolentAI.

Analytical tools

Information resources


  • Primary care EHR biomarkers -- machine-readable versions (CSV files) of electronic health record phenotyping algorithms for 31 commonly-measured biomakers.


  • Docker utils - Docker image for UK Biobank utilities: ukbunpack, ukbfetch, ukblink, ukbgene, ukbmd5, ukbconv

Other resources

  • WikiMedMap mapping of phenotype strings from questionnaires in the UK Biobank and from Mendelian diseases in Online Mendelian Inheritance in Man (OMIM) database to eight vocabularies: International Classification of Diseases, Ninth Revision (ICD-9), ICD-10, ICD-O, Medical Subject Headings (MeSH), OMIM, Disease Database, and MedlinePlus.

  • ukbb-srmed mapping of self-reported medication entries to the Anatomical Therapeutic Chemical (ATC) classification system and in the British National Formulary (BNF) coding systems.


Your contributions are always welcome!

Please submit a pull request or create an issue to add a new package, library or software to the list.



To the extent possible under law, Spiros Denaxas has waived all copyright and related or neighboring rights to this work.


A curated list for preprocessing, cleaning, mapping and analyzing UK Biobank data.