NIHR-BI / 2022_23_Flu_data_upload

Saving external data from NHS England and NHS Wales

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flu_data_upload

Repeated action to clean and save flu data for Wales and England

Data sources

Country Name Website link Public dashboard Link Direct data link
England UEC Daily SitRep – Web File Timeseries https://www.england.nhs.uk/statistics/statistical-work-areas/uec-sitrep/urgent-and-emergency-care-daily-situation-reports-2022-23/ None The link changes, see the "UEC Daily SitRep – Web File Timeseries" link on the website.
Wales Weekly ARI Hospital Dashboard Data https://www2.nphs.wales.nhs.uk/CommunitySurveillanceDocs.nsf Tableau Direct download

What does the cleaned data look like?

How does the data get cleaned and saved in GitHub?

  • There is a recurring trigger for this to automatically happen in GitHub Actions. England trigger and Wales trigger.
  • Each trigger triggers the code to run in its respective yml files stored in the .github/workflows folder. This yml code is where any changes in the data sets get saved in GitHub.
  • As part of the yml code, they run their respective .py code. The England .py code contains Python code which extracts the data from the link, cleans it and then saves it as a csv. The Wales .py code contains Python code which takes the Weekly ARI hospital dashboard data - last 90 days.xlsx file in the GitHub repository, cleans it and then saves it as a csv. The Wales data must be manually saved into the GitHub repository for it to be updated as this data can only be accessed from the UK.
  • The yml files contain the code below which outlines how frequently the code is triggered and run. This example says that this schedule will run every day at 10:00am. Search cron to understand more about this syntax.
schedule:
  - cron: "0 10 * * *"

How often does the data refresh?

  • Data will only be saved in GitHub if it has changed.
  • The England website says that the data refreshes every Thursday at 9:30am. The trigger in GitHub is therefore set to 10:00am everyday, in case there are delays.
  • The Wales website says that the data is refreshed weekly but it is not clear when exactly this happens. The trigger in GitHub is therefore set to 10:00am everyday. But remember that the source is the file manually saved in the repo so this needs to be updated first.
  • As of 24/04/2023, all of the actions to refresh data has been turned off.

2022/23

  • England: "Weekly updates will commence on Thursday 24 November 2022 and will continue every Thursday at 09:30 through the winter, with a final publication on Thursday 6 April 2023 covering the week ending Sunday 2 April 2023."
  • Wales: As of 27/04/2023, the data will no longer be scraped from this repo.

About

Saving external data from NHS England and NHS Wales


Languages

Language:Python 64.7%Language:Jupyter Notebook 35.3%