A Covid-19 database of cases and test numbers
Source: Department of Health data drop
Date | Updates |
---|---|
Feb 12 | Updated data as of Feb. 4, 2023 |
Jan 30 | Updated data as of Jan 16, 2023 |
Using pandas, I processed regularly released data by the health department. There are two types of data here: first are case numbers. Second are test numbers. I aim to regularly update this as well.
The code is meant to be an easier way to interpret large Covid-19 data released daily by the health department. This is shared to everyone but the primary goal is to assist researchers and journalists in their reporting of the ongoing pandemic.
The code is meant to easily navigate through case files and draw analysis from them. Information such as which province has the most number of cases or how much testing is being conducted, which age group is most infected and others can be easily determined through the database. Similarly, programmers and other researchers can likewise play around with the code for their own analysis using pandas.
Since data can be retrieved and analyzed by just changing the code, it is also easier to make visualization out of them. Here is an interactive map that contains some Covid-19 information.
The following are MOSTLY from definitions from the Department of Health. In all cases NaN means no data input for that particular column.
column name | definition |
---|---|
CaseCode | random code assigned for labelling cases; does not equate to the unique case number assigned by DOH |
Age | age |
AgeGroup | five-year age group |
Sex | confirmed case's sex |
DateSpecimen | date when specimen was collected |
DateResultRelease | date of release of result |
DateRepConf | date publicly announced as confirmed case |
DateDied | date patient died. Not used to tabulate number of deaths as publicly reported |
DateRecover | date patient recovered. Not used to tabulate number of recoveries as publicly reported |
RemovalType | type of removal, whether by recovery or death, that happens to a patient. Probably important in tallying active cases which are not included in this report |
Admitted | binary variable indicating patient has been admitted to hospital |
RegionRes | region of residence |
ProvRes | province of residence |
CityMunRes | city or municipality of residence |
CityMuniPSGC | Philippine Standard Geographic Code of Municipality or City of residence |
BarangayRes | barangay of residence |
HealthStatus | known current health status of patient (asymptomatic, mild, severe, critical, died, recovered) |
DateOnset | date of onset of symptoms |
Pregnanttab | binary variable (Yes/No) indicating if the patient is pregnant at any point during Covid-19 infection |
Quarantined | ever been home quarantined, not necessarily currently in home quarantine |
column name | definition |
---|---|
facility_name | name of the institution certified by the Department of Health to perform COVID-19 testing |
daily_output_positive_individuals | refer to the actual number of all unique individuals with positive results that are released from 6pm the previous day to 6pm of the reporting date |
daily_output_negative_individuals | refer to the actual number of all unique individuals with negative results that are released from 6pm the previous day to 6pm of the reporting date |
daily_output_unique_individuals | sum of all unique individuals tested (positive+negative) with results that are released from 6pm the previous day to 6pm of the reporting date |
daily_output_invalid | number of all specimens with invalid results that are released from 6pm the previous day to 6pm of the reporting date. Reasons for invalidity were not indicated |
daily_output_samples_tested | total specimens processed with results (positive, negative, equivocal or invalid) released from 6pm the previous day to 6pm of the reporting date |
cumulative_unique_individuals | number of unique individuals who underwent COVID-19 testing, regardless of result, accumulated since the start of operations in the laboratory. One individual, with 2 or more specimen results will only be counted once |
cumulative_positive_individuals | number of unique individuals with a positive result after COVID-19 testing using the appropriate confirmatory test (ex. RT-PCR) |
cumulative_negative_individuals | total number of unique individuals with a negative result after COVID-19 testing |
cumulative_samples_tested | sum of all specimens tested with validated results from the start of laboratory operation up to the reporting date |
pct_positive_cumulative | total number of cumulative positive individuals as percent of cumulative unique individuals per day |
pct_negative_cumulative | total Number of cumulative negative individuals as percent of cumulative unique individuals per day |
remaining_available_tests | remaining COVID-19 tests that can be conducted by the health facility or laboratory based on the PCR testing kits they currently have. For GeneXpert labs, this refers to the remaining number of cartridges on hand |
Information in the database were collected from the DOH Covid-19 Tracker Data Drop. CSV files were downloaded from the Cloud drive maintained and updated by the agency daily. To properly use this database, simply download the latest CSV files and update the file name in the pd.read_csv portion of the code once in Jupyter Notebook (Most likely you'll only have to change the numerical date in the filename)
After the necessary updates, simply run the kernel to get outputs.
You would need to install Python and Jupyter notebook in your computer to read this. You can easily search the installation process, depending on the OS you have, in the net like this.
After installation, go to your terminal and pip install pandas. After which, download the CSV files from the DOH Data Drop. Go run jupyter notebook and enjoy!
Prinz Magtulis, ppm2130@columbia.edu
Comments and suggestions are always welcome! All rights reserved.