This script takes NAMCS data (in raw folder), and uses it to generate a set of representative EHR patient data. The config section of the script allows you to change the number of patient records to generate, add in randomization, change the sampling weight methodology, and randomly remove data (to more realistically simulate EHR data).
To run, first unzip raw/namcs2012-stata.dta.zip, and then run generate.R
R CMD BATCH generate.R
The following tables will be generated:
- patient: contains information about the patient
- diagnosis: contains information about diagnoses the patient may have
- prescription: contains inforamtion about prescriptions the patient is currently on
- encounter: contains information about the patient's visits or encounters (note: currently there's a 1:1 relationship between patients and encounters)
- encountermeasure: contains measurements taken during the encounter
- labresult: contains lab results for the patient
Note that data is coded using:
- measurement/lab: LOINC
- prescription: Multum Lexicon Plus
- diagnosis: ICD9