synthetichealth / synthea

Synthetic Patient Population Simulator

Home Page:https://synthetichealth.github.io/synthea

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Patient height and weight values extremely low

malbertson3 opened this issue · comments

What happened?

I extracted patient age, height, and weight from my Synthea generated Observation.ndjson file. All of the heights are <60cm and all of the weights are <6kg. I'm using LOINC code 8302-2 for body height and code 29463-7 for body weight. I also printed out patient ages which range from about 10 - 70 years old. I'm using ./run_synthea -p 1000 to generate the data. Attached is a zip containing Observation.ndjson and Patient.ndjson files.
Observation.zip

I only modified the synthea.properties file and Module.java. Is there something I need to tweak in my properties file to bring the heights and weights back up to normal range?

Thanks in advance.

Contents of my synthea.properties file:

exporter.baseDirectory = ./output/
exporter.use_uuid_filenames = false
exporter.subfolders_by_id_substring = false
# exporters that use XML or JSON can enable or disable 'pretty printing'
exporter.pretty_print = true
# number of years of history to keep in exported records, anything older than this may be filtered out
# set years_of_history = 0 to skip filtering altogether and keep the entire history
exporter.years_of_history = 3
# split records allows patients to have one record per provider organization
exporter.split_records = false
exporter.split_records.duplicate_data = false
exporter.metadata.export = true
exporter.ccda.export = false
exporter.fhir.export = true
exporter.fhir_stu3.export = false
exporter.fhir_dstu2.export = false
exporter.fhir.use_shr_extensions = false
exporter.fhir.use_us_core_ig = true
exporter.fhir.us_core_version = 5.0.1
exporter.fhir.transaction_bundle = true
# using bulk_data=true will ignore exporter.pretty_print
exporter.fhir.bulk_data = true
# included_ and excluded_resources list out the resource types to include/exclude in the csv exporters.
# only one of these may be set at a time, if both are set then both will be ignored.
# if neither is set, then all resource types will be included.
# note the Patient and Encounter resources will always be included, even if specifically listed as excluded here
exporter.fhir.included_resources =
exporter.fhir.excluded_resources =
exporter.groups.fhir.export = false
exporter.hospital.fhir.export = false
exporter.hospital.fhir_stu3.export = false
exporter.hospital.fhir_dstu2.export = false
exporter.practitioner.fhir.export = false
exporter.practitioner.fhir_stu3.export = false
exporter.practitioner.fhir_dstu2.export = false
exporter.encoding = UTF-8
exporter.json.export = false
exporter.json.include_module_history = false
exporter.csv.export = false
# if exporter.csv.append_mode = true, then each run will add new data to any existing CSVs. if false, each run will clear out the files and start fresh
exporter.csv.append_mode = true
# if exporter.csv.folder_per_run = true, then each run will have CSVs placed into a unique subfolder. if false, each run will only use the top-level csv folder
exporter.csv.folder_per_run = false
# included_files and excluded_files list out the files to include/exclude in the csv exporter
# only one of these may be set at a time, if both are set then both will be ignored
# if neither is set, then all files will be included
# see list of files at: https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary
# include filenames separated with a comma, ex: patients.csv,procedures.csv,medications.csv
# NOTE: the csv exporter does not actively delete files, so if Run 1 you included a file, then Run 2 you exclude that file, the version from Run 1 will still be present
exporter.csv.included_files =
exporter.csv.excluded_files = patient_expenses.csv

exporter.cpcds.export = false
exporter.cpcds.append_mode = false
exporter.cpcds.folder_per_run = false
exporter.cpcds.single_payer = false

exporter.bfd.export = false
exporter.bfd.require_code_maps = true
exporter.bfd.export_missing_codes = true
exporter.bfd.bene_id_start = -1000000
exporter.bfd.clm_id_start = -100000000
exporter.bfd.clm_grp_id_start = -100000000
exporter.bfd.pde_id_start = -100000000
exporter.bfd.fi_doc_cntl_num_start = -100000000
exporter.bfd.carr_clm_cntl_num_start = -100000000
exporter.bfd.mbi_start = 1S00-E00-AA00
exporter.bfd.hicn_start = T01000000A
exporter.bfd.partc_contract_start = Y0001
exporter.bfd.partc_contract_count = 10
exporter.bfd.plan_benefit_package_start = 800
exporter.bfd.plan_benefit_package_count = 5
exporter.bfd.partd_contract_start = Z0001
exporter.bfd.partd_contract_count = 10
exporter.bfd.clia_labs_start = 00A0000000
exporter.bfd.clia_labs_count = 10
exporter.bfd.cutoff_date=20140529

exporter.cdw.export = false
exporter.text.export = false
exporter.text.per_encounter_export = false
exporter.clinical_note.export = false

# parameters for symptoms export
exporter.symptoms.csv.export = false
# selection mode of conditions or symptom export: 0 = conditions according to  exporter.years_of_history. other values = all conditions (entire history)
exporter.symptoms.mode = 0
# if exporter.symptoms.csv.append_mode = true, then each run will add new data to any existing CSVs. if false, each run will clear out the files and start fresh
exporter.symptoms.csv.append_mode = false
# if exporter.symptoms.csv.folder_per_run = true, then each run will have CSVs placed into a unique subfolder. if false, each run will only use the top-level csv folder
exporter.symptoms.csv.folder_per_run = false
exporter.symptoms.text.export = false

# enable searching for custom exporter implementations
exporter.enable_custom_exporters = true

# the number of patients to generate, by default
# this can be overridden by passing a different value to the Generator constructor
generate.default_population = 1

# the number of threads to use for the generator, set the value to -1 to match the number of
# available processors (as per Runtime.getRuntime().availableProcessors())
# defaults to -1 if not specified
generate.thread_pool_size = -1

generate.log_patients.detail = simple
# options are "none", "simple", or "detailed" (without quotes). defaults to simple if another value is used
# none = print nothing to the console during generation
# simple = print patient names once they are generated.
# detailed = print patient names, atributes, vital signs, etc..  May slow down processing

generate.timestep = 604800000
# time is in ms
# 1000 * 60 * 60 * 24 * 7 = 604800000

# default demographics is every city in the US
generate.demographics.default_file = geography/demographics.csv
generate.geography.zipcodes.default_file = geography/zipcodes.csv
generate.geography.country_code = US
generate.geography.timezones.default_file = geography/timezones.csv
generate.geography.foreign.birthplace.default_file = geography/foreign_birthplace.json
generate.geography.sdoh.default_file = geography/sdoh.csv

# Lookup Table Folder location
generate.lookup_tables = modules/lookup_tables/

# Set to true if you want every patient to be dead.
generate.only_dead_patients = false
# Set to true if you want every patient to be alive.
generate.only_alive_patients = false
# If both only_dead_patients and only_alive_patients are set to true,
# It they will both default back to false

# if criteria are provided, (for example, only_dead_patients, only_alive_patients, or a "patient keep module" with -k flag)
# this is the maximum number of times synthea will loop over a single slot attempting to produce a matching patient.
# after this many failed attempts, it will throw an exception.
# set this to 0 to allow for unlimited attempts (but be aware of the possibility that it will never complete!)
generate.max_attempts_to_keep_patient = 1000

# if true, tracks and prints out details of transition tables for each module upon completion
# note that this may significantly slow down processing, and is intended primarily for debugging
generate.track_detailed_transition_metrics = false

# If true, person names have numbers appended to them to make them more obviously fake
generate.append_numbers_to_person_names = false

# Probability of each person having a middle name. 0 is zero, 1.0 is 100% chance.
generate.middle_names = 0.80

# if true, the entire population will use veteran prevalence data
generate.veteran_population_override = false

# these should add up to 1.0
# weighting and categories are inspired by the following but there are no specific hard numbers to point to
# http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1694190/pdf/amjph00543-0042.pdf
# http://www.ncbi.nlm.nih.gov/pubmed/8122813
generate.demographics.socioeconomic.weights.income = 0.2
generate.demographics.socioeconomic.weights.education = 0.7
generate.demographics.socioeconomic.weights.occupation = 0.1

generate.demographics.socioeconomic.score.low = 0.0
generate.demographics.socioeconomic.score.middle = 0.25
generate.demographics.socioeconomic.score.high = 0.66

generate.demographics.socioeconomic.education.less_than_hs.min = 0.0
generate.demographics.socioeconomic.education.less_than_hs.max = 0.5
generate.demographics.socioeconomic.education.hs_degree.min = 0.1
generate.demographics.socioeconomic.education.hs_degree.max = 0.75
generate.demographics.socioeconomic.education.some_college.min = 0.3
generate.demographics.socioeconomic.education.some_college.max = 0.85
generate.demographics.socioeconomic.education.bs_degree.min = 0.5
generate.demographics.socioeconomic.education.bs_degree.max = 1.0

# The average family size in the US is 3.13. The 2010 FPL for a 3-person household is $18310. Tuned it to $17550 for realistic medicaid/ACA enrollments.
generate.demographics.socioeconomic.income.poverty = 17550
generate.demographics.socioeconomic.income.high = 75000

generate.birthweights.default_file = birthweights.csv
generate.birthweights.logging = false

# in Massachusetts, the individual insurance mandate became law in 2006
# in the US, the Affordable Care Act become law in 2010,
# and individual and employer mandates took effect in 2014.
# mandate.year will determine when individuals with an occupation score above mandate.occupation
# receive employer mandated insurance (aka "private" insurance).
# prior to mandate.year, anyone with income greater than the annual cost of an insurance plan
# will purchase the insurance.
generate.insurance.mandate.year = 2006
generate.insurance.mandate.occupation = 0.2

# Defines what percent of insurance premiums are covered by employers, when employer-covered.
# According to [https://www.kff.org/report-section/ehbs-2021-summary-of-findings/],
# the average employee premium contribution is 0.17 and employers pay 0.83.
generate.insurance.employer_coverage = 0.83

# Default Costs, to be used for pricing something that we don't have a specific price for
# -- $500 for procedures is completely invented
generate.costs.default_procedure_cost = 500.00
# -- $255 for medications - also invented
generate.costs.default_medication_cost = 255.00
# -- Encounters billed using avg prices from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096340/
# -- Adjustments for initial or subsequent hospital visit and level/complexity/time of encounter
# -- not included. Assume initial, low complexity encounter (Tables 4 & 6)
generate.costs.default_encounter_cost = 125.00
# -- https://www.nytimes.com/2014/07/03/health/Vaccine-Costs-Soaring-Paying-Till-It-Hurts.html
# -- currently all vaccines cost $136.
generate.costs.default_immunization_cost = 136.00
generate.costs.default_lab_cost = 100.00
# -- assumes device costs are included in procedure cost, if not add to costs/devices.csv
generate.costs.default_device_cost = 0.00
# -- assumes supply costs are included in procedure cost, if not add to costs/supplies.csv
generate.costs.default_supply_cost = 0.00

# Providers
generate.providers.hospitals.default_file = providers/hospitals.csv
generate.providers.longterm.default_file = providers/longterm.csv
generate.providers.nursing.default_file = providers/nursing.csv
generate.providers.rehab.default_file = providers/rehab.csv
generate.providers.hospice.default_file = providers/hospice.csv
generate.providers.dialysis.default_file = providers/dialysis.csv
generate.providers.homehealth.default_file = providers/home_health_agencies.csv
generate.providers.veterans.default_file = providers/va_facilities.csv
generate.providers.urgentcare.default_file = providers/urgent_care_facilities.csv
generate.providers.primarycare.default_file = providers/primary_care_facilities.csv
generate.providers.ihs.hospitals.default_file = providers/ihs_facilities.csv
generate.providers.ihs.primarycare.default_file = providers/ihs_centers.csv

# Provider selection behavior
# How patients select a provider organization:
#  nearest - select the closest provider. See generate.providers.maximum_search_distance
#  random  - select randomly.
#  network - select a random provider in your insurance network. same as random except it changes every time the patient switches insurance provider.
#  medicare - select the nearest provider that can bill Medicare. If no Medicare provider is found, it defaults back to "nearest".
generate.providers.selection_behavior = nearest

# if a provider cannot be found for a certain type of service,
# this will default to the nearest hospital.
generate.providers.default_to_hospital_on_failure = true

# minimum number of providers linked per patient
# if this number is not met it re-runs the simulation
generate.providers.minimum = 1

# maximum distance to look for a provider for a given patient, in km
# set to 10 degrees lat/lon to support the model that veterans only seek care at VA facilities
generate.providers.maximum_search_distance = 1000

# Payers
generate.payers.insurance_companies.default_file = payers/insurance_companies.csv
generate.payers.insurance_plans.default_file = payers/insurance_plans.csv
generate.payers.insurance_plans.eligibilities_file = payers/insurance_eligibilities.csv
generate.payers.insurance_companies.medicare = Medicare
generate.payers.insurance_companies.medicaid = Medicaid
generate.payers.insurance_companies.dual_eligible = Dual Eligible
# The percentage of a person's income that they are willing to spend on health insurance premiums.
generate.payers.insurance_plans.income_premium_ratio = 0.034
# The chance of rejection
# Plan selection behavior
# How patients select a plan:
#  best_rates - select plans with best rates for person's existing conditions and medical needs
#  random  - select plans randomly.
#  priority  - select plans based on the priority level defined in the insurance plans file.
generate.payers.selection_behavior = priority

# Payer adjustment behavior
# How payers adjust claims:
#  none - the payer reimburses each claim by the full amount.
#  fixed - the payer adjusts each claim by a fixed rate (set by adjustment_rate)
#  random  - the payer adjusts each claim by a random rate (between zero and adjustment_rate).
generate.payers.adjustment_behavior = none
# Payer adjustment rate should be between zero and one (0.00 - 1.00), where 0.05 is 5%.
generate.payers.adjustment_rate = 0.10

# Experimental feature. Patients will miss care if true, but side-effects of missing that care
# are not handled. Additionally, the path the disease module might take may no longer make sense.
# It might assume things occurred that haven't actually happened it. Use with care.
generate.payers.loss_of_care = false

# Add a FHIR terminology service URL to enable the use of ValueSet URIs within code definitions.
# generate.terminology_service_url = https://r4.ontoserver.csiro.au/fhir

# Quit Smoking
lifecycle.quit_smoking.baseline = 0.01
lifecycle.quit_smoking.timestep_delta = -0.01
lifecycle.quit_smoking.smoking_duration_factor_per_year = 1.0

# Quit Alcoholism
lifecycle.quit_alcoholism.baseline = 0.001
lifecycle.quit_alcoholism.timestep_delta = -0.001
lifecycle.quit_alcoholism.alcoholism_duration_factor_per_year = 1.0

# Adherence
lifecycle.adherence.baseline = 0.05

# set this to true to enable randomized "death by natural causes"
# highly recommended if "only_dead_patients" is true
lifecycle.death_by_natural_causes = false

# set this to enable "death by loss of care" or missed care,
# e.g. not covered by insurance or otherwise unaffordable.
# only functional if "generate.payers.loss_of_care" is also true.
lifecycle.death_by_loss_of_care = false

# Use physiology simulations to generate some VitalSigns
physiology.generators.enabled = false

# Allow physiology module states to be executed
# If false, all Physiology state objects will immediately redirect to the state defined in
# the alt_direct_transition field
physiology.state.enabled = false

# set to true to introduce errors in height, weight and BMI observations for people
# under 20 years old
growtherrors = false

I also, commented out these modules in lines 71-76 in synthea\src\main\java\org\mitre\synthea\engine\Module.java
```
//retVal.put("Lifecycle", new ModuleSupplier(new LifecycleModule()));
//retVal.put("Health Insurance", new ModuleSupplier(new HealthInsuranceModule()));
retVal.put("Cardiovascular Disease", new ModuleSupplier(new CardiovascularDiseaseModule()));
retVal.put("Quality Of Life", new ModuleSupplier(new QualityOfLifeModule()));
//retVal.put("Weight Loss", new ModuleSupplier(new WeightLossModule()));
//retVal.put("COVID-19 Immunization Module", new ModuleSupplier(new C19ImmunizationModule()));




### Environment

```markdown
- OS:
- Java:

Relevant log output

No response

In general I would recommend not commenting out any of the java modules because things can be very intertwined (I'm actually a little surprised nothing crashed) but to address the specific case here, growth happens in the LifecycleModule. Uncomment that one and patients should get better heights and weights.

Because you commented out the LifecycleModule, not only are they not growing, they aren't even aging. You have basically a population of unaging babies.

Thank you. That fixed it. I knew I was doing something stupid!