AI-SDC / ACRO

Tools for the Automatic Checking of Research Outputs. These are the tools for researchers to use as drop-in replacements for commands that produce outputs in Stata Python and R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create Stata front end

jim-smith opened this issue · comments

license details redacted

Is it ok to post this here -- the repository is set to public?

roadmap is:

  1. show we can call a python script from stata (by writing a stata script that does this)
  2. show we can pass a string (e.g. command) from stata to python as the argument,nts to a programme to run.
    • for this purposes the python could just echo the script,
    • build in parsing later
  3. Figure out how to get the data from stata to python. Save to file first? then send. filename as well?

This assumers the added burden of time to read/write dat is not an issue

  • further down the road we cna pass theta via memory like we do for R
    -Q: is saving data to file problematic for TREs? do we need delete it afterwards? or out in a directory called "intermediate files - do not release"
  1. Write command parser to translate fom stata command to acro ones
  • get Lizzie to come up with priority order of what to implement
  1. Stata docs say how to call a python script from stata
  2. Check how stata holds the name of the current command - I think it is always $1 or $2 (FR/LG can con firm)
  • and then whether we can pass that as an argument to the python script
  1. How do we get the data across??
    • pretty sure it can be passed in memory (check docs)
    • but for proof of concept could dump to file first and pass name of file/first?
  2. Write the main python script that sits there waiting to recalled then does something on demand.
  • probably each command as a one-off session?
  • unless respawn a separate python 'listener' and talk to that somehow - is that covered in the stata documentation?
  • needs to interpret the command
  • read in the data
  • start an acro session
  • do the command
  • run acro.finalise()
  • send output back to stata
  1. What about how user interacts with their outputs to add messages, delete things etc??
    -- let's get the first bits working first
  • might need to think about refactoring acro to work on previous outputs as well as current session

Basic version working

  • first call to stata is to 'do setup_py'
    • TODO this needs adapting generically -is hard coded at present
  • passes data and returns from python dynamically
  • log messages from acro get put on screen
  • works for sequence
    • acro init
    • acro table survivor grant_type
    • acro print_outputs
    • acro finalise

UPDATE 27-6-23

  • Parsing moved to stata inside acro.ado
  • Data subsetting via if and in now supported
  • Some things move to other issues
  • Some tasks flagged as needing doing in a do/ado file rather than python

TO-DOs

  • adapt setup_py.ado dynamically IN ADO. WONT BE IN NEXT RELEASE
  • support for stata syntax changes IN ADO WON'T BE IN NEXT RELEASE
    e.g. statistic() not contents() in table
  • [ -] get acro.print_outputs() to return the string as well as print it MOVED TO ISSUE #90
  • [-] put something g in front of logger messages e.g. "SDC analysis:" MOVED TO ISSUE
  • add support for if.
  • Use stat's built-in parsing
    [by varlist:] command [varlist] [=exp] [if_exp_] [in_range_] [weight] [,options]
  • demonstrate use of adding comments and descriptions IN STATA_ACRO_TEST.DO
  • support for contents e.g.
    acro table year survivor contents(freq mean inc_activity sd inc_activity)
  • support for tab[ulate] (1/2 away table of frequencies with measures of associations) e.g.
    "acro tab year survivor , chi2 expected"
  • support for by in command e.g.
    "acro table year survivor , by(grant_type)"
  • support for linear. regression
    "acro regress inc_activity inc_grants inc_donations total_costs"
  • support for probit e.g.
    "acro probit survivor inc_activity inc_grants inc_donations total_costs
  • support for xtreg e.g.
    "acro xtreg inc_activity inc_grants inc_donations total_costs , re"
  • support for plot e.g.
    sacro twoway (scatter inc_grants inc_activity)
  • support for failure example. IN STATA_ACRO_TEST.DO
    sort grant_type
    by grant_type: safe regress inc_activity inc_grants inc_donations total_costs if year>2013