clarify the role of conventions so that users know whether spandex requires them to use exec_sql to edit databases in all cases

Question

clarify the role of conventions so that users know whether spandex requires them to use exec_sql to edit databases in all cases

tbuckl opened this issue 9 years ago · comments

Its unclear to me as a user whether I can edit a database that Spandex is using/managing outside of the Spandex ORM without breaking the Spandex database class' understanding of the schema.

In short, the Spandex database class does not seem to show any existing tables on my public schema after dropping and then re-adding a table in psql and then inspecting the database with Spandex database class.

The long version:
This is a pretty simple and use-case-specific SQL query that takes a few minutes:

https://github.com/synthicity/spandex/blob/master/spandex/spatialtoolz.py#L495-L513

It works fine when you run this script from start to finish:

https://github.com/synthicity/bayarea_urbansim/blob/master/data_regeneration/run.py

Unfortunately, running that script from start to finish takes more than 12 hours on a well-provisioned (and tuned) machine.

As a user, it would be nice to be able to just call that specific SQL query on an arbitrary table. However, it seems that there may be some conventions or dependencies that I am not following in calling it.

In particular, I suspect that I am getting an error because I am not calling that function on a table that was specifically created or registered with one of the several ORM's (2 if you count Spandex as an ORM?) that seem to be in use in this repository.

The error is below. As a user, this means I will probably re-write the query from the ORM language into SQL in order to accomplish my larger goal of reducing the run-time of data regenerations.

---geom_aggregation1 took 7.37857508659 seconds ---
/home/vagrant/anaconda/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2079: SAWarning: Did not recognize type 'bpchar' of column 'county_id'
  name, format_type, default, notnull, domains, enums, schema)
/home/vagrant/anaconda/lib/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:2079: SAWarning: Did not recognize type 'unknown' of column 'imputation_flag'
  name, format_type, default, notnull, domains, enums, schema)
PARCEL AGGREGATION:  Merge geometries (and aggregate attributes) based on within-interior-ring status
Traceback (most recent call last):
  File "geom_aggregation2.py", line 16, in <module>
    df = geom_unfilled(t.public.parcels, 'unfilled')
AttributeError: type object 'public' has no attribute 'parcels'
Traceback (most recent call last):
  File "geom_test.py", line 30, in <module>
    check_run('geom_aggregation2.py')
  File "geom_test.py", line 22, in check_run
    return subprocess.check_call([python, path])
  File "/home/vagrant/anaconda/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/vagrant/anaconda/bin/python', 'geom_aggregation2.py']' returned non-zero exit status 1

Dara Adib · Answer 1 · Fri Jun 26 2015 14:53:16 GMT+0800 (China Standard Time)

Hi Tom. Assuming spandex understands the data types (to map them to Python types), you can edit the database outside of spandex. If you changed table schema while running spandex, you need run database.refresh().

I see two errors in the traceback:

Did not recognize type 'bpchar' of column 'county_id'
Did not recognize type 'unknown' of column 'imputation_flag'

bpchar (blank padded char) is an internal PostgreSQL type. Can you check on the data type of those columns? You could try explicitly casting to int or char as appropriate.

It might be helpful to uncomment the logger.warn call in spandex.database.

FYI -

spandex recreates its knowledge of the schema each time the database.refresh() method is called. That knowledge is only kept in memory and doesn't persist if you restart Python/spandex.

When spandex modifies table schema (for example to add a new column), it calls database.refresh(). database.refesh() iterates over every schema, table, column using SQLAlchemy/GeoAlchemy introspection/reflection.

For more information about reflection, see the SQLAlchemy documentation on Reflecting Database Objects and Using Reflection with Declarative.

Tom Buckley · Answer 2 · Sat Jun 27 2015 02:27:21 GMT+0800 (China Standard Time)

Thanks @daradib! I appreciate the feedback and I'll try this out when I get a moment.

Tom Buckley · Answer 3 · Sat Jun 27 2015 03:55:24 GMT+0800 (China Standard Time)

By uncommenting that logger.warn I was able to discover that the tables that were not showing up in the load.tables/database class were not being mapped because they did not have a primary key. Is this a GeoAlchemy convention (or requirement) for reflection or a Spandex one? SQLAlchemy does not require this. The QGIS database adapter also has this requirement/convention. Perhaps its necessary? However, it clearly confused me. Anyway, thanks @daradib!

Dara Adib · Answer 4 · Sat Jun 27 2015 11:24:31 GMT+0800 (China Standard Time)

No problem! Hope you find spandex useful despite its quirks and limitations. The primary key constraint is a SQLAlchemy ORM requirement: http://docs.sqlalchemy.org/en/latest/faq/ormconfiguration.html#how-do-i-map-a-table-that-has-no-primary-key