pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

Home Page:https://dataset.readthedocs.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

issue with upsert_many()

tony-pythony opened this issue · comments

Hi,

i'm on latest version of dataset:
1.4.1 - updated recently. Any chance to see & verify the version in my files directly?
I see an unexpected behavior with upsert_many(). As I understand and read in sources, upsert_many() consumes a list of dicts, like:

sample_list = [{'date': datetime.date(2020,6,16), 'sample_col_1': 12345.678},
                       {'date': datetime.date(2020,6,17), 'sample_col_2': 12345.789}]

upsert_many() is called with the sample_list above as arg:

db['calculations'].upsert_many(sample_list, ['date'])

yielding an error & warning:
db['calculations'].upsert_many(sample_list, 'date')
Traceback (most recent call last):

  File "<ipython-input-12-bcc35e9d30ac>", line 1, in <module>
    db['calculations'].upsert_many(sample_list, 'date')
  File "C:\Python\lib\site-packages\dataset\table.py", line 290, in upsert_many
    self.update_many(to_update, keys, chunk_size, ensure, types)
  File "C:\Python\lib\site-packages\dataset\table.py", line 243, in update_many
    print(stmt)   # **<- Line inserted by me for inspection**
  File "C:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 491, in __str__
    return str(self.compile())
  File "<string>", line 1, in <lambda>
  File "C:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 481, in compile
    return self._compiler(dialect, bind=bind, **kw)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\elements.py", line 487, in _compiler
    return dialect.statement_compiler(dialect, self, **kw)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\compiler.py", line 592, in __init__
    Compiled.__init__(self, dialect, statement, **kwargs)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\compiler.py", line 322, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\compiler.py", line 352, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\visitors.py", line 96, in _compiler_dispatch
    return meth(self, **kw)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\compiler.py", line 2593, in visit_update
    crud_params = crud._setup_crud_params(
  File "C:\Python\lib\site-packages\sqlalchemy\sql\crud.py", line 64, in _setup_crud_params
    return _get_crud_params(compiler, stmt, **kw)
  File "C:\Python\lib\site-packages\sqlalchemy\sql\crud.py", line 177, in _get_crud_params
    raise exc.CompileError(
CompileError: **Unconsumed column names: sample_col_2, sample_col_1**

In my sqlite database's table, there is a column "date" (is PrimaryKey, takes 'DATE'). The both columns not consumed should be created automatically, right? This does not happen. When I set the ensure = True argument explicitly, still get the same behavior.
Also, the given keys argument ['date'] is stored in the original dict as '_date' -> method update() is doing this. Is it intended for the method to work on the original dict instead on a temporary dict?

found a reason
I can't reproduce an other behavior atm, as I've restructured & rewritten the code in the meantime. I think it was seen with upsert() or with update() ??:
Having a table with columns 'date' (PK, type 'DATE'), 'sample1' (type FLOAT), 'sample2' (type FLOAT).
Having a list_A of dicts [{'date'=..., 'sample3'=...}, {'date'=..., 'sample3'=...} ...]
Calling upsert(list_A, ['date']) produces the right output, but deletes the columns 'sample1' and 'sample2' in the table.
If its necessary, i try to recreate the original code - I think, this behavior is somehow connected to the problem described above.

The behavior above is caused by setting the arg ensure = False and use _sync_columns() as its written in the docstring:
If automatic schema generation is disabled (ensure is False), this will remove any keys from the row for which there is no
matching column.

But still, is it intended for _sync_columns() to completely remove columns entries in a table??

Am I doing something wrong?

Thank you guys for dataset!

This should be somewhat slower now, but work correctly. Can you verify and re-open this issue if you still find the behaviour inconsistent?