simonw / sqlite-utils

Python CLI utility and library for manipulating SQLite databases

Home Page:https://sqlite-utils.datasette.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`table.upsert/upsert_all` fails to write row when `not_null` is absent and the schema definition includes `not_null`

alexwlchan opened this issue · comments

I found a bug where calls to upsert() and upsert_all() don't write rows if:

  1. There's a not_null kwarg on the table definition, and
  2. You don't pass not_null in the upsert() call

This doesn't affect insert() or insert_all().

Repro example

from sqlite_utils import Database

db = Database(":memory:")

db["birds"].create(
    {"id": int, "name": str},
    pk="id",
    not_null={"name"},
)

db["birds"].insert({"id": 1, "name": "flamingo"})
print("insert:               ", len(list(db["birds"].rows)))

db["birds"].insert({"id": 2, "name": "goldfinch"}, not_null={"name"})
print("insert, not_null:     ", len(list(db["birds"].rows)))

db["birds"].upsert({"id": 3, "name": "loon"}, pk="id")
print("upsert:               ", len(list(db["birds"].rows)))

db["birds"].upsert({"id": 4, "name": "stork"}, pk="id", not_null={"name"})
print("upsert, not_null:     ", len(list(db["birds"].rows)))

db["birds"].insert_all([{"id": 5, "name": "magpie"}])
print("insert_all:           ", len(list(db["birds"].rows)))

db["birds"].insert_all([{"id": 6, "name": "gull"}], not_null={"name"})
print("insert_all, not_null: ", len(list(db["birds"].rows)))

db["birds"].upsert_all([{"id": 7, "name": "dodo"}], pk="id")
print("upsert_all:           ", len(list(db["birds"].rows)))

db["birds"].upsert_all([{"id": 8, "name": "crow"}], pk="id", not_null={"name"})
print("upsert_all, not_null: ", len(list(db["birds"].rows)))

Here's the output (with my comments on the right)

sqlite-utils, version 3.36
insert:                1
insert, not_null:      2
upsert:                2    <-- no write
upsert, not_null:      3
insert_all:            4
insert_all, not_null:  5
upsert_all:            5    <-- no write
upsert_all, not_null:  6

So calling upsert() and upsert_all() without passing not_null isn't actually inserting a record.

This problem goes away if:

  • I add the not_null argument to the upsert() calls, or
  • Remove the not_null from the table definition

Version info

  • Python: 3.12.0
  • sqlite-utils: 3.36
  • sqlite: 3.39.5 2022-10-14

Related issues

It seems very likely that this is related to #538, which was also about upsert() and the not_null flag.

I don't know if this is a clue, but while testing this I noticed that insert() seems to be completely ignoring the not_null parameter if it doesn't have to create the table, whereas upsert() doesn't. Here's another example:

from sqlite_utils import Database

db = Database(":memory:")

db["birds"].create(
    {"id": int, "name": str},
    pk="id",
    not_null={"name"},
)

db["birds"].insert({"id": 1, "name": "flamingo"}, not_null={"wingspan"})
print("insert complete!")

db["birds"].upsert({"id": 2, "name": "goldfinch"}, pk="id", not_null={"wingspan"})
print("upsert complete!")

In this case the insert() completes, but the upsert() fails with sqlite3.OperationalError: table birds has no column named wingspan.

The only way to make the insert() fail is to let it auto-create the table:

from sqlite_utils import Database

db = Database(":memory:")

db["birds"].insert({"id": 1, "name": "flamingo"}, not_null={"wingspan"})
print("insert complete!")

db["birds"].upsert({"id": 2, "name": "goldfinch"}, pk="id", not_null={"wingspan"})
print("upsert complete!")

at which point I get an error AssertionError: not_null set {'wingspan'} includes items not in columns {'id', 'name'}.

A further, potentially interesting observation: this only seems to apply when you're inserting new rows. If your upsert is modifying an existing row, it works without supplying the not_null list.

Another example:

from sqlite_utils import Database

db = Database(":memory:")

db["birds"].create(
    {"id": int, "name": str, "color": str},
    pk="id",
    not_null={"name"},
)

db["birds"].insert({"id": 1, "name": "flamingo"})
print(next(db["birds"].rows))
# initial insert
# {'id': 1, 'name': 'flamingo', 'color': None}

db["birds"].upsert({"id": 1, "name": "goldfinch"}, pk="id")
print(next(db["birds"].rows))
# modifying the existing row
# {'id': 1, 'name': 'goldfinch', 'color': None}

db["birds"].upsert({"id": 1, "color": "blue"}, pk="id")
print(next(db["birds"].rows))
# modifying a column which is allowed to be non-null
# {'id': 1, 'name': 'goldfinch', 'color': 'blue'}

(That second case is how I spotted it – I was upserting into a table with not-null columns, but modifying a nullable column on an existing row.)

i can confirm all of the above.

upserts do not work on tables where there are not-null columns, but only the inserts do not work - updates work fine.

i am not sure if there would be a better fix, but one possible fix would be to query sqlite_master if not_null is DEFAULT, and if the table already exists, use its own reckoning of which columns are not_null. I am going to implement this fix in our fork.

One other thing worth noting - the documentation asserts that the pk argument is not necessary for upserts if you are certain that the table has already been created. However, that is not the case - upsert will raise an exception if you do not provide pk. My proposed fix of using sqlite_master to get the actual 'values' for not_null would also work as a way of filling in pk for existing tables, to match the documentation.