Buffered Transactional Editing
m-kuhn opened this issue · comments
Buffered Transactional Editing
Date 2020/12/03
Author Matthias Kuhn
Contact matthias@opengis.ch
maintainer @m-kuhn
Version QGIS 3.18 or 3.20
Summary
A new "buffered transactions" editing mode for QGIS is added.
With this edit mode, all editable layers are toggled synchronously and all edits are saved in a local edit buffer.
Saving changes is executed within a single transaction on all layers (per provider).
Reasoning / limitations of the status quo
QGIS currently supports 2 types of editing.
They both have their advantages and disadvantages.
Local edit buffer
Edits are buffered locally before being sent to the data provider.
The user saves each layer individually by toggling the edit mode.
For flat layers (e.g. simple shapefiles) this works very well.
When providers with foreign keys (parent child etc) come into play,
things become more complex because layers can no longer be treated independently.
- A user has to know very well in which order layers have to be saved.
- For editing data on multiple layers (commonly parent/child relationships)
many layers have to be put into edit mode (more clicking). - Even on a single layer there is currently no transaction safety.
E.g. a pg table with a line and a constraint on a field: if a line is split and the
newly created part does not fulfill the constraint, the existing line will be
shortened but the new one not added (there was an issue I can't find right now).
Transaction groups
Edits are sent to the data provider immediately while editing.
Multiple layers on the same provider are put into a transaction group and can be committed / rolled back
synchronously.
This approach helps with foreign keys and transaction safety. There are a couple of caveats still.
- The transaction is kept open for a long time. This introduces table locks and therefore prevents
users from working on the database in parallel, even on different areas of the data. (postgres) - It's hard to impossible to fixup data. Common use cases are
- adding a new row through the attribute table
- copy/pasting features, worked around through
- This has performance impacts because
- while editing we have constant I/O going on
- while an r/w connection is open, we cannot use any other connection, hence no parallel rendering
- This can completely freeze GeoPackages due to internal locks (e.g. when a "default value" is based on the
sqlite_fetch_and_increment
expression function).
Proposed Solution
A new buffered transaction mode is introduced as a project configuration.
As in the current transaction mode, multiple layers are put into edit mode in a grouped mode.
All editable layers are put into edit mode and committed in parallel (in contrast to the transactional editing, where they need to be on the same provider).
All editing is done locally, no writes to the provider occur during editing.
When the user commits the changes
- resolve layer dependencies through project relations
- start a new transaction on each involved provider (provider in this context means they share the connection string as in
QgsTransaction::connectionString()
) - change fields (add new fields, delete fields)
- delete all features, in reverse dependency order (children first)
- add all features, in forward dependency order (parents first)
- change all attributes in reverse dependency order (children first)
- change all geometries in reverse dependency order (children first, ideally this and the step before are merged, out of scope for this discussion)
- if everything went well, commit
- if commits went well discard edit buffer
- if there was a problem, rollback
- the edit buffer is unchanged
API additions
class QgsEditBufferGroup
A new class that keeps a list of edit buffers that it manages and commits or rolls back together.
QgsVectorLayerEditBuffer::setEditBufferGroup()
and QgsVectorLayerEditBuffer::editBufferGroup()
If an editBuffer
is part of an editBufferGroup
it will forward commit and rollback commands to this one which invokes individual addFeature, deleteFeature, ... in the correct order across all contained editBuffers.
QgsMapLayer::setProject()
and QgsMapLayer::project()
QgsMapLayer will receive knowledge of the project it is in. When a layer is registered in a project, the parent project will be set on the layer.
Will be used in QgsVectorLayer::startEditing()
, QgsVectorLayer::commitChanges()
, QgsVectorLayer::rollbackChanges()
to forward edit requests to their QgsProject
equivalent. They will be recursion guarded (since they will be called by QgsProject::...
as well).
QgsProject::startEditing( layer )
, QgsProject::commitChanges( layer )
, QgsProject::rollbackChanges( layer )
Will start editing (or commit/rollback) either a single layer or all editable layers, depending on QgsProject::transactionMode()
.
Should be used in the future as the main entry point for editing layers in a project. The current way will keep working though.
Will create a new QgsEditBufferGroup
if appropriate and add any editBuffer
from layers that have been put in edit mode into.
Limitations
- When an involved provider does not support transactions (shapefiles, excel files, etc) it is not possible to rollback if committing fails on another layer. In this case the layer might end up with stored data after an incomplete commit.
- When a project has circular dependencies through foreign keys, we are not able to completely resolve the layer save order into a "correct" order. In this case the provider is required to be tolerant (e.g. deferred constraint checks).
It would be possible to track dependencies down to individual features, but even there could potentially be remaining circular dependencies. In the end we try our best to handle the trivial cases and have to rely on the user for the complex scenarios. - In contrast to the existing transaction mode, side effects introduced on the provider through triggers etc. are not immediately visible.
Performance Implications
During editing performance is equal to the current edit buffer.
Performance is better than transaction groups during editing since nothing needs to be stored on the data provider. Also parallel is still enabled.
Backwards Compatibility
This is a new mode which is opt-in.
Issue Tracking ID(s)
(optional)
Votes
(required)
Hi Matthias, great idea !
Can you elaborate why QgsMapLayer::setProject()
is necessary?
If mapLayer->startEditing()
is called, it needs to forward this to the project so it can put all other editable layers into edit mode. Same for commitChanges
and rollbackChanges
. I think having not only a link from project to "child" layers but also from layers to "parent" project has its advantages for context.
There would be other approaches too which can be discussed if this one has unforeseen drawbacks. Do you see any specific drawbacks @elpaso ?
Sidenote: I also could imagine it will be handy to compile get_feature
, aggregate
and other join-based functionality in the future which could give a tremendous speed improvement in some cases.
An alternative approach would be to use the existing QObject parent's (QgsMapLayerStore) parent (QgsProject). That could live without a new member and would work "reasonably well" if protected by a solid set of unit tests.
@m-kuhn I don't think it's a good idea to have a cyclic dependency between project and map layers.
I'm -1 to add a project member to map layers.
Cannot you just handle this at the application level?
@elpaso the pull request qgis/QGIS#40745 avoids the member while keeping the logic contained, does that work for you?
I am worried that keeping this on the app level will make it a fragile construct with connections between signals and slots that will add public api for internal reasons, it will be hard to debug and will be hard to maintain. If somehow possible, I'd like to avoid this unless there is a very good reason.
@elpaso the pull request qgis/QGIS#40745 avoids the member while keeping the logic contained, does that work for you?
Yes! Sorry, I didn't look at he code and I thought it was a member.
Thanks. Your support is appreciated.
can we call for a vote on this?
LGTM in general.
I'd like to know how do you plan to handle the UX in case some layers belongs to providers that do not support rollback, I think the user should be warned about the potential data corruption in that case.
This is a good idea, +1 for me