dolthub / dolt

Dolt – Git for Data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`dolt diff` ... that only shows the tables changed in a simpler format

verdverm opened this issue · comments

I'm looking to feed the dolt diff into some automated processes, but only want to know what tables have changed

I'm using dolt diff --summary but it contains a bunch of formatting text

$ dolt diff b6u0b8g9crutgpaummtla6rbft91o2ue --summary -r json        
+---------------------------+-----------+-------------+---------------+
| Table name                | Diff type | Data change | Schema change |
+---------------------------+-----------+-------------+---------------+
| cbp_apprehensions_monthly | modified  | true        | false         |
+---------------------------+-----------+-------------+---------------+

dolt diff --stats -r json produces invalid JSON (schema_diff has invalid content`), but could work

$ dolt diff b6u0b8g9crutgpaummtla6rbft91o2ue --stat -r json 
{"tables":[{"name":"cbp_apprehensions_monthly","schema_diff":[prev size: 29676, new size: 29676, adds: 0, deletes: 0, modifications: 24990
4,686 Rows Unmodified (15.79%)
0 Rows Added (0.00%)
0 Rows Deleted (0.00%)
24,990 Rows Modified (84.21%)
0 Cells Added (0.00%)
0 Cells Deleted (0.00%)
49,980 Cells Modified (24.06%)
(29,676 Row Entries vs 29,676 Row Entries)

}]}

Ideally I could have something like git diff --name-only

$ dolt diff b6u0b8g9crutgpaummtla6rbft91o2ue --table-only
cbp_apprehensions_monthly

A workaround here would be dolt sql -q "select table_name from dolt_diff where commit_hash='WORKING'".

Good feature request for the CLI though.

I'll make a separate bug for the invalid JSON.

Is there a work around for diff between HEAD and previous commit?

(i.e. the two fields that come in a webhook payload)

Something like:

dolt sql -q "select table_name from dolt_diff where commit_hash ='STAGED' or commit_hash ='WORKING' or commit_hash='<CURRENT COMMIT>'"

I don't think HEAD will work.

edit: Had two staged. meant working

I don't think there would be anything in STAGED or WORKING, as it would be a fresh clone after a push

We have a hashof() that can take HEAD or HEAD~1 as an argument, so maybe something lile this?

tmp/main> select * from dolt_diff where commit_hash=hashof('HEAD') or commit_hash=hashof('HEAD~1');
+----------------------------------+------------+-----------+-------------------+---------------------+---------+-------------+---------------+
| commit_hash                      | table_name | committer | email             | date                | message | data_change | schema_change |
+----------------------------------+------------+-----------+-------------------+---------------------+---------+-------------+---------------+
| 616qa6ngvisk6notemsafdij9huqtiod | t          | James     | james@dolthub.com | 2024-04-30 18:27:12 | asdf    | false       | true          |
| bjghbfbsbns8t8kh7ku7a3qm0ghbqrfi | t1         | root      | root@localhost    | 2024-04-30 18:28:08 | fdas    | false       | true          |
+----------------------------------+------------+-----------+-------------------+---------------------+---------+-------------+---------------+
2 rows in set (0.00 sec)

Is there a way to say "give me the diff between these two specific commits"?

I think it's dolt_diff_stat(). So select table_name from dolt_diff_stat(<from_revision>, <to_revision>). If you want the rows in a able, you use dolt_diff() table function and pass in the table name.

https://docs.dolthub.com/sql-reference/version-control/dolt-sql-functions#dolt_diff_stat

Hey @verdverm,

Not sure if you were able to find a viable workaround, but we've implemented --name-only for you.
It's currently in dolt main, and we'll cut a release with the feature soon.

Let us know how it works for you.