cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

Home Page:https://www.cockroachlabs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sql: support expression-based index columns

knz opened this issue · comments

Computed indexes are indexes on the result of an expression (as opposed to the direct value of a column). For example: CREATE INDEX a ON customer( LOWER(firstname) || ' ' || LOWER(lastname))

Suggested by @petermattis

Would help with #7186 too.

This is one of the top unimplemented errors encountered by users

Also requested by @rkruze in #30690:

CREATE UNIQUE INDEX IF NOT EXISTS table_unique_idx
ON test_table
USING btree
((data->>'code'), (data->>'name'), (data->>'email_address')) where data->>'is_active'= 'true';
commented

petition

We are hitting this also, some advance on this would be appreciated.

I'm trying out Craft CMS, a popular PHP CMS built with the Yii framework, and am hitting this.

CREATE INDEX "elements_sites_uri_siteId_idx" ON "elements_sites" (lower("uri"), "siteId")

Moving to Optimizer backlog, cc @RaduBerinde @mgartner

This is tracked by #24455.

I'm sorry you can't close an issue that's X-anchored-telemetry until the feature is complete and references to the issues in the code have been removed.

Alternatively you can send a PR that changes the references in the code from #9682 to #24455 then close this. (But even then it feels icky given the refernces in previous versions)

Copying over comment some recent info from #24455 below.


I did a bit of research of what work would be necessary to support expression-based indexes:

  • opt/cat interface and sql descriptor changes - need a way to refer to expressions as index "columns"
  • represent scans of expression-based indexes in opt (need "fake" columnids for constraints)
  • opt operator statistics
  • exploration rules for scans - need to recognize instances of the index expressions in filters
  • mutation paths - need to plumb values for the expressions when indexes are updated
  • schema change / backfiller support

Some aspects would be simplified if we treated the expressions as virtual columns (computed but not stored) throughout the stack, but other aspects would become more complicated (it would be the first non-scannable table column - and the code around different kinds of table columns is already very complex).

Overall the amount of work is on the same order of magnitude as partial indexes.

@RaduBerinde any updates on this?

We need to apply an expression-based unique index to ensure unicity over a JSON field like, in PostgreSQL you could do:

CREATE UNIQUE INDEX test_data_foo_uq ON test ((data->>'foo'))

+info: https://dba.stackexchange.com/questions/161313/creating-a-unique-constraint-from-a-json-object/161345#161345

But in CRDB, until this is solved, we are planning to replicate the columns in the parent tablet by using a computed field:

CREATE TABLE test (
    id INT, 
    data JSONB, 
    foo STRING AS ((data->>'foo')) STORED
    );

Then apply the UQ over it:

CREATE UNIQUE INDEX test__foo__uq ON test(foo);

@lopezator We've begun implementing expression-based indexes, but the feature won't be released until the 21.1 release at the earliest (Spring 2021). Until then, I think the workaround you mentioned is your best option.

Checking in on this - looks like #57608 shipped late last year. Any plans to move forward with this? I think this is the only remaining issue prevent the Livepeer API server from using Cockroach.

Yes, we have plans to support the syntax in 21.2. Note that in 21.1 (which is close to be released) we support virtual columns which allow you to do the same thing essentially.

Expression indexes have landed on master and they will be released in version 21.2. I'll close the issue.

Heads up, the following does not seem to work yet

 CREATE INDEX cache_02_17_2022_06_20_title_lower_idx ON cache_02_17_2022_06_20 (lower(title));

returns:

ERROR: version ExpressionIndexes must be finalized to use expression indexes

Seems it's related to 2bb519e#diff-606ca467041f6ba1f19e8ef8530d2976bf9b0db98477940ae9deb7b2fd5d1795L363 which would land on v22