Roadmap 2023

Question

Roadmap 2023

BohuTANG opened this issue 2 years ago · comments

After a full year of research and development in 2022, the functionality and stability of Databend were significantly enhanced, and several users began using it in production. Databend has helped them greatly reduce costs and operational complexity issues.

This is Databend Roadmap in 2023 (discussion).

Main tasks

v1.3

v1.2 (Prepare for release on May 15th)

v1.1 (Prepare for release on April 5th)

v1.0 (Prepare for release on March 5th)

Features

Task	Status	Comments
Update#9261	DONE	need optimized(release in v1.0)
Privileges	DONE
Alter table	DONE	high-priority(release in v1.0 )
Window function#6342	DONE
Lambda function and high-order functions	DONE
Materialized view	Aggregating index	DONE
Support SET_VAR hints#8833	DONE
Parquet reader	DONE
DataFrame	DONE
Data Sharing(community version)	DONE
Concurrent query enhance	IN PROGRESS
Distributed COPY#8594	DONE
Support Decimal data type#2931	DONE	high-priority(release in v1.0 )
Add Column-Level dynamic data masking support	PLAN

Improvements

Task	Status	Comments
New expression#9411	DONE
Error message	PLAN

Planner

Task	Status	Comments
Scalar expression normalization	DONE
Column constraint framework	DONE
Functional dependency framework#7438	DONE
Join reorder	DONE
CBO	DONE	high-priority(release in v1.0)
Support TPC-DS	DONE
Support optimization tracing	PLAN	Easy to debug/study.

Cache

Task	Status	Comments
Unified cache layer	DONE
Meta data cache	DONE
Index data cache	DONE
Block data cache	DONE	high-priority(release in v1.0 )

Data Storage

Task	Status	Comments
Fuse engine re-clustering	DONE	high-priority(release in v1.1)
Fuse engine orphan data cleanup	DONE	high-priority(release in v1.0)

Distributed Query Execution

Task	Status	Comments
Visualized profiling	IN PROGRESS
Aggregation spilling	DONE	high-priority(release in v1.1)

Resource Quota

Task	Status	Comments
Session-level quota control (CPU/Memory)	DONE

Schema-Less Search

Task	Status	Comments
JSON indexing	DONE	high-priority
Fulltext index#3915	IN PROGRESS	high-priority
Array functions#7931	DONE	high-priority
Faiss index#9699	PLAN

LakeHouse

Task	Status	Comments
Apache Hive	DONE
Apache Iceberg	DONE
Delta Lake	PLAN
Querying external storage(Parquet)	DONE

Integrations

Task	Status	Comments
Dbt integration	DONE
Airbyte integration	DONE
Datadog Vector integrate with Rust-driver	DONE
Datax integrate with Java-driver	DONE
CDC with Flink	DONE
CDC with Kafka	DONE

Testing

Task	Status	Comments
SQLlogic Test	DONE	Supports more test cases
SQLancer Test	DONE	Supports more type and more cases
Fuzzer Test	IN PROGRESS

Releases

flaneur · Answer 1 · Tue Jan 03 2023 15:53:15 GMT+0800 (China Standard Time)

any plan about improving concurrency capabilities? so developers can depend on databend to make some data exploring platforms (like google analystics?) on the web.

flaneur · Answer 2 · Tue Jan 03 2023 15:57:56 GMT+0800 (China Standard Time)

any plan about tuning the metasrv's memory usage? I've got a OOM last week, IMHO it can store most the data in the disk?

Bohu · Answer 3 · Tue Jan 03 2023 15:59:22 GMT+0800 (China Standard Time)

any plan about improving concurrency capabilities? so developers can depend on databend to make some data exploring platforms (like google analystics?) on the web.

Added: Concurrent query enhance

Bohu · Answer 4 · Tue Jan 03 2023 16:00:30 GMT+0800 (China Standard Time)

any plan about tuning the metasrv's memory usage? I've got a OOM last week, IMHO it can store most the data in the disk?

@drmingdrmer will fill the meta section, I think he will do it.

wangyufan · Answer 5 · Thu Jan 12 2023 15:31:25 GMT+0800 (China Standard Time)

Any plan to support decimal data type? This is essential If we want to use databend in financial related fields. Will we see it in the first half of the year?

Bohu · Answer 6 · Thu Jan 12 2023 17:02:14 GMT+0800 (China Standard Time)

Any plan to support decimal data type? This is essential If we want to use databend in financial related fields. Will we see it in the first half of the year?

Added to the main task, thanks.

flaneur · Answer 7 · Wed Feb 08 2023 17:54:53 GMT+0800 (China Standard Time)

will fault tolerance on query processing be planned in 2023?

likewise I have some spot instances, the cluster may handles a shutdowned instance gracefully and not affect the running queries.

Bohu · Answer 8 · Wed Feb 08 2023 18:12:51 GMT+0800 (China Standard Time)

will fault tolerance on query processing be planned in 2023?

Will do but hard to do, so the priority is low.

likewise I have some spot instances, the cluster may handles a shutdowned instance gracefully and not affect the running queries.

Please file an issue for that.

Brian Cort · Answer 9 · Tue Aug 08 2023 01:43:57 GMT+0800 (China Standard Time)

Is there a plan for when the vector index feature will be added? It is part of #10689 but doesn't seem to have an associated ticket.

Task	Status	Comments
Jepsen test	DONE
Store membership in raft	DONE
Nonblocking snapshot building	DONE
Snapshot file format impl	DONE
Upgrade on-disk store format	DONE

Roadmap 2023

Main tasks

v1.3

v1.2 (Prepare for release on May 15th)

v1.1 (Prepare for release on April 5th)

v1.0 (Prepare for release on March 5th)

Features

Improvements

Planner

Cache

Data Storage

Distributed Query Execution

Resource Quota

Schema-Less Search

LakeHouse

Integrations

Meta

Testing

Releases