questdb / questdb

QuestDB is an open source time-series database for fast ingest and SQL queries

Home Page:https://questdb.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sometimes it hangs and refuse to process more queries.

alexey-milovidov opened this issue · comments

Describe the bug

I've set up QuestDB on a fresh VM with Ubuntu 22.04 and trying to run some queries:

#!/bin/bash

# Install

wget https://github.com/questdb/questdb/releases/download/6.4.1/questdb-6.4.1-rt-linux-amd64.tar.gz
tar xf questdb*.tar.gz
questdb-6.4.1-rt-linux-amd64/bin/questdb.sh start

# Import the data

wget 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz

curl -G --data-urlencode "query=$(cat create.sql)" 'http://localhost:9000/exec?timings=true'
time curl -F data=@hits.csv 'http://localhost:9000/imp?name=hits'

# 27m 47.546s

sed -i 's/query.timeout.sec=60/query.timeout.sec=6000/' .questdb/conf/server.conf
questdb-6.4.1-rt-linux-amd64/bin/questdb.sh stop
questdb-6.4.1-rt-linux-amd64/bin/questdb.sh start

./run.sh 2>&1 | tee log.txt

create.txt:

CREATE TABLE hits
(
    WatchID long,
    JavaEnable int,
    Title string,
    GoodEvent int,
    EventTime timestamp,
    Eventdate date,
    CounterID int,
    ClientIP int,
    RegionID int,
    UserID long,
    CounterClass int,
    OS int,
    UserAgent int,
    URL string,
    Referer string,
    IsRefresh int,
    RefererCategoryID int,
    RefererRegionID int,
    URLCategoryID int,
    URLRegionID int,
    ResolutionWidth int,
    ResolutionHeight int,
    ResolutionDepth int,
    FlashMajor int,
    FlashMinor int,
    FlashMinor2 string,
    NetMajor int,
    NetMinor int,
    UserAgentMajor int,
    UserAgentMinor string,
    CookieEnable int,
    JavascriptEnable int,
    IsMobile int,
    MobilePhone int,
    MobilePhoneModel string,
    Params string,
    IPNetworkID int,
    TraficSourceID int,
    SearchEngineID int,
    SearchPhrase string,
    AdvEngineID int,
    IsArtifical int,
    WindowClientWidth int,
    WindowClientHeight int,
    ClientTimeZone int,
    ClientEventTime timestamp,
    SilverlightVersion1 int,
    SilverlightVersion2 int,
    SilverlightVersion3 int,
    SilverlightVersion4 int,
    PageCharset string,
    CodeVersion int,
    IsLink int,
    IsDownload int,
    IsNotBounce int,
    FUniqID long,
    OriginalURL string,
    HID int,
    IsOldCounter int,
    IsEvent int,
    IsParameter int,
    DontCountHits int,
    WithHash int,
    HitColor string,
    LocalEventTime timestamp,
    Age int,
    Sex int,
    Income int,
    Interests int,
    Robotness int,
    RemoteIP int,
    WindowName int,
    OpenerName int,
    HistoryLength int,
    BrowserLanguage string,
    BrowserCountry string,
    SocialNetwork string,
    SocialAction string,
    HTTPError int,
    SendTiming int,
    DNSTiming int,
    ConnectTiming int,
    ResponseStartTiming int,
    ResponseEndTiming int,
    FetchTiming int,
    SocialSourceNetworkID int,
    SocialSourcePage string,
    ParamPrice long,
    ParamOrderID string,
    ParamCurrency string,
    ParamCurrencyID int,
    OpenstatServiceName string,
    OpenstatCampaignID string,
    OpenstatAdID string,
    OpenstatSourceID string,
    UTMSource string,
    UTMMedium string,
    UTMCampaign string,
    UTMContent string,
    UTMTerm string,
    FromTag string,
    HasGCLID int,
    RefererHash long,
    URLHash long,
    CLID int
);

queries.sql:

SELECT COUNT(*) FROM hits;
SELECT COUNT(*) FROM hits WHERE AdvEngineID != 0;
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
SELECT AVG(UserID) FROM hits;
SELECT count_distinct(CAST(UserID AS string)) FROM hits;
SELECT count_distinct(SearchPhrase) FROM hits;
SELECT MIN(EventDate), MAX(EventDate) FROM hits;
SELECT AdvEngineID, COUNT(*) AS c FROM hits WHERE AdvEngineID != 0 GROUP BY AdvEngineID ORDER BY c DESC;
SELECT RegionID, count_distinct(CAST(UserID AS string)) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10;
SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), count_distinct(CAST(UserID AS string)) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;
SELECT MobilePhoneModel, count_distinct(CAST(UserID AS string)) AS u FROM hits WHERE MobilePhoneModel != '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT MobilePhone, MobilePhoneModel, count_distinct(CAST(UserID AS string)) AS u FROM hits WHERE MobilePhoneModel != '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase != '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT SearchPhrase, count_distinct(CAST(UserID AS string)) AS u FROM hits WHERE SearchPhrase != '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase != '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, COUNT(*) AS c FROM hits GROUP BY UserID ORDER BY c DESC LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) AS c FROM hits GROUP BY UserID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase LIMIT 10;
SELECT UserID, extract(minute FROM EventTime) AS m, SearchPhrase, COUNT(*) AS c FROM hits GROUP BY UserID, m, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID FROM hits WHERE UserID = 435090932899640449;
SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase != '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, count_distinct(CAST(UserID AS string)) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase != '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase != '' ORDER BY EventTime LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase != '' ORDER BY SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase != '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL != '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS key, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer != '' GROUP BY key HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT SUM(ResolutionWidth), SUM(ResolutionWidth + 1), SUM(ResolutionWidth + 2), SUM(ResolutionWidth + 3), SUM(ResolutionWidth + 4), SUM(ResolutionWidth + 5), SUM(ResolutionWidth + 6), SUM(ResolutionWidth + 7), SUM(ResolutionWidth + 8), SUM(ResolutionWidth + 9), SUM(ResolutionWidth + 10), SUM(ResolutionWidth + 11), SUM(ResolutionWidth + 12), SUM(ResolutionWidth + 13), SUM(ResolutionWidth + 14), SUM(ResolutionWidth + 15), SUM(ResolutionWidth + 16), SUM(ResolutionWidth + 17), SUM(ResolutionWidth + 18), SUM(ResolutionWidth + 19), SUM(ResolutionWidth + 20), SUM(ResolutionWidth + 21), SUM(ResolutionWidth + 22), SUM(ResolutionWidth + 23), SUM(ResolutionWidth + 24), SUM(ResolutionWidth + 25), SUM(ResolutionWidth + 26), SUM(ResolutionWidth + 27), SUM(ResolutionWidth + 28), SUM(ResolutionWidth + 29), SUM(ResolutionWidth + 30), SUM(ResolutionWidth + 31), SUM(ResolutionWidth + 32), SUM(ResolutionWidth + 33), SUM(ResolutionWidth + 34), SUM(ResolutionWidth + 35), SUM(ResolutionWidth + 36), SUM(ResolutionWidth + 37), SUM(ResolutionWidth + 38), SUM(ResolutionWidth + 39), SUM(ResolutionWidth + 40), SUM(ResolutionWidth + 41), SUM(ResolutionWidth + 42), SUM(ResolutionWidth + 43), SUM(ResolutionWidth + 44), SUM(ResolutionWidth + 45), SUM(ResolutionWidth + 46), SUM(ResolutionWidth + 47), SUM(ResolutionWidth + 48), SUM(ResolutionWidth + 49), SUM(ResolutionWidth + 50), SUM(ResolutionWidth + 51), SUM(ResolutionWidth + 52), SUM(ResolutionWidth + 53), SUM(ResolutionWidth + 54), SUM(ResolutionWidth + 55), SUM(ResolutionWidth + 56), SUM(ResolutionWidth + 57), SUM(ResolutionWidth + 58), SUM(ResolutionWidth + 59), SUM(ResolutionWidth + 60), SUM(ResolutionWidth + 61), SUM(ResolutionWidth + 62), SUM(ResolutionWidth + 63), SUM(ResolutionWidth + 64), SUM(ResolutionWidth + 65), SUM(ResolutionWidth + 66), SUM(ResolutionWidth + 67), SUM(ResolutionWidth + 68), SUM(ResolutionWidth + 69), SUM(ResolutionWidth + 70), SUM(ResolutionWidth + 71), SUM(ResolutionWidth + 72), SUM(ResolutionWidth + 73), SUM(ResolutionWidth + 74), SUM(ResolutionWidth + 75), SUM(ResolutionWidth + 76), SUM(ResolutionWidth + 77), SUM(ResolutionWidth + 78), SUM(ResolutionWidth + 79), SUM(ResolutionWidth + 80), SUM(ResolutionWidth + 81), SUM(ResolutionWidth + 82), SUM(ResolutionWidth + 83), SUM(ResolutionWidth + 84), SUM(ResolutionWidth + 85), SUM(ResolutionWidth + 86), SUM(ResolutionWidth + 87), SUM(ResolutionWidth + 88), SUM(ResolutionWidth + 89) FROM hits;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase != '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase != '' GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS c FROM hits GROUP BY URL ORDER BY c DESC LIMIT 10;
SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND DontCountHits = 0 AND IsRefresh = 0 AND URL != '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT Title, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND DontCountHits = 0 AND IsRefresh = 0 AND Title != '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND IsRefresh = 0 AND IsLink != 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 1000, 10;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 1000, 10;
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 100, 10;
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-01T00:00:00Z' AND EventTime <= '2013-07-31T23:59:59Z' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 100000, 10;
SELECT DATE_TRUNC('minute', EventTime) AS M, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventTime >= '2013-07-14T00:00:00Z' AND EventTime <= '2013-07-15T23:59:59Z' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', EventTime) ORDER BY M LIMIT 1000, 10;

run.sh:

#!/bin/bash

TRIES=3

cat queries.sql | while read query; do
    sync
    echo 3 | sudo tee /proc/sys/vm/drop_caches

    echo "$query";
    for i in $(seq 1 $TRIES); do
        curl -sS --max-time 6000 -G --data-urlencode "query=${query}" 'http://localhost:9000/exec?timings=true' ||
            (questdb-6.4.1-rt-linux-amd64/bin/questdb.sh stop && questdb-6.4.1-rt-linux-amd64/bin/questdb.sh start && sleep 5)
        echo
    done;
done;

Initially, I faced many troubles with the queries, including:

  • HAVING is not supported;
  • COUNT(DISTINCT) is not supported;
  • count_distinct does not work for long;
  • if I set up a designated timestamp column, it skips almost all data during loading;
  • comparison of date with 'YYYY-MM-DD' does not work;
  • comparison of string = NULL, string != NULL works incorrectly;
  • LIMIT OFFSET does not work;
  • ORDER BY expr(...) does not work;

But these are not all the troubles.

Sometimes I only get this:
curl: (52) Empty reply from server

And sometimes it hangs unil:

curl: (52) Empty reply from server

  ___                  _   ____  ____
 / _ \ _   _  ___  ___| |_|  _ \| __ )
| | | | | | |/ _ \/ __| __| | | |  _ \
| |_| | |_| |  __/\__ \ |_| |_| | |_) |
 \__\_\\__,_|\___||___/\__|____/|____/
                        www.questdb.io

Something is wrong. Process does not stop. Killing..

To reproduce

Run the steps above.

Expected Behavior

No response

Environment

- **QuestDB version**: QuestDB server 6.4.1
- **OS**: Ubuntu 22.04
- **Browser**: -

Additional context

No response

Hi Alexey, thank you for your continued interest and the comprehensive benchmark! We appreciate it a lot and it is our goal to make QuestDB better. We're already on the way to address some of the benchmark woes:

  • having syntax is indeed not yet supported, however functionally having can be replaced with sub-query, e.g.
(SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL != '' GROUP BY CounterID) where c > 100000 ORDER BY l DESC LIMIT 25;

it might be a good idea though for SQL optimiser to support having syntax and rewrite it to a sub-query

  • work is underway to optimise import, which includes importing out-of-order data really fast #2155
  • work is underway to support order by expr() #2210

We will review and address other issues asap!

Thanks for creating the issue @alexey-milovidov

LIMIT OFFSET has different syntax. Use LIMIT , instead
https://questdb.io/docs/reference/sql/limit/

Could you elaborate what string != null or string=null query produces wrong result? I don't see word null in the queries list.

Could you elaborate what string != null or string=null query produces wrong result? I don't see word null in the queries list.

If I remember correctly, there is the following problem:

  1. Empty strings are parsed as NULLs from the source data.
  2. Then the queries with WHERE s != '' return a wrong result.

I've included QuestDB in the benchmark: https://benchmark.clickhouse.com/

Closing since all ClickBench issues were fixed. See ClickHouse/ClickBench#25 for more details.