harelba / q

q - Run SQL directly on delimited files and multi-file sqlite databases

Home Page:http://harelba.github.io/q/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How does the speed depend on the environment?

sanasar-dev opened this issue · comments

First of all, I want to say that I really liked this tool. This is an amazing,
I have one question. Why q is slow about two times on production?
I know that this is not the correct question, but will try to explain a little.
This is working perfectly on my laptop (i7 12 GEN/ 4 CPU/ 2 Threads/ 32 GB RAM) and the server is Digitalocean droplet (8 CPU/ 16GB RAM).
Could you please give me some idea of how can I speed up q on production?

Hi, the speed mostly depends on the machine's memory and the size of the data at hand.

From the specs you've sent the difference might be related to the memory size (32 vs 16 gb). However, q has one feature that might help with that - caching.

When activating caching (-C readwrite), any file that is being accessed is processed in a regular manner. However, another file with a .qsql postfix is being written. This file allows subsequent executions to be much much faster and take a much smaller amount of memory.

Using the qsql file can be done in two forms:

  • Running again with -C read (or -C readwrite) - q will autodetect the existence of the qsql file and use it. Notice that running without -C means that the qsql file will be ignored.
  • Running the query directly on the .qsql file (e.g. q 'select ... from myfile.csv.qsql). In that form, there no need for the original csv/tsv file.

I hope this will help you speed up things. Will be great if you can update here on the results.

Harel

Thanks for your reply.
The cache is enabled on both my local and production. The query is running on the same file, but it is running twice as slow in production, and that is why I wanted to know if there are any other configurations that need to be checked.

This is my query:

q -H -d ";" -e UTF-8 -Q UTF-8 -C readwrite "select *, iif(cs.adset_status = 'ARCHIVED' or cs.campaign_status = 'ARCHIVED', 'ARCHIVED', iif(cs.adset_status = 'DELETED' or cs.campaign_status = 'DELETED', 'DELETED', iif(cs.adset_status = 'PAUSED' or cs.campaign_status = 'PAUSED' , 'PAUSED', 'ACTIVE'))) as status, domain || '_' || lang || '_' || slug as url, sum(clicks) as r_clicks, ROUND(sum(spend), 1) as r_spend, ROUND(sum(ay_revenue), 1) as r_ay_revenue, ROUND(sum(ay_revenue) - sum(spend), 1) as r_profit, ROUND(avg(cpc), 3) as r_cpc, ROUND(avg(roas), 1) as r_roas, ROUND(avg(cpr), 2) as r_cpr, sum(impressions) as r_impressions, sum(ay_impressions) as r_ay_impressions, sum(ay_sessions) as r_ay_sessions, ROUND(COALESCE(sum(ay_impressions) / sum(ay_sessions), 0), 1) as r_ads_per_session, ROUND((COALESCE((sum(clicks) * 1.0) / sum(impressions), 0) * 100), 1) as r_ctr, ROUND((COALESCE((sum(ay_revenue) * 1.0) / sum(spend), 0) * 1000), 1) as r_ay_roas, ROUND((COALESCE((sum(ay_revenue) * 1.0) / sum(clicks), 0)), 3) as r_rpc, ROUND((COALESCE((sum(ay_revenue)-sum(spend) * 1.0)/sum(ay_revenue), 0) * 100), 1) as r_profit_margin from /var/www/fb-tool/public/storage/reports/campaigns/2023_09.csv as cr left join /var/www/fb-tool/public/storage/reports/adset-statuses/adset-statuses.csv as cs on cr.campaign_id = cs.c_id where date >= '2023-09-01' and date <= '2023-09-30' group by campaign_id order by r_profit desc limit 30 offset 0" -E UTF-8 -O