explore different data viz solution for Business intelligence (BI)
- Redash: acquired by Databricks in 2020
- Apache Superset
- Metabase
In this repo, we are using the Kubernetes to deploy all components.
- Rancher Desktop:
1.7.0
- Kubernetes:
v1.25.4
- kubectl
v1.26.0
- Helm:
v3.10.2
tl;dr: ./scripts/up.sh
kubectl create namespace bi --dry-run=client -o yaml | kubectl apply -f -
follow the bitnami postgresql chart to install postgresql
helm repo add bitnami https://charts.bitnami.com/bitnami
helm upgrade --install bi-postgresql bitnami/postgresql -n bi -f postgresql/values.yaml
verify the installation
kubectl run bi-postgresql-client --rm --tty -i --restart='Never' --namespace bi --image docker.io/bitnami/postgresql:15.0.0-debian-11-r3 --env="PGPASSWORD=bi_password" -- psql --host bi-postgresql -U bi_user -d bi -p 5432 -c "\l"
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
bi | bi_user | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =Tc/bi_user +
| | | | | | | bi_user=CTc/bi_user
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
(4 rows)
kubectl run bi-pgloader -n bi -ti --rm --restart=Never --image=ghcr.io/dimitri/pgloader --overrides='
{
"spec": {
"containers":[{
"name": "main",
"image": "ghcr.io/dimitri/pgloader",
"imagePullPolicy":"IfNotPresent",
"command": ["pgloader", "/var/lib/pgloader/sqlite/stats_can.db", "postgresql://bi_user:bi_password@bi-postgresql/bi"],
"stdin": true,
"tty": true,
"volumeMounts": [{"mountPath": "/var/lib/pgloader/sqlite","name": "store"}]
}],
"volumes": [{"name":"store","hostPath":{"path":"'$PWD/sqlite'","type":"Directory"}}]
}
}'
follow the community maintained helm chart to deploy Redash
helm repo add redash https://getredash.github.io/contrib-helm-chart/
helm upgrade --install bi-redash redash/redash -f redash/values.yaml -n bi
use the docker-compose to deploy superset due to the technical issue with the helm chart of Superset
git clone --depth=1 https://github.com/apache/superset.git
cd superset
nerdctl compose -f docker-compose-non-dev.yml up -d
FIXME: follow the official helm chart to deploy Superset
- lost connection after port-forwarding
E1225 18:33:45.746629 82460 portforward.go:406] an error occurred forwarding 8088 -> 8088: error forwarding port 8088 to pod 78d8fb8629c8ab4dc2baa54d14207d413094f45c96de2378811cf54862124671, uid : failed to execute portforward in network namespace "/var/run/netns/cni-3e90f492-496d-9f08-9e0f-6b0a5d1e1008": readfrom tcp4 127.0.0.1:59662->127.0.0.1:8088: write tcp4 127.0.0.1:59662->127.0.0.1:8088: write: broken pipe
use the docker-compose to deploy metabase because the official helm chart is pending
nerdctl compose -f metabase/docker-compose.yml up -d
TODO: there is no official arm64 image of metabase
kubectl port-forward svc/bi-redash -n bi 8080:80
create the admin user and organization
- create from the web ui
- FIXME: create the user using
manage.py
./manage.py users create_root admin@example.com admin --password=admin_password --org default
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: No such file or directory
SELECT
REF_DATE,
GEO as "geo::multi-filter",
INDEX_TYPE as "index_type::filter",
VALUE
FROM new_housing_price_index
WHERE TRUE
AND REF_DATE BETWEEN '{{ REF_DATE.start }}' AND '{{ REF_DATE.end}}'
visit http://localhost:8088 for the superset web ui
login with the default confidential
user: admin
password: admin
in order to connect with the psql instance on kubernetes
kubectl port-forward svc/bi-postgresql -n bi 5432:5432
192.168.5.2
is the equivalent host.docker.internal
based on this discussion
visit http://localhost:3000 for the metabase web ui
create the admin user for the first time
in order to connect with the psql instance on kubernetes
kubectl port-forward svc/bi-postgresql -n bi 5432:5432
192.168.5.2
is the equivalent host.docker.internal
based on this discussion
SELECT
REF_DATE,
GEO,
INDEX_TYPE,
MAX(VALUE) AS VALUE -- dummy aggregate func
FROM new_housing_price_index
wHERE TRUE
AND REF_DATE >= {{ref_date_start}} -- variable type date
AND REF_DATE <= {{ref_date_end}} -- variable type date
AND {{index_type}} -- variable type field filter
AND {{geos}} -- variable type field filter
GROUP BY 1, 2, 3
caveats:
- hard-coded row limit: 2,000 unaggregated, 10,000 aggregated
- that's why we use the dummy aggregate func in the above sql
- this is not configurable, you have to build your own binary in order to tweak this
tl;dr: ./scripts/down.sh
helm uninstall bi-redash -n bi
helm uninstall bi-postgresql -n bi
kubectl delete pvc --all -n bi
kubectl delete namespace bi