- Install Docker Desktop
- Create
.env
file in the repo root by copying.env.template
- Fill in the desired
POSTGRES_PASSWORD
value in the.env
file - Build containers:
docker compose up -d --build
Check out the jupyterlab
container logs and click on the link that looks like http://127.0.0.1:8089/lab?token=...
docker exec -it trino trino
SHOW SCHEMAS FROM db;
USE db.public;
SHOW TABLES FROM public;
docker exec -it spark-master /bin/bash
cd /opt/spark/bin
./spark-submit --master spark://0.0.0.0:7077 \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar 100
docker exec -it spark-master /bin/bash
./bin/beeline
!connect jdbc:hive2://localhost:10000 scott tiger
show databases;
create table hive_example(a string, b int) partitioned by(c int);
alter table hive_example add partition(c=1);
insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3);
select count(distinct a) from hive_example;
select sum(b) from hive_example;
docker exec -it scylla-1 cqlsh
CREATE KEYSPACE data
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
USE data;
CREATE TABLE data.users (
user_id uuid PRIMARY KEY,
first_name text,
last_name text,
age int
);
INSERT INTO data.users (user_id, first_name, last_name, age)
VALUES (123e4567-e89b-12d3-a456-426655440000, 'Polly', 'Partition', 77);
docker exec -it kafka kafka-topics.sh --create --topic test --bootstrap-server 127.0.0.1:9092
Check out the .env.template
file. Copy/paste airflow related variables and
update their values where necessary.
You need to create a Slack app and setup AIRFLOW_CONN_SLACK_API_DEFAULT
env variable with Slack api key. If you don't want to use this integration,
remove the AIRFLOW_CONN_SLACK_API_DEFAULT
variable from your .env
file.