Luca Canali's repositories
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Miscellaneous
Includes notes on using Apache Spark, Spark for Physics, a tool for running TPCDS on PySpark, a tool for performance testing CPUs, Jupyter notebook examples for Spark, Oracle and other DB systems.
Linux_tracing_scripts
Scripts and tools for troubleshooting and performance analysis in Linux. This includes dynamic tracing scripts with SystemTap both for system calls and for userspace function tracing.
Oracle_DBA_scripts
A collection of old-school CLI scripts for Oracle RDBMS monitoring and performance troubleshooting.
PyLatencyMap
PyLatencyMap is a tool for heat map visualization on the CLI. It is integrated with scrips to collect and visualize I/O latency heat maps from various sources, including SystemTap, DTrace, Oracle wait events, NetApp filers, trace files.
PerfSheet4
PerfSheet4 is a tool for performance troubleshooting of Oracle databases. Query and visualize Oracle AWR data using pivot charts.
Stack_Profiling
Tools and scripts for stack profiling: Userspace, Kernel, OS state and optionally Oracle wait
PerfSheet.js
PerfSheet.js is a tool for Oracle RDBMS performance troubleshooting. Use it to extract and visualize Oracle AWR time series data in the browser using JavaScript and dynamic pivot charts.
ipython-sql
%%sql magic for IPython, hopefully evolving into full SQL client
OraLatencyMap
OraLatencyMap is a performance widget running on SQL*plus (Oracle's CLI) to collect and visualize latency histograms for Oracle wait events using heat maps.
dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
hbase-connectors
Apache HBase Connectors
jupyter-extensions
Jupyter extensions for SWAN
jupyterhub-extensions
Customized components of the Jupyterhub server in SWAN (handlers, spawners, templates).
oci-hdfs-connector
HDFS Connector for Oracle Cloud Infrastructure
SLOB_2.5.4
Official SLOB distribution for version 2.5.4.0
SLOB_distribution
A Git repository used only for distributing the official SLOB release.
spark-root
Apache Spark Data Source for ROOT File Format
SparkDLTrigger
Notebooks with code and sample data for the blog article: "Machine Learning Pipelines for High Energy Physics Using Apache Spark with BigDL and Analytics Zoo"
sparkmonitor
Monitor Apache Spark from Jupyter Notebook
tf-spawner
spawn workers for tensorflow MultiWorkerMirroredStrategy