There are 0 repository under dask-distributed topic.
A full pipeline AutoML tool for tabular data
Unified Distributed Execution
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Parallel Lammps Python interface - control a mpi4py parallel LAMMPS instance from a serial python process or a Jupyter notebook
Perform I/O intensive workloads on high-volume data sparsely located across multiple AWS regions through the use of Dask.
Test LightGBM's Dask integration on different cluster types
Code for "Training models when data doesn't fit in memory" post
Scalable Cytometry Image Processing (SCIP) is an open-source tool that implements an image processing pipeline on top of Dask, a distributed computing framework written in Python. SCIP performs projection, illumination correction, image segmentation and masking, and feature extraction.
Open Data Profiling, Quality and Analysis on NYC OpenData dataset with semantic profiling using fuzzy ratio, Levenshtein distance and regex
Python library to query and transform genomic data from indexed files
Procurement: Dask Cluster as a Process.
HPC cluster deployment and management for the Hetzner Cloud
Magic commands to support running MPI python code as well as multi-node Dask workloads on Jupyter notebooks.
Dask tutorial;Dask汉化教程
Fraud detection ML pipeline and serving POC using Dask and hopeit.engine. Project created with nbdev: https://www.fast.ai/2019/12/02/nbdev/
Efficiently read climate/meteorology data into Xarray using Dask for parallelization. Transform the data for your modelling needs.
Scale up concurrent requests to Earth Engine interactive endpoints with Dask
Python 3 tools for distributed analysis and visualisation of big climate data on HPC systems.
Preserve all necessary runtime data of a Dask client in order to "replay" and analyze the performance and behavior of the client after the fact
A custom dask remote jobqueue for HTCondor.
Testing access performance of Sentinel-1 RTC metadata catalogs
Code for fetching, sampling, and analysis of NYC taxi data from TLC and Uber for 2009-2018
Script para configuración e installacion de requermientos de un worker de Dask Distributed
Testing PyCaret, Fugue, and Dask
Distributed solution for Traveling Salesman Problem using Dask.distributed and OR-Tools
dask-ecs-lib is a Python library that effortlessly spins up a Dask cluster on AWS ECS using Fargate, allowing you to seamlessly execute and parallelize your functions.
User documentation website for the Sulis tier 2 HPC service. Built using Jekyll.
In this repo, I build a LogisticRegression prediction model with Dask and PySpark and initialize an AWS EMR cluster to run the entire pipeline.
A Dask library for Big Data processing in Python demo
Asynchronous API using Dask and AWS Fargate