dharmeshkakadia / github-bookmarks-stars

All the repos I have starred on github

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Stars Awesome

A curated list of my GitHub stars! Generated by starred.



  • bixo/bixo - Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading pipe assembly, you can quickly create specialized web mining app




  • microsoft/Forge - A Generic Low-Code Framework Built on a Config-Driven Tree Walker
  • microsoft/synckusto - Synchronize database schemas between a Kusto cluster and the local file system
  • Unity-Technologies/ml-agents - The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement lea
  • microsoft/Trill - Trill is a single-node query processor for temporal or streaming data.
  • TheLastSliceGame/TheLastSliceGame - UPDATE: This challenge has ended. INSTRUCTIONS: Beat challenge 1 (download the game, change the code) here. Then beat challenge 2. First five to beat challenge 3 win $10,000 USD each. No joke.
  • BrianPeek/OctoBot - Manage GitHub from a chat window with Bot Framework and LUIS.ai
  • neo-project/neo - NEO Smart Economy
  • lambci/docker-lambda - Docker images and test runners that replicate the live AWS Lambda environment
  • scriptcs/scriptcs - Write C# apps with a text editor, nuget and the power of Roslyn!
  • OptiKey/OptiKey - OptiKey - Full computer control and speech with your eyes
  • PowerShell/PowerShell - PowerShell for every system!
  • microsoft/dotnet-computevirtualization - Sample class library for interfacing with Windows host compute service.
  • p-org/PSharp - A framework for rapid development of reliable asynchronous software.
  • microsoft/Mobius - C# and F# language binding and extensions to Apache Spark
  • dotnet/orleans - Cloud Native application framework for .NET
  • microsoft/referencesource - Source from the Microsoft .NET Reference Source that represent a subset of the .NET Framework


  • bulletphysics/bullet3 - Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc.
  • taichi-dev/taichi - Productive, portable, and performant GPU programming in Python.
  • google/sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
  • NVIDIA/MatX - An efficient C++17 GPU numerical computing library with Python-like syntax
  • NVIDIA/DALI - A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
  • openxla/xla - A machine learning compiler for GPUs, CPUs, and ML accelerators
  • bark-simulator/bark - Open-Source Framework for Development, Simulation and Benchmarking of Behavior Planning Algorithms for Autonomous Driving
  • ApolloAuto/apollo - An open autonomous driving platform
  • jiazhihao/TASO - The Tensor Algebra SuperOptimizer for Deep Learning
  • oneapi-src/oneDNN - oneAPI Deep Neural Network Library (oneDNN)
  • pytorch/glow - Compiler for Neural Network hardware accelerators
  • llvm/torch-mlir - The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
  • facebookresearch/moolib - A library for distributed ML training with PyTorch
  • typesense/typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch โšก ๐Ÿ” โœจ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
  • uNetworking/uWebSockets - Simple, secure & standards compliant web server for the most demanding of applications
  • mc2-project/mc2 - A Platform for Secure Analytics and Machine Learning
  • flexflow/FlexFlow - FlexFlow Serve: Low-Latency, High-Performance LLM Serving
  • Tencent/TurboTransformers - a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
  • facebookresearch/faiss - A library for efficient similarity search and clustering of dense vectors.
  • KDE/ghostwriter - Text editor for Markdown
  • pqrs-org/Karabiner-archived - Karabiner (KeyRemap4MacBook) is a powerful utility for keyboard customization.
  • microsoft/onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
  • microsoft/ALEX - A library for building an in-memory, Adaptive Learned indEX
  • BYU-PCCL/holodeck-engine - High Fidelity Simulator for Reinforcement Learning and Robotics Research.
  • iree-org/iree - A retargetable MLIR-based machine learning compiler and runtime toolkit.
  • facebookresearch/rela - Reinforcement Learning Assembly
  • BlazingDB/blazingsql - BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
  • StanfordSNR/gg - The Stanford Builder
  • google-ai-edge/mediapipe - Cross-platform, customizable ML solutions for live and streaming media.
  • trustwallet/wallet-core - Cross-platform, cross-blockchain wallet library.
  • interpretml/interpret - Fit interpretable models. Explain blackbox machine learning.
  • carla-simulator/carla - Open-source simulator for autonomous driving research.
  • microsoft/SEAL - Microsoft SEAL is an easy-to-use and powerful homomorphic encryption library.
  • pdlfs/deltafs - Transient file system service featuring highly paralleled indexing on both file data and file system metadata
  • mawww/kakoune - mawww's experiment for a better code editor
  • rapidsai/cudf - cuDF - GPU DataFrame Library
  • rapidsai/cuml - cuML - RAPIDS Machine Learning Library
  • hydro-project/fluent - A data-driven compute platform
  • apple/turicreate - Turi Create simplifies the development of custom machine learning models.
  • src-d/minhashcuda - Weighted MinHash implementation on CUDA (multi-gpu).
  • CMU-Perceptual-Computing-Lab/openpose - OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
  • google/asylo - An open and flexible framework for developing enclave applications
  • microsoft/service-fabric - Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
  • googlecreativelab/open-nsynth-super - Open NSynth Super is an experimental physical interface for the NSynth algorithm
  • tensorflow/minigo - An open-source implementation of the AlphaGoZero algorithm
  • mrsmkl/truebit-plasma - Template for implementing Plasma child chains with Truebit
  • mozilla/DeepSpeech - DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
  • dmlc/xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
  • microsoft/ELL - Embedded Learning Library
  • heavyai/heavydb - HeavyDB (formerly OmniSciDB)
  • apache/incubator-retired-quickstep - Apache Quickstep Incubator - This project is retired
  • anse1/sqlsmith - A random SQL query generator
  • CD3/gsc - Run guided scripts for command line demos.
  • cmu-db/peloton - The Self-Driving Database Management System
  • PolySync/oscc - Open Source Car Control ๐Ÿ’ป๐Ÿš—๐Ÿ™Œ
  • facebookarchive/beringei - Beringei is a high performance, in-memory storage engine for time series data.
  • rescrv/Consus - Consus is a geo-replicated transactional key-value store.
  • facebookarchive/bistro - Bistro is a flexible distributed scheduler, a high-performance framework supporting multiple paradigms while retaining ease of configuration, management, and monitoring.
  • eventql/eventql - Distributed "massively parallel" SQL query engine
  • scylladb/scylladb - NoSQL data store using the seastar framework, compatible with Apache Cassandra
  • mldbai/mldb - MLDB is the Machine Learning Database
  • amazon-archives/amazon-dsstne - Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
  • microsoft/rDSN - Robust Distributed System Nucleus (rDSN) is an open framework for quickly building and managing high performance and robust distributed systems.
  • includeos/IncludeOS - A minimal, resource efficient unikernel for cloud services
  • microsoft/CNTK - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
  • breach/thrust - Chromium-based cross-platform / cross-language application framework
  • runtimejs/runtime - [not maintained] Lightweight JavaScript library operating system for the cloud
  • Samsung/veles - Distributed machine learning platform
  • tensorflow/tensorflow - An Open Source Machine Learning Framework for Everyone
  • crosswalk-project/crosswalk - A web runtime built on Chrome. This project is currently unmaintained.
  • CharithYMendis/Helium - Helium: Lifting High-Performance Stencil Kernels from Stripped x86 Binaries to Halide DSL Code
  • scylladb/seastar - High performance server-side application framework
  • apache/mxnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
  • dmlc/wormhole - Deprecated
  • mesosphere/serenity - Intel:Mesosphere oversubscription technologies for Apache Mesos
  • cocaine/cocaine-core - An open platform to build your own PaaS clouds.
  • facebook/folly - An open-source C++ library developed and used at Facebook.
  • capnproto/ekam - Ekam Build System
  • TU-Berlin-DIMA/myriad-toolkit - Myriad Parallel Data Generator Toolkit
  • elodina/alligator - Custom allocator modules for Apache Mesos
  • owasp-modsecurity/ModSecurity - ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx. It has a robust event-based programming language which provides protection from a range o
  • camsas/Musketeer - The Musketeer workflow manager.
  • tstack/lnav - Log file navigator
  • tomahawk-player/tomahawk - Tomahawk, the multi-source music player
  • cmderdev/cmder - Lovely console emulator package for Windows
  • mesos/modules - Mesos modules examples and open source modules outside of the Apache Mesos source tree.
  • dmlc/minerva - Minerva: a fast and flexible tool for deep learning on multi-GPU. It provides ndarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code ca
  • cdapio/tigon - High Throughput Real-time Stream Processing Framework
  • yahoo/mdbm - MDBM a very fast memory-mapped key/value store.
  • capnproto/capnproto - Cap'n Proto serialization/RPC system - core tools and C++ library
  • papyros/papyros-shell - ๐Ÿš The desktop shell for Papyros, built using QtQuick and QtCompositor as a compositor for Wayland.
  • asmuth/clip - Create charts from the command line
  • rr-debugger/rr - Record and Replay Framework
  • osquery/osquery - SQL powered operating system instrumentation, monitoring, and analytics.
  • dedis/Dissent - Provably Anonymous Overlay
  • icecc/icecream - Distributed compiler with a central scheduler to share build load
  • facebook/mcrouter - Mcrouter is a memcached protocol router for scaling memcached deployments.
  • facebookarchive/scribe - Scribe is a server for aggregating log data streamed in real time from a large number of servers.
  • zerotier/ZeroTierOne - A Smart Ethernet Switch for Earth
  • maidsafe-archive/MaidSafe - This is the super-project in which each MaidSafe library resides. Some information is common to all libraries, and is detailed here. Library-specific information can be found in each library's wiki.
  • Yelp/MOE - A global, black box optimization engine for real world metric optimization.
  • uwsampa/grappa - Grappa: scaling irregular applications on commodity clusters
  • logcabin/logcabin - LogCabin is a distributed storage system built on Raft that provides a small amount of highly replicated, consistent storage. It is a reliable place for other distributed systems to store their core m
  • draios/sysdig - Linux system exploration and troubleshooting tool with first class support for containers
  • primecoin/primecoin - Primecoin - Cryptocurrency with Useful PoW Consensus
  • Studio3T/robomongo - Native cross-platform MongoDB management tool
  • rethinkdb/rethinkdb - The open-source database for the realtime web.
  • quantcast/qfs - Quantcast File System
  • ydmao/Metis - MapReduce for multi-core
  • facebook/rocksdb - A library that provides an embeddable, persistent key-value store for fast storage.
  • sit/dht - MIT Chord/DHash
  • rescrv/HyperDex - HyperDex is a scalable, searchable key-value store
  • google/lmctfy - lmctfy is the open source version of Googleโ€™s container stack, which provides Linux application containers.
  • camsas/firmament - The Firmament cluster scheduling platform




Common Lisp


  • mit-pdos/fscq - FSCQ is a certified file system written and proven in Coq
  • uwplse/verdi - A framework for formally verifying distributed systems implementations in Coq





  • UrbanOS-Public/kdp - Kubernetes deployment of PrestoDB, Hive Metastore, and Minio S3-standard object store
  • jpetazzo/critmux - Docker + CRIU + tmux = magic!
  • portworx/px-dev - PX-Developer is scale-out storage for containers. Run Cassandra, Jenkins, or any application in Docker, with enterprise storage functionality on commodity servers



Emacs Lisp


Git Attributes



  • milliondreams/Dancing-Elephant - Write your MR Jobs in Groovy from a web console and run them on a hadoop cluster
  • rundeck/rundeck - Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts


  • awslabs/data-on-eks - DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
  • coreos/tectonic-installer - Install a Kubernetes cluster the CoreOS Tectonic Way: HA, self-hosted, RBAC, etcd Operator, and more







  • FluxML/Flux.jl - Relax! Flux is the ML library that doesn't make you tensor

Jupyter Notebook


  • MLReef/mlreef - The collaboration workspace for Machine Learning
  • KronicDeth/intellij-elixir - Elixir plugin for JetBrain's IntelliJ Platform (including Rubymine)
  • orbit/orbit - Orbit - Virtual actor framework for building distributed systems
  • square/okhttp - Squareโ€™s meticulous HTTP client for the JVM, Android, and GraalVM.







  • bmatzelle/gow - Unix command line utilities installer for Windows.



  • semgrep/semgrep - Lightweight static analysis for many languages. Find bug variants with patterns that look like source code.
  • o1-labs/snarkette - Pure OCaml implementation of the Groth-Maller SNARK verifier (and associated crypto)
  • mirleft/btc-pinata - If you smash it, you get to keep the pieces.
  • reasonml/reason - Simple, fast & type safe code that leverages the JavaScript & OCaml ecosystems
  • batsh-dev-team/Batsh - A language that compiles to Bash and Windows Batch
  • facebook/infer - A static analyzer for Java, C, C++, and Objective-C
  • mirage/irmin - Irmin is a distributed database that follows the same design principles as Git
  • mirage/mirage - MirageOS is a library operating system that constructs unikernels



  • Qiskit/qiskit-metapackage - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.




  • zombodb/zombodb - Making Postgres and Elasticsearch work together like it's 2023





Protocol Buffer



  • stanfordnlp/pyreft - ReFT: Representation Finetuning for Language Models
  • TencentARC/InstantMesh - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
  • OpenRouterTeam/openrouter-runner - Inference engine powering open source models on OpenRouter
  • kscalelabs/sim - Training in simulation
  • jiaaro/pydub - Manipulate audio with a simple and easy high level interface
  • thuml/depyf - depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
  • huggingface/lerobot - ๐Ÿค— LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch
  • pydantic/logfire - Uncomplicated Observability for Python and beyond! ๐Ÿชต๐Ÿ”ฅ
  • myshell-ai/OpenVoice - Instant voice cloning by MyShell.
  • apple/corenet - CoreNet: A library for training deep neural networks
  • TransformerLensOrg/TransformerLens - A library for mechanistic interpretability of GPT-style language models
  • google-deepmind/penzai - A JAX research toolkit for building, editing, and visualizing neural networks.
  • run-house/runhouse - Write local debuggable Python which traverses your powerful remote infra. Deploy as-is. Unobtrusive, unopinionated, PyTorch-like APIs.
  • PKU-YuanGroup/Open-Sora-Plan - This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
  • facebookresearch/schedule_free - Schedule-Free Optimization in PyTorch
  • qnguyen3/chat-with-mlx - Chat with your data natively on Apple Silicon using MLX Framework.
  • OpenInterpreter/01 - The open-source language model computer
  • apple/axlearn - An Extensible Deep Learning Library
  • pydantic/FastUI - Build better UIs faster.
  • marimo-team/marimo - A reactive notebook for Python โ€” run reproducible experiments, execute as a script, deploy as an app, and version with git.
  • speechbrain/speechbrain - A PyTorch-based Speech Toolkit
  • NVlabs/trajdata - A unified interface to many trajectory forecasting datasets.
  • apple/ml-mgie -
  • wayveai/Driving-with-LLMs - PyTorch implementation for the paper "Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving"
  • google/orbax - Orbax provides common utility libraries for JAX users.
  • facebookresearch/ImageBind - ImageBind One Embedding Space to Bind Them All
  • google-research/t5x -
  • carson-katri/dream-textures - Stable Diffusion built-in to Blender
  • NeilGirdhar/tjax - Tools for JAX
  • google/paxml - Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading
  • google/maxtext - A simple, performant and scalable Jax LLM!
  • patrick-kidger/diffrax - Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. https://docs.kidger.site/diffrax/
  • facebookresearch/AnimatedDrawings - Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
  • cvg/LightGlue - LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
  • google-deepmind/tree - tree is a library for working with nested data structures
  • stanford-crfm/haliax - Named Tensors for Legible Deep Learning in JAX
  • stanford-crfm/levanter - Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
  • google/ml_collections - ML Collections is a library of Python Collections designed for ML use cases.
  • google/learned_optimization -
  • Nuitka/Nuitka - Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, and 3.11. You feed it your Python app, it does a lot of clever things, a
  • Dao-AILab/flash-attention - Fast and memory-efficient exact attention
  • danswer-ai/danswer - Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
  • morph-labs/rift - Rift: an AI-native language server for your personal AI software engineer
  • facebookresearch/hiera - Hiera: A fast, powerful, and simple hierarchical vision transformer.
  • Transpile-AI/ivy - The Unified AI Framework
  • replit/ReplitLM - Inference code and configs for the ReplitLM model family
  • mlc-ai/mlc-llm - Universal LLM Deployment Engine with ML Compilation
  • facebookresearch/nocturne - A data-driven, fast driving simulator for multi-agent coordination under partial observability.
  • NVlabs/traffic-behavior-simulation -
  • NVIDIA/framework-reproducibility - Providing reproducibility in deep learning frameworks
  • Significant-Gravitas/AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
  • microsoft/JARVIS - JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
  • lm-sys/FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
  • apple/ml-ane-transformers - Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
  • hyperonym/basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
  • microsoft/LMOps - General technology for enabling AI capabilities w/ LLMs and MLLMs
  • FMInference/FlexGen - Running large language models on a single GPU for throughput-oriented scenarios.
  • awslabs/slapo - A schedule language for large model training
  • huggingface/trl - Train transformer language models with reinforcement learning.
  • hpcaitech/ColossalAI - Making large AI models cheaper, faster and more accessible
  • AminHP/gym-anytrading - The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
  • opendilab/DI-drive - Decision Intelligence Platform for Autonomous Driving simulation.
  • ianb/infinite-ai-array - Do you worry that you'll get to the end of a good list and have nothing more, leaving you sad and starved of data? Worry no more!
  • LAION-AI/Open-Assistant - OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
  • commaai/comma10k - 10k crowdsourced images for training segnets
  • waymo-research/waymo-open-dataset - Waymo Open Dataset
  • google-research/cascades - Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference, and more.
  • brentyi/tyro - Zero-effort CLI interfaces & config objects, from types
  • tinygrad/tinygrad - You like pytorch? You like micrograd? You love tinygrad! โค๏ธ
  • langchain-ai/langchain - ๐Ÿฆœ๐Ÿ”— Build context-aware reasoning applications
  • blackjax-devs/blackjax - BlackJAX is a Bayesian Inference library designed for ease of use, speed and modularity.
  • sematic-ai/sematic - An open-source ML pipeline development platform
  • Sea-Snell/JAXSeq - Train very large language models in Jax.
  • facebookincubator/AITemplate - AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
  • grantjenks/python-sortedcontainers - Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
  • mindsdb/mindsdb - The platform for customizing AI from enterprise data
  • tensorflow/transform - Input pipeline framework
  • NVIDIA/Megatron-LM - Ongoing research training transformer models at scale
  • google-research/python-graphs - A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.
  • facebookresearch/xformers - Hackable and optimized Transformers building blocks, supporting a composable construction.
  • kernc/backtesting.py - ๐Ÿ”Ž ๐Ÿ“ˆ ๐Ÿ ๐Ÿ’ฐ Backtest trading strategies in Python.
  • twopirllc/pandas-ta - Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
  • substrait-io/substrait - A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
  • nod-ai/SHARK - SHARK - High Performance Machine Learning Distribution
  • leondgarse/keras_cv_attention_models - Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,
  • facebookresearch/mobile-vision - Mobile vision models and code
  • microsoft/Swin-Transformer - This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
  • visual-layer/fastdup - fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data oper
  • alpa-projects/alpa - Training and serving large-scale neural networks with auto parallelization.
  • mdbloice/Augmentor - Image augmentation library in Python for machine learning.
  • bytedance/byteps - A high performance and generic framework for distributed DNN training
  • patrick-kidger/equinox - Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
  • facebookresearch/metaseq - Repo for external large-scale work
  • mosaicml/composer - Supercharge Your Model Training
  • ELS-RD/transformer-deploy - Efficient, scalable and enterprise-grade CPU/GPU inference server for ๐Ÿค— Hugging Face transformer models ๐Ÿš€
  • cleanlab/cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
  • neuralmagic/sparseml - Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
  • neuralmagic/deepsparse - Sparsity-aware deep learning inference runtime for CPUs
  • microsoft/msrflute - Federated Learning Utilities and Tools for Experimentation
  • pytorch/captum - Model interpretability and understanding for PyTorch
  • OpenBB-finance/OpenBBTerminal - Investment Research for Everyone, Everywhere.
  • minimaxir/imgbeddings - Python package to generate image embeddings with CLIP without PyTorch/TensorFlow
  • adbar/trafilatura - Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
  • libffcv/ffcv - FFCV: Fast Forward Computer Vision (and other ML workloads!)
  • CZ-NIC/pz - Easily handle day to day CLI operation via Python instead of regular Bash programs. ๐Ÿ‡บ๐Ÿ‡ฆ #supporting
  • jacopotagliabue/you-dont-need-a-bigger-boat - An end-to-end implementation of intent prediction with Metaflow and other cool tools
  • jeffshek/open - The most boring open source you've ever seen ....
  • fal-ai/dbt-fal - do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
  • alvarobartt/investpy - Financial Data Extraction from Investing.com with Python
  • borisdayma/dalle-mini - DALLยทE Mini - Generate images from a text prompt
  • NVIDIA-Merlin/Transformers4Rec - Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
  • magenta/mt3 - MT3: Multi-Task Multitrack Music Transcription
  • NVIDIA-Merlin/NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
  • ethpm/ethpm-spec - Ethereum Package Manager http://ethpm.github.io/ethpm-spec/
  • daviskirk/climatecontrol - Python library for loading settings and config data from files and environment variables
  • marshmallow-code/marshmallow - A lightweight library for converting complex objects to and from simple Python datatypes.
  • fugue-project/fugue - A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
  • microsoft/qlib - Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to
  • cupy/cupy - NumPy & SciPy for GPU
  • betodealmeida/gsheets-db-api - A Python DB-API and SQLAlchemy dialect to Google Spreasheets
  • a-rahimi/python-checkpointing2 - Checkpoint the state of Python programs using Pythonic setjmp and longjmp
  • google/python-fire - Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
  • tiangolo/sqlmodel - SQL databases in Python, designed for simplicity, compatibility, and robustness.
  • oap-project/raydp - RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
  • triton-inference-server/server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
  • odoo/odoo - Odoo. Open Source Apps To Grow Your Business.
  • obsei/obsei - Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
  • MrPowers/chispa - PySpark test helper methods with beautiful error messages
  • microsoft/unilm - Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
  • facebookresearch/Kats - Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics,
  • yearn/brownie-strategy-mix -
  • bentoml/BentoML - The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
  • dask-contrib/dask-sql - Distributed SQL Engine in Python using Dask
  • asyml/forte - Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
  • pyodide/pyodide - Pyodide is a Python distribution for the browser and Node.js based on WebAssembly
  • ApeWorX/ape - The smart contract development tool for Pythonistas, Data Scientists, and Security Professionals
  • Textualize/rich - Rich is a Python library for rich text and beautiful formatting in the terminal.
  • gruns/icecream - ๐Ÿฆ Never use print() to debug again.
  • PrefectHQ/prefect - Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
  • vaexio/vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second ๐Ÿš€
  • RUCAIBox/RecBole - A unified, comprehensive and efficient recommendation library
  • bram2w/baserow - The official repository is hosted on https://gitlab.com/bramw/baserow. Baserow is an open source no-code database tool and Airtable alternative.
  • logicalclocks/maggy - Distribution transparent Machine Learning experiments on Apache Spark
  • apache/incubator-liminal - Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validati
  • world-federation-of-advertisers/cardinality_estimation_evaluation_framework - Evaluation framework and methods for estimating cardinalities of groups of sets
  • huggingface/accelerate - ๐Ÿš€ A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
  • coqui-ai/TTS - ๐Ÿธ๐Ÿ’ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  • facebookresearch/madgrad - MADGRAD Optimization Method
  • gradio-app/gradio - Build and share delightful machine learning apps, all in Python. ๐ŸŒŸ Star to support our work!
  • apache/tvm - Open deep learning compiler stack for cpu, gpu and specialized accelerators
  • jmfernandes/robin_stocks - This is a library to use with Robinhood Financial App. It currently supports trading crypto-currencies, options, and stocks. In addition, it can be used to get real time ticker information, assess the
  • Overv/outrun - Execute a local command using the processing power of another Linux machine.
  • xhluca/dl-translate - Library for translating between 200 languages. Built on ๐Ÿค— transformers.
  • replicate/keepsake - Version control for machine learning
  • chubin/cheat.sh - the only cheat sheet you need
  • asyml/texar - Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
  • activeloopai/deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.
  • samuelcolvin/notbook - An argument that Jupyter Notebooks are flawed and the world needs a successor.
  • samuelcolvin/python-devtools - Dev tools for python
  • sodadata/soda-core - โšก Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
  • zenml-io/zenml - ZenML ๐Ÿ™: Build portable, production-ready MLOps pipelines. https://zenml.io.
  • Lightning-Universe/lightning-flash - Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
  • sktime/sktime - A unified framework for machine learning with time series
  • mwouts/jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
  • agermanidis/pigeon - ๐Ÿฆ Quickly annotate data from the comfort of your Jupyter notebook
  • microsoft/hummingbird - Hummingbird compiles trained ML models into tensor computation for faster inference.
  • elblogbruno/NotionAI-MyMind - This repo uses AI and the wonderful Notion to enable you to add anything on the web to your "Mind" and forget about everything else.
  • Instagram/MonkeyType - A Python library that generates static type annotations by collecting runtime types
  • EleutherAI/gpt-neo - An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
  • EleutherAI/gpt-neox - An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
  • danielegrattarola/spektral - Graph Neural Networks with Keras and Tensorflow 2.
  • pyro-ppl/numpyro - Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation to GPU/TPU/CPU.
  • brettkromkamp/contextualise - Contextualise is an effective tool particularly suited for organising information-heavy projects and activities consisting of unstructured and widely diverse data and information resources
  • karlicoss/HPI - Human Programming Interface ๐Ÿง‘๐Ÿ‘ฝ๐Ÿค–
  • hackalog/easydata - A flexible template for doing reproducible data science in Python.
  • unitaryai/detoxify - Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using โšก Pytorch Lightning and ๐Ÿค— Transformers. For access to our API, please email us at contact@unitary.
  • google-deepmind/dm-haiku - JAX-based neural network library
  • google-deepmind/sonnet - TensorFlow-based neural network library
  • awwong1/torchprof - PyTorch layer-by-layer model profiler
  • voila-dashboards/voila - Voilร  turns Jupyter notebooks into standalone web applications
  • interpretml/interpret-text - A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard.
  • CharlieDinh/FEDL_pytorch - This repository implements FEDL using pytorch
  • modin-project/modin - Modin: Scale your Pandas workflows by changing a single line of code
  • google/gin-config - Gin provides a lightweight configuration framework for Python
  • facebookresearch/fairscale - PyTorch extensions for high performance and large scale training.
  • arogozhnikov/einops - Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
  • Rudrabha/Wav2Lip - This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
  • lucidrains/vit-pytorch - Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
  • airbytehq/airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
  • Neuraxio/Neuraxle - The world's cleanest AutoML library โœจ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spa
  • microsoft/DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • SeldonIO/alibi - Algorithms for explaining machine learning models
  • vaikkunth/PrivacyFL - A Simulator for Privacy Preserving Federated Learning
  • woven-planet/l5kit - L5Kit - https://woven.toyota
  • jbesomi/texthero - Text preprocessing, representation and visualization from zero to hero.
  • pcyin/tranX - A general-purpose neural semantic parser for mapping natural language queries into machine executable code
  • paulfitz/mlsql - inferring sql queries from plain-text questions about tables
  • ActivityWatch/activitywatch - The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused.
  • naiveHobo/InvoiceNet - Deep neural network to extract intelligent information from invoice documents.
  • linkedin/detext - DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
  • karpathy/minGPT - A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
  • alpacahq/alpaca-backtrader-api - Alpaca Trading API integrated with backtrader
  • microsoft/Olive - Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
  • great-expectations/great_expectations - Always know what to expect from your data.
  • huggingface/hmtl - ๐ŸŒŠHMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP
  • facebookresearch/pytext - A natural language modeling framework based on PyTorch
  • holoviz/holoviews - With Holoviews, your data visualizes itself.
  • uber/fiber - Distributed Computing for AI Made Simple
  • google/TensorNetwork - A library for easy and efficient manipulation of tensor networks.
  • UKPLab/sentence-transformers - Multilingual Sentence & Image Embeddings with BERT
  • hypothesis/h - Annotate with anyone, anywhere.
  • mit-han-lab/hardware-aware-transformers - [ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
  • pymc-devs/pymc - Bayesian Modeling and Probabilistic Programming in Python
  • kserve/kserve - Standardized Serverless ML Inference Platform on Kubernetes
  • google-parfait/tensorflow-federated - An open-source framework for machine learning and other computations on decentralized data.
  • facebookresearch/detr - End-to-End Object Detection with Transformers
  • aleju/imgaug - Image augmentation for machine learning experiments.
  • libindic/indic-trans - The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including English.
  • plotly/jupyter-dash - OBSOLETE - Dash v2.11+ has Jupyter support built in!
  • kotartemiy/newscatcher - Programmatically collect normalized news from (almost) any website.
  • ThilinaRajapakse/simpletransformers - Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
  • mingrammer/diagrams - ๐ŸŽจ Diagram as Code for prototyping cloud system architectures
  • resemble-ai/Resemblyzer - A python package to analyze and compare voices with deep learning
  • openai/jukebox - Code for the paper "Jukebox: A Generative Model for Music"
  • ml-tooling/ml-hub - ๐Ÿงฐ Multi-user development platform for machine learning teams. Simple to setup within minutes.
  • google-research/albert - ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
  • d2l-ai/d2l-en - Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
  • pytorch/elastic - PyTorch elastic training
  • fepegar/torchio - Medical imaging toolkit for deep learning
  • codelucas/newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
  • jupyter-widgets-contrib/ipysheet - Jupyter handsontable integration
  • nextstrain/ncov - Nextstrain build for novel coronavirus SARS-CoV-2
  • marvinbuss/MLDevOps - ML DevOps using GitHub Actions and Azure Machine Learning
  • reiinakano/scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects.
  • kubeflow-kale/kale - Kubeflowโ€™s superfood for Data Scientists
  • Lightning-AI/pytorch-lightning - Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
  • google/trax - Trax โ€” Deep Learning with Clear Code and Speed
  • nalepae/pandarallel - A simple and efficient tool to parallelize Pandas operations on all available CPUs
  • timkpaine/pyEX - Python interface to IEX and IEX cloud APIs
  • google/flax - Flax is a neural network library for JAX that is designed for flexibility.
  • Netflix/metaflow - ๐Ÿš€ Build and manage real-life ML, AI, and data science projects with ease!
  • apple/coremltools - Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
  • google-research/text-to-text-transfer-transformer - Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
  • google-deepmind/dm_memorytasks - A set of 13 diverse machine-learning tasks that require memory to solve.
  • tensorflow/lingvo - Lingvo
  • recommenders-team/recommenders - Best Practices on Recommendation Systems
  • amundsen-io/amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
  • eriklindernoren/ML-From-Scratch - Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learnin
  • tiangolo/fastapi - FastAPI framework, high performance, easy to learn, fast to code, ready for production
  • deepdrive/deepdrive - Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
  • goru001/inltk - Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
  • openai/gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners"
  • deezer/spleeter - Deezer source separation library including pretrained models.
  • vinayak-mehta/nbcommands - Unix commands for Jupyter notebooks.
  • timkpaine/paperboy - A web frontend for scheduling Jupyter notebook reports
  • gnes-ai/gnes - GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
  • sebastianruder/NLP-progress - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
  • feast-dev/feast - The Open Source Feature Store for Machine Learning
  • CorentinJ/Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
  • SPFlow/SPFlow - Sum Product Flow: An Easy and Extensible Library for Sum-Product Networks
  • pronobis/libspn - Library for learning and inference with Sum-product Networks
  • huggingface/transformers - ๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
  • google-deepmind/graph_nets - Build Graph Nets in Tensorflow
  • streamlit/streamlit - Streamlit โ€” A faster way to build and share data apps.
  • kubeflow/pipelines - Machine Learning Pipelines for Kubeflow
  • deepfakes/faceswap - Deepfakes Software For All
  • hukkelas/DeepPrivacy - DeepPrivacy: A Generative Adversarial Network for Face Anonymization
  • iperov/DeepFaceLab - DeepFaceLab is the leading software for creating deepfakes.
  • microsoft/icecaps - Intelligent Conversation Engine: Code and Pre-trained Systems. Version 0.2.0.
  • microsoft/presidio - Context aware, pluggable and customizable data protection and de-identification SDK for text and images
  • jmcarpenter2/swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
  • google-deepmind/bsuite - bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
  • TarrySingh/ERNIE - An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)
  • online-ml/river - ๐ŸŒŠ Online machine learning in Python
  • nteract/bookstore - ๐Ÿ“š Notebook storage and publishing workflows for the masses
  • tf-encrypted/tf-encrypted - A Framework for Encrypted Machine Learning in TensorFlow
  • jina-ai/clip-as-service - ๐Ÿ„ Scalable embedding, reasoning, ranking for images and sentences with CLIP
  • facebookresearch/fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
  • keithito/tacotron - A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
  • codesociety/friartuck - Live Quant Trading Framework for Robinhood, using IEX Trading and AlphaVantage for Free Prices.
  • kedro-org/kedro - Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, an
  • lyft/cartography - Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.
  • src-d/ml-core - source{d} MLonCode foundation - core algorithms and models.
  • github/argo-ml - Controllers, wrappers and miscaleus utils to make it easier for Argo to be used in ML scenarios
  • facebookresearch/mmf - A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
  • zipline-live/zipline - Zipline-Live, a Pythonic Algorithmic Trading Library
  • amzn/metalearn-leap - Original PyTorch implementation of the Leap meta-learner (https://arxiv.org/abs/1812.01054) along with code for running the Omniglot experiment presented in the paper.
  • ydataai/ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
  • thinkingmachines/geomancer - Automated feature engineering for geospatial data
  • NVlabs/SPADE - Semantic Image Synthesis with SPADE
  • kapicorp/kapitan - Generic templated configuration management for Kubernetes, Terraform and other things
  • hummingbot/hummingbot - Open source software that helps you create and deploy high-frequency crypto trading bots
  • ludwig-ai/ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models
  • google-deepmind/mathematics_dataset - This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
  • databricks/koalas - Koalas: pandas API on Apache Spark
  • 3dperceptionlab/therobotrix -
  • EthicalML/xai - XAI - An eXplainability toolbox for machine learning
  • ArchiveBox/ArchiveBox - ๐Ÿ—ƒ Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
  • openai/neural-mmo - Code for the paper "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents"
  • tensorflow/tfx - TFX is an end-to-end platform for deploying production ML pipelines
  • BYU-PCCL/holodeck - High Fidelity Simulator for Reinforcement Learning and Robotics Research.
  • VPanjeta/ModiScript - Acche din aa gaye
  • iterative/dvc - ๐Ÿฆ‰ ML Experiments and Data Management with Git
  • piskvorky/smart_open - Utils for streaming large files (S3, HDFS, gzip, bz2...)
  • flairNLP/flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
  • dagster-io/dagster - An orchestration platform for the development, production, and observation of data assets.
  • tensorlayer/TensorLayer - Deep Learning and Reinforcement Learning Library for Scientists and Engineers
  • eriklindernoren/Keras-GAN - Keras implementations of Generative Adversarial Networks.
  • hindupuravinash/the-gan-zoo - A list of all named GANs!
  • yfeng95/GAN - Resources and Implementations of Generative Adversarial Nets: GAN, DCGAN, WGAN, CGAN, InfoGAN
  • openai/spinningup - An educational resource to help anyone learn deep reinforcement learning.
  • Wikia/discreETLy - ETLy is an add-on dashboard service on top of Apache Airflow.
  • openphilanthropy/unrestricted-adversarial-examples - Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge
  • tensorflow/privacy - Library for training machine learning models with privacy for training data
  • google/jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
  • bamos/zsh-history-analysis - Plot your .zsh_history.
  • likedan/Awesome-CoreML-Models - Largest list of models for Core ML (for iOS 11+)
  • robinhood/faust - Python Stream Processing
  • IntelLabs/coach - Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
  • cleardusk/3DDFA - The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.
  • alteryx/featuretools - An open source python library for automated feature engineering
  • google/kasane - A simple kubernetes deployment manager
  • hi-primus/optimus - ๐Ÿšš Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
  • facebookresearch/DME - Dynamic Meta-Embeddings for Improved Sentence Representations
  • lspvic/jupyter_tensorboard - Start Tensorboard in Jupyter Notebook
  • anilshanbhag/RobinhoodShell - A command line shell for trading stocks using Robinhood
  • tensorflow/compression - Data compression in TensorFlow
  • chiphuyen/sotawhat - Returns latest research results by crawling arxiv papers and summarizing abstracts. Helps you stay afloat with so many new papers everyday.
  • ramtinms/ethereum-log - A native light weight implementation of log parser for Ethereum event logs
  • uber/petastorm - Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a
  • nteract/papermill - ๐Ÿ“š Parameterize, execute, and analyze notebooks
  • StevenBlack/hosts - ๐Ÿ”’ Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
  • madhavanmalolan/awesome-reactnative-ui - Awesome React Native UI components updated weekly
  • src-d/ml - sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees
  • techbanca/coinai - Seed applications based on AI for digital currency quantitative analysis, medium-term forecast and asset allocation for the secondary market of the BANCA community
  • openai/glow - Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
  • mapbox/robosat - Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
  • bigchaindb/bigchaindb - Meet BigchainDB. The blockchain database.
  • algorithmiaio/danku - Exchange ML models in a trustless manner!
  • fossasia/visdom - A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
  • tensorflow/tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
  • mlflow/mlflow - Open source platform for the machine learning lifecycle
  • achembarpu/pockyt - Automate & manage your Pocket.com collection.
  • carlomazzaferro/kryptoflow - Real-time analysis of bitcoin markets with Kafka and Tensorflow Serving
  • openuado/niet - Parse/Read yaml or json files directly in your shell (sh, bash, ksh, ...)
  • cloudevents/spec - CloudEvents Specification
  • cchen156/Learning-to-See-in-the-Dark - Learning to See in the Dark. CVPR 2018
  • Hexadite/acs-keyvault-agent - A Azure Key Vault agent container that grabs secrets from Azure Key Vault securely and passes them to other containers in its pod
  • mlcommons/training - Reference implementations of MLPerfโ„ข training benchmarks
  • zhoubolei/moments_models - The pretrained models trained on Moments in Time Dataset
  • tonybeltramelli/pix2code - pix2code: Generating Code from a Graphical User Interface Screenshot
  • ucbdrive/skipnet - Code for SkipNet: Learning Dynamic Routing in Convolutional Networks (ECCV 2018)
  • kipoi/models - Model zoo for genomics
  • chainer/chainer - A flexible framework of neural networks for deep learning
  • google-research/batch-ppo - Efficient Batched Reinforcement Learning in TensorFlow
  • NVIDIA/FastPhotoStyle - Style transfer, deep learning, feature transform
  • airbnb/omniduct - A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (including HDFS, Hive, Presto, MySQL, etc).
  • omgnetwork/plasma-mvp - OmiseGO's research implementation of Minimal Viable Plasma
  • OpenMined/PySyft - Perform data science on data that remains in someone else's server
  • Cloud-CV/visual-chatbot - โ˜๏ธ ๐Ÿ‘€ ๐Ÿ’ฌ Visual Chatbot
  • Cloud-CV/Fabrik - ๐Ÿญ Collaboratively build, visualize, and design neural nets in browser
  • freqtrade/freqtrade - Free, open source crypto trading bot
  • yahoo/TensorFlowOnSpark - TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
  • zhoubear/open-paperless - Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
  • brndnmtthws/optimal-buy-cbpro - Scheduled buying of BTC, ETH, and LTC from Coinbase Pro, optimally!
  • yunjey/stargan - StarGAN - Official PyTorch Implementation (CVPR 2018)
  • onnx/onnx - Open standard for machine learning interoperability
  • alexellis/repaint-the-past - Full instructions for repainting the past
  • fiunchinho/dockerize-me - This tool lets you Dockerize your applications using best practices to define your Dockerfile and Docker entry point files.
  • mesosphere/marathon-autoscale - Simple Proof-of-Concept for Scaling Application running on Marathon based on Utilization
  • snorkel-team/snorkel - A system for quickly generating training data with weak supervision
  • Azure/simdem - Tool for Simulating Demo's, delivering Tutorials and using documentation as tests.
  • horovod/horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • jorgebastida/awslogs - AWS CloudWatch logs for Humansโ„ข
  • xoreaxeaxeax/sandsifter - The x86 processor fuzzer
  • ShopRunner/jupyter-notify - A Jupyter Notebook magic for browser notifications of cell completion
  • machinalis/quepy - A python framework to transform natural language questions to queries in a database query language.
  • openai/InfoGAN - Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
  • hjacobs/kubernetes-on-aws-users - List of companies/organizations running Kubernetes on AWS
  • MycroftAI/mycroft-core - Mycroft Core, the Mycroft Artificial Intelligence platform.
  • facebookresearch/ParlAI - A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
  • emissary-ingress/emissary - open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
  • localstack/localstack - ๐Ÿ’ป A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline
  • ray-project/ray - Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
  • google/seq2seq - A general-purpose encoder-decoder framework for Tensorflow
  • yahoo/lopq - Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
  • dstat-real/dstat - Versatile resource statistics tool (the real one, not the Red Hat clone)
  • airbnb/streamalert - StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.
  • deepgram/kur - Descriptive Deep Learning
  • commaai/openpilot - openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for 250+ supported car makes and models.
  • Yelp/elastalert - Easy & Flexible Alerting With ElasticSearch
  • airbnb/knowledge-repo - A next-generation curated knowledge sharing platform for data scientists and other technical professions.
  • douban/tfmesos - Tensorflow in Docker on Mesos #tfmesos #tensorflow #mesos
  • openai/kubernetes-ec2-autoscaler - A batch-optimized scaling manager for Kubernetes
  • ecprice/newsdiffs - Automatic scraper that tracks changes in news articles over time.
  • b12io/orchestra - Orchestra is a human-in-the-loop AI system for orchestrating project teams of experts and machines.
  • microservices-demo/microservices-demo - Deployment scripts & config for Sock Shop
  • netbox-community/netbox - The premier source of truth powering network automation. Open source under Apache 2. Public demo: https://demo.netbox.dev
  • magenta/magenta - Magenta: Music and Art Generation with Machine Intelligence
  • openai/gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • jisungk/deepjazz - Deep learning driven jazz generation using Keras & Theano!
  • nucypher/zerodb - This project is no longer actively maintained. If you'd like to become the maintainer, please let us know. ZeroDB is an end-to-end encrypted database. Data can be stored and queried on untrusted dat
  • p-e-w/maybe - ๐Ÿ“‚ ๐Ÿ‡ ๐ŸŽฉ See what a program does before deciding whether you really want it to happen (NO LONGER MAINTAINED)
  • probcomp/crosscat - A domain-general, Bayesian method for analyzing high-dimensional data tables
  • awslabs/aws-shell - An integrated shell for working with the AWS CLI.
  • hatching/vmcloak - Automated Virtual Machine Generation and Cloaking for Cuckoo Sandbox.
  • CounterpartyXCP/counterparty-core - Counterparty Protocol Reference Implementation
  • projectatomic/container-best-practices - Container Best Practices
  • rcaloras/bashhub-client - โ˜๏ธ Bash history in the cloud. Indexed and searchable.
  • kubernauts/kploy - An opinionated Kubernetes deployment system for appops
  • Yelp/paasta - An open, distributed platform as a service
  • getredash/redash - Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
  • DistributedSystemsGroup/zoe - Zoe: Container Analytics as a Service -- mirror of https://gitlab.eurecom.fr/zoe/main/
  • probcomp/bayeslite - BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.
  • NixOS/nixops - NixOps is a tool for deploying to NixOS machines in a network or cloud.
  • AppScale/gts - AppScale is an easy-to-manage serverless platform for building and running scalable web and mobile applications on any infrastructure.
  • donnemartin/awesome-aws - A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
  • nicolargo/glances - Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
  • VIDA-NYU/reprozip - ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
  • akshshar/QoSon - Mesos Framework to tackle network degradation through Distributed Telemetry
  • madjar/nox - Tools to make nix nicer to use
  • StackStorm/st2 - StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 i
  • bolt-project/bolt - Unified interface for local and distributed ndarrays
  • freeman-lab/spark-ml-streaming - Visualize streaming machine learning in Spark
  • avinassh/rockstar - Makes you a Rockstar C++ Programmer in 2 minutes
  • autodesk-cloud/ochonetes - K8S/Ochopod web-shell + toolkit + CLI !
  • XiaoMi/minos - Minos is beyond a hadoop deployment system.
  • facebook/PathPicker - PathPicker accepts a wide range of input -- output from git commands, grep results, searches -- pretty much anything. After parsing the input, PathPicker presents you with a nice UI to select which fi
  • Capgemini/Apollo - ๐Ÿš€ An open-source platform for cloud native applications based on Apache Mesos and Docker.
  • nvbn/thefuck - Magnificent app which corrects your previous console command.
  • vmware/photon - Minimal Linux container host
  • mesosphere/ANAGRAMMER - An anagram finder for Apache Mesos
  • mantl/mantl - Mantl is a modern platform for rapidly deploying globally distributed services
  • GoogleCloudPlatform/PerfKitBenchmarker - PerfKit Benchmarker (PKB) contains a set of benchmarks to measure and compare cloud offerings. The benchmarks use default settings to reflect what most users will see. PerfKit Benchmarker is licensed
  • amscanne/huptime - Utility for zero downtime restarts of unmodified programs.
  • thefactory/autoscale-python - Python library to manage autoscaling logic and actions
  • 0xAX/linux-insides - A little bit about a linux kernel
  • dirkneumann/deepdist - Distributed Deep Learning on Spark
  • lra/mackup - Keep your application settings in sync (OS X/Linux)
  • ceteri/jem-video - A companion wiki + code repository for the O'Reilly Media video "Just Enough Math". This site provides additional links, sample code, and other addenda.
  • pybrain/pybrain -
  • karpathy/neuraltalk - NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
  • nate-parrott/Flashlight - The missing Spotlight plugin system
  • ucb-sts/sts - SDN Troubleshooting System
  • svenkreiss/unicodeit - Converts LaTeX tags to unicode: \mathcal{H} โ†’ โ„‹. Available on the web or as Automator script for the Mac.
  • google/grr - GRR Rapid Response: remote live forensics for incident response
  • jhorey/ferry - Ferry lets you define, run, and deploy big data applications on AWS, OpenStack, and your local machine using Docker
  • kayousterhout/trace-analysis - Scripts to analyze Spark's performance
  • danilop/yas3fs - YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE) interface to Amazon S3. It was inspired by s3fs but rewritten from scratch to implement a distributed cache synchronized
  • mesosphere-backup/mesos-cli - This project has been deprecated. Please use the DC/OS CLI.
  • vim-awesome/vim-awesome - Awesome Vim plugins from across the universe
  • duedil-ltd/portainer - Apache Mesos framework for building Docker images on a cluster of machines
  • ciudadanointeligente/write-it - App to create and send messages to public persons. It's a component of POPLUS project.
  • worldveil/dejavu - Audio fingerprinting and recognition in Python
  • duedil-ltd/mesos-docker-containerizer - Docker containerizer for Mesos
  • chriskiehl/Gooey - Turn (almost) any Python command line program into a full GUI application with one line
  • Dobiasd/programming-language-subreddits-and-their-choice-of-words - How do the different communities talk?
  • facebookarchive/hblog - A log parser for clusters
  • wickman/pesos - pesos is a pure python implementation of the mesos framework api
  • mesosphere/RENDLER - A rendering web crawler for Apache Mesos.
  • bspaans/improviser - Musical content generation software in Python
  • ClusterHQ/flocker - Container data volume manager for your Dockerized application
  • numenta/nupic-legacy - Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
  • t3rmin4t0r/tez-swimlanes - Swimlane graphs from apache-tez AM logs
  • wickman/compactor - pure python libprocess implementation
  • ceteri/exelixi - Exelixi is a distributed framework based on Apache Mesos, mostly implemented in Python using gevent for high-performance concurrency. It is intended to run cluster computing jobs (partitioned batch jo
  • ericmoritz/crdt - CRDT toolbox
  • Yelp/mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
  • whym/wikihadoop - Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop
  • softlayer/jumpgate - A simple library to make more clouds compatible with OpenStack.
  • tomakehurst/saboteur - Causing deliberate network mayhem for better resilience
  • gunnery/gunnery - Remote task execution tool
  • worstcase/blockade - Docker-based utility for testing network failures and partitions in distributed applications
  • mesosphere-backup/mesos-hydra - MPICH2 Hydra scheduler for Apache Mesos.
  • bitly/data_hacks - Command line utilities for data analysis
  • thumbor/thumbor - thumbor is an open-source photo thumbnail service by globo.com
  • postmanlabs/httpbin - HTTP Request & Response Service, written in Python + Flask.
  • douban/dpark - Python clone of Spark, a MapReduce alike framework in Python
  • spotify/snakebite - A pure python HDFS client
  • spotify/luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
  • savoirfairelinux/num2words - Modules to convert numbers to words. 42 --> forty-two
  • ourresearch/total-impact-core - An api and backend code to gather the impacts of diverse scholarly products online.
  • tuskar/tuskar - Tuskar is a service for managing OpenStack deployments.
  • deis/deis - Deis v1, the CoreOS and Docker PaaS: Your PaaS. Your Rules.
  • luispedro/BuildingMachineLearningSystemsWithPython - Source Code for the book Building Machine Learning Systems with Python
  • klbostee/dumbo - Python module that allows one to easily write and run Hadoop programs.
  • jeffknupp/sandman - Sandman "makes things REST".
  • ansible/ansible - Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud
  • sloria/TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
  • clips/pattern - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
  • progrium/skypipe - A magic pipe in the sky for the command line
  • ibmcb/cbtool - Cloud Rapid Experimentation and Analysis Toolkit
  • dfm/osrc - The Open Source Report Card
  • toddlipcon/haatkit - Toolkit of simple scripts useful for managing Hadoop
  • Yelp/EMRio - Elastic MapReduce instance optimizer
  • selfspy/selfspy - Log everything you do on the computer, for statistics, future reference and all-around fun!
  • bup/bup - Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). Please post problem
  • beloglazov/openstack-neat - OpenStack Neat: A Framework for Dynamic Consolidation of Virtual Machines in OpenStack Clouds
  • redhat-openstack/packstack - Install utility to deploy OpenStack on multiple hosts. This is the GitHub mirror for https://opendev.org/x/packstack.
  • saltstack/salt - Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
  • ceph/ceph-deploy - Deploy Ceph with minimal infrastructure, using just SSH access
  • packetloop/packetpig - Packetpig - Open Source Big Data Security Analytics



  • xflows/clowdflows - A web based data mining workflow platform with real-time analysis capabilities
  • LuRsT/hr - A horizontal ๐Ÿ“ for your terminal



  • HigherOrderCO/Bend - A massively parallel, high-level programming language
  • dora-rs/dora - Dataflow-Oriented Robotic Application is middleware that streamlines and simplifies the creation of AI-based robotic applications with low latency, composable, and distributed dataflow.
  • astral-sh/rye - a Hassle-Free Python Experience
  • astral-sh/uv - An extremely fast Python package installer and resolver, written in Rust.
  • lancedb/lance - Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckD
  • Eventual-Inc/Daft - Distributed DataFrame for Python designed for the cloud, powered by Rust
  • BloopAI/bloop - bloop is a fast code search engine written in Rust.
  • spyglass-search/spyglass - A personal search engine: Create a searchable library from your personal documents, interests, and more!
  • neondatabase/neon - Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
  • atuinsh/atuin - โœจ Magical shell history
  • roapi/roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.
  • DioxusLabs/dioxus - Fullstack GUI library for web, desktop, mobile, and more.
  • pola-rs/polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
  • foundry-rs/foundry - Foundry is a blazing fast, portable and modular toolkit for Ethereum application development written in Rust.
  • parallel-finance/parallel - A decentralized lending & staking protocol built on top of the Polkadot ecosystem.
  • coral-xyz/anchor - โš“ Solana Sealevel Framework
  • tauri-apps/tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
  • vectordotdev/vector - A high-performance observability data pipeline.
  • tokio-rs/tokio - A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
  • bbodi/notecalc3 - NoteCalc is a handy calculator trying to bring the advantages of Soulver to the web.
  • dandavison/delta - A syntax-highlighting pager for git, diff, grep, and blame output
  • sekey/sekey - Use Touch ID / Secure Enclave for SSH Authentication!
  • sbstp/kubie - A more powerful alternative to kubectx and kubens
  • a-b-street/abstreet - Transportation planning and traffic simulation software for creating cities friendlier to walking, biking, and public transit
  • cube-js/cube - ๐Ÿ“Š Cube โ€” The Semantic Layer for Building Data Applications
  • solana-labs/solana - Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
  • MaterializeInc/materialize - The data warehouse for operational workloads.
  • tomaka/redshirt - ๐Ÿง‘โ€๐Ÿ”ฌ Operating system
  • meilisearch/meilisearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
  • Trow-Registry/trow - Container Registry and Image Management for Kubernetes Clusters
  • diem/diem - Diemโ€™s mission is to build a trusted and innovative financial network that empowers people and businesses around the world.
  • mike-engel/now-importer - Easily import your static websites into ZEIT's now platform
  • comnik/declarative-dataflow - A reactive query engine built on differential dataflow.
  • swc-project/swc - Rust-based platform for the Web
  • cantino/mcfly - Fly through your shell history. Great Scott!
  • firecracker-microvm/firecracker - Secure and fast microVMs for serverless computing.
  • sharkdp/bat - A cat(1) clone with wings.
  • habitat-sh/habitat - Modern applications with built-in automation
  • weld-project/weld - High-performance runtime for data analytics applications
  • autumnai/leaf - Open Machine Intelligence Framework for Hackers. (GPU/CPU)





  • webyrd/Barliman - Prototype smart text editor
  • opencog/opencog - A framework for integrated Artificial Intelligence & Artificial General Intelligence (AGI)





  • windmill-labs/windmill - Open-source developer platform to turn scripts into workflows and UIs. Fastest workflow engine (5x vs Airflow). Open-source alternative to Airplane and Retool.


  • Toxblh/MTMR - ๐ŸŒŸ [My TouchBar My rules]. The Touch Bar Customisation App for your MacBook Pro
  • lwouis/alt-tab-macos - Windows alt-tab on macOS
  • maxgoedjen/secretive - Store SSH keys in the Secure Enclave
  • lannister-capital/lannister-ios - ๐Ÿ‘‘ iOS personal wealth manager with secure and decentralized storage
  • Lona/Lona - A tool for defining design systems and using them to generate cross-platform UI code, Sketch files, and other artifacts.
  • indragiek/InAppViewDebugger - A UIView debugger (like Reveal or Xcode) that can be embedded in an app for on-device view debugging
  • pedrommcarrasco/Brooklyn - ๐ŸŽ Screensaver inspired by Apple's Event on October 30, 2018
  • mxcl/Workbench - Seamless, automatic, โ€œdotfileโ€ sync to iCloud.
  • agens-no/swiff - Human readable time diffs on lines of output when running e.g. build commands like fastlane
  • pixelspark/catena - Catena is a distributed database based on a blockchain, accessible using SQL.
  • mas-cli/mas - ๐Ÿ“ฆ Mac App Store command line interface





  • vlang/v - Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io

Vim Script





  • tigerbeetle/tigerbeetle - The distributed financial transactions database designed for mission critical safety and performance.
  • oven-sh/bun - Incredibly fast JavaScript runtime, bundler, test runner, and package manager โ€“ all in one



To the extent possible under law, dharmeshkakadia has waived all copyright and related or neighboring rights to this work.


All the repos I have starred on github