giordano-lucas / awesome-machine-learning-engineer

πŸ€“ A curated awesome list of Machine Learning Engineering resources. Feel free to contribute!

Home Page:https://radix.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Machine Learning Engineer

Awesome

For more awesomeness, check out Awesome.

What is this and how do I use it?

  • This is a curated list of delightful resources for everything you need to develop Machine Learning solutions.
  • Each item in this list will teach you at least one distinct and significant skill or piece of information.
  • There are three content levels:
    1. πŸ₯ Essential reading for all ML engineers
    2. 🐍 Advanced reading for professional ML engineers
    3. πŸ¦„ Expert material for expert ML engineers
  • Descriptions are written to complete the sentence "After reading this article you will have learned ...".

Contents

Communication

Software Engineering

API design

Workflow

Python patterns

Typing

  • 🐍 The Comprehensive Guide to mypy - How to write type annotations in Python (1 hour)
  • 🐍 Pydantic overview - How to write type annotations for complex types instead of a meaningless Dict[str, Any] (1 hour)
  • 🐍 Magic number - Why magic values are an anti-pattern (15 min)
  • 🐍 Enums - How to write Enums in Python instead of type-unsafe magic values (15 min)
  • πŸ¦„ Mypy generics - How to use TypeVars to write generic types such as List[T] (30 min)
  • πŸ¦„ Mypy protocols - How to use Protocols to define interfaces such as Iterable (30 min)

Curated Python packages

Workflow

Code quality

  • πŸ₯ black - Automatically format your code
  • πŸ₯ isort - Automatically sort your import statements
  • 🐍 pre-commit - Automatically run code quality checks on commit
  • 🐍 bandit - Find common security issues
  • 🐍 darglint - Check that your docstrings match your function signature
  • 🐍 flake8 - Check your code for bugs and that your code style is PEP8-compliant
  • 🐍 flake8 extensions - An awesome list of Flake8 extensions
  • 🐍 mypy - Check the type-correctness of your code
  • 🐍 pre-commit hooks - A collection of pre-commit hooks that check file quality
  • 🐍 pydocstyle - Check that your code is documented
  • 🐍 pygrep hooks - A collection of pre-commit hooks that check for common Python code smells
  • 🐍 pytest-recording - Record and play back HTTP requests in your pytest tests
  • 🐍 pyupgrade - Check that your code is written using the latest Python language features
  • 🐍 safety - Check that your dependencies don't have any known security vulnerabilities
  • 🐍 shellcheck - Check the quality of your shell scripts
  • 🐍 coverage.py - Check your code's test coverage
  • πŸ¦„ hypothesis - Write tests that automatically look for edge cases that break your code
  • πŸ¦„ hypothesis-auto - Automate generate Hypothesis tests based on your code's type annotations

Application development

  • 🐍 fastapi - Create RESTful APIs based on type annotations
  • 🐍 typer - Create CLIs based on type annotations
  • 🐍 streamlit - Create web apps with a single Python file

Utilities

  • 🐍 bump2version - Release a new version of your package
  • 🐍 coloredlogs - Increase your logs' readability with colour
  • 🐍 hvplot - Create interactive plots from pandas dataframes
  • 🐍 mkdocs - Create developer documentation for your project
  • 🐍 pdoc - Generate API documentation for your code
  • 🐍 birdseye - Graphically debug your Python code
  • 🐍 scalene - Profile your code's CPU and memory usage by line
  • 🐍 viztracer - Vizualize your code's performance with a flamegraph
  • 🐍 tqdm - Easily add progress bars to long-running jobs

Machine Learning

Practical theory

Explainability

Unsupervised

Classification

Regression

Computer Vision

Natural Language Processing

Time Series Analysis

Recommender Systems

Tensor computation libraries

Pandas

Sci-kit learn

Labelling

DevOps

CI/CD

  • 🐍 invoke - How to implement common tasks you run on your project as a CLI (30 min)
  • 🐍 poe - How to implement common tasks you run on your project as a CLI (30 min)

Environment and dependency management

Docker

Data pipelines

  • 🐍 Great Expectations - How to test and document your data and data pipelines (30 min)

Shell

Terraform

Infrastructure

Curated by Radix

Radix is a Belgium-based Machine Learning company.

We invent, design and develop AI-powered software. Together with our clients, we identify which problems within organizations can be solved with AI, demonstrating the value of Artificial Intelligence for each problem.

Our team is constantly looking for novel and better-performing solutions and we challenge each other to come up with the best ideas for our clients and our company.

Here are some examples of what we do with Machine Learning, the technology behind AI:

  • Help job seekers find great jobs that match their expectations. On the Belgian Public Employment Service website, you can find our job recommendations based on your CV alone.
  • Help hospitals save time. We extract diagnosis from patient discharge letters.
  • Help publishers estimate their impact by detecting copycat articles.

We work hard and we have fun together. We foster a culture of collaboration, where each team member feels supported when taking on a challenge, and trusted when taking on responsibility.

About

πŸ€“ A curated awesome list of Machine Learning Engineering resources. Feel free to contribute!

https://radix.ai