Target audience: Infrastructure Engineers building ML / DS infrastructure, who have a conceptual understanding of the domain they are building systems for and are primarily NOT 'data science' specialists.
A curated list for machine learning / data science frameworks, libraries, software and resources for building ML (or Data Science) platforms or infrastructure.
This list is not for ML / DS domain-specific technologies / projects. Thanks to @josephmisiti who has an 'awesome' list covering all that.
If you want to contribute to this list (please do), send me a pull request or contact me @stevencasey.
Also, a listed repository should be deprecated if:
- Repository's owner explicitly say that "this library is not maintained".
- Not committed for long time (2~3 years).
- Awesome Data Engineering - 'Awesome' list of data platform technologies (majority of data eng content here is from the list there)
- The Data Engineering Ecosystem: An Interactive Map - Great interactive map of data eng systems.
- Apache Atlas - Data governance, Metadata management, Data Lineage
- kubeflow - making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Deploy best-of-breed open-source systems for ML to diverse infrastructures.
- Sagemaker
- Guidelines for ethical modeling - A collection of resources and tools designed to provide guidelines for ethical modeling.
- DataEngConf - DataEngConf is the first technical conference that bridges the gap between data scientists, data engineers and data analysts.
- AI NEXTCon - Relatively small conference. Some good building ML infra talks mixed in with tracks on DS specialties. Decent hallway track due to small size. Light on sponsors/exhibitors.
- Data Engineering Podcast - The show about modern data infrastructure.
- Data Engineering Weekly news - provides weekly latest data eng news in your inbox