applied-ml
Curated papers, articles, and blogs on data science & machine learning in production.
Figuring out how to implement your ML project? Learn how other organizations did it:
- How the problem is framed
🔎 (e.g., personalization as recsys vs. search vs. sequences) - What machine learning techniques worked
✅ (and sometimes, what didn't❌ ) - Why it works, the science behind it with research, literature, and references
📂 - What real-world results were achieved (so you can better assess ROI
⏰ 💰 📈 )
P.S., Want a summary of ML advancements? ml-surveys
P.P.S, Looking for guides and interviews on applying ML? applyingML
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
Data Quality
- Monitoring Data Quality at Scale with Statistical Modeling
Uber
- An Approach to Data Quality for Netflix Personalization Systems
Netflix
- Automating Large-Scale Data Quality Verification (Paper)
Amazon
- Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb
- Data Management Challenges in Production Machine Learning (Paper)
Google
- Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook
Data Engineering
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
- Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb
- Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix
- How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand
DoorDash
- Revolutionizing Money Movements at Scale with Strong Data Consistency
Uber
- Zipline - A Declarative Feature Engineering Framework
Airbnb
- Automating Data Protection at Scale, Part 1 (Part 2)
Airbnb
- Real-time Data Infrastructure at Uber
Uber
Data Discovery
- Amundsen — Lyft’s Data Discovery & Metadata Engine
Lyft
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft
- Amundsen: One Year Later
Lyft
- Using Amundsen to Support User Privacy via Metadata Collection at Square
Square
- Discovery and Consumption of Analytics Data at Twitter
Twitter
- Democratizing Data at Airbnb
Airbnb
- Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber
- Turning Metadata Into Insights with Databook
Uber
- Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code)
Netflix
- Exploring Data @ Netflix (Code)
Netflix
- DataHub: A Generalized Metadata Search & Discovery Tool (Code)
LinkedIn
- DataHub: Popular Metadata Architectures Explained
LinkedIn
- How We Improved Data Discovery for Data Scientists at Spotify
Spotify
- How We’re Solving Data Discovery Challenges at Shopify
Shopify
- Nemo: Data discovery at Facebook
Facebook
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code)
Apache
- Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code)
WeWork
Feature Stores
- Introducing Feast: An Open Source Feature Store for Machine Learning (Code)
Gojek
- Feast: Bridging ML Models and Data
Gojek
- Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression
DoorDash
- Building Riviera: A Declarative Real-Time Feature Engineering Framework
DoorDash
- Michelangelo Palette: A Feature Engineering Platform at Uber
Uber
- Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory
Uber
- Distributed Time Travel for Feature Generation
Netflix
- Fact Store at Scale for Netflix Recommendations
Netflix
- The Architecture That Powers Twitter's Feature Store
Twitter
- Building the Activity Graph, Part 2 (Feature Storage Section)
LinkedIn
- Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed
LinkedIn
- Accelerating Machine Learning with the Feature Store Service
Condé Nast
- Building a Feature Store
Monzo Bank
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb
- ML Feature Serving Infrastructure at Lyft
Lyft
- Butterfree: A Spark-based Framework for Feature Store Building (Code)
QuintoAndar
Classification
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn
- Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
Walmart
- Deep Learning: Product Categorization and Shelving
Walmart
- Large-scale Item Categorization for e-Commerce (Paper)
DianPing
,eBay
- Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER
- Categorizing Products at Scale
Shopify
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google
- Discovering and Classifying In-app Message Intent at Airbnb
Airbnb
- How We Built the Good First Issues Feature
GitHub
- Teaching Machines to Triage Firefox Bugs
Mozilla
- Testing Firefox More Efficiently with Machine Learning
Mozilla
- Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google
- Scalable Data Classification for Security and Privacy (Paper)
Facebook
- Uncovering Online Delivery Menu Best Practices with Machine Learning
DoorDash
- Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
DoorDash
Regression
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb
- Using Machine Learning to Predict the Value of Ad Requests
Twitter
- Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
Netflix
- Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment
DoorDash
Forecasting
- Forecasting at Uber: An Introduction
Uber
- Engineering Extreme Event Forecasting at Uber with RNN
Uber
- Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber
- Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code)
Uber
- Under the Hood of Gojek’s Automated Forecasting Tool
Gojek
- BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video)
Google
- Retraining Machine Learning Models in the Wake of COVID-19
DoorDash
- Managing Supply and Demand Balance Through Machine Learning
DoorDash
- Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code)
Atlassian
- Greykite: A flexible, intuitive, and fast forecasting library
LinkedIn
Recommendation
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon
- Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon
- P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper)
Amazon
- Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba
- Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper)
Alibaba
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba
- TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba
- PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper)
Alibaba
- SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper)
Alibaba
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper)
Alibaba
- Controllable Multi-Interest Framework for Recommendation (Paper)
Alibaba
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper)
Alibaba
- ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper)
Alibaba
- Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica
- How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox
- Deep Neural Networks for YouTube Recommendations
YouTube
- Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor
- E-commerce in Your Inbox: Product Recommendations at Scale (Paper)
Yahoo
- Powered by AI: Instagram’s Explore recommender system
Facebook
- Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix
- Learning a Personalized Homepage
Netflix
- Artwork Personalization at Netflix
Netflix
- To Be Continued: Helping you find shows to continue watching on Netflix
Netflix
- Calibrated Recommendations (Paper)
Netflix
- Marginal Posterior Sampling for Slate Bandits (Paper)
Netflix
- Food Discovery with Uber Eats: Recommending for the Marketplace
Uber
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
- How Music Recommendation Works — And Doesn’t Work
Spotify
- Music recommendation at Spotify
Spotify
- Recommending Music on Spotify with Deep Learning
Spotify
- For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify
- Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify
- Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper)
Spotify
- The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify
- Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox
- Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox
- Personalized Recommendations in LinkedIn Learning
LinkedIn
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn
- Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn
- Building a Heterogeneous Social Network Recommendation System
LinkedIn
- How TikTok recommends videos #ForYou
ByteDance
- A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter
- Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper)
Twitter
- Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google
- Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google
- Self-supervised Learning for Large-scale Item Recommendations (Paper)
Google
- Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper)
Google
- Personalized Channel Recommendations in Slack
Slack
- Learning to Rank Recommendations with the k -Order Statistic Loss (Paper)
Google
- Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper)
ByteDance
- Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper)
Tencent
- Using AI to Help Health Experts Address the COVID-19 Pandemic
Facebook
- A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper)
Home Depot
- Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper)
Ikea
- Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper)
Pinterest
- How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
Pinterest
- Multi-task Learning for Related Products Recommendations at Pinterest
Pinterest
- Improving the Quality of Recommended Pins with Lightweight Ranking
Pinterest
- Advertiser Recommendation Systems at Pinterest
Pinterest
- Personalized Cuisine Filter Based on Customer Preference and Local Popularity
DoorDash
- How We Built a Matchmaking Algorithm to Cross-Sell Products
Gojek
- On YouTube's Recommendation System
YouTube
Search & Ranking
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code)
Amazon
- Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon
- Semantic Product Search (Paper)
Amazon
- QUEEN: Neural query rewriting in e-commerce (Paper)
Amazon
- Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper)
Amazon
- Seasonal relevance in e-commerce search (Paper)
Amazon
- How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada
- Using Deep Learning at Scale in Twitter’s Timelines
Twitter
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb
- Applying Deep Learning To Airbnb Search (Paper)
Airbnb
- Managing Diversity in Airbnb Search (Paper)
Airbnb
- Improving Deep Learning for Airbnb Search (Paper)
Airbnb
- Ranking Relevance in Yahoo Search (Paper)
Yahoo
- An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy
- Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn
- Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn
- In-session Personalization for Talent Search (Paper)
LinkedIn
- The AI Behind LinkedIn Recruiter Search and recommendation systems
LinkedIn
- Learning Hiring Preferences: The AI Behind LinkedIn Jobs
LinkedIn
- Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
- Ads Allocation in Feed via Constrained Optimization (Paper, Video)
LinkedIn
- Talent Search and Recommendation Systems at LinkedIn (Paper)
LinkedIn
- Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn
- AI at Scale in Bing
Microsoft
- Query Understanding Engine in Traveloka Universal Search
Traveloka
- The Secret Sauce Behind Search Personalisation
Gojek
- Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber
- Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook
- Bayesian Product Ranking at Wayfair
Wayfair
- COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba
- Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper)
Alibaba
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
- Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper)
Alibaba
- Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search
Alibaba
- Understanding Searches Better Than Ever Before (Paper)
Google
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
- Driving Shopping Upsells from Pinterest Search
Pinterest
- GDMix: A Deep Ranking Personalization Framework (Code)
LinkedIn
- Bringing Personalized Search to Etsy
Etsy
- Building a Better Search Engine for Semantic Scholar
Allen Institute for AI
- Query Understanding for Natural Language Enterprise Search (Paper)
Salesforce
- How We Used Semantic Search to Make Our Search 10x Smarter
Tokopedia
- Powering Search & Recommendations at DoorDash
DoorDash
- Things Not Strings: Understanding Search Intent with Better Recall
DoorDash
- Query Understanding for Surfacing Under-served Music Content (Paper)
Spotify
- How We Built A Context-Specific Bidding System for Etsy Ads
Etsy
- Query2vec: Search query expansion with query embeddings
GrubHub
- Embedding-based Retrieval in Facebook Search (Paper)
Facebook
- Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper)
JD
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search
Baidu
- Pre-trained Language Model based Ranking in Baidu Search (Paper)
Baidu
- Stitching together spaces for query-based recommendations
Stitch Fix
Embeddings
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba
- Embeddings@Twitter
Twitter
- Listing Embeddings in Search Ranking (Paper)
Airbnb
- Understanding Latent Style
Stitch Fix
- Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn
- Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper)
Moshbit
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears
- Machine Learning for a Better Developer Experience
Netflix
- Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code)
Google
- Personalized Store Feed with Vector Embeddings
DoorDash
- Embedding-based Retrieval at Scribd
Scribd
Natural Language Processing
- Abusive Language Detection in Online User Content (Paper)
Yahoo
- How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn
- Building Smart Replies for Member Messages
LinkedIn
- DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn
- Smart Reply: Automated Response Suggestion for Email (Paper)
Google
- Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google
- SmartReply for YouTube Creators
Google
- Using Neural Networks to Find Answers in Tables (Paper)
Google
- A Scalable Approach to Reducing Gender Bias in Google Translate
Google
- Assistive AI Makes Replying Easier
Microsoft
- AI Advances to Better Detect Hate Speech
Facebook
- A State-of-the-Art Open Source Chatbot (Paper)
Facebook
- A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook
- Deep Learning to Translate Between Programming Languages (Paper, Code)
Facebook
- Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook
- Introducing Dynabench: Rethinking the way we benchmark AI
Facebook
- Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code)
Facebook
- Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon
- How Gojek Uses NLP to Name Pickup Locations at Scale
Gojek
- Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix
- The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code)
Google
- Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce
- GeDi: A Powerful New Method for Controlling Language Models (Paper, Code)
Salesforce
- Applying Topic Modeling to Improve Call Center Operations
RICOH
- WIDeText: A Multimodal Deep Learning Framework
Airbnb
- How we reduced our text similarity runtime by 99.96%
Microsoft
- Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models)
Facebook
Sequence Modelling
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba
- Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba
- Deep Learning for Electronic Health Records (Paper)
Google
- Deep Learning for Understanding Consumer Histories (Paper)
Zalando
- Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica
- Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health
- How Duolingo uses AI in every part of its app
Duolingo
- Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video)
Facebook
Computer Vision
- Categorizing Listing Photos at Airbnb
Airbnb
- Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb
- Powered by AI: Advancing product understanding and building new shopping experiences
Facebook
- New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model)
Facebook
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox
- How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic
- A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google
- Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google
- RepNet: Counting Repetitions in Videos (Paper)
Google
- Converting Text to Images for Product Discovery (Paper)
Amazon
- How Disney Uses PyTorch for Animated Character Recognition
Disney
- Image Captioning as an Assistive Technology (Video)
IBM
- AI for AG: Production machine learning for agriculture
Blue River
- AI for Full-Self Driving at Tesla
Tesla
- On-device Supermarket Product Recognition
Google
- Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest
- Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper)
Google
- Vision-based Price Suggestion for Online Second-hand Items (Paper)
Alibaba
- Making machines recognize and transcribe conversations in meetings using audio and video
Microsoft
- An Efficient Training Approach for Very Large Scale Face Recognition (Paper)
Alibaba
- Identifying Document Types at Scribd
Scribd
- Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper)
Walmart
Reinforcement Learning
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba
- Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba
- Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba
- Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga
- Deep Reinforcement Learning in Production Part1 Part 2
Zynga
- Building AI Trading Systems
Denny Britz
- Reinforcement Learning for On-Demand Logistics
DoorDash
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba
Anomaly Detection
- Detecting Performance Anomalies in External Firmware Deployments
Netflix
- Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn
- Preventing Abuse Using Unsupervised Learning
LinkedIn
- The Technology Behind Fighting Harassment on LinkedIn
LinkedIn
- Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial
- How Does Spam Protection Work on Stack Exchange?
Stack Exchange
- Auto Content Moderation in C2C e-Commerce
Mercari
- Blocking Slack Invite Spam With Machine Learning
Slack
- Cloudflare Bot Management: Machine Learning and More
Cloudflare
- Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER
- Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank
- Fighting fraud with Triplet Loss
OLX Group
- Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative)
Facebook
- How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4
Facebook
- Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video)
Swedbank
,Hopsworks
Graph
- Building The LinkedIn Knowledge Graph
LinkedIn
- Retail Graph — Walmart’s Product Knowledge Graph
Walmart
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
- AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba
- Scaling Knowledge Access and Retrieval at Airbnb
Airbnb
- Contextualizing Airbnb by Building Knowledge Graph
Airbnb
- Traffic Prediction with Advanced Graph Neural Networks
DeepMind
- SimClusters: Community-Based Representations for Recommendations (Paper, Video)
Twitter
- Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper)
Alibaba
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba
- JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper)
JPMorgan Chase
- Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest
Optimization
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber
- Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft
- The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED)
Grab
- Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG
- Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper)
Amazon
- Optimizing DoorDash’s Marketing Spend with Machine Learning
DoorDash
Information Extraction
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten
- Information Extraction from Receipts with Graph Convolutional Networks
Nanonets
- Using Machine Learning to Index Text from Billions of Images
Dropbox
- Extracting Structured Data from Templatic Documents (Paper)
Google
- AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video)
Amazon
- One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper)
Alibaba
Weak Supervision
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
- Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM
Generation
- Better Language Models and Their Implications (Paper)
OpenAI
- Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI
- Image GPT (Paper, Code)
OpenAI
- Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar
- Unit Test Case Generation with Transformers
Microsoft
Audio
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google
- The Machine Learning Behind Hum to Search
Google
Validation and A/B Testing
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google
- Twitter Experimentation: Technical Overview
Twitter
- Experimenting to Solve Cramming
Twitter
- Building an Intelligent Experimentation Platform with Uber Engineering
Uber
- Analyzing Experiment Outcomes: Beyond Average Treatment Effects
Uber
- Under the Hood of Uber’s Experimentation Platform
Uber
- Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber
- Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka
- Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix
- Experimentation with Resource Constraints
Stitch Fix
- Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better
- It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix
- Computational Causal Inference at Netflix (Paper)
Netflix
- Key Challenges with Quasi Experiments at Netflix
Netflix
- Interpreting A/B Test Results: False Positives and Statistical Significance
Netflix
- Interpreting A/B Test Results: False Negatives and Power
Netflix
- Constrained Bayesian Optimization with Noisy Experiments (Paper)
Facebook
- Detecting Interference: An A/B Test of A/B Tests
LinkedIn
- Making the LinkedIn experimentation engine 20x faster
LinkedIn
- Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn
LinkedIn
- How to Use Quasi-experiments and Counterfactuals to Build Great Products
Shopify
- Improving Experimental Power through Control Using Predictions as Covariate
DoorDash
- Supporting Rapid Product Iteration with an Experimentation Analysis Platform
DoorDash
- Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
DoorDash
- Leveraging Causal Modeling to Get More Value from Flat Experiment Results
DoorDash
- Iterating Real-time Assignment Algorithms Through Experimentation
DoorDash
- Running Experiments with Google Adwords for Campaign Optimization
DoorDash
- The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
- Spotify’s New Experimentation Platform (Part 1) (Part 2)
Spotify
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper)
Google
- Experimentation Platform at Zalando: Part 1 - Evolution
Zalando
- Scaling Airbnb’s Experimentation Platform
Airbnb
- Designing Experimentation Guardrails
Airbnb
- Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab
- Meet Wasabi, an Open Source A/B Testing Platform (Code)
Intuit
- Building Pinterest’s A/B Testing Platform
Pinterest
- Network Experimentation at Scale(Paper]
Facebook
- Universal Holdout Groups at Disney Streaming
Disney
Model Management
- Runway - Model Lifecycle Management at Netflix
Netflix
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
- ML Model Monitoring - 9 Tips From the Trenches
Nubank
Efficiency
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper)
Facebook
- Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper)
Uber
- How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Roblox
Ethics
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn
- LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn
Infra
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
- Elastic Distributed Training with XGBoost on Ray
Uber
MLOps Platforms
- Managing ML Models @ Scale - Intuit’s ML Platform
Intuit
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast
- Big Data Machine Learning Platform at Pinterest
Pinterest
- Real-time Machine Learning Inference Platform at Zomato
Zomato
- Meet Michelangelo: Uber’s Machine Learning Platform
Uber
- Building Flexible Ensemble ML Models with a Computational Graph
DoorDash
- LyftLearn: ML Model Training Infrastructure built on Kubernetes
Lyft
- "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper)
Coveo
- Core Modeling at Instagram
Instagram
- Open-Sourcing Metaflow - a Human-Centric Framework for Data Science
Netflix
- MLOps at GreenSteam: Shipping Machine Learning
GreenSteam
- Evolving Reddit’s ML Model Deployment and Serving Architecture
Reddit
Practices
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio
- Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google
- Rules of Machine Learning: Best Practices for ML Engineering
Google
- On Challenges in Machine Learning Model Management
Amazon
- Machine Learning in Production: The Booking.com Approach
Booking
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking
- Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank
- Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper)
Cambridge
- Continuous Integration and Deployment for Machine Learning Online Serving and Models
Uber
- Tuning Model Performance
Uber
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook
- The problem with AI developer tools for enterprises
Databricks
- Maintaining Machine Learning Model Accuracy Through Monitoring
DoorDash
- Building Scalable and Performant Marketing ML Systems at Wayfair
Wayfair
- Our approach to building transparent and explainable AI systems
LinkedIn
- 5 Steps for Building Machine Learning Models for Business
Shopify
Team structure
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix
- Cultivating Algorithms: How We Grow Data Science at Stitch Fix
Stitch Fix
- Analytics at Netflix: Who We Are and What We Do
Netflix
- Building a Data Team at a Mid-stage Startup: A Short Story
Erikbern
- Building The Analytics Team At Wish
Wish
Fails
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate
- When It Comes to Gorillas, Google Photos Remains Blind
Google
- An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor
Harrisburg University
- It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI
- A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom
- More in awful-ai
P.S., Want a summary of ML advancements? Get up to speed with survey papers ml-surveys