coolbeans201 / GreatTechBlogPosts

A collection of my favorite tech-related blog posts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Great Tech Blog Posts From Companies

A collection of my favorite tech-related blog posts from companies.

Inspiration for this README comes from @kilimchoi's repo

Note: I am a Data Engineer, so the majority of these links will be data-related. However, there are plenty of general ones in here as well.

Links

Adobe

Taking Query Optimizations to the Next Level with Iceberg - As someone looking to learn more about Iceberg, this was a good primer on how to optimize queries

Iceberg at Adobe - A good overview on how Iceberg is used at a huge company

What Is CI/CD² (CI/CD Squared)? Continuous Integration and Continuous Delivery - Continuous destruction isn't a concept we think about often, but it's definitely a useful point to consider

Adobe Customer Journey Management’s Journey into the World of GitOps - It's great to see more and more companies embracing GitOps

Airbnb

Data Quality at Airbnb - The gold standard when it comes to data quality and how it should be followed.

Visualizing Data Timeliness at Airbnb - Having the insight to properly track SLAs is helpful for the operational side of things.

Achieving Insights and Savings with Cost Data - Cost dashboards are a big one to identify the biggest pain points for a team.

How Airbnb Built “Wall” to prevent data bugs - A great framework for holding up data quality.

Metis: Building Airbnb’s Next Generation Data Management Platform - A very good implementation of a data management platform

Riverbed: Optimizing Data Access at Airbnb’s Scale - A very sensible way to combine Lambda and Kappa architectures.

Data Quality Score: The next chapter of data quality at Airbnb - A great way to surface the need for data quality

Personal Data Classification - A good shift-left approach for data governance

Sandcastle: data/AI apps for everyone - A very good approach for bringing life to data applications

Atomic Object

The Benefits of a Spiking Phase in Agile Development - How spiking helps for proper planning in Agile.

BBC

Quality engineering for a shared codebase - Quality engineering is always an important topic.

Babbel

Should you move into management as a software engineer? - Crossing over into management from the engineering world is a difficult choice but usually has to be made at some point.

AWS Fargate for Data Engineering - Fargate is still a relevant processor for big data jobs.

How to Fight Retrospective Fatigue - Retrospectives are an important part of any sprint review and should be done properly.

Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis - A good luck at the evolution of a data platform.

Benchling

Evolving to Enterprise-Grade Permissions - The principle of least access can never be mentioned enough.

Bigeng

Why the way we look at technical debt is wrong - Tech debt needs to be accepted and not be viewed as a negative.

Blackrock

Developer Engagement One Code Review at a Time - A good overview of how to really handle code review in an efficient manner

Domain-Driven Asset Management - Asset management is a discussion that doesn't get a lot of hype but is very relevant in the context of systems like the one at Blackrock

Citizen Developer Cookbook: Python Multiprocessing - I've always wanted to know more about multiprocessing in Python, so this was a helpful tutorial

Telemetry and Observability at BlackRock - A great primer for those wanting to understand more about telemetry and observability

Booking

How Reliability and Product Teams Collaborate at Booking.com - Collaborating with the product team needs to become more commonplace for development teams.

What can Dungeons & Dragons teach us about User Experience? - A very good analogy about UX.

Empowering Data with Design - A post stressing the importance of visualization.

Bumble

Hourglass into Pyramid: how you can improve the structure of your tests - E2E testing needs to stop being an afterthought and part of the actual development process.

Canva

Service-aligned Data Platform Architecture - A good overview of Canvas's data platform and CDC.

Capital One

The evolution of issue tracking - An important part of being on top of everything, how issue tracking has evolved at Capital One

Data Profiler: Data Drift Model Monitoring Tool - Data drift is an underappreciated topic sometimes, so this was a good overview of how it can be properly monitored

Serverless Computing Reduces Collaboration Costs - Capital One is really all-in on the serverless revolution, and this is a good explanation of why

5 reasons to use ML for better data quality - A good post on how ML can be used to enhance data quality

Serverless architecture technology - A good overview on different serverless technologies in AWS

So You Want to Be a Techie? - Tech is not only for people from a CS background. Anyone can thrive

How Machine Learning Can Help Fight Money Laundering - Applications of ML in the banking industry

Giving a Fast-Changing Data Ecosystem Room to Grow - A good post on how an always-evolving domain is allowed to properly grow

The “Why” of Inner Source - As someone who's big on inner-source, this is a good post of why it's useful

From the CIO’s View: Building a Nimble Learning Organization - Learning done right within an organization

Batch and Streaming in the World of Data Science and Data Engineering - A good overview of both batch and stream processing as it relates to DS/DE

The Journey from Batch to Real-time with Change Data Capture - An explanation of the different technologies that can be used in CDC

CICD and Data - An older post that helps give rise to DataOps

Doing The Hard Things First — Lessons From Our Cloud Journey - The financial industry is usually the last when it comes to bigger migration efforts, so this was a good overview of Capital One's cloud migration

Scaling to Billions of Requests-The Serverless Way at Capital One - How Capital One properly scales to support all the transactions it handles

Guardrails for AWS Event-Driven Serverless Architectures - Best practices for using serverless technologies in AWS

The 3 R’s of SREs: Resiliency, Recovery & Reliability - Even though this post is about SRE, many of the same principles apply to DRE as well

3 Considerations for Containers & Serverless Compute Options - Things to consider when making the migration to serverless

Serverless Stream Consumers — Common Pitfalls and Best Practices - Best practices for properly ingesting stream data

4 Serverless Myths to Understand Before Getting Started with AWS - A good overview on misconceptions to ignore before getting started with serverless

Embrace the Chaos … Engineering - A good overview on how to properly do chaos engineering

6 Principles of a Well Managed Change - Good principles to consider when it comes to bigger changes

Governance in a DevOps Environment - Properly integrated DevOps with governance

4 Steps for Pairing the Cloud and DevOps to Improve Resiliency - A very good post on how DevOps can be used to improve overall resiliency of an architecture

https://medium.com/capital-one-tech/focusing-on-the-devops-pipeline-topo-pal-833d15edf0bd - The intersection of Agile and DevOps

Continuous Chaos — Introducing Chaos Engineering into DevOps Practices - How chaos engineering and DevOps can feed off of one another

Continuous Delivery and What the Heck Happened to QA? - Why CD works and the importance of having a QA environment

DevOps is a State of Mind, Not Just a Role - I 100% agree with the premise of this post, as DevOps is a bigger adjustment than just learning a set of principles

No Testing Strategy, No DevOps - A great overview on why proper testing is needed to successfully pull off DevOps

The Mon-ifesto Part 2: Alerting and Graphing - Proper alerting/graphing principles in application monitoring

The Mon-ifesto Part 3: Alert Response and Post-Mortem - Postmortems are an underappreciated aspect of monitoring and incident management, but they're very relevant to helping ensure that issues suppress themselves in the future

Chick-fil-A

Enterprise Architecture at Chick-fil-A - A great view of how important architecture can be in the restaurant industry.

Site Reliability Engineering at Chick-fil-A - An interesting overview of SRE with CFA.

Decentralized Model Ops Platform w/ Apache Airflow - A good overview of how Airflow is used to power MLOps at CFA.

Clever

Defining Clever’s Engineering Culture - The principles mentioned in this engineering culture should be the standard.

Cloudbees

DevOps Best Practices: Opinionated Software That Drives a Successful DevOps Culture - A solid collection of best practices when it comes to DevOps.

DevOps Has Evolved Beyond Shift Left - Shift left isn't enough when it comes to DevOps and other key practices. Time to think bigger.

CockroachDB

What is data partitioning, and how to do it right - Partitioning can make or break data applications, so it's important to know how to set it up properly.

Codelitt

Establishing Communication - A good post on the importance of communication in effective graphic design.

Coinbase

Databricks cost management at Coinbase - A good overview on various Databricks cost-savings practices, many of which I've implemented with successful outcomes.

Commercetools

Product & Tech — Better Together! - Another post stressing the importance of the interaction between tech and product teams.

Staff and Principal Engineers: why do we need them now? (Part 2) - Principal/staff engineers are paramount to larger organizations, so this was a good summary of how they made a difference at Commercetools.

How we Roadmap in 2021 - Effective roadmapping makes the Agile process a lot easier than it tends to be.

It’s done! Or is it? - A proper definition of done allows there never to be any guesswork when it comes to sprint planning.

Compass

Building Great Products at Compass IDC - A good guide on what it takes to build "great" products.

Repositories — One or Many - This is worth more of a discussion than you think. The monorepo vs. multiple repo debate can have cascading effect based on which approach you choose.

The Engineering Manager Guide: Spinning Up a Results Oriented Team - Teams need to be focused on impact/results. This was a great overview on the practices needed to get there.

Writing Good Commit Messages - Leaving this as more of a reminder to myself as my commit messages are lazy (and occasionally involve curse words)

Confluent

Is Apache Kafka a Database? With ksqlDB, Most Definitely - A good overview of ksql and how it compares to traditional databases.

Credit Karma

Effectively attending a tech conference - Making the most out of a tech conference is harder than it seems.

Credit Karma Data Explorer - A good overview of how Credit Karma is making data discovery easier.

How Engineering Rotation Programs Can Help Teams Scale - As someone who started out in a rotational program, these can be really effective for raising the future of your workforce.

Criteo

How Much Can Bad Data Cost Us? - Bad data has many side effects, and that is why data quality is a fight you must always take on.

DataDoc — The Criteo Data Observability Platform - A wonderful overview of an effective data observability platform and how it met all of its use cases.

Technical Data Roadmap: Why and how to build it using a maturity matrix? - An effective technique for successful roadmapping.

Big Data Quality at Criteo - You can never mention data quality enough.

Data Governance at Criteo - A good overview of how an effective data governance process can be set in place.

dbt

The Analytics Development Lifecycle (ADLC) - A good summary of how SDLC can be applied to analytics

D2iq

All Together Now: FinOps, Kubernetes, and Platform Engineering - Applying FinOps to Kubernetes savings

The Best Way to Control Kubernetes and Cloud Costs - More FinOps with Kubernetes

Dagster

Dagster at all 5 Steps of the Development Lifecycle - Having an orchestrator involved at all steps of the development lifecycle is really helpful for data engineers. This is how Dagster tries to accomplish just that.

Declarative Scheduling for Data Assets - Declarative scheduling just makes sense when it comes to managing data assets.

Partitions in Data Pipelines - Thinking in partitions helps set Dagster apart from its competitors

What Dagster Believes About Data Platforms - "Data engineering is software engineering". Preach!

Balancing the Data Scales: Centralization vs. Decentralization - Centralization vs. decentralization is a never-ending topic, but this is a good primer on the pros and cons of each.

How to Make Data a Team Sport - Data democratization can be a big challenge at organizations, but when it's done well, it helps everyone.

Data Visibility -- A Primer - Data quality, lineage, and more are all key to proper data visibility.

DataKitchen

Data Observability and Monitoring with DataOps - A very useful overview of DataOps and a platform that supports it

Use DataOps With Your Data Mesh to Prevent Data Mush - A good post on how DataOps and data mesh are intertwined

Deliveroo

The Emergence and Evolution of Analytics Engineering at Deliveroo - Analytics engineering is definitely on the rise as of late, so this is a good introduction to what exactly that is

CloudFormation To Terraform - I've always been a proponent of Terraform, and it was good to see that Deliveroo agrees with me

Data Sink - An application of data sinks

Discord

How Discord Stores Billions of Messages - Discord is one of the fastest growing communities out there, so it's interesting to see how they manage to hold onto all those messages

How Discord Creates Insights From Trillions Of Data Points - Dealing with a lot of data isn't easy, so we can all take a page from Discord

How Data Science Informs Strategy Innovation At Discord - A good post about how relevant Data Science needs to be in the grand scheme of things

How Discord Stores Trillions of Messages - An impressive view of how Discord has continued to grow and managed to sustain that growth

How Discord Uses Open-Source Tools for Scalable Data Orchestration & Transformation - A good overview of how Discord overhauled their orchestration platform to Dagster and dbt

Doordash

Building a Source of Truth for an Inventory with Disparate Data Sources - Bringing together a single source of truth in a massive organization is definitely a challenge

Meet Sibyl – DoorDash’s New Prediction Service – Learn about its Ideation, Implementation and Rollout - I've always been impressed by the role ML plays in the food service industry, so this was a cool implementation from Doordash

Lifecycle of a Successful ML Product: Reducing Dasher Wait Times - Another good overview of the role that ML plays in the food service industry

Ship to Production, Darkly: Moving Fast, Staying Safe with ML Deployments - DevOps meets ML

Organizing Machine Learning: Every Flavor Welcome! - A solid set of principles for growing ML

Using Metrics Layer to Standardize and Scale Experimentation at DoorDash - As someone wanting to know more about the metrics layer, this was a great post with a very detailed overview

How DoorDash Defines Great Engineering Management - I love the transparency behind how DoorDash wants to deliver on their management practices. This is a good template to follow.

How DoorDash Fosters Meaningful Engineering Career Development - A great model to follow for engineering development.

Five Common Data Quality Gotchas in Machine Learning and How to Detect Them Quickly - A good primer on a proper data quality framework

Doximity

Finding Joy in Git Conflict Resolution - A cool way to make merge conflicts a lot easier

Data Science & Analytics: Practitioner Insights - A good set of principles for bringing the most out of data

Stars and Dimensions - For those who want to learn more about data modeling, this post has a nice refresher

Dropbox

Balancing quality and coverage with our data validation framework - Data validation frameworks are very helpful, and I like the way Dropbox has implemented theirs

Why we chose Apache Superset as our data exploration platform - Superset seems to fly under the radar from time to time but is always a dependable choice

Lessons learned in incident management - Oncall can be a messy process sometimes, but Dropbox has made it much more streamlined

Entelo

Why You Should Make Everyone a Project Lead - Leading a project is a great way to further your leadership skills

Etsy

Towards Machine Learning Observability at Etsy - A good overview on how Etsy is keeping all their ML models within scope

Expedia

Software Architectural Patterns in Data Engineering - A helpful expalantion of the software patterns underlying all our popular DE tools

Rethinking Data Visualization - Product thinking applied to data visualization

Unified Machine Learning Platforms At Expedia Group - Awesome overview of Expedia's ML journey

The Importance of Being a Code Reviewer - A good set of practices to follow when it comes to code review.

Facebook/Meta

Enabling static analysis of SQL queries at Meta - A really neat overview of how FB handles SQL linting, amongst other things

Move faster, wait less: Improving code review time at Meta - FB's code review, especially considering it's a monorepo, is extremely well done

Tulip: Schematizing Meta’s data platform - Logging is very important to FB, so this is good insight into how that performance is maintained

Scaling data ingestion for machine learning training at Meta - I didn't necessarily understand everything here, but this was an interesting read nonetheless

Improving Meta’s SLO workflows with data annotations - Annotations can certainly give more insight into observability

Introducing Zelos: A ZooKeeper API leveraging Delos - Interesting overview of how FB plans on moving from ZooKepper to something more at their scale

BellJar: A new framework for testing system recoverability at scale - Recovering from an outage can't be easy for something the scale of FB, so this was a good overview of how they accomplish it

SLICK: Adopting SLOs for improved reliability - FB's monitoring is top-notch, and it's overviews like these that show why

Nemo: Data discovery at Facebook - FB's data discovery, speaking from personal experience, is immensely impressive

Aria Presto: Making table scan more efficient - Table scans are a painful activity, so making that more efficiently holds a lot of weight in SQL engines

Getafix: How Facebook tools learn to fix bugs automatically - Obviosuly treading into dangerous territory, but automating bug squashing could be very useful for many places

Migrating Messenger storage to optimize performance - How a service the size of Messenger is able to stay afloat

Rapid release at massive scale - DevOps applied to FB

Facebook Chef cookbooks - How Facebook (although an older post) puts CI/CD to use

Engineering Culture: Code ownership - Code ownership is certainly a debatable topic

Scaling Mercurial at Facebook - The Mercurial monorepo is FB is gigantic, so this is an interesting insight into how it's actually serving the thousands of engineers who work on it.

Presto: Interacting with petabytes of data at Facebook - Presto laid the foundation for what's Trino now, so understanding how Presto is as efficient as it is will help explain Starburst Galaxy and the like

Join Optimization in Apache Hive - Older article, but join optimizations in Hive is still a relevant topic

Scaling Out - An earlier post before FB was the FB we know today, but still a good lesson to be learned

Scheduling Jupyter Notebooks at Meta - A bit specific to Meta due to Bento not being open-source, but good principles nonetheless

Data engineering at Meta: High-Level Overview of the internal tech stack - Best thing you'll ever read on comparing the Meta DE tech stack to an open-source one

The future of the data engineer — Part I - A great read on the future of DE

Four Analytics Best Practices We Adopted — and Why You should Too - Good practices to follow for a successful analytics implementation

Analytics Career Development at Meta - What career advancement at Meta looks like

Automating data removal - A good system to remove data with reduced risk

What it takes to be a Senior IC at Meta - A good breakdown of senior vs. other levels of IC

Composable data management at Meta - A good introduction to setting up a composable data stack, which is becoming more and more relevant

Funding Circle

How we manage documentation at Funding Circle for our Data Platform - A great guide on properly handling documentation

Future Processing

Data science and data analytics – know the difference - These terms sometimes are used interchangeably but have their differences, so it's important to distinguish them

7 common Big Data security issues - Security is sometimes an afterthought when it comes to big data, so it's important to be aware of the various issues you may encounter while setting up these applications

Fynd

Introducing Developer-less Data Workbench — Making business analysts, Masters of the data! - Considering this was written in 2015, this is an impressive overview of data enablement with automation

Gamechanger

Apache Airflow on AWS ECS - Many different implementations of Airflow are available, but I haven't see too many leveraging ECS before

Let me automate that for you - Obviously, we want automation wherever we can have it, so this was a simple walkthrough of how it's done at Gamechanger

Data Interruption Process - This was a strange way to word on-call, but it's an effective (albeit older) approach nonetheless

What Good Engineers Do - A solid set of principles for what makes a good engineer

Grab

How we store and process millions of orders daily - For those who want to know more about DynamoDB, this is helpful

Embracing a Docs-as-Code approach - Documentation is an often overlooked area, but this is a good approach to making sure it remains a chief priority

Real-time data ingestion in Grab - How food service handles real-time data ingestion

Trident - Real-time Event Processing at Scale - I was not too familiar with IFTTT (if this, then that) design before, so this was an interesting read

Gusto

The Accidental Tech Lead - Growing into being a tech lead, which can sometimes happen by accident just on account of experience

Cultivating Engineering Growth - Good tips on how to enable engineers for success through mentorship

Haptik

Kubernetes Production Best Practices - Part I - For those using Kubernetes in their workflows, a solid set of best practices

Hashnode

How to Build Event-Driven Architecture on AWS - A good tutorial on setting up event-driven architectures in AWS, including the different routes that can be taken

Heap

How I Learned to Stop Worrying and Love Tech Debt - The term "papercuts" is definitely a reasonable way to pull in tech debt items into planning

HelloFresh

How HelloTech’s working and knowledge sharing culture supports a company on scale - Companies with good knowledge sharing cultures are the ones whose employees succeed the most IMO

SLOs for everyone with Sloth - A very well-detailed explanation of how HelloFresh has full-scale monitoring for their SLOs in place

How HelloFresh establishes Data Quality with an in-house tool - A very nice implementation of data quality and attempting to shift left with it as well

Data driven Snowflake optimisation at HelloFresh - It's no secret in the DE world that Snowflake can be expensive. A good guide on how to tune down those costs.

Helpshift

Building a Data warehouse with Hive at Helpshift — Part 1 - A little more outdated, but still a useful overview of how you can build a warehouse with Hive as your backbone

Instacart

Building for Balance - A very thorough overview of how Instacart finds the balance between fast deliveries and high-earning opportunities for their drivers

The Next Era of Data at Instacart - Good post on the future of the data org at Instacart

Adopting dbt as the Data Transformation Tool at Instacart - Good to see bigger companies starting to adopt dbt

Intuit

Democratizing AI to Accelerate ML Model Development in Weeks vs. Months - A good overview on how the ML development process has been sped up at Intuit

How to Ensure Release Candidates are Good2Go? Automated Performance Pipelines. - Proper performance testing as a part of the CI/CD process is not something that's done enough, but this is a good set of principles to employ to accomplish that

LINE

A story of introducing data lineage into LINE's large-scale data platform - A good implementation of lineage in many different capacities at LINE

LinkedIn

Super Tables: The road to building reliable and discoverable data products - LinkedIn's overview of "super tables" helps bring out the best in their data products

Open Sourcing Venice – LinkedIn’s Derived Data Platform - An impressive data platform implementation from LinkedIn

Scalable Automated Config-Driven Data Validation with ValiData - A nice way to automate data validation

LakeChime: A Data Trigger Service for Modern Data Lakes - A great idea of how to ingest data as soon as it's available

Lyft

Securing Apache Airflow UI With DAG Level Access - DAG-level access may be the next evolution of Airflow UI access combined with RBAC

Open Sourcing Amundsen: A Data Discovery And Metadata Platform - Amundsen is becoming a very popular data discoverability platform, and for good reason

Running Apache Airflow At Lyft - Lyft is one of the big "power" users of Airflow, and their model can serve as a template for many

Big Savings On Big Data - A nice overview of how Lyft managed to bring down their costs in their processing

Gotchas of Streaming Pipelines: Profiling & Performance Improvements - Good tips on optimizing streaming pipelines

From Big Data to Better Data: Ensuring Data Quality with Verity - A very thorough overview of a great data quality platform

ETA (Estimated Time of Arrival) Reliability at Lyft - A thorough overview of how Lyft tries to calculate ETA

McDonald's

Searching for quality and speed? Observability can help - How observability is helping keep McDonald's development go quickly

A single source of truth: Building a design system library - This provides a good template for those who want to ensure they provide a consistent user experience

Proactive monitoring: The why, what and how - Proactive monitoring helps prevent bigger incidents from ever arising. It's the best way to pull off proper monitoring.

Miro

Data Products Reliability: The Power of Metadata: A good overview of how Miro is implementing data contracts

Netflix

Navigating the Netflix Data Deluge: The Imperative of Effective Data Management - A great post on how Netflix manages storage costs at scale

ETL development life-cycle with Dataflow - A very good overview of the E2E ETL process with Dataflow at Netflix

New York Times

Congrats, You’re On Call! Now What? - How to effectively handle an on-call rotation

Nextdoor

Engineering Principles (v1) at Nextdoor - A gold standard for engineering principles

NextRoll

Coordinated Cost Savings - Cost savings is a team effort and takes a village, as this post details

PayPal

The next generation of Data Platforms is the Data Mesh - A very solid explanation of why data mesh is needed in data platforms

Next-Gen Data Movement Platform at PayPal - Lots of parts in play, but detailed insight into everything that drives PayPal

The Journey of Metadata at PayPal - Bringing data ownership and discoverability to the masses at PayPal

Gimel: PayPal’s Analytics Data Processing Platform - The coolest part of this blog was realizing Romit now works in a related team at Disney :). But this platform is certainly impressive nonetheless.

Pinterest

Improving efficiency and reducing runtime using S3 read optimization - Reducing runtime with S3 reads is every data engineer's dream

How Pinterest runs Kafka at scale - A good overview on how Kafka can be effectively scaled within an organization

Postman

How (and Why) Postman Created a Data-Driven Hiring Process - I have never really liked the interviewing process, on both sides. Postman has a good model in place here.

The Postman Data Team’s Hub-and-Spoke Model - A good explanation of how the hub-and-spoke model works for Postman and its data teams

How Postman Does Data Democratization - A very thorough overview of how Postman enhances their data with proper democratization

Quora

Trino at Quora Scale: Cost, Speed, and Reliability - For those using Trino/Presto, a good overview on how it's done in a larger environment

REA Group

Accelerating experimentation with MLOps - A great resource for those who want to know more about best practices in MLOps

Data Science: Principles for Success - A solid set of principles for enabling success in a Data Science team

Data Discovery - A sensible implementation of Amundsen

Reflections On Designing An Enterprise Data Warehouse - Tips on how to design an effective data warehouse

The Ops Dojo - I'll all for the term "dojo" to better describe more of what we need to be doing

Shopify

The 25 Percent Rule for Tackling Technical Debt - 25% allotment for tackling technical debt would be a dream, but Shopify raises a very valid point on why it's necessary

The Hardest Part of Writing Tests is Getting Started - A very truthful title. TDD is needed but actually getting to that initial state can be a challenge.

How Good Documentation Can Improve Productivity - As someone who very much agrees with good documentation, I couldn't agree more

Three Essential Remote Work Practices for Engineering Teams - Some of these are easier said than done, but very much true for remote work these days

Reducing BigQuery Costs: How We Fixed A $1 Million Query - Good tips on how to keep your costs low

A Software Engineer's Guide to Working Across Time Zones - As someone who works on a team with teammates halfway around the world, very relatable points

How to Structure Your Data Team for Maximum Influence - The "Diamond Defense" is not one I've ever seen before, but it makes sense on team structure

On the Importance of Pull Request Discipline - Good practices to follow for raising PRs

When Culture and Code Reviews Collide, Communication is Key - More relevant points than you might think

Six Tips for Staying Technical as a CTO - My fear when getting into management is not being technical, so it's cool to see this advice on how to "stay in the game"

5 Steps to Bounce Back from a Negative Performance Review - A bad performance review isn't the end of the world. It provides an opportunity to really grow.

Lessons Learned From Running Apache Airflow at Scale - Shopify has a good model in place for running Airflow

Asynchronous Communication is the Great Leveler in Engineering - Asynchronous communication is absolutely necessary in our current state of work

Data Is An Art, Not Just A Science—And Storytelling Is The Key - Absolutely agree with the title here. Telling a story with data is critical.

The Magic of Merlin: Shopify's New Machine Learning Platform - Merlin is a very cool implementation of ML

A Data Scientist’s Guide To Measuring Product Success - Good tips on how to better enable product success

Using Terraform to Manage Infrastructure - As a big Terraform proponent, this is a good overview on how Shopify is using it

Shopify's Playbook for Scaling Machine Learning - A good model to follow for ML

Search at Shopify—Range in Data and Engineering is the Future - A great post on why range is necessary for future development

Shopify’s Unique Data Science Hierarchy Of Needs - Shopify has a good model in place here for Data Science

Five Tips for Growing Your Engineering Career - A good set of tips for elevating your career

The AWARE Development Plan - A very good acronym to follow for a successful career

5 Steps for Building Machine Learning Models for Business - Good tips on getting ML into the picture

Modelling Developer Infrastructure Teams - A good explanation of the difference between horizontal and vertical teams

Bridging the Gap Between Developers and End Users - Very good tips on how to bring product and tech closer together

A Guide to Running an Engineering Program - Not sure if I'll ever get to this stage, but this seems like a very sensible guide if that day ever were to come

Other Driven Developments - Developments we'd never think about, but they're totally out there

How I Define My Boundaries to Prevent Burnout - Good tips here, including ones I need to follow more honestly

4 Tips for Shipping Data Products Fast - As someone who works with data products, I can attest following these will make things go much smoother

How to Make Dashboards Using a Product Thinking Approach - Good principles to follow for getting the most out of dashboards

How to Reliably Scale Your Data Platform for High Volumes - I feel like this isn't used as often as it should be, but it totally makes sense for making sure platforms scale

Software Release Culture at Shopify - This should set a standard for proper release culture

Great Code Reviews—The Superpower Your Team Needs - Good practices to follow for successful code reviews

Successfully Merging the Work of 1000+ Developers - A good set of proper CI standards

How Shopify Scales Up Its Development Teams - I very much agree with the points listed here on upleveling your team

Five Common Data Stores and When to Use Them - For those who need to evaluate with what type of data store to go with, this is a good reference

Implementing ChatOps into our Incident Management Procedure - I very much agree with the role of ChatOps in incident management

Code Style Consistency for Shopify’s Decade-Old Codebase - Code style is something that try to preach and uphold for our team

Why Shopify Moved to The Production Engineering Model - Having a model in place like this makes everyone's lives easier

Developer Onboarding at Shopify - Proper onboarding can make a world of difference for engineers, and it seems like Shopify has it down pat

Unlocking Real-time Predictions with Shopify's Machine Learning Platform - Very well-done explanation of how Merlin is being used at scale today

What Being a Staff Developer Means at Shopify - Being a staff developer is considered the ultimate rank, but what does it take to get there? This is a helpful guide to getting to that point.

Sky Betting and Gambling

Team Size and Why It Matters - A good breakdown of how smaller vs. bigger teams differ

Skyscanner

Automating cloud governance at scale - For those who work in governance, this is a good way to keep guardrails on resource provisioning

Using engineering principles to create autonomous teams at scale - A good set of principles for ensuring teams are successful

Monoliths and Microservices - How to move away from monoliths to microservices

Slack

BuildRock: A Build Platform at Slack - Proper CI/CD platforms help unblock many teams, so it's imperative to do it right

Infrastructure Observability for Changing the Spend Curve - Generally, it's not CI infrastructure that hogs costs, but always good to be aware of everything

Data Lineage at Slack - Effective implementation of data lineage, especially with Slack notifications involved

How We Design Our APIs at Slack - For those interested in API design, this is a good set of principles to follow

Starting an Initiative - Finding impact can be difficult at first, but persistence is key

How Big Technical Changes Happen at Slack - Good discussion on when the hype is real and joining the trend

Deploys at Slack - A very solid CI/CD implementation

Disasterpiece Theater: Slack’s process for approachable Chaos Engineering - Chaos engineering helps keep websites like Slack up around the clock

Data Wrangling at Slack - An older article, but an effective implementation for data wrangling

Data Consistency Checks - An older article, but still covers valuable points related to data quality

Service Delivery Index: A Driver for Reliability - For those in SRE, this is a good primer.

Executing Cron Scripts Reliably At Scale - A bit strange not to see Slack using a service like Airflow to handle all of this, but a good overview nonetheless.

Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack - A walkthrough of how Slack upgraded all of their processes to use more recent versions of EMR/Spark.

Slalom

Cloud Trends: A Mainstream Evolution to DataOps - A good overview on the relevance of DataOps in this current era

The Building Blocks of Success: Is Data Mesh Right for My Organization? - Data mesh is (rightfully) a buzzword right now, but that doesn't mean it's for everyone. This is a good guide on when data mesh makes sense.

Data Is Everywhere. Is Yours Under Control? - A good post on the relevance of data governance

Data Modelling is More than Documentation - A good explanation on the different types of data models

Deconstructing Data Mesh Principles - A good overview on the different key principles of a data mesh

Data Mesh: is the argument a strawman? - A post battling the hype of data meshes

Building a Culture of Data and Insights - A nice overview of how to enable a data-driven culture

Soundcloud

Building a Healthy On-Call Culture - Tips for helping ensure a smooth on-call process

How (Not) to Build Datasets and Consume Data at Your Company - An effective approach towards ensure healthy data usage

Getting a Team Back on Track - This is an underdiscussed topic that should be mentioned more. A helpful set of tips for helping keep teams afloat amidst change.

A Better Model of Data Ownership - A helpful definition of what exactly ownership means in relation to data

Spotify

Why We Switched Our Data Orchestration Service - Flyte isn't necessarily on Airflow or Prefect level yet but Spotify's explanations of why they're doing it makes total sense

Achieving Team Purpose and Pride with Scrum - Getting the most out of scrum, done the right way

Managing Clouds from the Ground Up: Cost Engineering at Spotify - We all could benefit from a dashboard tool like this (and many companies are now realizing how relevant it is)

How We Improved Data Discovery for Data Scientists at Spotify - A very thorough overview of how Spotify has implemented data discovery

TC4D: Data Quality By Engineers, For Engineers - A fun initiative for bringing out the best in testing

Qualities of Quality - A very solid set of principles for holding up quality

Analytics at Spotify - Old post but that only goes to show how much Spotify embraces data

Agile à la Spotify - You don't see many places rewriting the Agile manifesto, but the principles Spotify's outlining make sense

Fleet Management at Spotify (Part 1): Spotify’s Shift to a Fleet-First Mindset - Maintaining a lot of components is extremely difficult, but Spotify makes it look easy with this approach.

Getting More from Your Team Health Checks - How to get the most out of your team health/pulse checks, something that's not done enough.

Data Platform Explained - A data platform at a company that handles data like Spotify is bound to be interesting. I look forward to the continuation of this series.

Data Platform Explained Part II - A continuation of the data platform series

Unlocking Insights with High-Quality Dashboards at Scale - A good set of criteria for high-quality dashboards

Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform - Persona usage for making sure a platform is built appropriately is a smart model

Squarespace

Creating a Code Review Culture, Part 1: Organizations and Authors - Good tips on how to more effectively put code together for review

Creating a Code Review Culture, Part 2: Code Reviewers - Good tips on how to be an effective code reviewer

Data Traceability and Lineage - A bit older on this topic, but setting the foundations for effective lineage in data

Stack Overflow

Why Devs (Should) Like Estimates - Good tips on how to more effectively estimate when it comes to planning

A Culture of Trust - Trust is one of the most important things you need to have within a team, and I totally agree with Stack Overflow's discussion on it

Developer Turned Manager - A good retrospective on transitioning from development to the management side of things

Stitchfix

Migrating Spark from EMR on EC2 to EMR on EKS - EKS is the "new" standard for Spark processing, so this is a helpful tutorial on moving Spark from EC2 to EKS

Aggressively Helpful Platform Teams - "Aggressively helpful" is exactly what platform teams need to be in order to better enable success within an organization

Stride

What is DevOps? - A well done primer on DevOps

Creating Core Values that Actually Stick - Core values are often brushed away, but the organizations that really put time and effort into them are the ones that stand out amongst the crowd

Target

Chaos Leads to Resilience - Chaos engineering can better protect your system in the long run, so it's cool to see how Target is preparing themselves for those scenarios

Review Scrutiny - Code review etiquette is an underappreciated topic but a good one to go back to from time to time

Thoughtworks

Making the data dream a reality - The origins of data mesh and how it can better enable data-driven thinking

Timescale

Database Management: Behind-the-Scenes Lessons From a Data Architect - For those who want to learn more about data centers and the ins and outs of big data, this is definitely a good post

Toptal

Big Data Architecture for the Masses: A ksqlDB and Kubernetes Tutorial - A good overview of ksqlDB

Trivago

SRE: On-Call Procedure at trivago - On-call procedures would be a lot better for everyone if they followed how Trivago is doing it

Remastering Guilds After Five Years - Guilds are a great way to bring out more collaboration within an organization

Creating a Culture of Quality - A good post on proper quality when it comes to CI/CD

Technical Decision-Making - A good guide to help standardize the technical decision-making process

What Have I Even Been Doing Today? - How to come to terms with moving from an IC into a management role

Twitch

Twitch Engineering: An Introduction and Overview - Older post, but still a cool overview of how Twitch is set up

Twitter

Data Quality Automation at Twitter - For those using Great Expectations, this is an effective look at how Twitter is doing it

Powering real-time data analytics with Druid at Twitter - Druid may not be the most relevant platform anymore, but it's cool to see how Twitter is using it to power their use cases

Next generation data insights using natural language queries - This implementation of Qurious looks really, really cool

Advancing Jupyter Notebooks at Twitter - Part 1 - How Twitter leverages Jupyter notebooks for true data-driven analysis

Processing billions of events in real time at Twitter - 400 billion events per day is insane, so to see how Twitter's able to do it under the hood is very interesting

Kafka as a storage system - You don't really think of Kafka being used for storage, but Twitter seems to have done it effectively

Building Twitter’s ad platform architecture for the future - An AdServer per product is a lot, but it definitely does better enable proper scale

Democratizing data analysis with Google BigQuery - A very sensible approach to proper data democratization at Twitter

Interactive Analytics at MoPub: Querying Terabytes of Data in Seconds - An effective use of Druid and microservices to power interactive analytics

ZooKeeper at Twitter - Similar to FB, a detailed breakdown on how a big platform is using ZooKeeper to stay afloat

Productionizing ML with workflows at Twitter - How Twitter uses Airflow to solve their ML use cases

Using Deep Learning at Scale in Twitter’s Timelines - This is a really cool overview of how deep learning is used to power what we see on our Twitter timelines

The Infrastructure Behind Twitter: Scale - This is a lot of context on how Twitter manages to scale, and you know it's only gotten more complex since then

Discovery and Consumption of Analytics Data at Twitter - A pretty detailed discussion on data discovery, especially given that this was in 2016

Uber

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows - Automated tools like these will become more of a reality, especially in the larger organizations

Crane: Uber’s Next-Gen Infrastructure Stack - The future of big data processing at Uber

Cost Efficiency @ Scale in Big Data File Format - A bit advanced, but a nice overview on how Uber keeps their costs in check

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework - A good implementation of real-time analytics

How Data Shapes the Uber Rider App - A good overview on what role data plays in the Uber app

How Uber Achieves Operational Excellence in the Data Quality Experience - +1 for operational excellence and proper data quality

Continuous Integration and Deployment for Machine Learning Online Serving and Models - How Uber tackles some of their MLOps challenges

Uber’s Journey Toward Better Data Culture From First Principles - I'm a big fan of the principles mentioned in this page

Turning Metadata Into Insights with Databook - A data discovery/observability platform that can be the gold standard for others

Monitoring Data Quality at Scale with Statistical Modeling - Very useful applications of modeling for proper DQM

Uber’s Data Platform in 2019: Transforming Information to Intelligence - A bit outdated by DE standards, but valuable insight into how Uber manages to continue to perform at scale

Solving Big Data Challenges with Data Science at Uber - Fun applications of Data Science within Uber

Managing Uber’s Data Workflows at Scale - Eliminating single points of failure and converging to unified products when possible are very solid principles to be considering for larger platforms

Databook: Turning Big Data into Knowledge with Metadata at Uber - Cool overview on how Uber brings out their data discovery

Turbocharging Analytics at Uber with our Data Science Workbench - Self-serve analytics platforms like what Uber has built are the backbone of larger organizations

Engineering Data Analytics with Presto and Apache Parquet at Uber - How Uber uses Presto and Parquet for an efficient SQL engine

ETA Phone Home: How Uber Engineers an Efficient Route - An interesting read on how Uber puts routes together

Identifying Outages with Argos, Uber Engineering’s Real-Time Monitoring and Root-Cause Exploration Tool - An earlier but still extremely relevant post on anomaly detection and the role it plays in monitoring

The Pulse of a City: How People Move Using Uber Engineering - For those into data visualization, a nice view into Uber transport in big cities

Evolution of Data Lifecycle Management at Uber - DLM is a very relevant topic these days, especially with an increased focus on costs. How Uber handles it is a good standard to follow.

Dynamic Executor Core Resizing in Spark - OOM errors in Spark are the worst. This is a good method to make that issue easier.

Attribute-Based Access Control at Uber - Proper access control is tricky when it comes to tables, so this is a good foundation for others to follow.

Announcing Cadence 1.0: The Powerful Workflow Platform Built for Scale and Reliability - There's always more room for workflow engines, so cool to see what Cadence can bring to the table.

Sparkle: Standardizing Modular ETL at Uber - I am all for standardizing ETL development wherever it can be. Sparkle seems like a very smart approach.

Preon: Presto Query Analysis for Intelligent and Efficient Analytics - Excellent approach to query optimization/analysis

VTS

Designing for Data - Telling a story with data is underrated

Walmart Global Tech

Rapid & Reliable ML Experiments using MLOps Best Practices. - A good application of MLOps principles

element: Walmart’s Machine Learning Platform - A very good overview of the ML platform that's in place at Walmart

Unsung Saga of MLOps - This is a good set of principles to really kick ML engineering into high gear

MLOps — Is it a Buzzword??? Part -1 - MLOps is more than just a buzzword or a trend. It's a cultural change.

The Importance of Good Data - For those who like to sleep on data quality, this one's for you

Pillars of Walmart’s Demand Forecasting - The pillars used to accomplish proper demand forecasting make sense for any company in the same line of work

DataBathing — A Framework for Transferring the Query to Spark Code - We actually use a similar process to simplify SparkSQL queries. It's good to see others do the same.

Engineering Acceleration with InnerSource Culture - Inner-source culture is a big one in companies.

Unified Monitoring of ETL Performance with BumbleBee - A good overview of how to do effective ETL monitoring

Resiliency Through Message-Driven Architecture - The message/event-driven architecture is definitely a sensible one based on the internals of your application

Cloud Native Architecture Fundamentals - A good overview of what it really means to be Cloud-native

Data as a Service - A lot of these same concepts are a part of data products as we know them now

Auditing Airflow Job Runs - Auditing Airflow job runs is crucial as a part of proper observability

The Keystone of Happy Teams - Psychological safety is a great term to use when distinguishing the average team from happy teams

Building a Platform Team — Laying the Foundations - A great overview on how to really set up a proper platform team

Product Management 101: 8 Steps to Design Better Products - Even as engineers, we should be familiar with many of these concepts so we can help our product stakeholders accordingly

The Power of an Invisible Leader - I've never heard the term "invisible leader" before, but it's a sensible one based on the description

Work Got You Stressed? Here Is My Secret To Controlling The Chaos. - A very applicable guide to myself, as I struggle with work stress all the time

5 Principles Guaranteed to Help Build a Strong Team Culture - As someone who is big on team culture, I thought this was a great read.

Wayfair

Introducing our Machine Learning and Data Platforms Team - Platform teams enable a lot of success within an organization

Enabling Supplier Sales Through Real-time Data - How real-time data unlocks more potential for Wayfair

Wealthfront

Rolling Back an Airflow Upgrade - Things are never perfect, so this is a good post on how to recover from a failed Airflow upgrade.

WePay

Effective Software Design Documents - Effective design documents are super helpful when iterating on a product

Improving Airflow UI Security - A good model for ensuring proper security within the Airflow UI

Xandr

Knowledge Transfer in Engineering: How to make it go smoothly - Effective KTs are a gamechanger for engineers, so this is a solid set of principles for better enabling it

Yelp

Spark Data Lineage - I've never really seen Spark being used for lineage purposes, but this is a cool implementation from Yelp on how it can be accomplished

Engineering Career Series: How we onboard engineers across the world at Yelp - Effective onboarding programs make the process so much easier for engineers as they start their journey

Zalando

Growth Engineering at Zalando - Mentoring and role frameworks help enable growth and success for engineers

Accelerate testing in Apache Airflow through DAG versioning - DAG versioning makes complete sense for dealing with testing alongside ongoing production processes leveraging the same DAGs

Principal Engineering at Zalando - A good primer on what principal engineers mean to an organization

A Systematic Approach to Reducing Technical Debt - The concept of a tech debt rotation isn't a bad idea to help keep that area in check

The Product Playbook - The 4 D's is a sensible approach for product design

Four Pillars Of Leading People - Good principles to follow for those in leadership roles.

The Democratization of ‘Data Science As A Service’ - I'm always a fan of promoting data science/engineering as a service

Discovering Design Sprints - Sometimes the war room doesn't have to be such a bad thing

Data Analysis with Spark - For those newer to DE, a basic overview of Spark.

Dedicated Ownership for Teams at Zalon - A good model on team structure

Zapier

Thinking Fast and Estimating Wrong - Estimation never seems to be right when it comes to planning in software development, so I totally agree with the message of this post.

Zillow

Building a Data Streaming Platform - How Zillow Sends Data to its Data Lake - An interesting look at how Zillow combines all its data sources into the lake

Airflow at Zillow: Easily Authoring and Managing ETL Pipelines - Zillow has always had a strong Airflow presence, and this article from 2017 still holds up.

Building a strong foundation to accelerate StreetEasy’s data science efforts - A great post on what it takes to build a data foundation

Zomato

The Deep Tech Behind Estimating Food Preparation Time - Interesting post into the logic behind food service.

Zumba

Learning to Be a Tech Lead - Becoming a tech lead is not a simple change and requires shifting your priorities/frame of reference.

About

A collection of my favorite tech-related blog posts.