jdiez / awesome-data-leadership

A curated list of awesome posts, videos, and articles on leading a data team (small and large)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

awesome-data-leadership

A curated list of awesome and useful posts, videos, and articles on leading a data team. This includes leadership at the middle-management, Director/VP, or C-suite level, for organizations both big and small. A few relevant engineering management articles are sprinkled in. Awesome

Please contribute by opening PRs! ⚡️

Topics (with # of links)

Hiring

Author Title One-sentence summary Year
Eli Goldberg Hire better data scientists: A field guide for hiring managers new to data science. Part 1. Creating better job descriptions brings in better talent. When hiring, highlight the "why you", desecribe opportunities instead of responsibilities, describe key actions and background experience needed not technologies, and proofread! 2020
Eli Goldberg Hire better data scientists: A field guide for hiring managers new to data science Part 2. Create a clear interviewing process. Make time for hiring and use your shift in priorities to your advantage, don't "wing it", write your process down and engineer it to be data driven, and modify the process not your adherence to it. 2020
Gergely Orosz Hiring (and Retaining) a Diverse Engineering Team Stories from six engineering leaders who succeeded in building and growing diverse teams 2021
Reddit “Are we being too harsh on junior candidates?” Reddit thread discussing expectations of junior ML job candidates 2022
Hacker News “When did 7 interviews become normal” A “Ask HN” forum question around the topic of over-interviewing 2022
Farhan Thawar VP of Engineering hiring cheatsheet A guide for assessing a candidate for a engineering or data leadership role: provides good and bad responses to questions. 2022
Freaking Rectange Blog How to Freaking Find Great Developers By Having Them Read Code When hiring for data engineering, analytics, data science, or ML Engineering roles, it would be better to have candidates try to read code instead of writing it (it can be neutral interview-only code). 2022
Emily Thompson Hiring Data Scientists With Intention Gives guidance on: writing a focused job description, being strategic in sourcing, and designing a structured interview process so that you can be consistent in evaluating candidates. 2022
Nate Rosidi 15 Python Coding Interview Questions You Must Know For Data Science Provides 15 examples of testing basic python dta manipulation skills for interviews. 2022
Jike Chong, Ben Lorica, Yue Cathy Chang Top Places to Work for Data Scientists: We identify U.S. organizations that will help you develop your career in data science Looks at factors that make a data science org attractive to an IC, but this provides some insights for hiring managers trying to get in the heads of talent. 2022
Randy Au Let's talk a bit about giving interviews Gives thoughts on planning and carrying out a technical data science interview. 2022
Michelangelo D'Agostino, Katie Malone The Care and Feeding of Data Scientists, Chapters 2 and 3 "How to Win Friends and Recruit Data Scientists" and "Interview with the Data Scientist" has tips on recruiting and interviewing. 2019
Dip Ranjan Chatterjee The Data Science Interview Book A very comprehensive set of topics to interview data science candidates with (spans statistics, ML, NLP, etc). 2022
Tristan Handy When to hire a data engineer? Article makes the claim that increasingly data analysts and scientists are working on ETL pipelines themselves (with the help of Stitch, Fivetran, dbt, etc.) but data engineers are still essential for: managing core data infrastructure, building and maintaining custom ingestion pipelines, supporting data team resources with design and performance optimization, and building non-SQL transformation pipelines. 2022
Jacob Kaplan-Moss My questions for prospective employers (Director/VP roles) This post discusses the other side of the hiring table, and gives great questions a candidate for a Director or VP-level engineering leadership role should be asking (though this post could also be helpful to hiring team thinking through the scope of a Director or VP-level role). 2019
Chip Huyen What we look for in a resume Outlines the resume evaluation process for a small startup looking for data talent and includes topics like looking for examples of persistence, looking for unique perspectives, and looking for metrics around business impact. 2023

Culture

Author Title One-sentence summary Year
Emily Thompson Growing Data Teams from Reactive to Influential Reactive data teams lead to low impact and attrition, so instead acknowledge if your team is reactive, assess reactivity quantitatively, focus on near-term wins for cultural change, and build longer-term foundational work into the team’s capacity 2022
Prukalpa Sankar It’s Time for the Modern Data Culture Stack We need a modern data culture stack: best practices, values, and cultural rituals that will help data people come together and collaborate effectively. 2021
Kuba Niechcial How to set goals for engineers? Provides some examples of good engineer personnel goals and things to keep in mind (e.g. KPIs should not be personal goals). 2021
Jacob Kaplan-Moss “Exit Interviews Are a Trap” Rethinking the exit interview: there is very little upside (unlikely things will change) and potentially significant downside (bad blood, retracted references, malicious actions by employer, etc. 2022
Christoph Neijenhuis How to stop shrinkage in engineering teams The journey to stopping shrinkage in engineering teams is long and rarely straightforward, but there are practical things leaders can do to take control of the chaos, from taking steps to get out of survival mode and tackling problems around culture to involving teams in the development of a solid technical strategy. 2022
Caitlin Moorman Proficiency v. Creativity It is critical to find a balance between open-endedness/opportunities for creativity and standardized rigor when leading a data function. 2020
Shimin Zhang Why a Meeting Costs More than a MacBook Pro – the Business Case for Fewer Developers in Meetings Describes the opportunity cost of having all developers or data engineers attending meetings and describes ways to recoup this. 2022
David Waller 10 Steps to Creating a Data-Driven Culture Details some steps for working towards a data-driven culture, from taking care in choosing metrics to quantifying uncertainty. 2020
Michael Kaminsky A Culture of Partnership Building a culture of partnership on your analytics team is crucial to maximizing the impact your team can have. 2019
Benn Stancil Do data-driven companies actually win? Article discusses how much a data-driven culture actually contributes to a company's successs through a handful of hypothetical fashion companies. 2022
Michelangelo D'Agostino, Katie Malone The Care and Feeding of Data Scientists, Chapter 4 "Fear and Loathing in Data Science" offers concrete tips on culture that help to retain your best people. 2019
Benjamin Rogojan Onboarding For Data Teams The costs (both opportunity costs and retention problems) of poor onboarding are great, to help with this the author writes about 'Onboarding For Context', 'Environment Set-Up', and the concept of 'Commit Something Day One'. 2022
Prukalpa Sankar The “knowledge-creating” company, a big announcement and other takeaways from dbt Coalesce Prukalpa provides thoughts around this great quote from an early 90's HBR article: "...markets shift, technologies proliferate, competitors multiply, and products become obsolete almost overnight, successful companies are those that consistently create new knowledge, disseminate it widely throughout the organization, and quickly embody it in new technologies and products; these activities define the ‘knowledge-creating’ company, whose sole business is continuous innovation." 2022
Christine Garcia The secrets of a modern data leader: The first 365 days inside a data team Fantastic video covering how leaders should nurture their data teams, build the right team values, establish governance inside the team, create cadences and rituals, etc. 2022
Claire Carroll Data education is broken The post explores the disconnect between data education and real data practice in industry (e.g. analyzing static flat files in R, Pandas, or SPSS compared with using SQL along with tools like git, dbt, Airflow, VSCode, etc), why this occurs, and the effects it has on the data industry. 2021

Impact

Author Title One-sentence summary Year
McKinsey Ten red flags signaling your analytics program will fail. A list ranging from the executive team doesn't have a clear vision for it's analytics program to nobody knows the quantitative impact that analytics is providing 2018
Erik Bernhardsson Building a data team at a mid-stage startup: a short story A story about a fictional company that became more data-driven and how it was done. 2021
Abinaya Sundarraj Data Management: How to Stay on Top of Your Customer’s Mind? Describes the virtues and challenges around achieving a customer-centric, data perspective in a business. 2022
Mikkel Dengsøe How to measure data quality: Practical guidelines for how to measure quality, engagement and productivity in a data team Provides some thoughts around how to evaluate your data team and suggests three categories of metrics: quality, productivity, and engagement. 2022
Sarah Krasnik Choosing a Data Catalog Although not technically on management, this tackles the critical topic of documentation, dictionaries, knowledge repos and such, which are critically important for a data org. 2022
Chad Sanderson The Existential Threat of Data Quality: and Why the Modern Data Stack Can't Solve It Despite the rapidly-evolving/growing data stack, poor data quality remains an enormous problem; the article breaks it down into "downstream" and "upstream" categories. 2022
Anna Geller Should You Measure the Value of a Data Team? What to measure and whether you should Wonderful discussion of the challenges of measuring a data team's impact, and provides clear examples of good, so-so, and poor metrics for measuring this performance. 2023
Benn Stancil A method for measuring analytical work: Our only job should be to make people more decisive. Argues that much of the value of an analytics org is difficult to quantify, but perhaps these orgs should be valued (and measured on) their ability to reduce the time it takes to make decisions. 2021

Strategy

Author Title One-sentence summary Year
Prukalpa Sankar Data Advantage Matrix: A New Way to Think About Data Strategy Break down your data advantage into four categories (e.g. operational, strategic, product, and business opportunity) and then assess what stage each of these is at (e.g. basic, intermediate, advanced) 2021
Ilan Man Creating a Data Road Map Provides suggestions for what factors to consider when thinking about a data roadmap or data strategy (e.g. identifying the audience, set up the scaffolding, etc.) . 2019
Chris Brown Executing a Data Strategy with OKRs Outlines how OKRs (Objectives and Key Results) can help with executing on data strategy and provides some examples. 2022
Yali Sassoon Organizations need to deliberately create data Organizations spend an incredible amount of time and resources extracting data from various sources, but rarely consider making their own data to generate inputs for the ML systems. 2022
Leo Polovets The Value of Data, Part 1: Using Data as a Competitive Advantage Software and hardware infrastructure are becoming commoditized, so data you generate gives you the advantage; data helps you make good content recommendations, helps with ad targetting, gives you actionable insights, makes operations more efficient, and more. 2015
Leo Polovets The Value of Data, Part 2: Building Valuable Datasets Describes the attributes of high-value datasets, common approaches for capturing this data, and common pitfalls people fall into during this process (e.g. consider the law of diminishing returns, how clean is your data, etc.) 2015
Leo Polovets The Value of Data, Part 3: Data Business Models Final post in this series describes the concept of a "Data Business Model", the reality of how data can be monetized with examples of companies in each scenario. 2015
Emilie Schario and Taylor A Murphy Run Your Data Team Like A Product Team Service-oriented data teams aren’t effective, and the authors suggest running the data team like a product team is ideal, where you take a more active roll in defining your org's success metrics and push the business forward in a more active way. 2021
Jeremy Salfen Building a Data Practice from Scratch Provides a series of suggestions for first data hires at an early stage startup, including the following principles: "don’t worry about making things fancy", "keep an eye on how things will scale, but rein in your impulses to optimize them", and "documentation, transparency, and reproducibility are interrelated and fundamental". 2021
Brittany Bennett Roadmapping as a Tool for Data Leaders Author describes how to create a roadmap with their data team and how to use it to push for more team resources (includes ideation sessions with sticky notes, voting, generating a timeline, and then ultimately packaging this for the leadership to get the resources). 2023
Raymond See Tools and Techniques to Establish Your Data Team Early Provides some tips for early-stage start-ups hoping to develop a data function (e.g. hire a few generalists, bring in the right tools, etc) 2023
Prukalpa Sankar A Behind-the-Scenes Look at How Postman’s Data Team Works: How Postman’s data team set up better onboarding, infrastructure, and processes while growing 4–5x in one year Describes Postman's data team structure (contains central, embedded, and distributed memebers), how they handle prioritization, sprints, and the like. 2021

Diversity Equity and Inclusion

Author Title One-sentence summary Year
Sophia (Saeyoon) Baik Building a Diverse Engineering Team in 2022: The Beginner’s Guide Provides great summary and many links describing the state of DEI in tech engineering, along with why diversity helps boost productivity, and a number of suggestions on how to reduce hiring biases. 2022
Sergio Morales Future-proof your Analytics Efforts in 2020: Hire Diverse Teams Post describes how data team diversity deters bias and encourages curiosity, skepticism and analytical thinking; attributes any analytics enterprise will highly value. 2020
Swathi Young How To Make Sure That Diversity In AI Works Post provides guidance on how management teams can build diverse AI teams, including suggestions like restructuring talent acquisition, thinking through pay parity, and more. 2021
Gergely Orosz Hiring (and Retaining) a Diverse Engineering Team Stories from six engineering leaders who succeeded in building and growing diverse teams 2021

Project Management

Author Title One-sentence summary Year
Erik Bernhardsson “Why software projects take longer than you think: a statistical model” Adding up time estimates for many subtasks isnt advised, instead, figure out which tasks have the highest uncertainty – those tasks are basically going to dominate the time to completion. 2019
Erik Bernhardsson “σ-driven project management: when is the optimal time to give up?” The post describes an abstract measure “alpha” that captures the risk of a project and based on that risk the post describes a statistical model that shows when one ought to give up on a project. 2022
Michael Kaminsky Agile Analytics, Part 1: The Good Stuff When it comes to data science and analytics, these aspects of the scrum work flow work well: acceptance criteria, pointing, two-week chunks (sprints), and explicit prioritization. 2018
Michael Kaminsky Agile Analytics, Part 2: The Bad Stuff Some aspects of agile don't work so well with data teams, these include: "The fortuitous finding", exploratory data analysis needs, product ownership / story-writing, and business-as-usual support. 2018
Michael Kaminsky Agile Analytics, Part 3: The Adjustments Adjustments are suggested for agile to work well on a data team: time-bound spikes for research, build in slack time for exploration, acceptance criteria includes “write the next story”, peer-review instead of sprint-review. 2018
Michelangelo D'Agostino, Katie Malone The Care and Feeding of Data Scientists, Chapter 5 "To Agile or Not to Agile". 2019
Oscar Baruffa Dealing with difficult stakeholders Presents some approaches for handling difficult stakeholders that you need buy-in from, including things like take the path of least resistance, work towards getting stakeholders to think it's their idea, have lots of private conversations beforehand, and more. 2022
Lucas F Costa Useful engineering metrics and why velocity is not one of them Covers four useful metrics that are easily attainable from JIRA that aren't easily gameable and can help you debug process problems: arrival rate, work in progress, throughput, and cycle time. 2022
Leandro Carvalho Data Product Canvas — A practical framework for building high-performance data products Outlines the "Data Canvas" framework for building new data products, which is divided into 10 blocks (problem, solution, data, hypotheses, actors, actions, KPIs, values, risks and performance/impact), and separated by 3 domain areas: the product vision, the vision of the strategy, and the business vision. 2022

Code Review

Author Title One-sentence summary Year
Gunnar Morling The Code Review Pyramid There should be a hierachy of effort in reviewing code, where more effort is spent on core concepts, how performant code is, and documentation, with less effort on test quality (though of course tests are important) and syntax. 2022
Tim Hopper Code Review Guidelines for Data Science Teams In the context of data team, desecribes what a code review should achieve, bullets to carry out pull requests, and some links to additional reading. 2020
Eric Ma Practicing Code Review In the context of data science the essay briefly describes the purpose of code review, what it should not be, and the value of it in data work. 2021

Organization Structure and Job Titles

Author Title One-sentence summary Year
Rob Dearborn Organizing and scaling an effective data team General guidelines on what a properly-structured data team should look like, with describes ranging from 1-person data team to 32+ person team. 2022
Brittany Bennett Building Powerful Data Teams: On Investing in Junior Talent Provides suggestions on how developing junior talent: blocking off time for personal development, celebrating this blocked off time, hiring tutors, and more. 2021
Eric Colson "Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function" Beware specialization in data science (data science is not to execute. Rather, the goal is to learn and develop profound new business capabilities), as there are costs to specialization. 2019
Chuong Do "What is the most effective way to structure a data science team?" Covers how should data scientist roles be defined (analysis vs building), where should data scientists report (centralized vs decentralized), where should the data science function live (engineering org vs product org vs independent consultancy), and what should an organization do to set up data science for success. 2017
Mikkel Dengsøe "Data team structure: embedded or centralised?" There are three common models of how data teams are structured, each with their drawbacks and advantages: centralized, embedded, and hybrid. 2022
Randy Bean Chief Data Officers Struggle To Make A Business Impact There is widespread disparity of opinion on what defines a successful Chief Data Officer, so it makes sense that only CDOs are poised for success according to a recent Gartner report. 2019
Matthew Mayo Data Scientist, Data Engineer & Other Data Careers, Explained Explanations of various titles such as Data Architect, Data Engineer, Analyst, ML Engineer, and Data Scientist 2022
Gergely Orosz What Silicon Valley "Gets" about Software Engineers that Traditional Companies Do Not The Silicon Valley treats engineers as autonomous adults who are smart people because that’s who they hire because that’s who can do the work they need done, while traditional companies tend to keep developers in pure execution roles. 2021
Rifat Majumder The Data Product Manager Describes the emerging role of "Data Product Manager", and how benefits they provide an org: better business impact, a deep understanding of customer problems, and more clarity on priorities. 2021
Benn Stancil The technical pay gap: The culture we build is the culture we buy Describes the current state of confusion around data titles (using the "analytics engineer" as an example), and describes how the tech industry overvalues technical skills at times. 2022
Ben Darfler Engineering Levels at Honeycomb: Avoiding the Scope Trap Describes a nice framework for thinking about job levels, based on scope and level of project complexity. 2022
Mikkel Dengsøe Data teams are getting larger, faster There are many problems you can encounter when your data team grows beyond a handful of people; the article provides some tips on working through these problems. 2022
Jorge Fioranelli A framework for Engineering Managers Although not directly about data this is relevant: a framework for engineering managers to think through titles and expectations (including domains of technology, systems, people, process, and influence). 2022
Pardis Noorzad Models for integrating data science teams within companies: A comparative analysis Compares different models for situating DS teams including the "center-of-excellence model", the "Accounting model", the "consultant model", the "embedded model", and more, and considers factors like "Coordination efficiency", "Employee happiness", and others. 2019
Kurt Cagle Why You Don’t Need Data Scientists Early in an organization's data maturity stage, you don't need "data scientists" and machine learning people, you instead need to focus on data quality and ontological engineering problems. 2018
Michelangelo D'Agostino, Katie Malone The Care and Feeding of Data Scientists, Chapter 6 "Chutes and Career Ladders" discusses how to write a great career ladder for your team. 2019
Benjamin Rogojan Different Types Of "Data Engineering" Teams Post gives nice overview of the various flavors of data engineering roles in organizations (including software engineers, data platform engineers, etc). 2022
Morgan Krey Storytellers and System Builders: A New Way to Think About Data Roles There has been a proliferation of "data X" roles (e.g. data engineer, data scientist, data analyst, etc) but the author argues that there are really just two kinds of data practitioners: system builders (your engineers that build pipelines, schedule jobs, stand up APIs, etc.) and storytellers (looking for actionable insights, visualizing data on dashboards, etc). 2022
Mikkel Dengsøe Data team as % of workforce: A deep dive into 100 tech scaleups Author analyzed 100 known startups and notes that data team members comprise 1-5% of the company headcount, and this varies industry to industry (details included) 2023

ML and AI Within an Organization

Author Title One-sentence summary Year
Monica Rogati The AI Hierarchy of Needs Before you can fully get value out of ML/AI in an organization, it is critical to have foundational data needs met (i.e. good data collection processes, checks, and analytics). 2017
Mario Perrakis “The “0 / 1 / Done” Strategy for Data Science” A description for what a DS org should aspire to: 0-day handovers facilitated by great documentation and code, 1-day prototypes enabled by good tooling and good knowledge, and a clear definition of “done”. 2022
Thomas Redman “Your Data Initiatives Can’t Just Be for Data Scientists” Describes the tole and importance of non-data experts in DS projects: collaborators, customers, and as creators of the data. 2022
Natassha Selvaraj “Why Are So Many Data Scientists Quitting Their Jobs?” Two primary factors drive a number of new data scientists out of the profession: a mis-match between employer and employee expectations around data science work and the general difficulty of ML to add clear business value. 2022
Pete Warden How Should you Protect your Machine Learning Models and IP? Some thoughts on the importance of protecting IP in a ML org. 2022
Jeff Saltz Managing Machine Learning Projects Touches on difficulties of managing ML projects and how the management process differs from standard software development. 2021
Alfred Spector, Peter Norvig, Chris Wiggins, and Jeannette M. Wing Data Science in Context: Foundations, Challenges, Opportunities A pre-release of a book that gives a thorough accounting of the history of Data Science, a high-level understanding of its applications, and the ethical and social concerns associated with it. 2022
Brooke Carter, Melissa Barr, and Michael Mui ML Education at Uber: Frameworks Inspired by Engineering Principles Provides an overview of the philosophy behind Uber's ML education program. 2022
Eyal Trabelsi How to build TRUST in Machine Learning, the sane way Provides suggestions on how teams can improve trust in ML in their org, including defining metrics up front, following some best practices when developing the model, A/B testing the model upon deployment, and more. 2022
Andrew Lukyanenko Lessons learned after 10 years in IT: What I have learned from my mistakes and successes A senior data scientist gives general DS career (some of which is worth noting as a leader) including topics around interviewing, productivity, communication, time estimation, and more. 2022
Shreya Shankar et al. Operationalizing Machine Learning: An Interview Study From the abstract: They conducted interviews with 18 MLEs working across many applications, touching on how Velocity, Validation, and Versioning govern project success (in terms of deployment and long-term maintanence), and they also discuss interviewees’ pain points and anti-patterns. 2022
Eugene Yan Mechanisms for Effective Machine Learning Projects The author describes a few process-based techniques for increasing ML project success (e.g. establishing project pilots and copilots, literature reviews, methods reviews, etc). 2023
Arthur Turrell Data science maturity and the cloud Describes conditions and infrastructure needed for data scientists to thrive in an organization, and puts it in yhe context of data maturity. 2023

BI and Analytics Within an Organization

Author Title One-sentence summary Year
Lenny Rachitsky Choosing Your North Star Metric Proposes metrics based on your type of business, recommends having a singular north star metric, and avoid using revenue as your metric. 2021
Ron Berman “The Value of Descriptive Analytics: Evidence from Online Retailers” The authors estimate an increase of 4%–10% in average weekly revenues post-adoption associated with the adoption of descriptive analytics among online retailers. 2020
Roger M. Stein "Why Managing Data Scientists Is Different" Two challenges in managing data scientists: (1) managing a data research effort tends to be a dynamic and self-correcting process in which it is difficult to plan either a project’s timing or final outcomes, and (2) analytics is highly sensitive to time, cost, and quality tradeoffs. 2015
Eric Colson "The Sobering Truth about the Impact of your Business Ideas" The vast majority of business ideas fail to generate a positive impact, and this underscores the value of measuring impact, collecting data, and testing. 2021
Joe McFarren 5 Tips for Managing a Successful Analytics Project In the context of analytics consulting it is important to: clearly establish project scope, be in constant communication, determine a line of escalation, monitor work with tracking apps, and track finances. 2022
Erik Balodis A Framework for Embedding Decision Intelligence into your Organization Provides a high-level overview of how to infuse decision-intelligence into an organization, along with some additional reading sources. 2022
Nelson Auner Building an Analytics Stack in 2020 Gives an overview of the modern analytics stack via three buckets: a data-moving tool (ETL), a data warehouse to store the data, and a BI layer to analyze the data. 2020
Mode The Data Team’s Guide for Marketing Metrics Good overview of the landscape of metrics used in data marketing work (as well as information on the technical side of it). 2022
SeattleDataGuy Why Are We Still Struggling To Answer How Many Active Customers We Have? Surprisingly, metrics are still hard to calculate and this is at least partly because of turnover of developers, ERP and CRM migrations, producers of data constantly changing what data they provide, and mergers and acquisitions, and other reasons. 2022
Randy Au We take our units of analysis for granted Understanding what the "unit of analysis" is, is critical to answer a research question, and yet in industry it's something we often poorly handle. 2022
Marie Lefevre Not All Data Requests Are Urgent, So Start by Asking These 5 Questions Details five questions the authors typically asks of those that request analyses: Why? Why again? Who is it for? When is it due? Is it more of a priority than that other request? 2022
Amplitude The North Star Playbook: The guide to discovering your product’s North Star A short book intended for product managers and product designers that describes the value of North Star metrics and how to iddentify them. 2018
Gergely Orosz Checklist used at Uber to determine if something is urgent 1. What is the impact? 2. Do you have a signed spec answering the why and the what? 3. Do you have your estimate of the cost? 4. Make the cost of dropping what you're doing very clear. 2022
Dan Frank Experimentation Platform in a Day A short technical (but very accessible guide) to setting up a simple experimentation "platform" with elements of logging, measurement, assignment, and analysis. 2022
Ron Kohavi, Diane Tang, and Ya Xu Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing A fantastic introductory book on A/B testing for program and feature evaluation; covers methods, interpretation, biases that can arise, and culture around experimentation. 2020
W.D. (ryxcommar@gmail.com) Caveats and Limitations of A/B Testing at Growth Tech Companies Highlights an issue of A/B tests where over time effect sizes tend to shrink, and growth companies can find themselves in a situation where the statistical power benefits of a growing user base are outweighed by this diminishing returns effect. 2022
Tristan Handy The Startup Founder’s Guide to Analytics Although written in 2017, this article gives a still relevant high-level overview on creating the analytics competency at your org, at different levels of company size. 2017
Rembrand Koning and Aaron Chatterji Experimentation and Startup Performance: Evidence from A/B Testing This academic paper provides the first evidence of how digital experimentation affects the performance of a large sample of high-technology startups using data that tracks their growth, technology use, and product launches (they find increased performance on several critical dimensions, including page views and new product features). 2022
Sarah Krasnik The Analytics Requirements Document: Launch and pray doesn't work when it comes to data Makes the case that an ARD should be generated by the analytics team in parallel to a Product Requirements Document early in the product evolution lifecycle, outlining metric tracking expectations, data design, data criteria, and more.
Erin Gustafson Meaningful metrics: How data sharpened the focus of product teams Outlines a thorough growth model that is broadly applicable to most B2C organizations where users subscribe to a service and includes discussion on how various "levers" of this growth model were tested. 2023
Kasia Rachuta Why You Need an Experimentation Template Author shares a generic version of their company's A/B testing doc and argues that it's helpful for structuring tests and ensuring stakeholders have thought about the right business questions prior to asking for something to be launched. 2023

Management Skills

Author Title One-sentence summary Year
David Loftesness The Engineer to Manager Transition, by Former Twitter Director of Engineering Talks about an engineering management "event loop", where you touch base on people, projects, process, and self on daily, weekly, and monthly basis. 2015
The Institute of Leadership & Management "Spotlight on Leadership Styles" Describes a set of leadership/management styles including pace-setting, democratic, laissez-faire, and more. 2018
Andy Johns How to know when to stop: A guide to avoiding burnout and establishing balance in your life—by guest author Andy Johns A framework for thinking throughout burnout including: 1) Define your personal range of tolerance, 2) Pick your career progression, 3) Pick your life progression. 2022
Alan Johnson 11 Principles of Engineering Management A brief, digestable list of management principles for new engineering managers. 2022
GitLab Preventing burnout: A manager's toolkit Provides 12 strategies managers can utilize to support their team and prevent burnout 2022
Tanya Reilly Being glue Describes the importance of "glue work" (e.g. noticing when other people in the team are blocked and helping them out, reviewing design documents and noticing what's inconsistent, onboarding the new people and making them productive faster, or improving processes to make customers happy. 2019
Lindy Greer, Francesca Gino, and Robert I. Sutton You Need Two Leadership Gears: Know when to take charge and when to get out of the way Describes how leaders that know when, where, and how to shift gears between a top-down/take charge personas (“exercise authority” mode) and a more “flat” mode (in which the leader levels the hierarchy and shares power) will tend to be more successful, research shows. 2023
Sarah Drasner Engineering Management for the Rest of Us Fantastic general engineering management book covering tooics such as career laddering, giving and receiving feedback, setting team culture, and more. 2022

Data Platforms

Author Title One-sentence summary Year
Elijah Ben Izzy and Stefan Krawczyk Deployment for Free -- A Machine Learning Platform for Stitch Fix's Data Scientists The authors describe, at a high-level, the initial design considerations for Stitch Fix's ML platform, present the API data scientists use to interact with it, and detail its capabilities. 2022
Barr Moses and Lior Gavish What is a Data Platform? And How to Build One While every organization’s data platform approach will vary based on the industry and the size of their company, this quick and dirty guide lays out a blueprint for a modern data platform. 2022
Jordan Volz The Modern Data Stack Ecosystem: Spring 2022 Edition Articles maps out the various pieces of the modern data stack, including event tracking, a data warehouse, data governance, and more. 2022
Krzysztof Szafranek Zalando's Machine Learning Platform: Architecture and tooling behind machine learning at Zalando Provides an overview of Zalando's ML platform (AWS-powered) from the perspective of a machine learning practitioner. 2022
Jean-Georges Perrin The next generation of Data Platforms is the Data Mesh The post summarizes Zhamak Dehghani's proposal for transitioning from current breadth-first data platforms (end-to-end data lifecycle) into vertical/depth-first architectures (one business domain at a time). 2022
Gabrielle Davelaar and Jordan Edwards DevOps for AI - Microsoft Great talk outlines how DevOps principles can be applied to AI, and then shows in detail how CI/CD, version control, model storage, and more fit into a great MLOps process. 2018
Kevin Hu The Four Pillars of Data Observability Provides a definition of data observability and how in the context of a data platform this includes the following facets: metrics, lineage, metadata, and logs. 2022
Stefan Krawczyk What I Learned Building Platforms at Stitch Fix: Five lessons learned while building platforms for Data Scientists. The author describes 5 lessons learned in building a data science platform, including things like don't build them for all possible users, abstract away any underlying APIs to simplify things for end-users. 2022
Lak Lakshmanan No, you don’t need MLOps: Keep It Simple: the complexity of full MLOps is rarely needed In counterpoint to all the buzz, the author warns that MLOps is no panacea, and can often automate away important detail or cause a large amount of technical debt that ultimately doesn't save time. 2022
Nishith Agarwal The Build vs. Buy Guide for the Modern Data Stack The author claims that the decision to build vs buy comes down to five main considerations: cost, complexity, expertise, time to value, and competitive advantage. 2022
Dominik Kreuzberger, Niklas Kühl, and Sebastian Hirschl Machine Learning Operations: Overview, Definition, and Architecture The authors conducted a literature review and interviews with experts to create an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows surrounding "MLOps" 2022
Indika Kumara et al. Requirements and Reference Architecture for MLOps: Insights from Industry The authors conducted a qualitative analysis of the MLOps field from literature, and bucket their findings into categories like "Infrastructure", "Model Deployment and Serving", "Monitoring and Feedback Loops", and more. 2022
Charlie Summers Demystifying event streams: Transforming events into tables with dbt Provides an overview on how to convert events from an event-driven microservice architecture into relational tables in a warehouse like Snowflake, the advantages of this architecture, and how you might want to structure your event messages. 2022
Dmitry Kruglov The Architecture of a Modern Startup: Hype wave, pragmatic evidence vs the need to move fast Probably more relevant for CTO roles, but with interesting nuggets for Heads of Data, this post gives an overview of the various infrastructure and tools used in the modern startup (languages, infrastucture as code, secrets management, databases, etc). 2022
Sam Lafontaine How to Build a Modern Data Stack – The Comprehensive Guide A light overview of the several components that constitute the modern data stack: a data source, data ingestion tools, data storage, data transformations and modeling, data analytics, and data activation (what used to be called "reverse ETL"). 2021
Jordan Tigani Big data is dead Provactive piece that argues that despite the hype of the last 10 years around the coming "big data" wave and the need for big data tooling and infrastructure, only the smallest of fractions of organizations need to concern themselves with this. 2023
Benjamin Rogojan Why You Should Upgrade Your Data Infrastructure Gives high-level summary of data the several phases of data infrastructure that organizations mature through (from tiny start-up looking at manually-generated spreadsheets to more mature organizations with complex ETL DAGs. 2023

Data Governance

Author Title One-sentence summary Year
Sanjana Sen and Stephen Bailey Locally Optimistic Meetup - Governance and Compliance A conversation among many data practitioners about how their organizations handle data access control, data tagging, anonymization, and other key compliance activities, and what frameworks they have found helpful. 2020
Bryan Petzold, Matthias Roggendorf, Kayvaun Rowshankish, and Christoph Sporleder Designing data governance that delivers value Briefly surveys the problem of poor data governance, describes an idea data governance model, and provides six ways to drive data-governance excellence. 2020
Ilan Man People-first Data stacks Proposes switching from tech- to user-centric data management by i) integrating data into company culture (raising awareness, tracking adoption); ii) making data governance options actionable for stakeholders outside of the data platform and iii) introducing ownership of tests on data quality. 2022
Yali Sassoon Why Data Contracts are Obviously a Good Idea. And why there is so much resistance to this idea from the community around the Modern Data Stack Briefly describes the importance of data contracts, provides an example of a complaint against contracts, and then how complaints arise because practitioners are stuck in the “data is oil” paradigm i.e. assume that the data is extracted, rather than deliberately creating data. 2022
Crystal Lewis Using a data dictionary as your roadmap to quality data A bit more for an academic or research audience, provides a style guide and suggestions on making an effective data dictionary and nomenclature for data models. 2022
Maggie Hays Data Governance, but Make It a Team Sport Outlines an iterative framework (with examples) to introduce data governance within an organization (includes identify the chief data problem(s) to solve, set clear goals to resolve these problems, start small before you go big, drive incremental action, and then measure progress and iterate). 2023

About

A curated list of awesome posts, videos, and articles on leading a data team (small and large)