Hujun / blog

post in issues

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Review of Designing Data-Intensive Applications (DDIA)

Hujun opened this issue · comments

commented

designing data-intensive applications

This book has being become a must-read since its first publish from 2017. Actually, it is a very worthy reading book. I will give it a five-star comment and recommend every programmer to have a look at it, no matter he/she is a beginner or a senior.

DDIA is easy to ready book because of its detailed explanation in fundamental concepts. But it is still hard to understand the content in depth because it needs a rich engineering experience to follow author's thinking from a practical problem solving thinking. Therefore there are two total different ways to read the book. For the beginners, it is recommended to just go through all the chapters. It is important to know the basics about database, distributed system and other concepts. Personally, I like the second part (Distributed Data) a lot. How to partition big data and how to keep data consistency across partitions are actually a big part of my daily work. And surprisingly, I always encounter programmers (even very senior ones) don't really know the basic concepts such as transactions, isolation levels, 2PL, MVCC etc. So I highly recommend to take time to read this part. For the senior and experienced developers, DDIA is a very good reference. You can just start from the last chapter (The future of data system), and jump into necessary details when you feel not familiar with some idea. The references after every chapter's summary are also very good if you need to go deeper into particular part of it.

Martin Kleppmann encourages a lot an architecture based on derived data, message passing, and stateless processors. It is also a in facto best practice existing everywhere for middle or large scale backend applications. The reason is very simple and mentioned dozens of times in the book, it is very hard and unpredicted to keep the system efficient and reliable when it is inevitably split into small pieces (micro-services). Similar ideas are implemented in different solutions and architectures. From my point of view, the most important thing to learn from the book and to remember, is to always keep in mind the very fundamental data processing concepts and trade-off between different technologies.