cswaroop / mobile-attribution-analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://www.youtube.com/watch?v=VBIlk3GMmNc&list=PL_arKaS8JOXHgSDJXtAVxhtDBJ6WL3uaS&index=16

Lessons in Building a System that Processes More than 70 Billion Events Daily - Morri Feldman

AppsFlyer’s mobile attribution and analysis platform is used by the biggest and most popular applications on Earth, generating a constant “storm” of 70B+ events (HTTP Requests) on their microservices, cloud based platform daily. In this talk, Morri will share their technological choices which include Clojure as their leading backend language - and the decisions to migrate from Python for improved multi-threading and concurrency.

The backend was to built to be a robust system based on a diversity of open source tooling such as: Kafka, RabbitMQ, Aerospike, Redis and a host of proprietary in-house developed tools and services that enable the testing and adoption of new data technologies, continuous deployment, and large-scale monitoring of the system - including open sourcing production libraries for interoperability with core technologies.

This talk will also dive into AppFlyer’s real-time back-end architecture & functional programming philosophy, what it is like to be a developer at AppsFlyer, and overall attitude towards performance, redundancy and resiliency for processing 35 Million events/minute at an average latency of hundreds of milliseconds per event.

Mobile/Web hybrid Attribution and analysis

Tech Stack

2 -person team

  • Python
  • Redis
  • MongoDb (
  • CouchDB (master database)

over time

  • Clojure (primary language) principles of language permeated to architecture; Language of the system Rich
  • Kafka
  • Docker, Consul, Spark
  • Google BigQuery
  • Aerospike
  • Druid
  • Go
  • ClickHouse

prefernce for clustered masterless technologies

  • Kafa, aerospike, clickhouse

System Architecture;

Mainly a data platform with few microservices (attribution, user, postbacks, push) than a SOA Mobile phones hitting web servers. webservers puts into kafka

  • Microservices pulls from kafka
    • Attribution Engine + Aerospike
    • User Activity + Aerospike
    • Postbacks to Partners
    • Push API to Clients
    • Batch Analytics + Druid + ClickHouse
  • Also S3 for retention beyond kafka

Architecture Patterns

  • Event Driven Architecture and also a mini-CQRS (both have Kafka as central DB)
  • JSON/Protobuf on wire. Parquet for S3. Parquet is column oriented. so gives rigidity
  • No rename of data fields is the
  • downstream people pains (data teams) cannot be heard (app team)
  • DB is an implementation detail

AWS

  • spot instances
  • Kafka - 280 nodes, 15 clusters
  • Aerospike - 220 nodes, 17 clusters
  • Apache Spark - 2600 nodes, 18 clusters
  • Amazon S3 - 12 PB

Books

  • Release It! - Michael T Nygard

About

License:Eclipse Public License 2.0


Languages

Language:Clojure 100.0%