Computer-Science-Papers

Storagesystems

Haystack (https://lnkd.in/gSZYcmmB)
f4: Facebook’s Warm BLOB Storage System (https://lnkd.in/gMEfTpAh)
The Hadoop Distributed File System (https://lnkd.in/gSUqafDg)
The Google File System (https://lnkd.in/giUResea)
Facebook's Tectonic Filesystem: Efficiency from Exascale (https://lnkd.in/geg7-ub9)
Pelican: A Building Block for Exascale Cold Data Storage (https://lnkd.in/gSse26YK)
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data (https://lnkd.in/gUbnK4rH)
RADOS: a scalable, reliable storage service for petabyte-scale storage (https://lnkd.in/gKwbmzTx)
Megastore: Providing Scalable, Highly Available Storage for Interactive Services (https://lnkd.in/gT7mSDQN)
The Design and Implementation of a Log-Structured File System (https://lnkd.in/gVuka_Ym)
The RAMCloud Storage System (https://lnkd.in/gC3SQccF)

Analytics

Monarch: Google's Planet-Scale In-Memory Time Series Database (https://lnkd.in/gbqa7HNa)
Gorilla: A Fast, Scalable, In-Memory Time Series Database (https://lnkd.in/gd_nUJbu)
Scuba: Diving into Data at Facebook (https://lnkd.in/gfBrJcge)
The Unified Logging Infrastructure for Data Analytics at Twitter (https://lnkd.in/gwhNUMnF)
Cubrick: Indexing Millions of Records per Second for Interactive Analytics (https://lnkd.in/g-n9GUMD)
Shark: SQL and Rich Analytics at Scale (https://lnkd.in/gqXHq5BG)
Realtime Data Processing at Facebook (https://lnkd.in/gQdMN4kP)

Clustermanager and Scheduling

Large-scale cluster management at Google with Borg (https://lnkd.in/gT7bG2SF)
Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing (https://lnkd.in/gEEdRmcD)
Apache Hadoop YARN: Yet Another Resource Negotiator (https://lnkd.in/g9SVx_Ft)
Twine: A Unified Cluster Management System for Shared Infrastructure (https://lnkd.in/gbnuqutm)

Streamprocessing

MillWheel: Fault-Tolerant Stream Processing at Internet Scale (https://lnkd.in/gC7VjCfG)
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing (https://lnkd.in/g-PyJUPa)
Apache Flink™: Stream and Batch Processing in a Single Engine (https://lnkd.in/gpzRA6v3)
Drizzle: Fast and Adaptable Stream Processing at Scale (https://lnkd.in/g9Hbnvp7)
Kafka, Samza and the Unix Philosophy of Distributed Data (https://lnkd.in/grtHkFWN)
Discretized Streams: Fault-Tolerant Streaming Computation at Scale (https://lnkd.in/gbzc3_Ke)
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark (https://lnkd.in/gnQQP2UY)
Noria: dynamic, partially-stateful data-flow for high-performance web applications (https://lnkd.in/gYtpef34)

Pubsub

Kafka: a Distributed Messaging System for Log Processing (https://lnkd.in/dkfPsFwH)
Scribe: Transporting petabytes per hour via a distributed, buffered queueing system (https://lnkd.in/dTyTBE_t)
LogDevice: a distributed data store for logs (https://lnkd.in/dvVTBz46)
Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log (https://lnkd.in/d7xmexrQ)
CORFU: A Shared Log Design for Flash Clusters (https://lnkd.in/dxiquk5h)
The FuzzyLog: A Partially Ordered Shared Log (https://lnkd.in/da4ikmEa)
Ubiq: A Scalable and Fault-tolerant Log Processing Infrastructure (https://lnkd.in/dQTfCDwH)

Graph processing in distributed setting.

Pregel: A System for Large-Scale Graph Processing (https://lnkd.in/ggpew7yq)
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs (https://lnkd.in/g6f9Mjzk)
GraphX: Graph Processing in a Distributed Dataflow Framework (https://lnkd.in/gixUZP46)
Gemini: A Computation-Centric Distributed Graph Processing System (https://lnkd.in/gCs2R5EJ)
TAO: Facebook’s Distributed Data Store for the Social Graph (https://lnkd.in/gfesm_Hn)

Consensus and replicated state machines.

Paxos Made Simple (https://lnkd.in/gk6nxyVj)
Implementing Fault-Tolerant Services Using the State Machine (https://lnkd.in/gPwNde-i)
The Chubby lock service for loosely-coupled distributed systems (https://lnkd.in/gFXKTrXR)
ZooKeeper: Wait-free coordination for Internet-scale systems (https://lnkd.in/gWTYBxQN)
In Search of an Understandable Consensus Algorithm (https://lnkd.in/gqrKhvsK)
Virtual Consensus in Delos (https://lnkd.in/g5bitkdM)

Peertopeer systems and information dessimination.

Gossip-Based Broadcast (https://lnkd.in/gT74Zb8Z)
Gossiping in Distributed Systems (https://lnkd.in/g55DFbuP)
Peer-to-peer membership management for gossip-based protocols (https://lnkd.in/g_XE4TiE)
Gossip-based Peer Sampling (https://lnkd.in/gSPwEkaW)
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (https://lnkd.in/gxZtR3Nh)
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (https://lnkd.in/gyURBizm)
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications (https://lnkd.in/grVF9crk)

Additional May be Repeated articles will categorize later.

	Short Name	Title	Link	Extra links
1	Apache Kafka	Kafka: A Distributed Messaging System for Log Processing	(https://notes.stephenholiday.com/Kafka.pdf)
2	Apache Cassandra	Cassandra - A Decentralized Structured Storage System	(https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf)
3	Apache Flink	Apache Flink: Stream and Batch Processing in a Single Engine	(https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf)
4	Apache Spark	Spark: Cluster Computing with Working Sets	(https://www.usenix.org/legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf)
5	Apache Zookeeper	ZooKeeper: Wait-free coordination for Internet-scale systems	(https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf)
6	BigTable	Bigtable: A Distributed Storage System for Structured Data	(https://research.google.com/archive/bigtable-osdi06.pdf)
8	Apache Impala	Apache Impala: A Modern, Open-Source SQL Engine for Hadoop	(https://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf)
9	Apache Druid	Druid: A Real-time Analytical Data Store	(http://static.druid.io/docs/druid.pdf)
10	Timer Wheel	Hashed and Hierarchical Timing Wheels	(http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf)
11	MillWheel	MillWheel: Fault-Tolerant Stream Processing at Internet Scale	(https://research.google.com/pubs/archive/41378.pdf)
12	Dynamo	Dynamo: Amazon’s Highly Available Key-value Store	(https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
13	Google File System	The Google File System	(https://research.google.com/archive/gfs-sosp2003.pdf)
14	MapReduce	MapReduce: Simplified Data Processing on Large Clusters	(https://research.google.com/archive/gfs-sosp2003.pdf)
15	Spanner	Spanner: Google’s Globally-Distributed Database	(https://research.google.com/archive/spanner-osdi2012.pdf)
16	Zab	Zab: High-performance broadcast forprimary-backup systems	(http://www.cs.cornell.edu/courses/cs6452/2012sp/papers/zab-ieee.pdf)
17	Paxos	Paxos Made Simple	(https://lamport.azurewebsites.net/pubs/paxos-simple.pdf)
18	Chubby	The Chubby lock service for loosely-coupled distributed systems	(https://research.google.com/archive/chubby-osdi06.pdf)
19	Dremel	Dremel: Interactive Analysis of Web-Scale Datasets	(https://research.google/pubs/pub36632/)
20	Megastore	Megastore:Providing Scalable, Highly Available Storage for Interactive Services	(https://research.google/pubs/pub36971.pdf)
21	Raft	In Search of an Understandable Consensus Algorithm (Extended Version)	(https://raft.github.io/raft.pdf)
22	Flexible Paxos	Flexible Paxos: Quorum Intersection Revisited	(https://arxiv.org/abs/1608.06696)
23	Thrift	Thrift: Scalable Cross-Language Services Implementation	(https://thrift.apache.org/static/files/thrift-20070401.pdf)
24	Maglev	Maglev: A Fast and Reliable Software Network Load Balancer	(https://research.google.com/pubs/archive/44824.pdf)
25	LSM	The Log-Structured Merge-Tree (LSM-Tree)	(https://www.cs.umb.edu/~poneil/lsmtree.pdf)
26	Chord	Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications	(https://pdos.csail.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf)
27	Kademlia	Kademlia: A Peer-to-peer Information System Based on the XOR Metric	(https://www.scs.stanford.edu/~dm/home/papers/kpos.pdf)
28	Mesa	Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing	(https://research.google/pubs/pub42851/ )
29	SCRIBE	SCRIBE: A large-scale and decentralized application-level multicast infrastructure	https://rowstron.azurewebsites.net/PAST/jsac.pdf
30	PAST	Storage management and caching in PAST- A large-scale, persistent peer-to-peer storage utility	https://people.mpi-sws.org/~druschel/publications/PAST-hotos.pdf
31	Pastry	Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems?	https://www.cs.cornell.edu/people/egs/615/pastry.pdf
32	Linearizability	Linearizability: A Correctness Condition for Concurrent Objects	http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
33	Time and Clocks	Time, Clocks, and the Ordering of Events in a Distributed System	http://lamport.azurewebsites.net/pubs/time-clocks.pdf
34	CRDTs	CRDTs: Consistency without concurrency control	http://hal.archives-ouvertes.fr/docs/00/39/79/81/PDF/RR-6956.pdf
35	Photon	Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams	https://research.google/pubs/pub41318/
36	TAO	TAO: Facebook’s Distributed Data Store for the Social Graph	https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf
37	Pregel	Pregel: A System for Large-Scale Graph Processing	https://15799.courses.cs.cmu.edu/fall2013/static/papers/p135-malewicz.pdf
38	Dapper	Dapper: A-large-scale-distributed-tracing-infrastructure	https://research.google/pubs/pub36356.pdf
39	Raft Refloated	Raft Refloated: Do We Have Consensus?	https://www.cl.cam.ac.uk/~ms705/pub/papers/2015-osr-raft.pdf
40	Percolator	Large-scale Incremental Processing Using Distributed Transactions and Notifications	https://research.google/pubs/pub36726.pdf
41	Monarch	Monarch: Google’s Planet-Scale In-Memory Time Series Database	https://research.google/pubs/pub50652/
42	Borg	Large-scale cluster management at Google with Borg	https://research.google/pubs/pub43438.pdf
43	Borg - Next	Borg: the Next Generation	https://research.google/pubs/pub49065.pdf
44	Amazon Aurora	Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases	https://web.stanford.edu/class/cs245/readings/aurora.pdf
45	Gorilla	Gorilla: A Fast, Scalable, In-Memory Time Series Database	http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
46	HDFS	The Hadoop Distributed File System	https://storageconference.us/2010/Papers/MSST/Shvachko.pdf
47	Autopilot	Autopilot: workload autoscaling at Google	https://dl.acm.org/doi/10.1145/3342195.3387524
48	Consistent hashing	Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web	https://dl.acm.org/doi/pdf/10.1145/258533.258660
49	SEDA	SEDA: An Architecture for Well-Conditioned, Scalable Internet Services	http://www.sosp.org/2001/papers/welsh.pdf
50	Bitcask	Bitcask: A Log-Structured Hash Table for Fast Key/Value Data	https://riak.com/assets/bitcask-intro.pdf
51	DynamoDB	Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service	https://www.usenix.org/system/files/atc22-elhemali.pdf
52	Isolation levels	A critique of ANSI SQL isolation levels	https://dl.acm.org/doi/pdf/10.1145/223784.223785
54	Deletable Bloom Filter	The deletable bloom filter	https://arxiv.org/pdf/1005.0352
55	Hash Coding	Space\Time Trade-offs in Hash Coding with Allowable Errors	https://dl.acm.org/doi/pdf/10.1145/362686.362692
56	Expedite Byzantine	Shifting Gears- Changing Algorithms on the Fly To Expedite Byzantine Agreement	https://www.sciencedirect.com/science/article/pii/089054019290035E
57	Scalability cost	Scalability! But at what COST?	https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf
58	Foundation DB	FoundationDB: A Distributed Unbundled Transactional Key Value Store	https://www.foundationdb.org/files/fdb-paper.pdf
59	Monolith	Monolith: Real Time Recommendation System With Collisionless Embedding Table	https://arxiv.org/pdf/2209.07663
60	Memcache at Facebook	Scaling Memcache at Facebook	https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf
61	MilliSampler	A microscopic view of bursts, buffer contention, and loss in data centers	https://dl.acm.org/doi/pdf/10.1145/3517745.3561430	https://engineering.fb.com/2023/04/17/networking-traffic/millisampler-network-traffic-analysis/
62	FlexiRaft	FlexiRaft: Flexible Quorums with Raft	https://www.cidrdb.org/cidr2023/papers/p83-yadav.pdf
63	Minesweeper	Scalable Statistical Root Cause Analysis on AppTelemetry	https://arxiv.org/abs/2010.09974
64	Shard Manager	Shard Manager: A Generic Shard ManagementFramework for Geo-distributed Applications
65	FlumeJava	FlumeJava: Easy, Efficient Data-Parallel Pipelines	https://research.google/pubs/pub35650.pdf
66	Heron	Twitter Heron: Stream Processing at Scale	https://dl.acm.org/doi/pdf/10.1145/2723372.2742788
67	Dataflow	The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, OutofOrder Data Processing	https://research.google/pubs/pub43864.pdf
68	Flink	State Management in Apache Flink	http://www.vldb.org/pvldb/vol10/p1718-carbone.pdf
69	Dgraph	Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database

vineethNaroju / Computer-Science-Papers-For-System-Design

Computer-Science-Papers

Storagesystems

Analytics

Clustermanager and Scheduling

Streamprocessing

Pubsub

Graph processing in distributed setting.

Consensus and replicated state machines.

Peertopeer systems and information dessimination.

About