fruffy / nyu-systems-seminar

The NYU Systems Seminar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NYU Systems Reading Group

This term the seminar takes place every Wednesday from 12:30AM to 1:30PM. The meeting takes place in person.

Topics

The seminar discusses a broad range of recent systems papers. Papers are selected from typical systems related conferences, including, but not limited to, the following:

General Systems: OSDI, SOSP, NSDI, ATC, EuroSys

Security: USENIX Security, CCS, Oakland, NDSS

Networking: SIGCOMM, INFOCOM, IMC

Architecture: ASPLOS, ISCA, MICRO

Distributed Systems: PODC, ICDCS

Storage: FAST

Some academic terms may have a specfic theme. All the chosen papers are related to that theme.

Schedule

Each reading group presenter should:

  • Send an email reminder to the reading group email with paper details and update the URL in the repository a few days prior to the group meeting (at least two days before).

Spring 2024

Date Discussion Lead Paper Title and Link Conference
March 1, 2024 Daniel Practical Byzantine Fault Tolerant (PBFT) / HQ replication: a hybrid quorum protocol for byzantine fault tolerance OSDI'99 / OSDI'06

Fall 2023 (System Challenges Posed By LLMs)

Date Discussion Lead Paper Title and Link Conference
September 25, 2023 Jinkun Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism / Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization ArXiv / OSDI'22
October 2, 2023 Lingfan Orca: A Distributed Serving System for Transformer-Based Generative Models / Efficient Memory Management for Large Language Model Serving with PagedAttention OSDI'22 / SOSP'23
October 9, 2023 Reading Week Reading Week Reading Week
October 16, 2023 Jinkun AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures / Welder: Scheduling Deep Learning Memory Access via Tile-graph ASPLOS'22 / OSDI'23 /
October 23, 2023 Haitian FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness / FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning NeurIPS'22 / ArXiv

Spring 2023 (Cloud Scale Databases)

Design and implementation of cloud scale database has an impact on how many of us build and optimize our systems: for those working in the lower-layers (e.g., on offloads, etc.) this is a common application that might influence designs; for those working on tracing, these systems are often what is used to store and query things, and has an impact on what is stored and why; and in general many of these have intricate algorithms (since they are inherently distributed). Reasoning about what properties they provide, and why, is a fun puzzle.

Date Discussion Lead Paper Title and Link Conference
Feb 22, 2023 Panda Bigtable: A Distributed Storage System for Structured Data / Dremel: Interactive Analysis of Web-Scale Datasets OSDI'10 / VLDB'10
Mar 1, 2023 Jinyang Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases / The Snowflake Elastic Data Warehouse SIGMOD'17 / SIGMOD'16
Mar 8, 2023 Panda Photon: A Fast Query Engine for Lakehouse Systems / Rethinking SIMD Vectorization for In-Memory Databases SIGMOD'22 / SIGMOD'15
Mar 15, 2023 Reading Week Reading Week Reading Week
Mar 23, 2023 Panda Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 / MODIST: Transparent Model Checking of Unmodified Distributed Systems SOSP'21 / NSDI'09
Mar 29, 2023 Elaine Ironfleet: Proving Safety and Liveness of Practical Distributed Systems / Ivy: Safety Verification by Interactive Generalization ACM COMMUNICATIONS / PLDI'16
Apr 5, 2023 Zhangan D3S: Debugging Deployed Distributed Systems / CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems NSDI'08 / NSDI'09
Apr 12, 2023 Haseeb Raksha: a flexible information flow architecture for software security / Securing Distributed Systems with Information Flow Control ISCA'07 / NSDI'08
Apr 19, 2023 Jinkun Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM SC'21
Apr 26, 2023 Anqi ZeRO: Memory Optimizations Toward Training Trillion Parameter Models SC'20
May 03, 2023 Jinkun Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNss NSDI'23

Spring 2021

Date Discussion Lead Paper Title and Link Conference
March 9, 2021 John HovercRaft: achieving scalability and fault-tolerance for microsecond-scale datacenter services EuroSys'20
March 16, 2021 Tao Architectural Considerations for a New Generation of Protocols CCR'90
March 23, 2021 Lingfan Cortex: A Compiler for Recursive Deep Learning Models MLSys'21
March 30, 2021 Fabian Noria: dynamic, partially-stateful data-flow for high-performance web applications OSDI'18
April 6, 2021 Jinkun Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control SOSP'17
April 13, 2021 Changgeng TAO: Facebook’s Distributed Data Store for the Social Graph ATC'13
April 20, 2021 Eric On the Use of ML for Blackbox System Performance Prediction NSDI'21
April 27, 2021 Jessica Tesseract: Distributed, General Graph Pattern Mining on Evolving Graphs EuroSys'21
May 4, 2021 Xiangyu Lyra: A Cross-Platform Language and Compiler for Data Plane Programming on Heterogeneous ASICs SIGCOMM'20
May 11, 2021 Anqi sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data MLSys'21
May 18, 2021 Taegyun Ethanos: Efficient Bootstrapping for Full Nodes on Account-based Blockchain EuroSys'21

Fall 2020

Date Discussion Lead Paper Title and Link Conference
October 27, 2020 John RedLeaf: Isolation and Communication in a Safe Operating System OSDI'20
November 3, 2020 Xiangyu Swift: Delay is Simple and Effective for Congestion Control in the Datacenter SIGCOMM'20
November 10, 2020 Jessica Come as You Are: Helping Unmodified Clients Bypass Censorship with Server-side Evasion SIGCOMM'20
November 17, 2020 Ding Ding Microsecond Consensus for Microsecond Applications OSDI'20
November 24, 2020 Thanksgiving - -
December 1, 2020 Taegyun Blockene: A High-throughput Blockchain Over Mobile Device OSDI'20
December 8, 2020 Eric A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters OSDI'20
December 15, 2020 Changgeng Tolerating Slowdowns in Replicated State Machines using Copilots OSDI'20
December 22, 2020 Anqi Retiarii: A Deep Learning Exploratory-Training Framework OSDI'20

Spring 2020

Date Discussion Lead Paper Title and Link Conference
February 10, 2020 Kickoff - -
February 17, 2020 Fabian SplitFS: Reducing Software Overhead in File Systems for Persistent Memory SOSP'19
February 24, 2020 Panda Helen: Maliciously Secure Coopetitive Learning for LinearModels S&P'19
March 2, 2020 John Learning to Reconstruct: Statistical Learning Theory and Encrypted Database Attacks S&P'19
March 9, 2020 Jinkun Pretend Synchrony - Synchronous Verification of Asynchronous Distributed Programs POPL'19
March 16, 2020 Spring Break Corona Virus
March 23, 2020 Xiangyu Corona Virus
March 30, 2020 Anirudh Corona Virus
April 6, 2020 Project Review Corona Virus
April 13, 2020 Jinyang Corona Virus
April 20, 2020 Taegyun Corona Virus
April 27, 2020 Eric Corona Virus
May 4, 2020 Changgeng Corona Virus
May 11, 2020 Anqi Corona Virus
May 18, 2020 Tao Corona Virus

About

The NYU Systems Seminar

License:The Unlicense