biplobmanna / ujrpc

Up to 100x Faster FastAPI. JSON-RPC with io_uring, SIMDJSON, and pure CPython bindings

Home Page:https://github.com/unum-cloud/ujrpc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uninterrupted JSON RPC

Remote Procedure Calls
Up to 100x Faster than FastAPI


Discord     LinkedIn     Twitter     Blog     GitHub


Most modern networking is built either on slow and ambiguous REST APIs or unnecessarily complex gRPC. FastAPI, for example, looks very approachable. We aim to be equally or even simpler to use.

FastAPIUJRPC
pip install fastapi uvicorn
pip install ujrpc
from fastapi import FastAPI
import uvicorn

server = FastAPI()

@server.get('/sum')
def sum(a: int, b: int):
    return a + b

uvicorn.run(...)    
from ujrpc.posix import Server
# from ujrpc.uring import Server on 5.19+

server = Server()

@server
def sum(a: int, b: int):
    return a + b

server.run()    

It takes over a millisecond to handle a trivial FastAPI call on a recent 8-core CPU. In that time, light could have traveled 300 km through optics to the neighboring city or country, in my case. How does UJRPC compare to FastAPI and gRPC?

Setup ๐Ÿ” Server Latency w 1 client Throughput w 32 clients
Fast API over REST โŒ ๐Ÿ 1'203 ฮผs 3'184 rps
Fast API over WebSocket โœ… ๐Ÿ 86 ฮผs 11'356 rps ยน
gRPC ยฒ โœ… ๐Ÿ 164 ฮผs 9'849 rps
UJRPC with POSIX โŒ C 62 ฮผs 79'000 rps
UJRPC with io_uring โœ… ๐Ÿ 23 ฮผs 43'000 rps
UJRPC with io_uring โœ… C 22 ฮผs 231'000 rps
Table legend

All benchmarks were conducted on AWS on general purpose instances with Ubuntu 22.10 AMI. It is the first major AMI to come with Linux Kernel 5.19, featuring much wider io_uring support for networking operations. These specific numbers were obtained on c7g.metal beefy instances with Graviton 3 chips.

  • The ๐Ÿ” column marks, if the TCP/IP connection is being reused during subsequent requests.
  • The "server" column defines the programming language, in which the server was implemented.
  • The "latency" column report the amount of time between sending a request and receiving a response. ฮผ stands for micro, ฮผs subsequently means microseconds.
  • The "throughput" column reports the number of Requests Per Second when querying the same server application from multiple client processes running on the same machine.

ยน FastAPI couldn't process concurrent requests with WebSockets.

ยฒ We tried generating C++ backends with gRPC, but its numbers, suspiciously, weren't better. There is also an async gRPC option, that wasn't tried.

How is that possible?!

How can a tiny pet-project with just a couple thousand lines of code compete with two of the most established networking libraries? UJRPC stands on the shoulders of Giants:

  • io_uring for interrupt-less IO.

    • io_uring_prep_read_fixed on 5.1+.
    • io_uring_prep_accept_direct on 5.19+.
    • io_uring_register_files_sparse on 5.19+.
    • IORING_SETUP_COOP_TASKRUN optional on 5.19+.
    • IORING_SETUP_SINGLE_ISSUER optional on 6.0+.
  • SIMD-accelerated parsers with manual memory control.

You have already seen the latency of the round trip..., the throughput in requests per second..., want to see the bandwidth? Try yourself!

@server
def echo(data: bytes):
    return data

Free Tier Throughput

We will leave bandwidth measurements to enthusiasts, but will share some more numbers. The general logic is that you can't squeeze high performance from Free-Tier machines. Currently AWS provides following options: t2.micro and t4g.small, on older Intel and newer Graviton 2 chips. This library is so fast, that it doesn't need more than 1 core, so you can run a fast server even on a tiny Free-Tier server!

Setup ๐Ÿ” Server Clients t2.micro t4g.small
Fast API over REST โŒ ๐Ÿ 1 328 rps 424 rps
Fast API over WebSocket โœ… ๐Ÿ 1 1'504 rps 3'051 rps
gRPC โœ… ๐Ÿ 1 1'169 rps 1'974 rps
UJRPC with POSIX โŒ C 1 1'082 rps 2'438 rps
UJRPC with io_uring โœ… C 1 - 5'864 rps
UJRPC with POSIX โŒ C 32 3'399 rps 39'877 rps
UJRPC with io_uring โœ… C 32 - 88'455 rps

In this case, every server was bombarded by requests from 1 or a fleet of 32 other instances in the same availability zone. If you want to reproduce those benchmarks, check out the sum examples on GitHub.

Quick Start

For Python:

pip install ujrpc

For CMake projects:

include(FetchContent)
FetchContent_Declare(
    ujrpc
    GIT_REPOSITORY https://github.com/unum-cloud/ujrpc
    GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(ujrpc)
include_directories(${ujrpc_SOURCE_DIR}/include)

The C usage example is mouthful compared to Python. We wanted to make it as lightweight as possible and to allow optional arguments without dynamic allocations and named lookups. So unlike the Python layer, we expect the user to manually extract the arguments from the call context with ujrpc_param_named_i64(), and its siblings.

#include <cstdio.h>
#include <ujrpc/ujrpc.h>

static void sum(ujrpc_call_t call, ujrpc_callback_tag_t) {
    int64_t a{}, b{};
    char printed_sum[256]{};
    bool got_a = ujrpc_param_named_i64(call, "a", 0, &a);
    bool got_b = ujrpc_param_named_i64(call, "b", 0, &b);
    if (!got_a || !got_b)
        return ujrpc_call_reply_error_invalid_params(call);

    int len = snprintf(printed_sum, 256, "%ll", a + b);
    ujrpc_call_reply_content(call, printed_sum, len);
}

int main(int argc, char** argv) {

    ujrpc_server_t server{};
    ujrpc_config_t config{};

    ujrpc_init(&config, &server);
    ujrpc_add_procedure(server, "sum", &sum, NULL);
    ujrpc_take_calls(server, 0);
    ujrpc_free(server);
    return 0;
}

Roadmap

  • Batch Requests
  • JSON-RPC over raw TCP sockets
  • JSON-RPC over TCP with HTTP
  • Concurrent sessions
  • Numpy array serialization
  • HTTPS support
  • Batch-capable endpoints for ML
  • Zero-ETL relay calls
  • Integrating with UKV
  • WebSockets for web interfaces
  • AF_XDP and UDP-based analogs on Linux

Want to affect the roadmap and request a feature? Join the discussions on Discord.

Why JSON-RPC?

  • Transport independent: UDP, TCP, bring what you want.
  • Application layer is optional: use HTTP or not.
  • Unlike REST APIs, there is just one way to pass arguments.

About

Up to 100x Faster FastAPI. JSON-RPC with io_uring, SIMDJSON, and pure CPython bindings

https://github.com/unum-cloud/ujrpc

License:Apache License 2.0


Languages

Language:C++ 71.1%Language:C 19.8%Language:CMake 5.1%Language:Python 3.3%Language:Dockerfile 0.7%