zooldk / large-scale-xmpp

Document / Whitepaper on how to build large scale XMPP services. Whats the hurdles, problems and how to solve them. Work in Progress!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction / Warning :-)

This is a paper that is work in progresss!. This is my own personal notes that hopefully will turn into somekind of white paper on how you most likely can scale you XMPP applications, depending on your needs, context etc. Right now they are loose notes and thoughts that should NOT be used for scaling your environment.

TODO

  • Put initial paper on git for versioning.
  • Create table of content.
  • Define large scale operations.
  • Table of XMPP server features
  • ...
  • ...
  • Proof read
  • "Release" first draft

Authors

  • Steffen Larsen,

Scaling XMPP

Intro...

First of all, there is NO turnkey solutions that scale linearly! Each project and application will have its own optimization and solution space. Scaling XMPP will take a lot of skills.

My view is that to get the best performance, you have to know and observe how the system behave in real world situation, for the specific use case. XMPP is a large protocol, especially with tons of XMPP Extension Protocols that have been added over time (at present time > 340). If you want to scale you have to have a perfect knowledge of your XMPP server inside and out, but also a perfect knowledge of the XMPP protocol itself. Some requirements or suggested approach in the protocol do not scale out of the specification, and you have to take into account a full solution design, from the client behaviour itself to the cluster architecture and code optimizations.

Definition of Large scale

Large scale in terms of:

  • Registered users
  • Simultaneous users
  • Throughput of messages

many users needs automation of jobs for maintaining the server

  • uptime!
  • scale
  • deployable and easy to manage and upgrade (live)!

restarting server -- storm of reconnects.. make sure that the server can throttle the dufferent types of connections

  • besides hitting your own servers it will generate presence for S2S connections!

Knowing your application

  • what stanzas will your app send the most?

    • presence
    • iq (buddy list) server should support roster versioning
    • message
  • which transport will the client use? (TCP, BOSH, Websockets)

  • client mobility? (is it a mobile app?.. network changes might have something to say)

  • components for business logic.. remember to scale those as well

  • muc

  • pub/sub

  • off line storage (linmit the size.. it could grow enormously!)

  • is your XMPP public for registering for users? abuse/attacks easier

  • is your XMPP public and have anonymous logins?

  • is your XMPP public but not open for registering?

  • is your XMPP private in a silo and in a controlled environment?

Knowing your XMPP Server

Good to choose a server that is written in a language that you or your development team understands. Getting a stacktrace from Erlang if you do not understand erlang can be a pain. The same with Java and stack traces etc.

Tune your XMPP server

  • many servers supports turning off features such as ...

  • JVM tuning if your server runs on a JVM

  • EVM (erlang)

  • Physical vs Hosted / cloud solutions

  • RAM in host.. depends on the number of connections, their buddy list etc. that are held in memory

XEP-198 5. Resumption

It can happen that an XML stream is terminated unexpectedly (e.g., because of network outages). In this case, it is desirable to quickly resume the former stream rather than complete the tedious process of stream establishment, roster retrieval, and presence broadcast.

both c2s and c2s

Tune your Operating System (linux)

  • memory
  • sockets pr process
  • TCP/IP stack optimizing

Network

  • Load balancer in front (if using multiple endpoints (connection managers))
  • split up XMPP server and BOSH/websocket frontend (seperate conenction managers)
  • stream compression (zlib/EXI) for constrained network (might take a lot of CPU on client)

Clustering XMPP servers

Not all supports clustering..

Scalable backend (database)

Mobile constraints and solutions

BOSH vs WebSockets

latency: While the BOSH draft document claims very low-latency, it will be difficult for BOSH to compete with WebSockets. Unless you have ideal conditions where HTTP/1.1 is supported all the way through all intermediaries and by the target server, the BOSH client and connection manager will need to re-establish connections after every packet and every request timeout. This will significantly increase latency and latency jitter. Low jitter is often more important for real-time applications than average latency. WebSocket connections will be very similar in latency and jitter to raw TCP connections.

small-packet overhead: In WebSockets there are two bytes of framing overhead for small messages. In BOSH, every message has HTTP request and response headers (easily 180+ bytes for each round-trip). In addition, each message is wrapped in XML (supposedly optional but the spec doesn't define how) with several session related attributes.

complexity: while BOSH uses existing mechanisms in the browser, it requires a moderately complex JavaScript library to implement the BOSH semantics. Managing this in Javascript will also increase latency and jitter compared to a native/browser (or even Flash) implementation.

traction: BOSH started life as a way to make XMPP more efficient. It grew up out of the XMPP community and from what I can tell has gotten very little traction outside of that community. The draft documents for BOSH and XMPP are split apart, but there seems to be very little real world use of BOSH without XMPP.

problem on server: double connection on server..websockets only one.

Fallback mechanisms.. when and where?

Overall: http://xmpp.org/extensions/xep-0286.html

robustness:

  • session management (xep-0198)
  • message carbons

optimization:

Define Performance Metrics for your service

These cover a variety of internal XMPP server measures, including times to perform key functions, message counts, queue sizes, memory consumption, etc.

Load testing

Tsung -... load test what your app actually does. proxy it and replay the session multiple times..

Monitoring

Monitor for detecting troubles of performance and eventual attacks.

  • Large bandwidth consumption
  • Many packets of stanzas of a special kind, could be an attack or a error
  • Patterns of traffic ... limit those through throtteling

Attacks and maintenance

  • attacks might slow down your server and stop new users for connecting
  • abuse

Appendix

XMPP Server Matrix

Comparison are made with the latest versions of the servers.

Server WebSocket XEP-0198 XEP-0273 Roster Versioning Clustering Compress BOSH Language Maturity
Prosody X X X - X Lua / C
Tigase X (X) X X X Java
Ejabberd - ? X X X Erlang
Mongoose IM X ? X X X Erlang
jabberd - ? ? - ? C
Openfire X ? X X (Hazelcast) X C

Links

About

Document / Whitepaper on how to build large scale XMPP services. Whats the hurdles, problems and how to solve them. Work in Progress!