Websocket Performance Comparison
This is a tldr version of a study I performed on various websocket server implementations. The full length report can be found here.
Find the source code for each server and the client on GitHub
Ready-made environments are available on Docker Hub
View the raw data from each benchmark tests here
The purpose of this benchmark is to determine which programming language/library offers the best performance in a websocket application.
In order to measure changes to the websocket servers’ response and reliability, the benchmarking client puts each server under an increasing amount of stress. This is done by performing multiple rounds of testing. Each round consists of a collection of websocket clients sending requests to the server, with the amount of clients increasing by a fixed amount each round. The exact configuration is 100 rounds of testing, adding 100 clients each round, for a total of 10,000 clients, and each client sending 100 requests per round.
The languages and libraries which are evaluated in this benchmark are as follows:
- C / Libwebsockets
- C++ / uWebSockets
- C# / Fleck
- Go / Gorilla
- Java / Java-WebSocket
- NodeJS / uWebsSocket
- PHP / Ratchet
- Python / websockets
- Rust / rust-websocket
For the sake of readability, going forward the websocket servers will be referred to solely by the language in which they are written. It should be noted that the results for a given language are not a representation of the language as a whole, and alternate libraries for the same language may yield different results.
The benchmarking client is written in NodeJS, which was specifically chosen because of its non-blocking nature. This is a desirable trait, as I want all of the websocket clients to make their requests to the server at the same time, rather than one connection having to finish sending all their requests before the next connection can start.
Results
Overall
It was unexpected to see that C and Python are unable to complete the benchmark test. Python consistently makes it to round 32, and then drops all the websocket connections. The benchmarking client eventually throws a “heap out of memory” error from trying to recursively connect back to the server. It is also extremely concerning that Python’s elapsed time for each round seems to increase exponentially as the number of connections/requests increases linearly.
C, on the other hand, is a bit more unpredictable, making it to between round 70–80. Again, it drops the connections until the client runs out of heap space. I suspect that multithreading these websockets, or running them on a more powerful machine will improve their results. However, that point is moot, since doing so will improve the results for all websocket servers.
Another surprise was that out of all the websockets that are able to complete the benchmark, Go performs the worst. Go’s performance does not just lag behind by a little, rather it takes over twice as long to complete the benchmark when compared to the next slowest websocket, which is Rust.
Something that does not come as too much of a surprise is the fastest websocket. Although my initial prediction was that C would perform the best, it is not unexpected that NodeJS takes the crown. Node’s asynchronous nature allows for greater throughput of requests coming into the server.
Java and C# follow closely behind NodeJS. With PHP, C++, and Rust performing the benchmark a little slower.
Connection Time
Despite being the slowest when it came to responding to requests, Go displays the best connection times. Connecting to the cumulative 10,000 clients in 8.8 seconds. Java, which is one of the best when it came to request times, performs the connections the slowest, connecting to the 10,000 clients in 205 seconds. C++ takes 60 seconds, with the other servers performing the connections in 10–20 seconds.
Request Time
The amount of time it takes for the websocket servers to respond to requests increases linearly as the total number of requests also increases linearly. Go is the biggest loser in this category, taking 100 minutes to complete the cumulative 5.5 million requests. NodeJS performs the best, taking under 12 minutes to complete all requests. Rust takes 42 minutes, C++ takes 37 minutes, PHP takes 32 minutes, C# 20 minutes, and Java 16 minutes.
Analysis
While the results of the benchmark tests are not what I was expecting, they do make sense once further understood. Let us start by looking at C, or more specifically LWS (Libwebsockets). As it turns out, LWS is in fact an extremely inefficient websocket library when it comes to performance. The entire websocket server runs in a blocking event loop on a single thread. To put it simply the servers operate something like this:
- Accept incoming message
- Read incoming message
- Generate response
- Send response
- Repeat
It does this for each incoming request, one request at a time, and all in a single thread. In fact, I proposed earlier that multithreading the application could improve performance, only to find out that this is highly discouraged. Libwebsocket’s own website states “Directly performing websocket actions from other threads is not allowed. Aside from the internal data being inconsistent in forked() processes, the scope of a wsi (struct websocket) can end at any time during service with the socket closing and the wsi freed.” The wsi variable referenced in that quote refers to the pointer to the client connection. Plainly speaking, if one tries to multithread LWS, one could end up with incorrect data in the forked thread, or lose the connection to the client altogether. Therefore, the poor performance experienced is a design decision of the LWS library. This suggests that while our tests show poor performance for the C websocket server, there may be other C libraries that would offer better speed and reliability.
The idea of this single-threaded design also explains Go’s poor performance. Go does not meet the performance I was expecting either. However, the explanation is quite simple. Go is designed to take advantage of concurrent processing. In other words, Go achieves its renowned performance by completing tasks in parallel. Therefore, running everything from a single goroutine (a single thread) substantially hampers the performance of the websocket, as it was never designed to be utilized in such a bare-bones set-up. The good news is, unlike the C websocket, the Go server is able to use multiple goroutines, and therefore be multithreaded for better performance.
So how did C++, PHP, and Rust achieve better performance than their C and Go counterparts? To put it simply, while C and Go servers are subject to blocking code, the C++, PHP, and Rust servers are not. In other words, C and Go complete each task one at a time, in order, one after another. Meanwhile, C++, PHP, and Rust can complete their tasks asynchronously, out of order, in whatever sequence will get the job done that fastest. The advantage here is that these asynchronous servers can work on other tasks while waiting for an entirely different tasks to complete. This leads to huge performance improvements, as seen in the results of the benchmark.
So what tricks do Java, C#, and NodeJS use to improve performance even further? Not only are these servers performing their tasks asynchronously, but they also automatically spawn dozens of threads to perform them in parallel. This gave Java, C#, and NodeJS an even greater advantage than the aforementioned servers which are limited to one single thread. It should be noted that, if one desires to do so, C++, PHP, and Rust can also be multithreaded to achieve similar performance improvements. This just has to be implemented manually by the developer.
Lastly, that brings us to Python. Why does Python perform so poorly? Is it also a victim of blocking code? Actually, no. While the Python server does run on a single thread, the code is written to be asynchronous. In part, the reason Python performs so terribly is that the websocket library being used is horribly unoptimized. While I use reputable websocket libraries for the other websocket servers, like the established Ratchet websocket for PHP and uWebsockets for NodeJS, Python is different. For the Python websocket I use a generic websocket module simple named “websockets”. The documentation for this module is limited, and custom configuration is non-existent. This is most likely a module that offers the simplest of websocket functionality. Now, I did say that this only partly explains the poor performance. As I am writing this report, I want to give Python a fighting chance. I have rebuilt the websocket server with the more trusted Autobahn library and have rerun the benchmark. Better results were achieved, though I use the word “better” loosely. With this new server, the time to complete each round increases linearly rather than exponentially, which was a promising sign. Even so, the performance is still worse than Go’s websocket server. Additionally, I am still unable to finish the benchmark test even with this more optimized websocket library. It gets to round 98, and then the server drops the websocket connections, with many dropped messages throughout the benchmarking process. Nevertheless, I am determined to complete the benchmark with a Python websocket, so I try one more time with a library by the name of “aiohttp.” At last, all 100 rounds of the benchmark are able to be completed, but not very well. Aiohttp still takes longer than Go, and becomes substantially unreliable after round 50, dropping anywhere from 30–50% of the messages. I believe the reason for this dreadful performance is Python itself. Python, which is interpreted at run time rather than complied, suffers from slow. execution times. Even when compared to other interpreted languages, like PHP, Python’s performance still lags behind.
Conclusion
As we can see, all my predictions turned out to be entirely incorrect, but this is not a bad thing. Ending up with data that goes against our initial expectations proves that there is information to be taken away from this benchmark. From the knowledge that has been uncovered in this report, I propose the following 4 guidelines when selecting a websocket library:
- Ensure the websocket library is asynchronous. This may also be expressed as “non-blocking”.
- Ensure the library allows the websocket to be multithreaded, either done automatically or with additional configuration by the developer.
- Greater performance may be achieved by using a compiled language over an interpreted one.
- Do not use Python.
All in all, the winner here is clearly NodeJS. From its amazing performance to its code complexity (or rather lack of complexity), NodeJS is an optimal choice for a websocket project. I thought I would be able to give a similar complement to Python. Unfortunately, after this test, I will be steering clear of any Python based websockets. For a more business oriented application, one cannot go wrong with the enterprise favorites of Java or C#. That being said, if one is to take a single piece of knowledge away from this study, it would be to always use an asynchronous websocket.