@shaggie76 wrote:
We’ve been testing a haproxy configuration to perform SSL termination for an IRC server. InspIRCd can do its own SSL termination for us but we wanted to add Proxy Protocol support but InspIRCd can’t do both at the same time.
The suggested solution was to run haproxy for termination and then connect raw to InspIRCd with a unix domain socket – in theory you might end up burning a bit more CPU time in total but you’d get better multi-core scaling since you could be doing the crypto on a different core.
We were very shocked to find that haproxy used much, much more CPU than expected – in fact we found it offloaded nearly no CPU load at all and ended up adding as much load as the whole IRC server was!
Our initial attempt was single-process but was quickly swamped. Since our InspIRCd server is already a cluster of 5 spoke processes anyway doing one haproxy process per back-end process seemed like a natural fit (we don’t need any load balancing or anything fancy – just termination).
It’s very difficult for us to get consistent live-fire testing because our load fluctuates quite a bit throughout the day – but here’s two snapshots where the number of concurrent SSL connections is probably around 8000 per process.
These measurements were on the same machine (bare-metal hex-core Xeone), with the same ciphers.
Without haproxy we’re looking at about 10% CPU per spoke (pid 25023 is the hub the spokes communicate through) – here’s a typical 10-second average top snapshot:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25125 ircd 20 0 4559864 416028 12520 S 9.7 0.3 239:40.11 inspircd 25193 ircd 20 0 4233396 417604 12532 S 9.6 0.3 241:10.11 inspircd 25091 ircd 20 0 4041320 422916 12516 S 9.3 0.3 244:37.56 inspircd 25159 ircd 20 0 4035264 415608 12368 S 9.2 0.3 238:41.26 inspircd 25057 ircd 20 0 3315684 415732 12628 S 8.9 0.3 239:06.11 inspircd 25023 ircd 20 0 632500 125948 12112 S 3.3 0.1 89:43.57 inspircd
With 5 haproxy workers added it looks like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23879 ircd 20 0 380308 313148 8752 S 12.2 0.2 74:28.61 haproxy 23880 ircd 20 0 378728 311508 8752 R 12.2 0.2 75:31.97 haproxy 23881 ircd 20 0 378352 310852 8752 S 12.2 0.2 75:02.69 haproxy 23883 ircd 20 0 381000 313292 8748 S 12.1 0.2 70:48.66 haproxy 23882 ircd 20 0 380848 312804 8748 S 11.9 0.2 73:18.27 haproxy 25125 ircd 20 0 4297356 183900 12520 S 11.4 0.1 68:34.24 inspircd 25091 ircd 20 0 3772620 183128 12516 S 11.2 0.1 69:15.45 inspircd 25159 ircd 20 0 3772768 183120 12368 S 11.2 0.1 68:18.97 inspircd 25193 ircd 20 0 3970408 184744 12532 S 11.1 0.1 68:17.81 inspircd 25057 ircd 20 0 2986488 182640 12628 S 10.9 0.1 68:48.68 inspircd 25023 ircd 20 0 630388 123796 12112 S 4.5 0.1 26:35.82 inspircd
In this snapshot there were 5-10% more users so the per-spoke load is a bit higher but it’s close enough to make my point: in this configuration haproxy spends more time doing SSL termination than InspIRCd does doing everything else.
That is: instead of 5-10% more load for we’re seeing > 100% more load – and this isn’t an isolated case – I’ve been trying to improve this for a while now.
I’m assuming that both pieces of software are using the same OpenSSL library under the hood so there’s no obvious reason for it to be so different.
We’re on Ubuntu 18, with HA-Proxy version 1.8.8-1ubuntu0.6 2019/10/23 – I know there is newer but this is what Ubuntu offered.
An abbreviated config has been attached below (except there are 5 FE and 5 BE servers). It’s all pretty standard stuff and I’ve skimmed the manual a few times – it just doesn’t make any sense.
I’m open to suggestions but live-fire tests may be difficult to arrange promptly because we can only do these tests at low-tide – I’m gravely concerned that this scaling would swamp our server if we had too many users at once.
global log /dev/log local0 log /dev/log local1 notice # necessary to access the UDS user ircd group ircd daemon maxconn 200000 nbproc 5 cpu-map 1 0 cpu-map 2 1 cpu-map 3 2 cpu-map 4 3 cpu-map 5 4 ssl-default-bind-ciphers AES128-SHA256:ECDH+AESGCM:DH+AESGCM:... ssl-default-bind-options no-sslv3 defaults log global mode tcp option dontlognull option tcplog maxconn 200000 timeout connect 10s timeout client 4m timeout server 4m frontend cloudflare_frontend5 bind *:6675 ssl crt /etc/ssl/private/site.pem accept-proxy default_backend inspircd_backend5 bind-process 1 frontend cloudflare_frontend6 bind *:6676 ssl crt /etc/ssl/private/site.pem accept-proxy default_backend inspircd_backend6 bind-process 2 ... backend inspircd_backend5 log global server local_irc5 /home/ircd/InspIRCd1/run/proxy.sock send-proxy-v2 backend inspircd_backend6 log global server local_irc6 /home/ircd/InspIRCd2/run/proxy.sock send-proxy-v2
Posts: 1
Participants: 1