Backend connect latency spikes on backend queue

@mburisa wrote:

Hi, we’ve observed an increase in backend connect time on all backends under the same frontend as soon as one starts to queue.

Just wondering if this is an expected behaviour or our configuration would benefit from some meaningful tuning?
For this example, bck1 starts to queue up to 10k+ requests on traffic spikes which is not an issue here but we can see at the time that other backends like bck2 (no queue, same load as before) starts to have larger backend connect times which then has a repercussion of processing less traffic (waiting for requests). Latency can spike from nominal 10ms up to 3s…

Some examples of the current configuration:
We are using [HAProxy version 1.8.12-8a200c7, released 2018/06/27]. Running on 1 process and thread.

pid = 14332 (process #1, nbproc = 1, nbthread = 1)
system limits: memmax = unlimited; ulimit-n = 200039
maxsock = 200039; maxconn = 100000; maxpipes = 0

Total number of connections established are way under the set limits (25k/req at max) and queue gets to (10-15k)

defaults
mode http
timeout client 10s
timeout server 1800s
timeout connect 10s
timeout queue 60s
timeout check 1s
load-server-state-from-file global
option forwardfor
no option forceclose
no option tcp-smart-accept
option http-no-delay
log global
log 127.0.0.1 len 5120 local0
option dontlognull
balance leastconn

frontend api
bind :80
bind :443 ssl crt ./
maxconn 100000
option http-buffer-request
…

backend bck1
fullconn 30000
option httpchk HEAD /rmi/status
http-check disable-on-404
server : check slowstart 20s maxconn 20

backend bck2
fullconn 30000
option httpchk HEAD /rmi/status
http-check disable-on-404
server : check slowstart 20s maxconn 500
server : check slowstart 20s maxconn 500
server : check slowstart 20s maxconn 500

Example of metrics from stress tests (no such huge spike as on production traffic but notable)

8_14_queue_connect_latency.png1542×799 51.3 KB

CPU busy is 100% on queue load on the haproxy servers itself and memory load jumps by ~400MB but there is still room to spare on the instances.

Here are the metrics from the production traffic where you can clearly see in red when the queuing occurs, backend connect latency spikes on all backends. So is this an expected behavior or we can further tune the configuration to try and mitigate this? All of this traffic is http/https, haproxies are behind keepalived instances running least connection algorithm. Haven’t seen any difference using round-robin algo.

Thanks and any help appreciated!