@Andreas wrote:
Hello everyone
Lately we had a problem which caused our both HAProxies to consume 100% cpu time and stop responding to new frontend connections. While this problem occurs the HAProxy seems to keep all existing backend connections and not closing them over a longer period of time (this information is based one the grafana graphs). The haproxy stops creating log entries and there are no error or warning messages in other logs.
Reloading the haproxy service in this situation allows new frontend connections to the haproxy for a short period of time and our site is served until the first described situation occurs again (30-60min). Restarting the haproxy service solves the problem.
Our haproxies running the same setup:
- Centos 7 virtual machine on VMWare ESX
- 4 CPUs and 4GB RAM
- Kernel 3.10.0-862.3.2.el7.x86_64 #1 SMP
- HAProxy Version 1.8.14-52e4d43 (also occurred with HAProxy 1.8.9)
Both systems perform ssl offloading and load balance traffic to four backend http server which are on the same network.
Our normal load behaviour on a Saturday looks like this. The archived data is only in 30min intervals, therefore the cpu utilisation is not accurate with 20%.
When the described problem occurs our monitoring records the following data. Both HAProxies stop working at the exact same moment. Therefore, I think that the HAProxy is not the cause of the problem. But the effects lead to a problem on both systems. Also after the peak around 20:30 (08:30 pm) our layer 4 load balancer records a decreasing amount of connections as expected, but the HAProxies keep a high number of connections.
In total this problem occurred three times in the last 1 1/2 month only on a Saturday while higher load situations. Before the first occurrence this setup was running for about 2 1/2 month without any problems.
Unfortunately I’m unable to reproduce this behaviour.
The haproxy.cfg has the following content. I remove names and acls.
global maxconn 40000 nbproc 1 nbthread 4 cpu-map auto:1/1-4 0-3 log 127.0.0.1 local0 notice log 127.0.0.1 local0 info #temp chroot /var/lib/haproxy stats timeout 30s user haproxy group haproxy daemon tune.ssl.default-dh-param 2048 ssl-default-bind-options no-sslv3 ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK defaults maxconn 40000 log global mode http option httplog option dontlognull timeout http-request 10s timeout http-keep-alive 3s timeout connect 12s timeout queue 60s timeout client 60s timeout server 60s timeout check 30s userlist ... frontend stats_frontend bind *:8080 mode http option dontlog-normal default_backend stats_backend frontend http_frontend bind *:80 option http-buffer-request declare capture request len 20000 http-request capture req.body id 0 capture request header Host len 200 log-format "%ci:%cp [%tr] %ft %b/%s %Th/%Ti/%TR/%Tq/%Tw/%Tc/%Tr/%Ta/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r" acl is_crossdomain path -i /crossdomain.xml acl is_well_known path_beg -i /.well-known acl is_... # Redirect acls for old urls http-request redirect code 301 location #... (about 30 redirect rules) redirect scheme https if ... use_backend haproxy_backend if is_well_known use_backend http_server_backend if ... # Default backend default_backend http_server_backend frontend https_frontend bind *:443 ssl alpn h2,http/1.1 crt ... option http-buffer-request declare capture request len 20000 capture request header Host len 200 log-format "%ci:%cp [%tr] %ft %b/%s %Th/%Ti/%TR/%Tq/%Tw/%Tc/%Tr/%Ta/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r" acl is_crossdomain path -i /crossdomain.xml acl is_well_known path_beg -i /.well-known acl ... use_backend haproxy_backend if is_well_known use_backend http_server_backend if ... # Default backend default_backend http_server_backend backend http_server_backend balance leastconn option http-keep-alive option forwardfor option httpchk HEAD / HTTP/1.1\r\nHost:\ ... acl is_crossdomain capture.req.uri -m str /crossdomain.xml acl ... http-request deny if ... # Request authorization for sites http-request auth realm ... http-request set-header ... # Rewrite request urls reqirep ^([^\ :]*)\ ... # default-server changes the default settings for backend servers default-server inter 2s downinter 5s rise 3 fall 2 server httpserver1 10.x.y.z1:30080 check server httpserver2 10.x.y.z2:30080 check server httpserver3 10.x.y.z3:30080 check server httpserver4 10.x.y.z4:30080 check backend stats_backend mode http acl is_auth ... acl is_admin ... stats ...
Does anyone know a similar situation or has an idea what can cause or solve this problem?
Thanks for your help.
Posts: 2
Participants: 2