Quantcast
Viewing all articles
Browse latest Browse all 4733

HAProxy stops serving frontend requests while not closing backend connections at 100% cpu utilisation

@Andreas wrote:

Hello everyone

Lately we had a problem which caused our both HAProxies to consume 100% cpu time and stop responding to new frontend connections. While this problem occurs the HAProxy seems to keep all existing backend connections and not closing them over a longer period of time (this information is based one the grafana graphs). The haproxy stops creating log entries and there are no error or warning messages in other logs.

Reloading the haproxy service in this situation allows new frontend connections to the haproxy for a short period of time and our site is served until the first described situation occurs again (30-60min). Restarting the haproxy service solves the problem.

Our haproxies running the same setup:

  • Centos 7 virtual machine on VMWare ESX
  • 4 CPUs and 4GB RAM
  • Kernel 3.10.0-862.3.2.el7.x86_64 #1 SMP
  • HAProxy Version 1.8.14-52e4d43 (also occurred with HAProxy 1.8.9)

Both systems perform ssl offloading and load balance traffic to four backend http server which are on the same network.

Our normal load behaviour on a Saturday looks like this. The archived data is only in 30min intervals, therefore the cpu utilisation is not accurate with 20%.

When the described problem occurs our monitoring records the following data. Both HAProxies stop working at the exact same moment. Therefore, I think that the HAProxy is not the cause of the problem. But the effects lead to a problem on both systems. Also after the peak around 20:30 (08:30 pm) our layer 4 load balancer records a decreasing amount of connections as expected, but the HAProxies keep a high number of connections.


In total this problem occurred three times in the last 1 1/2 month only on a Saturday while higher load situations. Before the first occurrence this setup was running for about 2 1/2 month without any problems.

Unfortunately I’m unable to reproduce this behaviour.

The haproxy.cfg has the following content. I remove names and acls.

global
	maxconn 40000
        nbproc 1
        nbthread 4
        cpu-map auto:1/1-4 0-3

	log 127.0.0.1 local0 notice
	log 127.0.0.1 local0 info #temp

	chroot /var/lib/haproxy
	stats timeout 30s
	user haproxy
	group haproxy
	daemon

	tune.ssl.default-dh-param 2048
	ssl-default-bind-options no-sslv3
        ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK

defaults
	maxconn 40000
	log global
	mode http
	option httplog
	option dontlognull
	timeout http-request 10s
	timeout http-keep-alive 3s
	timeout connect 12s
	timeout queue 60s
	timeout client 60s
	timeout server 60s
	timeout	check 30s

userlist ...

frontend stats_frontend
	bind *:8080
	mode http
	option dontlog-normal
	default_backend stats_backend

frontend http_frontend
	bind *:80

        option http-buffer-request
        declare capture request len 20000
        http-request capture req.body id 0
	capture request header Host len 200
	log-format "%ci:%cp [%tr] %ft %b/%s %Th/%Ti/%TR/%Tq/%Tw/%Tc/%Tr/%Ta/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"

	acl is_crossdomain path -i /crossdomain.xml
	acl is_well_known path_beg -i /.well-known
	acl is_...

	# Redirect acls for old urls
        http-request redirect code 301 location #... (about 30 redirect rules)
	
        redirect scheme https if ...

	use_backend haproxy_backend if is_well_known
	use_backend http_server_backend if ...
	
	# Default backend
	default_backend http_server_backend

frontend https_frontend
	bind *:443 ssl alpn h2,http/1.1 crt ...

        option http-buffer-request
        declare capture request len 20000
        capture request header Host len 200
	log-format "%ci:%cp [%tr] %ft %b/%s %Th/%Ti/%TR/%Tq/%Tw/%Tc/%Tr/%Ta/%Tt %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"

	acl is_crossdomain path -i /crossdomain.xml
	acl is_well_known path_beg -i /.well-known
	acl ...

	use_backend haproxy_backend if is_well_known
	use_backend http_server_backend if ...

	# Default backend
	default_backend http_server_backend

backend http_server_backend
	balance leastconn
	option http-keep-alive
	option forwardfor
	option httpchk HEAD / HTTP/1.1\r\nHost:\ ...

	acl is_crossdomain capture.req.uri -m str /crossdomain.xml
	acl ...

	http-request deny if ...

	# Request authorization for sites
	http-request auth realm ...

	http-request set-header ...

	# Rewrite request urls 
	reqirep ^([^\ :]*)\ ...

	# default-server changes the default settings for backend servers
        default-server inter 2s downinter 5s rise 3 fall 2
	server httpserver1 10.x.y.z1:30080 check
	server httpserver2 10.x.y.z2:30080 check
	server httpserver3 10.x.y.z3:30080 check
	server httpserver4 10.x.y.z4:30080 check

backend stats_backend
	mode http
	acl is_auth ...
	acl is_admin ...
	stats ...

Does anyone know a similar situation or has an idea what can cause or solve this problem?

Thanks for your help.

Posts: 2

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4733

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>