Hi guys,
I'd appreciate if anyone can give me couple of suggestions for the issue I have with SSL.
I know that sounds like certificate issue, but it happens only when I have big spike of new connections.
I am running haproxy 1.5.14 on Azure and using SSL termination.
Haproxy works perfectly well when load rises gradually, but everything goes bad if I have instant load.
In normal situation qmax goes up to 3000 and per thread and cpu core is loaded not higher than 75%.
So if I restart haproxy during daily load, haproxy might fill CPU usage up to 100% and be unable to handle more than 700-800 requests per thread.
When it comes to that limit, I see rate of new requests lowers down to 2-5
Haproxy log become mostly filled with tls/1: SSL handshake failure
errors.
If I add more haproxy instances into balance, it becomes normal.
I don't have issues with entropy:
cat /proc/sys/kernel/random/entropy_avail
885
I tried to add conneciton rate limits:
maxsessrate 100
maxsslrate 100
maxconnrate 100
that had no effect. Everything stops at about 800 connections and then whole log filled with SSL handshake failures.
I tried to play around with timeouts
changed timeout connect
as:
- 500
- 50000
- 30s
No effect
Can anyone suggest anything here? I have no idea how to debug that.
Here is the config file I use:
global
log /dev/log local0
log /dev/log local1 notice
stats socket /var/run/haproxy.p1.sock mode 660 group nagios level admin process 1
stats socket /var/run/haproxy.p2.sock mode 600 level admin process 2
stats socket /var/run/haproxy.p3.sock mode 600 level admin process 3
stats socket /var/run/haproxy.p4.sock mode 600 level admin process 4
stats timeout 2m #Wait up to 2 minutes for input
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
nbproc 4
cpu-map 1 0 # first arg is process number (1-based); second arg is cpu number (0-based)
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3
# SSL/TLS settings
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
tune.ssl.default-dh-param 2048
tune.ssl.cachesize 10000000
tune.ssl.lifetime 86400
#tune.ssl.maxrecord 2859
tune.ssl.maxrecord 1400 # TCP window size
ssl-default-bind-options no-sslv3 no-tls-tickets
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK
maxconn 60000
maxsslconn 60000
# maxsessrate 100
# maxsslrate 100
# maxconnrate 100
defaults
log global
option dontlognull
option dontlog-normal
timeout connect 5000
timeout client 50000
timeout server 50000
bind-process all # not needed, but worthwhile being explicit
listen stats
bind :2100 process 1
bind :2101 process 2
bind :2102 process 3
bind :2103 process 4
mode http
log global
stats enable
stats realm stats_process
stats uri /
stats refresh 15s
stats show-legends
stats show-node
stats auth xxxxxxxxxxxxx
frontend tls
mode tcp
maxconn 60000
option tcplog
bind *:443 ssl crt-list /etc/ssl/private/certificates.txt npn http/1.1
default_backend frontend_service
backend frontend_service
mode tcp
option tcplog
option httpchk GET /status
fullconn 60000
# 2 second 'inter'val between health checks. 2 failures to remove a server. 2 successes to add it back
default-server inter 8s fall 2 rise 2
timeout check 8s
balance leastconn
server SRV1 SRV1:80 maxconn 2000 check port 3000
....
server SRV60 SRV1:80 maxconn 2000 check port 3000
Thank you!
Pavel