@crisrodrigues wrote:
Hi,
We’ve been using haproxy-1.8.18 happily since it was released.
It sits in front our app server and gets requested to route them to our backend farm depending on the desired app service.
We use:
nbproc 5
;- 1 unix frontend attached to each process;
- communicate with a few hundred (~300) backends. We use server-templates for a few cases with DNS resolution and IP addresses (IPv4 and IPv6) for the rest.
All frontends are non-encrypted HTTP/1.1 and backends vary between TLS are HTTP.
This is the very same (boring but long) config file we’ve always used.
And this weekend a few haproxy processes started dying. “show errors” don’t show anything (and can’t be used after the process dies), so not a clue besides this dmesg messages:
[12313007.354629] haproxy[1010574]: segfault at 58 ip 000000000048de73 sp 00007ffd7e218950 error 4 in haproxy[400000+146000] [12313007.355575] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 [12316538.140456] haproxy[1013602]: segfault at 58 ip 000000000048de73 sp 00007ffe09fbe5e0 error 4 in haproxy[400000+146000] [12316538.141250] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 [12424725.217771] haproxy[1112079]: segfault at 58 ip 000000000048de73 sp 00007fff1d1e87c0 error 4 in haproxy[400000+146000] [12424725.218582] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 [12444059.954893] haproxy[1112083]: segfault at 58 ip 000000000048de73 sp 00007fff1d1e87c0 error 4 in haproxy[400000+146000] [12444059.955708] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 [12473582.800870] haproxy[1162962]: segfault at 58 ip 000000000048de73 sp 00007fff3d196f00 error 4 in haproxy[400000+146000] [12473582.801908] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 [12489985.349159] haproxy[1162959]: segfault at 58 ip 000000000048de73 sp 00007fff3d196f00 error 4 in haproxy[400000+146000] [12489985.350112] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10
I’m not sure what info I can provide since this is live traffic, but obviously I’ll try and get as much as possible. I since updated to 1.8.19 with the latest patches on top of it (as of March 23rd git tree) but can’t spot a single problem there that’s related. And segfault is…weird!
Anyway, the usual part of our config is:
global nbproc 5 maxconn 900000 ulimit-n 2701398 user haproxy group haproxy daemon ssl-engine rdrand ssl-mode-async tune.ssl.default-dh-param 2048 tune.ssl.maxrecord 1419 unix-bind user haproxy mode 777 hard-stop-after 1m tune.idletimer 1000 tune.bufsize 131072 resolvers dnsserver nameserver cloudflare 1.1.1.1:53 resolve_retries 3 hold valid 3s hold timeout 1s hold refused 1s accepted_payload_size 1024 defaults mode http retries 1 maxconn 900000 timeout connect 10s timeout server 100s timeout server-fin 3s timeout check 10s timeout client 100s timeout client-fin 3s timeout http-request 3s timeout http-keep-alive 5s timeout tunnel 300s option http-no-delay default-server init-addr none option accept-invalid-http-response option tcp-check frontend front bind-process 1-5 bind /var/run/backend1.sock process 1 bind /var/run/backend2.sock process 2 bind /var/run/backend3.sock process 3 bind /var/run/backend4.sock process 4 bind /var/run/backend5.sock process 5
For backend selection, we use a few tricks:
- All servers in each backend use a tcp-check (L4) to see if the desired port is available;
- We store in HTTP headers 2 possible backends: A first (better) and a second (slower, but possible) backend name;
- We use a var (set-var with req context) to retrieve the desired backend name from a HTTP header;
- We set a ACL to check how many servers are alive in the first (better) backend, such as:
acl avail var(req.back_first),nbsrv ge 1
- We use the backends as:
use_backend %[var(req.back_first)] if avail use_backend %[var(req.back_second)]
Any info you need to help figure this out would be greatly appreciated
Posts: 1
Participants: 1