Haproxy-1.8.18 segfault

@crisrodrigues wrote:

Hi,

We’ve been using haproxy-1.8.18 happily since it was released.

It sits in front our app server and gets requested to route them to our backend farm depending on the desired app service.

We use:

nbproc 5;

1 unix frontend attached to each process;

communicate with a few hundred (~300) backends. We use server-templates for a few cases with DNS resolution and IP addresses (IPv4 and IPv6) for the rest.

All frontends are non-encrypted HTTP/1.1 and backends vary between TLS are HTTP.

This is the very same (boring but long) config file we’ve always used.

And this weekend a few haproxy processes started dying. “show errors” don’t show anything (and can’t be used after the process dies), so not a clue besides this dmesg messages:
[12313007.354629] haproxy[1010574]: segfault at 58 ip 000000000048de73 sp 00007ffd7e218950 error 4 in haproxy[400000+146000]
[12313007.355575] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
[12316538.140456] haproxy[1013602]: segfault at 58 ip 000000000048de73 sp 00007ffe09fbe5e0 error 4 in haproxy[400000+146000]
[12316538.141250] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
[12424725.217771] haproxy[1112079]: segfault at 58 ip 000000000048de73 sp 00007fff1d1e87c0 error 4 in haproxy[400000+146000]
[12424725.218582] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
[12444059.954893] haproxy[1112083]: segfault at 58 ip 000000000048de73 sp 00007fff1d1e87c0 error 4 in haproxy[400000+146000]
[12444059.955708] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
[12473582.800870] haproxy[1162962]: segfault at 58 ip 000000000048de73 sp 00007fff3d196f00 error 4 in haproxy[400000+146000]
[12473582.801908] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
[12489985.349159] haproxy[1162959]: segfault at 58 ip 000000000048de73 sp 00007fff3d196f00 error 4 in haproxy[400000+146000]
[12489985.350112] Code: 44 24 18 48 8b 00 48 85 c0 74 1f 89 4c 24 28 48 89 54 24 20 4c 89 e7 4c 89 04 24 ff d0 8b 4c 24 28 48 8b 54 24 20 4c 8b 04 24 <48> 8b 42 58 48 85 c0 0f 84 4c 08 00 00 4d 85 c0 74 0d 41 83 78 10 
I’m not sure what info I can provide since this is live traffic, but obviously I’ll try and get as much as possible. I since updated to 1.8.19 with the latest patches on top of it (as of March 23rd git tree) but can’t spot a single problem there that’s related. And segfault is…weird!

Anyway, the usual part of our config is:
global
    nbproc 5
    maxconn 900000
    ulimit-n 2701398
    user haproxy
    group haproxy
    daemon
    ssl-engine rdrand
    ssl-mode-async
    tune.ssl.default-dh-param 2048
    tune.ssl.maxrecord 1419
    unix-bind user haproxy mode 777
    hard-stop-after 1m
    tune.idletimer 1000
    tune.bufsize 131072

resolvers dnsserver
  nameserver cloudflare 1.1.1.1:53
  resolve_retries       3
  hold valid 3s
  hold timeout 1s
  hold refused 1s
  accepted_payload_size 1024

defaults
    mode    http
    retries 1
    maxconn 900000
    timeout connect 10s
    timeout server 100s
    timeout server-fin 3s
    timeout check 10s
    timeout client 100s
    timeout client-fin 3s
    timeout http-request 3s
    timeout http-keep-alive 5s
    timeout tunnel 300s
    option http-no-delay
    default-server init-addr none
    option accept-invalid-http-response
    option tcp-check

frontend front
    bind-process 1-5 

    bind /var/run/backend1.sock process 1
    bind /var/run/backend2.sock process 2
    bind /var/run/backend3.sock process 3
    bind /var/run/backend4.sock process 4
    bind /var/run/backend5.sock process 5
For backend selection, we use a few tricks:

All servers in each backend use a tcp-check (L4) to see if the desired port is available;

We store in HTTP headers 2 possible backends: A first (better) and a second (slower, but possible) backend name;

We use a var (set-var with req context) to retrieve the desired backend name from a HTTP header;

We set a ACL to check how many servers are alive in the first (better) backend, such as:
    acl avail var(req.back_first),nbsrv ge 1
We use the backends as:
    use_backend %[var(req.back_first)] if avail
    use_backend %[var(req.back_second)]
Any info you need to help figure this out would be greatly appreciated

Posts: 1

Participants: 1

Read full topic

Latest Images

Trending Articles

Latest Images