HAProxy 2.0.5 often fails to quickly update SRV records

@eedwards-sk wrote:

I’m attempting to use HAProxy Resolvers along with SRV Records and server-template to allow services on dynamic ports to register with HAProxy.

I’m using AWS Service Discovery (with Route53, TTL: 10s) and ECS.

It works successfully, given enough time, and any services in the DNS record eventually become available backends.

If I have 2 containers running for a service, with 4 defined using server-template, then the first 2 will be “green” and the second two will be “red”.

During an HA deployment, where the 2 containers are replaced 1 by 1, HAProxy fails to register the updated records in time to prevent an outage.

So e.g. during a deployment, you might have an SRV record with 2 results:
_foo_.my.service:

  - A._foo.my.service
  - B._foo.my.service
as the first container (A) is stopped, the SRV record only returns 1 result:
_foo_.my.service:

  - B._foo.my.service
at this point, I would expect HAProxy to remove the server from the server list, and it would appear “red” similar to other servers that were missing when the service started

However, instead, the server ends up marked as “MAINT” (orange), due to “resolution”, and will sit “stuck” for up to 5+ minutes sometimes, failing to acquire the new IP information.

Meanwhile, the SRV record is updated again as the services are replaced/updated:
_foo_.my.service:

  - B._foo.my.service
  - C._foo.my.service
then again as B is removed:
_foo_.my.service:

  - C._foo.my.service
and finally D is added:
_foo_.my.service:

  - C._foo.my.service
  - D._foo.my.service
This whole time, performing a dig SRV _foo_.my.service @{DNS_IP} on the haproxy host IMMEDIATELY resolves the correct service IPs and Ports as each of the above deployment steps happens. So the issue isn’t with upstream DNS being up-to-date.

This makes the SRV system basically useless to me currently, as even with a rolling deployment with HA services, I end up with an outage.

I have 2 HAProxy servers and the behavior is not identical between them, either (even though they’re identically configured).

Whether one of the server entries stays in “MAINT” for long seems to vary between them.

Eventually, it ends up resolving – but having to wait 5+ minutes and having the services go completely unavailable (even though they’re up, dns is updated, and they’re ready to receive traffic) is not adequate for production usage.

here’s a sanitized and trimmed config excerpt:
global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

        spread-checks     5

defaults
        log           global
        mode          http
        option        httplog
        option        dontlognull
        timeout       connect 5000
        timeout       client  50000
        timeout       server  50000
        errorfile     400 /etc/haproxy/errors/400.http
        errorfile     403 /etc/haproxy/errors/403.http
        errorfile     408 /etc/haproxy/errors/408.http
        errorfile     500 /etc/haproxy/errors/500.http
        errorfile     502 /etc/haproxy/errors/502.http
        errorfile     503 /etc/haproxy/errors/503.http
        errorfile     504 /etc/haproxy/errors/504.http

        option        httpclose
        monitor-uri   /elb-check

        maxconn       60000
        rate-limit    sessions 100
        backlog       60000

resolvers aws-sd
        accepted_payload_size   8192
        hold valid              5s # keep valid answer for up to 5s
        nameserver aws-sd1      169.254.169.253:53

listen stats
        bind              0.0.0.0:9000
        mode              http
        balance
        stats             enable
        stats             uri /stats
        stats             realm HAProxy\ Statistics

frontend HTTP_IN
        bind              0.0.0.0:80
        capture           request header User-Agent len 200
        capture           request header Host len 54
        capture           request header Origin len 54
        capture           request header X-Forwarded-For len 35
        capture           request header X-Forwarded-Proto len 5
        capture           response header status len 3
        option            http-server-close
        option            forwardfor except #sanitized#
        option            forwardfor except #sanitized#

        # environments
        acl               dev        hdr_beg(host)  #sanitized#. #sanitized#.

        # web-services routes
        acl               locations         path_beg /locations

        # dev backend
        use_backend       DEV_HOME if dev !locations
        use_backend       DEV_LOCATIONS if dev locations

backend DEV_HOME
        balance roundrobin
        option httpchk GET /healthcheck
        http-check expect status 200
        default-server inter 10s downinter 2s fastinter 2s rise 5 fall 2
        server-template web 4 _http._tcp.web-service-home-dev-web.my.service resolvers aws-sd check init-addr none resolve-opts allow-dup-ip resolve-prefer ipv4

backend DEV_LOCATIONS
        balance roundrobin
        option httpchk GET /locations/healthcheck
        http-check expect status 200
        default-server inter 10s downinter 2s fastinter 2s rise 5 fall 2
        server-template web 4 _http._tcp.web-service-locations-dev-web.my.service resolvers aws-sd check init-addr none resolve-opts allow-dup-ip resolve-prefer ipv4

Posts: 3

Participants: 1

Read full topic

HAProxy 2.0.5 often fails to quickly update SRV records

Trending Articles

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

pinout ecu b5vf 18881a

Stories • Goddess Stepmom

BQ40Z80EVM-020: Installation problems with Battery Management Studio Software...

Cops bust UVF goon Matthews at east Belfast gym

* Start SLD Registration * Failed to open HTTP connection

Practical Research 2 DLP for SHS

South Sudan: CCM VACANCY FOR Primary Health Care Supervisor (PHCS) – SOUTH SUDAN

Sarah Samis, Emil Bove III

IP400 Series Phones Fail to Connect to CAS

Practice Sheet of Right form of verbs for HSC Students

Who's been in the courts?

LSI SMIS на ESXi 6.7

MDG F: Cost Centre Hierarchy - File upload

FUNG: ROMELIA MARIA

あいみょん (Aimyong) –瞬間的シックスセンス [FLAC 24bit/48kHz]

Error when updating pager_heading in Views Module - "A valid cache entry...

Re: No option for 'Guest Isolation' in VMware Workstation 16 player

Burbank Police Log: May 16 – May 22