Quantcast
Channel: HAProxy community - Latest topics
Viewing all articles
Browse latest Browse all 4849

Second Back-End Server Always Times Out - but fine when accessing on the LAN

$
0
0

Hello,

I am having an issue that I can’t seem to figure out. I’m not convinced it is HAProxy - but I need to eliminate all possibilities.

I’ve used this configuration for more than 4 years now. In the past month, I have changed data centers - but have a mostly identical hardware configuration. I’ll try to walk through everything as detailed as possible.

Configuration:

global
        log /dev/log    local0 debug
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user ion
        group ion
        daemon

        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

frontend http-in
        bind *:80
        mode http
        acl http ssl_fc,not
        http-request redirect scheme https if http
        log global

frontend app
        bind *:443 ssl crt <REDACTED>
        option                  forwardfor
        log global
        option http-keep-alive
        timeout http-keep-alive 1000
        mode http
        acl                     app3    var(txn.txnhost) -m str -i app3.ion-k12.com
        acl                     aclcrt_APPFrontEnd      var(txn.txnhost) -m reg -i ^([^\.]*)\.website\.com(:([0-9]){1,5})?$
        acl                     aclcrt_APPFrontEnd      var(txn.txnhost) -m reg -i ^website\.com(:([0-9]){1,5})?$
        acl                     api     var(txn.txnhost) -m str -i api.ion-k12.com
        acl                     aclcrt_APIFrontEnd      var(txn.txnhost) -m reg -i ^api\.website\.com(:([0-9]){1,5})?$
        acl                     public  var(txn.txnhost) -m beg -i public.website.com
        acl                     app     var(txn.txnhost) -m str -i testing.website.com
        acl                     aclcrt_TestingFrontEnd  var(txn.txnhost) -m reg -i ^([^\.]*)\.website\.com(:([0-9]){1,5})?$
        acl                     aclcrt_TestingFrontEnd  var(txn.txnhost) -m reg -i ^website\.com(:([0-9]){1,5})?$
        http-request            set-var(txn.txnhost) hdr(host)
        default_backend         appservers
        option httplog
        option logasap
        http-request capture req.hdr(Content-Length) len 15

backend appservers
        mode                    http
        log /dev/log local0 debug
        balance                 roundrobin
        timeout connect         300s
        timeout server          300s
        retries                 3
        server                  APP01 192.168.10.50:80 id 10101 check port 80 inter 1000
        server                  APP02 192.168.10.51:80 id 10102 check port 80 inter 1000

defaults
        log     global
        mode    http
        option  tcplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http

APP01:
IIS
Server 2022
128 GB Memory
Intel X550 (Dual) NIC

APP02
IIS Configured Identically
Server 2022
128 GB Memory
Mellanox ConnectX-4 LX NIC

When APP01 and APP02 are both active - we get intermittent 504 errors. When isolated, APP02 is the source of the trouble. However, when I browse to the site on APP02, it performs normally. When I browse to it from APP01, it performs normally. It’s just when APP02 is in the LB cluster that it 504’s.

I’m wondering if there is something about the Mellanox NIC that HAProxy doesn’t like. I can’t find any notes, or documentation or anything like that, so I have no way of confirming. It should just work, right? A NIC is a NIC.

Here are a few of the logs. I see that it’s timing out after 30 seconds with a sH - but it shouldn’t be - especially when the server is responsive locally.

Nov 14 14:01:42 k17-ru21 haproxy[506882]: 69.135.X.X:28175 [14/Nov/2023:14:01:12.825] app~ appservers/APP02 0/1/30029 198 sH 24/22/3/3/0 0/0
Nov 14 14:01:42 k17-ru21 haproxy[506882]: 69.135.X.X:16805 [14/Nov/2023:14:01:12.928] app~ appservers/APP02 0/0/30040 198 sH 24/22/2/2/0 0/0
Nov 14 14:01:43 k17-ru21 haproxy[506882]: 69.135.X.X:37404 [14/Nov/2023:14:01:12.865] app~ appservers/APP02 0/1/30163 198 sH 24/22/1/1/0 0/0
Nov 14 14:01:43 k17-ru21 haproxy[506882]: 69.135.X.X:59170 [14/Nov/2023:14:01:13.211] app~ appservers/APP02 0/0/30098 198 sH 24/22/0/0/0 0/0

Is it possible that there is some kind of routing, or network something going on between my HAProxy and my APP02 that would be causing these 504’s? I mean - with a direct connection to APP02, most requests are being served in milliseconds. But introduce HA - 30+ seconds.

I’m at my wits end here…

Thanks for any help or info anyone can provide.

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 4849

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>