2.2.3 multiprocess -> 3.0.8 multithreaded TLS performance degradation

Hi all, I’m working on trying to port existing haproxy instances from 2.2.3 using nbproc to 3.0.8 nbthread and hitting a (literal) TLS performance wall.

The hardware and underlying OS is identical, I’m just trying to update to a newer version and use a supported config:

AMD EPYC 9654P 96-Core Processor (hyperthreaded, no NUMA) 3.7GHz, 768GB RAM
Mellanox MT28908 ConnectX-6 NIC, 100GB DAC, FEC enabled
Ubuntu 22.04 (Linux version 5.15.0-60-generic)

The 2.2.3 config is simply using nbproc set to 64 with no other CPU config:

global
    user haproxy
    group haproxy
    daemon
    maxconn 450000
    nbproc 64
    server-state-file /tmp/haproxybackendstate
    set-dumpable
    ssl-dh-param-file /etc/ssl/ttd/dh_params
    ssl-mode-async
    stats maxconn 200
    tune.bufsize 32768
    tune.comp.maxlevel 2
    tune.ssl.cachesize 1000000
    tune.ssl.default-dh-param 2048
defaults
    load-server-state-from-file global
    maxconn 450000
    mode tcp
    retries 3
    timeout check 11s
    timeout client 16s
    timeout connect 10s
    timeout server 16s

and the 3.0.8 config I’ve ended up with for testing effectively tries to mirror that:

global
    user haproxy
    group haproxy
    daemon
    nbthread 64
    thread-groups 1
    cpu-map auto:1/1-64 0-63
    maxconn 450000
    server-state-file /tmp/haproxybackendstate
    set-dumpable
    ssl-dh-param-file /etc/ssl/ttd/dh_params
    ssl-mode-async
    stats maxconn 200
    stats socket /var/run/haproxystats.sock mode 600 level admin expose-fd listeners
    tune.bufsize 32768
    tune.comp.maxlevel 2
    tune.ssl.cachesize 1000000
    tune.ssl.default-dh-param 2048
    tune.listener.default-shards by-thread # https://docs.haproxy.org/3.0/configuration.html#3.2-  tune.listener.default-shards
    tune.listener.multi-queue fair # https://docs.haproxy.org/3.0/configuration.html#3.2-tune.listener.multi-queue

defaults
    load-server-state-from-file global
    maxconn 450000
    mode tcp
    retries 3
    timeout check 11s
    timeout client 16s
    timeout connect 10s
    timeout server 16s

the front/backend configs are the same for both:

frontend myservice443
    bind x.x.x.x:443 ssl crt /etc/ssl/mycompany/star_global_pub_priv
    default_backend myservice443
    mode http
    option forwardfor if-none

frontend myservice80
    bind x.x.x.x:80
    default_backend myservice80
    mode http
    option forwardfor if-none

backend myservice443
    balance random
    http-check expect status 200
    http-check send-state
    mode http
    option httpchk GET /service/health?from=lb
    server  backendpod1  172.18.95.118:443 ca-file /etc/ssl/mycompany/star_global_cert_ca check fall 4 inter 5s rise 3 slowstart 15s ssl verify none weight   100
    server  backendpod2  172.18.89.81:443 ca-file /etc/ssl/mycompany/star_global_cert_ca check fall 4 inter 5s rise 3 slowstart 15s ssl verify none weight   100
    server  backendpod3  172.18.85.74:443 ca-file /etc/ssl/mycompany/star_global_cert_ca check fall 4 inter 5s rise 3 slowstart 15s ssl verify none weight   100
    server  backendpod4  172.18.93.160:443 ca-file /etc/ssl/mycompany/star_global_cert_ca check fall 4 inter 5s rise 3 slowstart 15s ssl verify none weight   100

backend myservice80
    balance random
    http-check expect status 200
    http-check send-state
    mode http
    option httpchk GET /service/health?from=lb
    server  backendpod1  172.18.95.118:80 check fall 4 inter 5s rise 3 slowstart 15s weight   100
    server  backendpod2  172.18.89.81:80 check fall 4 inter 5s rise 3 slowstart 15s weight   100
    server  backendpod3 172.18.85.74:80 check fall 4 inter 5s rise 3 slowstart 15s weight   100
    server  backendpod4  172.18.93.160:80 check fall 4 inter 5s rise 3 slowstart 15s weight   100

(although I’ve also tried 3 thread groups with 64 processors in each)

and finally, these are the kernel tuning parameters for both systems:

linux_sysctl:
  net.ipv4.conf.all.proxy_arp: 0
  net.ipv4.tcp_window_scaling: 1
  net.ipv4.tcp_fin_timeout: 10
  net.ipv4.ip_forward: 1
  net.ipv4.conf.all.rp_filter: 2
  net.ipv4.conf.default.rp_filter: 2
  fs.file-max: 5000000
  fs.nr_open: 5000000
  net.ipv4.tcp_max_syn_backlog: 3240000
  net.core.somaxconn: 100000
  net.core.netdev_max_backlog: 100000
  net.ipv4.ip_local_port_range: "{{ 1024 + ansible_processor_vcpus }} 65535"
  net.netfilter.nf_conntrack_buckets: 425440
  net.netfilter.nf_conntrack_max: 10035200
  net.netfilter.nf_conntrack_tcp_timeout_close_wait: 20
  net.netfilter.nf_conntrack_tcp_timeout_fin_wait: 20
  net.netfilter.nf_conntrack_tcp_timeout_time_wait: 20
  net.ipv4.tcp_max_orphans: 5000000
  net.ipv4.conf.all.arp_ignore: 1
  net.ipv4.conf.default.arp_ignore: 1
  net.ipv4.conf.all.arp_announce: 2
  net.ipv4.conf.default.arp_announce: 2

The behavior is that within seconds of turning on traffic to the instance (via rack-level routing change):

4xx and 5xx errors start climbing
CPU utilization climbs until it hits 100% on all assigned processors
If I disable the TLS frontend, the non-TLS proxy works fine with no performance issues
If I enable the TLS frontend, neither proxy is healthy (I don’t see the 4xx and 5xx errors spike as high on the non-TLS service, but it never achieves healthy traffic levels)

I have what is essentially an ideal production testing environment, where these SLBs are peering via BGP to the top of rack switch and I have effectively equal amounts of traffic going to them; in this rack, there are two SLBs total, one running the legacy 2.2.3 version with nbproc and the other with 3.0.8 running the nbthread config. The cert being used is a 2048 bit wildcard.

Any initial thoughts? My initial investigation seems to to point to receive queues being overloaded, I’ve tried things like setting maxconn to high values (e.g. 2M), same with ssl.cachesize (e.g. 64M), since those are shared for all threads.

3 posts - 2 participants

Read full topic

2.2.3 multiprocess -> 3.0.8 multithreaded TLS performance degradation

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...