Hi all,
Is there any extra setting that needs to be enabled when load balancing grpc services running in a docker?, because we are trying to load balance our backend - which in this case are Tesla GPUs running the same Tensorflow models on different ports, all inside a docker. The Client is making grpc calls, where it is sending multiple images for the models running on the GPUs to consume. We have haproxy between the client and the GPUs, running on a separate VM, hoping it will load balance requests to these GPUs. But unfortunately, we are unable to make it work. We keep getting one of the other errors. On the GPU console, we could see that out of the 8 images for instance, only one image is getting processed and then we end up with an exception which is below -
status = StatusCode.UNAVAILABLE
details = “failed to connect to all addresses”
debug_error_string = “{“created”:”@1603277976.225947468",“description”:“Failed to pick subchannel”,“file”:“src/core/ext/filters/client_channel/client_channel.cc”,“file_line”:3941,“referenced_errors”:[{“created”:"@1603277976.225941688",“description”:“failed to connect to all addresses”,“file”:“src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc”,“file_line”:393,“grpc_status”:14}]} and sometimes this -
status = StatusCode.CANCELLED
details = “Received RST_STREAM with error code 8”
debug_error_string = “{“created”:”@1603183282.697214100",“description”:“Error received from peer ipv4:[IPaddress:8700]
(http://IPaddress:8700/)”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1056,“grpc_message”:“Received RST_STREAM with error code 8”,“grpc_status”:1}"
Below is the relevant piece of haproxy Configuration ( P.S - here ‘loadbalancerIP’ and ‘IPaddress’ are respective IP addresses).
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 50000
#only for debugging
debug
# Default SSL material locations
#ca-base /etc/ssl/certs
#crt-base /etc/ssl/private
# See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
#ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
#ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
#ssl-default-bind-options no-sslv3 no-tls-tickets
#ssl-default-server-options no-sslv3 no-tls-tickets
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
option http-use-htx
#option logasap
maxconn 3000
frontend loadbalancernode
bind 'loadbalancerIP':8700 proto h2
bind 'loadbalancerIP':8600 proto h2
bind 'loadbalancerIP':8601 proto h2
bind 'loadbalancerIP':8605 proto h2
default_backend gpu_servers
backend gpu_servers
balance leastconn
mode http
server server1_8700 'IPaddress':8700 proto h2 check
server server1_8600 'IPaddress':8600 proto h2 check
server server1_8601 'IPaddress':8601 proto h2 check
server server1_8605 'IPaddress':8605 proto h2 check
server server2_8700 'IPaddress':8700 proto h2 check
server server2_8600 'IPaddress':8600 proto h2 check
server server2_8601 'IPaddress':8601 proto h2 check
server server2_8605 'IPaddress':8605 proto h2 check
server server3_8700 'IPaddress'1:8700 proto h2 check
server server3_8600 'IPaddress':8600 proto h2 check
server server3_8601 'IPaddress':8601 proto h2 check
server server3_8605 'IPaddress':8605 proto h2 check
listen stats
bind :30000
mode http
stats enable
stats uri /haproxy?stats
stats hide-version
stats refresh 60
stats realm Haproxy-Statistics
stats auth admin:password
Without haproxy, the calls work perfectly fine, so not sure what haproxy is adding/removing from the client calls that are forwarded to the backend servers.
Any help would be much appreciated.
Thanks,
Ranjan
1 post - 1 participant