Hello, I am trying to dive deep into routing failures due to timeout connect (under the hood it appears to be timeout queue) but I’m only seeing queue times reach the max of the first request, even though there is a retry/redispatch. Can you clarify the expected time values in the following configuration/scenario:
timeout connect 600ms
timeout client 11s
timeout server 900ms
balance roundrobin
option http-keep-alive
retries 1
option redispatch 1
retry-on empty-response 0rtt-rejected conn-failure response-timeout 503 500
http-reuse always
We are seeing some failures with the following log format:
Nov 14 12:14:41 localhost haproxy[27714]: <client_ip> [14/Nov/2024:12:14:40.518] frontend_srv~ backend_srv/srv_name 0 601 -1 -1 +1200 503 +217 - - sC-- 2368/2368/2363/6/+1 0/0 "GET /ws HTTP/1.1" <backend_source_ip> <backend_source_port> <server_ip> <port> <stickiness_id> - - - - <request_id> - -
Based on the log format : HAProxy version 2.8.12-12 - Configuration Manual
This suggests that the request had a queue time of 600ms and a total active request time of 1200ms and failed due to timeout (this makes sense in theory if we had 2 failures, each timing out at 600ms).
But shouldn’t the total queue time also be 1200ms? Or does queue time only track the time of the request that’s actually logged in the failure? If it’s total queue time, where else could the extra time be spent?
When we see timeout failures, it always follows this structure, where the queue_time is 600ms (equal to our timeout connect) but the active_req_time is 1200ms.
1 post - 1 participant