I have a Python/flask app that listens to requests from a Grafana dashboard - and processes/parses data from a backend in response:
If a user ‘spams’ requests to the app via Grafana, it can currently result in too many parallel threads being spun up, thus crashing the app due to memory issue (in particular on memory restricted instances such as AWS EC2).
A simplistic solution could be to limit the number of threads or sessions entirely via e.g. waitress. However, some of the session requests are ‘light’ and purely for fetching meta data used in generating other requests - these we prefer to allow the app to process quickly and in parallel.
It has been proposed that we use HAProxy to somehow be able to “restrict” the threads/sessions sent to the app so that it does not cause out-of-memory events. I was hoping somebody in here might have a suggestion for how to set this up in our project - or would be interested in helping with this as a small freelance project.
2 posts - 2 participants