Flask-SQLAlchemy consuming 100% CPU with multi-thread WSGI server.

These days I has been taking my hairs with a critical issue on my web app, when it eats up to 100% CPU of my VPS.

My web app plays role as an authentication server for splash-protected Wifi network. When people want to use the network, the router (with WifiDog installed) will drive them to a login page first, requiring them to authenticate to get Internet access. With this usage, I don't expect the server to get too many guests. That's why I chose to build it with Flask. For database utilization, I pick SQLAlchemy for safety and because at that time, none of others supports Python3.

The website runs smoothly until one recent day we realized that it took 100% CPU and made the server irresponsive. I tried many ways, including rollbacking the code to old revisions, downgrading the libraries but none helps. At first I ran with CherryPy as WSGI server, then I tried switching to Waitress. No help. When I ran with Flask's built-in Flask server, I noticed that the CPU usage no longer increased to 100%, but in other hand, the web kept hanging. The browser could connect and make HTTP request, but got no response.

After many days trying and testing as well as doing optimize code which exploits database, I try the last option: switching to Gunicorn. Surprisingly, this helps! Though I don't know how Gunicorn fix the problem, or where is the defect in my code, I can assert that it is not because "Gunicorn is superior to others". The difference between Gunicorn and the other forementioned is that one is multi-process type and one is multi-thread (CherryPy, Waitress). My use, my setup of SQLAlchemy/PostgreSQL is somehow compatible with multi-process server rather than multi-thread. But by which manner, I don't know. There are many of setup of Flask+SQLAlchemy+CherryPy out there but no one complaints about 100% CPU issue. It proves that the issue is not CherryPy/Waitress's fault.

My lack of knowledge about database system in low-level prevents me from understanding where the point of failure is. So let temporarily pleasant with the current solution and hope one day I can answer the question "Why?".