FastCGI occasionally slow

#1
I recently migrated a small web app to OLS. There are a couple of html files and two primary Perl/FastCGI scripts as external apps using UDS, directly called by the command parameter. Everything works, is operating in FastCGI mode, and is generally lightning fast. However, I have a situation I haven't been able to figure out. A common action in the app fires off four async xhr requests to one external app (this is the one with the problem), and one xhr to the other external app. One can do this many times, over and over, and it all works as expected. I've put logging into the external apps so I can see the time actual request processing starts and finishes. All four requests arrive at the same time, the processing start times match up with the times of request arrival in the access log, and the responses occurs milliseconds later. But every now and then, the fourth request doesn't start processing until ~7 seconds after the access log time-- I am assuming it wasn't 'assigned' until then. Processing takes the normal amount of time after that. It's always the fourth request of the group, and is usually the same 'call', but not always.

I've fiddled with the OLS server and vhost (template) parameters to no avail. The virtual server it runs on is 4cpu/8G running Ubuntu 24 wth Mariadb as a backend. Both OLS admin and top show the server very lightly loaded: 0.1 or less usually with no observable spikes. I did load up the server to 9+ once and it still worked. Though I've allowed many more, the pool for the 4-hit app is rarely over 11-13, and one rarely sees more than 1 or 2 in use in the dashboard. Response data is not enormous.

It's been a while since I've run a web server, and OLS is new to me, so I expect I've messed something up, but I cannot figure out what. Anyone had a similar experience, or have any ideas?
 
#2
Interestingly, the problem is worse under https than http. It happens almost reliably with my primary test case, and often involves two of the four calls being delayed by 7 sec, instead of one.

#access log extract, last # is response time in seonds
w.x.y.z [08/Aug/2024:07:26:50 -0700] "GET /pda/notes?xxxxxxxxxx HTTP/2" 200 8572 0
w.x.y.z [08/Aug/2024:07:26:50 -0700] "GET /pda/demo?xxxxxxxxxx HTTP/2" 200 444 0
w.x.y.z [08/Aug/2024:07:26:50 -0700] "GET /petinfo?ad=A824933 HTTP/2" 200 1744 0
w.x.y.z [08/Aug/2024:07:26:50 -0700] "GET /pda/tasks?xxxxxxxxxx HTTP/2" 200 790 7
w.x.y.z [08/Aug/2024:07:26:50 -0700] "GET /pda/treatment?xxxxxxxxxx HTTP/2" 200 1449 7
#logs from fcgi scripts
2024/08/08 07:26:50 ---- GET /demo xxxxxxxxxx
2024/08/08 07:26:50 ---- GET /notes xxxxxxxxxx
2024/08/08 07:26:50 6 GET /notes xxxxxxxxxx
2024/08/08 07:26:50 15 GET /demo xxxxxxxxxx
2024/08/08 07:26:57 ---- GET /tasks xxxxxxxxxx
2024/08/08 07:26:57 ---- GET /treatment xxxxxxxxxx
2024/08/08 07:26:57 2 GET /tasks xxxxxxxxxx
2024/08/08 07:26:57 3 GET /treatment xxxxxxxxxx
2024/08/08 07:26:50 5 GET ad=A824933

w.x.y.z [08/Aug/2024:07:27:05 -0700] "GET /pda/notes?xxxxxxxxxx HTTP/2" 200 8572 0
w.x.y.z [08/Aug/2024:07:27:05 -0700] "GET /pda/demo?xxxxxxxxxx HTTP/2" 200 444 0
w.x.y.z [08/Aug/2024:07:27:05 -0700] "GET /petinfo?ad=A824933 HTTP/2" 200 1738 0
w.x.y.z [08/Aug/2024:07:27:05 -0700] "GET /pda/tasks?xxxxxxxxxx HTTP/2" 200 786 7
w.x.y.z [08/Aug/2024:07:27:05 -0700] "GET /pda/treatment?xxxxxxxxxx HTTP/2" 200 1446 7
2024/08/08 07:27:05 ---- GET /demo xxxxxxxxxx
2024/08/08 07:27:05 ---- GET /notes xxxxxxxxxx
2024/08/08 07:27:05 5 GET /notes xxxxxxxxxx
2024/08/08 07:27:05 14 GET /demo xxxxxxxxxx
2024/08/08 07:27:12 ---- GET /tasks xxxxxxxxxx
2024/08/08 07:27:12 ---- GET /treatment xxxxxxxxxx
2024/08/08 07:27:12 2 GET /tasks xxxxxxxxxx
2024/08/08 07:27:12 2 GET /treatment xxxxxxxxxx
2024/08/08 07:27:05 6 GET ad=A824933

w.x.y.z [08/Aug/2024:07:27:54 -0700] "GET /pda/treatment?xxxxxxxxxx HTTP/2" 200 1446 0
w.x.y.z [08/Aug/2024:07:27:54 -0700] "GET /petinfo?ad=A824933 HTTP/2" 200 1738 0
w.x.y.z [08/Aug/2024:07:27:54 -0700] "GET /pda/demo?xxxxxxxxxx HTTP/2" 200 444 0
w.x.y.z [08/Aug/2024:07:27:54 -0700] "GET /pda/tasks?xxxxxxxxxx HTTP/2" 200 793 7
w.x.y.z [08/Aug/2024:07:27:54 -0700] "GET /pda/notes?xxxxxxxxxx HTTP/2" 200 8575 7
2024/08/08 07:27:54 ---- GET /demo xxxxxxxxxx
2024/08/08 07:27:54 ---- GET /treatment xxxxxxxxxx
2024/08/08 07:27:54 2 GET /treatment xxxxxxxxxx
2024/08/08 07:27:54 10 GET /demo xxxxxxxxxx
2024/08/08 07:28:01 ---- GET /tasks xxxxxxxxxx
2024/08/08 07:28:01 ---- GET /notes xxxxxxxxxx
2024/08/08 07:28:01 2 GET /tasks xxxxxxxxxx
2024/08/08 07:28:01 3 GET /notes xxxxxxxxxx
2024/08/08 07:27:54 5 GET ad=A824933

I suppose this is good from a debugging perspective.
 
#3
Further debugging showed (if I was interpreting the log correctly) that the occasional request was getting assigned but then timing out. So at that point I decided to make the problem go away by abandoning FCGI and instead proxying to the critical part of the app reworked in Mojolicious. Problem removed.
 

Cold-Egg

Administrator
#4
I have no idea about the FCGI delay part unless you set a very low value on the Max Connections, Environment, Memory Limit, or Process Limit. Also, what's the value you set in the Instances?

Yes, the reverse proxy works too.
 
#5
I have no idea about the FCGI delay part unless you set a very low value on the Max Connections, Environment, Memory Limit, or Process Limit. Also, what's the value you set in the Instances?

Yes, the reverse proxy works too.
Thanks for responding.

Most of the parameters were default so I didn't expect them to be an issue. Connections varied but minimum was in the hundreds for a modest number of users. (Today for example my highest simultaneous http+https conn count was 20 according to dashboard.) It didn't seem that I should have been hitting any limits.

I'm not accusing OLS of having any issues. I fully expect the problem was on my side in a library somewhere, or at the interface. I was migrating a legacy CGI system and initially chose FastCGI as the simplest upgrade in efficiency. I chose not to use lsperld because I had more trouble getting it to work than pointing external apps directly at my CGI scripts wrapped in FCGI and CGI::Fast. Worked great, except for the occasional slowness.

Once the problem was revealed to be somewhere extremely difficult to debug, it made more sense to move the code into the modern era. It wasn't that difficult, and I'm glad now that I did. It was even kind of fun. So that's the story.
 
Top