Possible dead lock [LSAPI application] - Help

#42
This might (or might not) be relevant as well. The server has been running on apache all week without problems. Yesterday though there was a sudden unexplained spike in disk activity, which seems to have been due to the server running out of memory and spending all its time swapping to virtual memory. It was so slow I couldn't do any diagnostics so I rebooted.

Then around 11pm I switched the server back to OLS. As you can see, disk activity steadily increased, which surprises me somewhat.
 

Attachments

Cold-Egg

Administrator
#43
It seems like your CPU is a little bit busy, so it might caused some processes like PHP to fail to process and log such an error. As you mentioned, an OOM issue happens with Apache, which also indicates the server might need an upgrade.
I can't tell why the disk usage increased while using OLS, to check it further, you can try iowait command and monitor it. Hope it helps.
 
#44
same issue here. wordpress installed on a fresh dedicated server, and lot of "possible dead lock". error log is not helping, no error on wordpress side, nothig. cpu usage is very low.
Completly unable to find root cause of those dead lock which are hanging the page response.
 
#45
It seems like your CPU is a little bit busy, so it might caused some processes like PHP to fail to process and log such an error. As you mentioned, an OOM issue happens with Apache, which also indicates the server might need an upgrade.
I can't tell why the disk usage increased while using OLS, to check it further, you can try iowait command and monitor it. Hope it helps.
My CPU usage generally looks OK to me. It's easy to blame server resources, but is adding (expensive) capacity really necessary? I'd like to see some evidence before doing that. In particular, if it's due to an attack I'd rather block the attack than spend money helping the attackers, therefore I keep a fairly close eye on server load (alarms on long page load times).
1698602183387.png
 

Cold-Egg

Administrator
#46
@philmck
That was just my opinion based on the top output screenshot shared by you, it depends on you whether to upgrade for sure. To measure if there's any attack, you might also want to monitor the network traffic.

@tonnick
Please check if there's any PHP error log, if nothing is found, and the possible dead lock keeps showing, please submit the issue to support@litespeedtech.com for a further look.
 
#47
There's nothing relevant in the PHP error logs that I can see. I've been doing some testing and it appears the "possible dead lock" message may be a bit misleading, in that it appears at times of high load and disappears at quiet times without any interaction from me. Web sites continue to run, but "like treacle". That, plus the fact that the server can run the same sites on Apache without problems, suggests that more "tuning" is needed. I will submit the issue to support as suggested, thanks.
 
#48
I have no time for now to investigate deeper. I rebuilt server using nginx even if performances are bit slower. but at least, no more dead lock.
my 2 cents feeling is that OLS is killing "long" queries, which are remaining stuck between php and mysql, which is ending in dead lock. but as no relevant logs are found, this is hard to check.
 
#49
I've just witnessed an episode of "dead lock" messages (after a couple of days of normal page loads). It's a Sunday evening, there's hardly any traffic in the logs. Network traffic is tiny (both directions). CPU usage and disk activity are normal. The problem has gone away now so I can't reproduce it.

Where do I even start with this?
 
#50
I am facing the same problem with OLS 1.7.18. Already have zlib.output_compression=On, tried to raise initTimeout, maxConns and env:

Code:
maxConns                120
env                     PHP_LSAPI_CHILDREN=120
initTimeout.        6000
in /usr/local/lsws/conf/httpd-phplimits.conf but the problem keeps coming back. The server load is always less than 1, with 6 cores dedicated CPU, with 10GB RAM available in total 16GB. So, the system resources seem not to be the reason.

I wonder where I can find more possible reasons?
Thanks
 
#51
I have some news on this (in my case) that may or may not help someone.

There is a "stderr" log file at /usr/local/lsws/logs/stderr.log that I didn't know existed. It's enabled by default in Webadmin > Server Configuration > Log but not as far as I can tell accessible from anywhere (except using SSH, of course). In my case I saw several log entries along the lines of "Reached max children process limit ... please increase LSAPI_CHILDREN" that roughly correlated with the times I saw "possible dead lock" messages. I had already load-tested the server and set LSAPI_CHILDREN to a safe maximum, but I tried increasing it anyway. The result was a reduction in "possible dead lock" messages - but the server eventually crashed, apparently out of memory.

It's difficult to be sure because the whole issue is so intermittent, but in my situation it seems there was no possible setting of LSAPI_CHILDREN that simultaneously avoided "possible dead lock" messages without crashing the server. I reluctantly concluded that the server was overloaded and moved everything to a larger VPS. I am still testing that and haven't completely eliminated dead lock messages but it's looking promising.

That may not be the whole story, because the crashes happened at times like 4 am when there were hardly any visitors. Or were there? Everyone is asleep in the UK but they're still awake in the US at that time. Even so, traffic appeared light except for a few search engine bots (I already block malicious bots that cause 404 errors using fail2ban/ipset but you can't block them all). Utilities like "top" reported only moderate CPU, I/O, network and memory usage but they can be misleading.

The other interesting thing in my case was that switching back to Apache2 actually improved the situation (fewer slowdowns or crashes, less disk activity). The new VPS is expensive so I might end up going back to that.
 
#52
Thanks for sharing. However, that might not be the reason in my case. I do notice some "Reached max children process limit..." message, but it's a week ago. And while the PHP_LSAPI_CHILDREN was 60, my error message was:
Code:
Reached max children process limit: 35, extra: 11, current: 46, busy: 45, please increase LSAPI_CHILDREN.
Then I don't know how those numbers related to the limitation of 60.
And like today, when the Dead lock message happened, the monitor showed timeout when trying to connect to the server, and in stderr.log there were only these:

Code:
2023-11-20 00:07:32.282 [STDERR] 2023-11-20 00:07:32.282432 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-995.slice/litespeed-exec.scope/cgroup.procs, pid: 83384
2023-11-20 00:07:32.282 [STDERR] 2023-11-20 00:07:32.282818 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-1000.slice/litespeed-exec.scope/cgroup.procs, pid: 83385
2023-11-20 00:07:32.283 [STDERR] 2023-11-20 00:07:32.283018 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-995.slice/litespeed-exec.scope/cgroup.procs, pid: 83386
2023-11-20 00:07:32.283 [STDERR] 2023-11-20 00:07:32.283394 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-1000.slice/litespeed-exec.scope/cgroup.procs, pid: 83387
2023-11-20 00:07:32.284 [STDERR] 2023-11-20 00:07:32.283882 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-1007.slice/litespeed-exec.scope/cgroup.procs, pid: 83388
2023-11-20 00:07:32.284 [STDERR] 2023-11-20 00:07:32.284226 Cgroups returning success file: /sys/fs/cgroup/systemd/user.slice/user-1002.slice/litespeed-exec.scope/cgroup.procs, pid: 83389
 

Cold-Egg

Administrator
#53
It means you have PHP_LSAPI_CHILDREN set to 35 on either the virtual host External App or Server-level External App, and you hit the limit.
For the "Cgroups returning success file" error, do you have cgroups enabled?
 
#54
I am also having a similar issue, it begins as this:

Code:
2023-11-20 23:18:08.033639 [NOTICE] [6500] LiteSpeed/1.7.18 Open starts successfully!
2023-11-20 23:18:13.032658 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:18:13.032694 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:18:13.041473 [NOTICE] [6500] [AdminPHP] add child process pid: 6505
2023-11-20 23:18:14.392788 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:18:14.392828 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:18:14.406068 [NOTICE] [6500] [AdminPHP] add child process pid: 6507
2023-11-20 23:18:45.031730 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6505:-3:0'.
2023-11-20 23:18:45.401582 [NOTICE] [6498] Cmd from child: [extappkill:6505:-3:0]
2023-11-20 23:19:41.081585 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6507:-3:0'.
2023-11-20 23:19:41.408524 [NOTICE] [6498] Cmd from child: [extappkill:6507:-3:0]
2023-11-20 23:20:11.944192 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:20:11.944248 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:20:11.980735 [NOTICE] [6500] [AdminPHP] add child process pid: 6544
2023-11-20 23:20:42.093955 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6544:-3:0'.
2023-11-20 23:20:42.408386 [NOTICE] [6498] Cmd from child: [extappkill:6544:-3:0]
2023-11-20 23:21:12.950101 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:21:12.950143 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:21:12.950722 [NOTICE] [6500] [AdminPHP] add child process pid: 6594
2023-11-20 23:21:43.081148 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6594:-3:0'.
2023-11-20 23:21:43.413209 [NOTICE] [6498] Cmd from child: [extappkill:6594:-3:0]
2023-11-20 23:22:13.946677 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:22:13.946750 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:22:13.947092 [NOTICE] [6500] [AdminPHP] add child process pid: 6621
2023-11-20 23:22:44.088428 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6621:-3:0'.
2023-11-20 23:22:44.400371 [NOTICE] [6498] Cmd from child: [extappkill:6621:-3:0]
2023-11-20 23:23:14.949571 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:23:14.949614 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:23:14.950062 [NOTICE] [6500] [AdminPHP] add child process pid: 6635
2023-11-20 23:23:45.002843 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6635:-3:0'.
2023-11-20 23:23:45.400398 [NOTICE] [6498] Cmd from child: [extappkill:6635:-3:0]
2023-11-20 23:24:00.137873 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:24:00.137919 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:24:00.138347 [NOTICE] [6500] [AdminPHP] add child process pid: 6641
2023-11-20 23:25:15.592111 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:25:15.592151 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:25:15.593503 [NOTICE] [6500] [AdminPHP] add child process pid: 6651
2023-11-20 23:25:46.096357 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6641:-3:0'.
2023-11-20 23:25:46.400391 [NOTICE] [6498] Cmd from child: [extappkill:6641:-3:0]
2023-11-20 23:26:15.056929 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6651:-3:0'.
2023-11-20 23:26:15.400743 [NOTICE] [6498] Cmd from child: [extappkill:6651:-3:0]
2023-11-20 23:26:17.111964 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:26:17.112005 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:26:17.112435 [NOTICE] [6500] [AdminPHP] add child process pid: 6655
2023-11-20 23:26:17.112530 [NOTICE] [6500] [LocalWorker::workerExec] VHost:_AdminVHost suExec check uid 999 gid 65534 setuidmode 2.
2023-11-20 23:26:17.112543 [NOTICE] [6500] [LocalWorker::workerExec] Config[AdminPHP]: suExec uid -1 gid -1 cmd /usr/local/lsws/admin/fcgi-bin/admin_php -c ../conf/php.ini, final uid 999 gid 65534, flags: 0.
2023-11-20 23:26:17.113950 [NOTICE] [6500] [AdminPHP] add child process pid: 6656
2023-11-20 23:26:48.018810 [NOTICE] [6500] sendKillCmdToWatchdog: 'extappkill:6656:-3:0'.
2023-11-20 23:26:48.400786 [NOTICE] [6498] Cmd from child: [extappkill:6656:-3:0]
Which eventually results in the dreaded "No request delivery notification has been received from LSAPI application, possible dead lock."

Can anybody help shed any light please?
 

Cold-Egg

Administrator
#55
Hi @teedledee

In your case, probably not related to the process limit. Please try to use the latest OLS version, and empty the Memory Soft Limit (bytes), Memory Hard Limit (bytes), Process Soft Limit, and Process Hard Limit value under the External App > LSAPI APP. If the issue persists, please increase the PHP Max execution time and memory limit.

Let us know if it works.
 
Top