many possible dead lock on a new server with a simple script.

#1
Hi,

The mistery "No request delivery notification has been received from LSAPI application, possible dead lock." flooded my error log files
of my openlitespeed server and i can't find a way to fix it after much checking.

I decided to setup another new server on DigitalOcean Ubunut 18.04LTS and run a really simple short script to see how the error is triggerred.

I will appreciate if all of you can weigh in to give your findings in the interest of the community and get the attention of developers,
as this seems to be long running issue on and off.

Setup:
======================
Envirnoment:
1. Ubuntu 18.04.2 (1 cpu 2 Gb ram)
2. LITESPEED/1.4.49 OPEN
3. test script on the web host with no other things installed.

Steps to re-produce the deadlock errors:

1. Create and place the .htaccess at the root of web server.
(with reference about "No abort" settings: https://www.litespeedtech.com/support/wiki/doku.php/litespeed_wiki:php:run-without-timeouts)

# ############# .htaccess ############
# RewriteEngine On

# BEGIN litespeed noabort
# SetEnv noabort 1
# END litespeed noabort

<IfModule Litespeed>
RewriteEngine On
RewriteRule .* - [E=noconntimeout:1]
RewriteRule .* - [E=noabort:1]
</IfModule>

# BEGIN litespeed noabort
#<IfModule rewrite_module>
#RewriteEngine On
#RewriteRule .* - [E=noabort:1]
#</IfModule>
# END litespeed noabort
# ############# .htaccess ############

1. place the really simple test script "test.php" at the root of web server:

Test script:

<?php
// Test OpenliteSpeed deadlock error: No request delivery notification has been received from LSAPI application, possible dead lock.

// ini_set('max_execution_time', '300'); //300 seconds = 5 minutes
ini_set('max_execution_time', '0'); // for infinite time of execution

// ob_end_flush();
// set_time_limit(0);
ini_set('output_buffering', 0);
ini_set('implicit_flush', 1);
ob_end_flush();
ob_start();

// turn on and off echo to screen.
$echo = 1;

// the length duration trigger the deadlocks message, longer duration triggers. shorter may not.
$sleep_sec = 5;

$max = 20 ; $i = 0 ;
echo "Max Loop: $max => ";

while ($i < $max)
{
sleep($sleep_sec);
if ( $i == $max-1 AND $echo == 1)
{
echo "test$i (Completed)";
}
else if($echo == 1)
{
echo "test$i (slept $sleep_sec sec) => ";
}

# header("Status: 200");
ob_flush();
flush();

$i++ ;
}
?>

================ Results ==============
Test Scenarios by Turning on and off echo to screen ($echo = 1 or $echo =0).

off => deadlock message occurs and followed by error 503 "Service unavailable (Server is busy, try again later!)", script failed half way.
on => and sleep_sec > 5, deadlock occurs, script may complete or failed.
on => on and sleep_sec <= 5, deadlock may not occurs, script completed

Question 1: When deadlock error message appear at the error logs (view from the control panel), will the script stop?
Answer: Depends. Script may continues till the end or 503 error (Service Unavailable).

Question 2. What is the effect of the deadlock error?
Answer: No description about this on the error list at Openlitespeed wiki error list :
1. https://www.litespeedtech.com/support/wiki/doku.php/litespeed_wiki:php:execution-errors?s[]=deadlock
2. https://openlitespeed.org/kb/

Question 3. The wiki docu for Openlitespeed No abort setting (https://www.litespeedtech.com/support/wiki/doku.php/litespeed_wiki:php:run-without-timeouts)
mentioned that: "Setting No abort Globally via the WebAdmin => WebAdmin console > Configuration Server > General > External Application Abort"

But the setting is not found after i login, checked throught all place in WebAdmin, there is no "External Application Abort". Where is "External Application Abort"?
=====================================

I am running back ground php script for certain tasks:
1. backup and zip files task.
2. run a batch job => Query DB for records, loop through records, send api request to external host, process, update database, repeat.
it is inevitable my script will run longer.

Also for another server having the same issue, that runs wordpress, i found this URL https://www.wordfence.com/help/advanced/system-requirements/litespeed/
mentioned: "wordfence for wordpress mentioned about litespeed is known for killing process that runs a few sec longer."

I believe we have a common needs above (i.e. sometime our script needs to run longer), but seems the deadlock problems keep appearing and flooded my error logs,
and we don't know if the process are completed or failed and the current fix (no abort settings) is more of trial and error. I tried but i can't get the on abort working too as reported above.

Please test and share your experiences, i see many asked the same problems before but no definite fix or ways to solve this.
The solution is "try this, if it does not work, try that or that".

We need a solution to this and should be a simple solution.

Thank you.
 
#2
Hi

I'm having the same problem with lots of clients too
At least two wordpress-api sites are broken when using OLS because of dead locks. If some script takes too long (a few seconds, actually) to return any data, we get a lot of deadlocks on log and eventually everything dies.

I've searched a lot, but found no fix yet. This was the better topic about the subject I found. I was able to replicate the problem using the OPs script too. :/
 
#3
Thanks for your confirmation that you can replicate this problem with my instructions. It means everyone will likely face this problem. I think this is going to be a show stopper.
 
#4
I have same problem, a lot "No request delivery notification has been received from LSAPI..." in OLS error.log file. I use OpenLiteSpeed 1.4.50. Ubuntu 18.04.3 LTS
 
Last edited:

David

Active Member
#5
I will follow your steps to reproduce this issue, if I can reproduce it correctly, I will fix it.
Can you give me more info about your lsphp version?
Thanks.
 

David

Active Member
#6
I can reproduce it and I updates the lsphp timeout to longer time in server conf file, it seems fixed.
Code:
extProcessor lsphp{
    type                            lsapi
    address                         uds://tmp/lshttpd/lsphp.sock
    ......
    

    initTimeout                     600
    pcKeepAliveTimeout     10000
 
}
 

David

Active Member
#7
My test.php is
Code:
<?php
// Test OpenliteSpeed deadlock error: No request delivery notification has been received from LSAPI application, possible dead lock.

// ini_set('max_execution_time', '300'); //300 seconds = 5 minutes
ini_set('max_execution_time', '0'); // for infinite time of execution

// ob_end_flush();
// set_time_limit(0);
ini_set('output_buffering', 0);
ini_set('implicit_flush', 1);
ob_end_flush();
ob_start();

// turn on and off echo to screen.
$echo = 0;

// the length duration trigger the deadlocks message, longer duration triggers. shorter may not.
$sleep_sec = 10;

$max = 20 ; $i = 0 ;
echo "Max Loop: $max => ";

while ($i < $max)
{
sleep($sleep_sec);
if ( $i == $max-1 )
{
echo "test$i (Completed)";
}
else if($echo == 1)
{
echo "test$i (slept $sleep_sec sec) => ";
}



# header("Status: 200");
ob_flush();
flush();

$i++ ;
}
?>
 
#9
Thanks David for your test. From your info, i summarise the following:

1. OpenLiteSpeed does not respect php ini_set('max_execution_time', '0') in the script. It will kill the script if "certain" timeout occurs. The parameters that affect the timeout cannot be predictably determined without trial and error of settings the parameters' permutations in the config settings.

2. The probability of success of test script depends on these 3 parameters in the /usr/local/lsws/conf/httpd_config.conf:

Code:
extProcessor lsphp{

    ......

    persistConn                     1
    initTimeout                     6000
    pcKeepAliveTimeout              20000

}
Note: The per website vhost settings will also override the values above.

In my test, i still get script failed halfway with your values( initTimeout 600 and pcKeepAliveTimeout 10000), and i increased to 6000 and 20000 for initTimeout and pcKeepAliveTimeout , both parameters' values are in sec, however, it does not seems to have any significance to affect the outcome, when the wait is more than 10 sec per loop.

In fact, I doubt initTimeout and pcKeepAliveTimeout are actually in sec as what the document stated somewhere, proven by 10000 for pcKeepAliveTimeout is 166 min, which can't handle script takes longer than 10 sec x 20 loops (about 3.3 min in total per script run).


3. It will be a concern, when script fails when handling activities such as online purchase, worst if concurrent users are trying to complete their transactions and the script failed or fail to process the payment api from payment gateways. True, we can tune the values, but it is more of trial and error.

4. the "No abort" setting to resolve longer running script failure add more confusion to the problem, as it does not fix it, instead divert the issue, since it can also be affected by initTimeout and pcKeepAliveTimeout.

5. The "many possible deadlock" error, are not helpful in determining the problem, when the current approach to it is to tune many permutations of params and no guaranteed of fixing the longer running script issue. In my test case, when this error occurred, script sometimes completed running but other times failed with error 503 - no service available.

6. It seems the "longer" running php script also needs to output(echo) something to keep it from being killed by OpenLiteSpeed, and it is not obvious by php command echo alone:

Code:
// begin of script
ini_set('output_buffering', 0);
ini_set('implicit_flush', 1);
ob_end_flush();
ob_start();

echo "something";

// force the output, else it will contribute to the death of the script if php script runs longer than x sec.
ob_flush();
flush();

7. when php says ini_set('max_execution_time', '0'), it means it. Not in the case for OpenLiteSpeed, not obvious, you got to play with its params until hopefully you get it and you may need to play again when your server's load changes.

That's all for my observation so far as a new user to OpenLiteSpeed, I hope it will be useful for new users who encounter similar issues. Perhaps also encourage better documentation or making it more obvious to the developers to address the issues.

Please continue to share your tests and contribute to this thread.
 
Last edited:

David

Active Member
#10
1, the php setting ini_set('max_execution_time', '0') is set to php process include other php ENV setting;
2, the openlitespeed setting `initTimeout` is for openlitespeed, it is counted by seconds, the default value is 60, so that usually it will be timeout after 1 minute, we will tune this later;
3, `pcKeepAliveTimeout` setting may have no contribution here.
4, "No abort" is not implemented currently, LSWS has it but open version does not yet.
Thanks for your feedback, we will keep working to make openlitespeed better.
 
#11
Hi David! Glad to see you here taking care of this problem :)

I see you "fixed" the problem in your test setting initTimeout to 600 and I thoght to give it a try, but when I checked my config, my timeout was on 3000 and the deadlock still occurs.

I spent the last few hours trying various combinations to fix this problem, but no luck. Everytime I got a few deadlocks.
My scripts are running for about 5 to 10 seconds and then deadlocks
 
#13
I used the code you provided :)

Maybe some other configuration is wrong?


Code:
extprocessor xxxxxxxxxxxxxx {
  type                    lsapi
  address                 UDS://srv/xxxxxxxxxxxxxx/etc/php/lsws.sock
  maxConns                2
  env                     PHP_LSAPI_CHILDREN=2
  initTimeout             3000
  retryTimeout            0
  persistConn             1
  respBuffer              0
  autoStart               2
  path                    /usr/local/lsws/lsphp73/bin/lsphp
  backlog                 100
  instances               1
  extUser                 zlx
  extGroup                www-data
  umask                   022
  runOnStartUp            1
  priority                0
}
 

David

Active Member
#14
Need to make sure this `extprocessor ` is serving the testing php, since you may have VHost own php and server level php.
For my test case, I tested about 10 times, all are good.
I checked the code, the `initTimeout` do the exact thing.
 
#15
Thanks for your time, guys, the timeout value is not the timeout value that we probably think it is or behaves.

As David is taking the effort to test and fix it, we need more people to test and point out otherwise if it is indeed otherwise.
We need the script not to die unexpectedly, for the good of all, otherwise we really can't use this platform for certain critical situation.

Please join us and follow the test instructions and see you can help to provide more info of your observations.

ACP
 
#17
Need to make sure this `extprocessor ` is serving the testing php, since you may have VHost own php and server level php.
For my test case, I tested about 10 times, all are good.
I checked the code, the `initTimeout` do the exact thing.
Right, I only have the VHost PHP extprocessor created. I removed the server level php
I'll install OLS on a new server and keep all default values, changing only initTimout to test the provided scripts.

I will keep you updated if I find anything else. Thanks guys :)
 
#18
Ok, setting the initTimeout higher indeed seems to fix the problem most of the time. I'm still getting one or another deadlock, but the script is running until the end. My problem now was caused by nginx, since we are using nginx in front of OLS because of sub_filter module.

I adjusted proxy_read_timeout and proxy_send_timeout to match initTimeout and max_execution_time values and everything worked. Got only one "possible dead lock" warning after running the script three times.

I hope Wordpress stop breaking itself now everytime it tries to auto-update :)

Thanks again :D
 
#20
initTimeout to 6000 not fix it for me. What else can be done for fixing?
my config
extprocessor lsphp {
type lsapi
address uds://tmp/lshttpd/lsphp.sock
maxConns 100
env PHP_LSAPI_CHILDREN=100
initTimeout 6000
retryTimeout 0
persistConn 1
respBuffer 0
autoStart 1
path $SERVER_ROOT/fcgi-bin/lsphpnew
backlog 100
instances 1
priority 0
memSoftLimit 2047M
memHardLimit 2047M
procSoftLimit 400
procHardLimit 500
}
 
Top