No request delivery notification has been received from LSAPI application, possible dead lock.

#1
Hi,
My website's server log is full of "No request delivery notification has been received from LSAPI application, possible dead lock." and "ExtConn timed out while processing." messages.

I'm using LSPHP 8.1, connection timeout is set to 600secs, PHP_LSAPI_CHILDREN=10

I tried to find the root cause for the issue by disabling all WordPress plugins, but even with a pure WordPress installation with zero enabled plugins and standard WordPress theme, the error is still appearing.

Any idea what could be causing this?
 

Cold-Egg

Administrator
#2
What's the output of the following command?
Code:
ps aux | grep php
If you did not hit the process limit, does any simple PHP e.g. phpinfo page loadable?
 
#3
Hi,
The website itself is working, phpinfo, too. it is just the logfiles which are flooded with the "No request delivery notification has been received from LSAPI application, possible dead lock." and "ExtConn timed out while processing." messages.

Attached the screenshot with the ps aux | grep php output. I increased the PHP_LSAPI_CHILDREN limit to 100 in the meantime as I got in the STD_ERR log also the message, that it was reaching the limit of 10. This error message is gone now, but the deadlock and ExtConn messages are still there.
 

Attachments

Cold-Egg

Administrator
#6
@LiteCache

I just tried a long-run script to verify it, and "RewriteRule .* - [E=noabort:1,E=noconntimeout:1]" seems to be working in my case. Just want to share it with you, maybe you can also give it a try. :)
 

LiteCache

Active Member
#7
@Cold-Egg

That's very strange. No matter what attempts I read from numerous users, they all come to the conclusion that these directives do not work in OLS, so I can only conclude from your answer that it must be due to the type of script and process. Can you tell me how you tested this?
 

Cold-Egg

Administrator
#8
Sure, my test env is simple,

1. max_execution_time=120
2. RewriteRule and Rewrite log Level 9 are Enabled
3. Add the following script under the doc root with any name you want
Code:
<?php

set_time_limit(0);

$i = 0;

while($i < 1000) {

    echo $i.'<br />';

    if($i == 900){
        echo 'sleeping 900 secs<br />';
        sleep(900);
        echo 'waking up<br />';
    }

    $i++;
}

?>
Visit the script from the browser and got a timeout error after a while.

Then add "RewriteRule .* - [E=noabort:1,E=noconntimeout:1]" to the top of the .htaccess file and restart OLS.
Test it again, and the page loads after 15 minutes.

And following add ENV shown in the error log as well.
Code:
[REWRITE] add ENV: 'noabort:1'
[REWRITE] add ENV: 'noconntimeout:1'
 
#10
Sure, my test env is simple,

1. max_execution_time=120
2. RewriteRule and Rewrite log Level 9 are Enabled
3. Add the following script under the doc root with any name you want
Code:
<?php

set_time_limit(0);

$i = 0;

while($i < 1000) {

    echo $i.'<br />';

    if($i == 900){
        echo 'sleeping 900 secs<br />';
        sleep(900);
        echo 'waking up<br />';
    }

    $i++;
}

?>
Visit the script from the browser and got a timeout error after a while.

Then add "RewriteRule .* - [E=noabort:1,E=noconntimeout:1]" to the top of the .htaccess file and restart OLS.
Test it again, and the page loads after 15 minutes.

And following add ENV shown in the error log as well.
Code:
[REWRITE] add ENV: 'noabort:1'
[REWRITE] add ENV: 'noconntimeout:1'
Does this effect all subdirectories under the root directory if I update them in the root .htaccess file?
 
#13
We to are getting the same issues with php8.2 and ols 1.7.17. The only difference is that we get no response back from OpenLiteSpeed and eventually the error changes. We have seen this on multiple servers some with high traffic and some with zero traffic.

2023-09-11_09-35-ols-error-log.png
 
#16
Hi, I also have this problem.
Code:
2023-09-23 14:23:13.004538 [NOTICE] [1061391] [37.47.84.255:30417:HTTP2-25#zawodtyper:lsapi] No request delivery notification has been received from LSAPI application, possible dead lock.
Some time ago, problems with the server began. From time to time, the site monitoring reported error 408, and users of the site experienced lag. Apart from these moments, the site worked normally - fast. I'm using Ubuntu 22.04 with the latest updates and OLS version 1.7.18. I have a pretty powerful dedicated server (AMD Ryzen 9 5900X - 12c/24t - 3,7 GHz/4,8 GHz, 64GB RAM). Yesterday I even deleted OLS and reinstalled it. It didn't help, it's the same.

The lsws/logs directory shows a lot of logs that look like this:

Code:
2023-09-23 08:03:10.896088 [INFO] logger: set pacer to info
2023-09-23 08:03:10.896107 [DEBUG] engine: next advisory tick is now: have 1 tickable connection (511FFAF03AD01F10 first)
2023-09-23 08:03:10.896111 [DEBUG] engine: decref conn 511FFAF03AD01F10, 'HT' -> 'H'
2023-09-23 08:03:10.896115 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 2
2023-09-23 08:03:10.896119 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 6
2023-09-23 08:03:10.896122 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 10
2023-09-23 08:03:10.896133 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 156
2023-09-23 08:03:10.896136 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 164
2023-09-23 08:03:10.896139 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 168
2023-09-23 08:03:10.896142 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: initialized; # elems: 6; sets: [ 00000000, 00000011 ]
2023-09-23 08:03:10.896145 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 2, incr: 1, priority 0
2023-09-23 08:03:10.896149 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 6, incr: 1, priority 0
2023-09-23 08:03:10.896152 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 10, incr: 1, priority 0
2023-09-23 08:03:10.896155 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 156, incr: 1, priority 4
2023-09-23 08:03:10.896158 [DEBUG] [QUIC:511FFAF03AD01F10-156] di: get_frame: no frame at read offset 23
2023-09-23 08:03:10.896161 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 164, incr: 1, priority 4
2023-09-23 08:03:10.896166 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] QuicStream::onRead()
2023-09-23 08:03:10.896171 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164#zawodtyper] HttpSession::onReadEx(), state: 6!
2023-09-23 08:03:10.896174 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164#zawodtyper] Read Request Body!
2023-09-23 08:03:10.896177 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] QuicStream::read(), to read: 48, ret: 0
2023-09-23 08:03:10.896180 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] End of stream detected, mark EOF
2023-09-23 08:03:10.896182 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 168, incr: 1, priority 4
2023-09-23 08:03:10.896186 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] QuicStream::onRead()
2023-09-23 08:03:10.896189 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168#zawodtyper] HttpSession::onReadEx(), state: 6!
2023-09-23 08:03:10.896192 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168#zawodtyper] Read Request Body!
2023-09-23 08:03:10.896194 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] QuicStream::read(), to read: 50, ret: 0
2023-09-23 08:03:10.896197 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] End of stream detected, mark EOF
2023-09-23 08:03:10.896200 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: send_ctl_can_send: n_out: 240 (unacked_all: 240); cwnd: 122171; ccfc: 321767/16777216
2023-09-23 08:03:10.896204 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: send_ctl_can_send: n_out: 240 (unacked_all: 240); cwnd: 122171; ccfc: 321767/16777216
2023-09-23 08:03:10.896207 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: app-limited
2023-09-23 08:03:10.896210 [DEBUG] engine: incref conn 511FFAF03AD01F10, 'H' -> 'HK'
2023-09-23 08:03:10.896213 [DEBUG] engine: decref conn 511FFAF03AD01F10, 'HK' -> 'H'
2023-09-23 08:03:10.896216 [DEBUG] [QUIC:511FFAF03AD01F10-156] di: get_frame: no frame at read offset 23
2023-09-23 08:03:10.896219 [DEBUG] [QUIC:511FFAF03AD01F10] conn: tickable: stream 164 can be read from
2023-09-23 08:03:10.896222 [DEBUG] engine: incref conn 511FFAF03AD01F10, 'H' -> 'HT'
2023-09-23 08:03:10.896225 [DEBUG] engine: next advisory tick is now: have 1 tickable connection (511FFAF03AD01F10 first)
2023-09-23 08:03:10.896228 [DEBUG] engine: next advisory tick is now: have 1 tickable connection (511FFAF03AD01F10 first)
2023-09-23 08:03:10.896231 [DEBUG] engine: decref conn 511FFAF03AD01F10, 'HT' -> 'H'
2023-09-23 08:03:10.896234 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 2
2023-09-23 08:03:10.896237 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 6
2023-09-23 08:03:10.896239 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 10
2023-09-23 08:03:10.896242 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 156
2023-09-23 08:03:10.896248 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 164
2023-09-23 08:03:10.896251 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: add stream 168
2023-09-23 08:03:10.896254 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: initialized; # elems: 6; sets: [ 00000000, 00000011 ]
2023-09-23 08:03:10.896257 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 2, incr: 1, priority 0
2023-09-23 08:03:10.896260 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 6, incr: 1, priority 0
2023-09-23 08:03:10.896263 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 10, incr: 1, priority 0
2023-09-23 08:03:10.896266 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 156, incr: 1, priority 4
2023-09-23 08:03:10.896269 [DEBUG] [QUIC:511FFAF03AD01F10-156] di: get_frame: no frame at read offset 23
2023-09-23 08:03:10.896272 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 164, incr: 1, priority 4
2023-09-23 08:03:10.896275 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] QuicStream::onRead()
2023-09-23 08:03:10.896278 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164#zawodtyper] HttpSession::onReadEx(), state: 6!
2023-09-23 08:03:10.896280 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164#zawodtyper] Read Request Body!
2023-09-23 08:03:10.896283 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] QuicStream::read(), to read: 48, ret: 0
2023-09-23 08:03:10.896285 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-164] End of stream detected, mark EOF
2023-09-23 08:03:10.896288 [DEBUG] [QUIC:511FFAF03AD01F10] hpi: read-0: lsquic_hpi_next: return stream 168, incr: 1, priority 4
2023-09-23 08:03:10.896291 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] QuicStream::onRead()
2023-09-23 08:03:10.896293 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168#zawodtyper] HttpSession::onReadEx(), state: 6!
2023-09-23 08:03:10.896296 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168#zawodtyper] Read Request Body!
2023-09-23 08:03:10.896298 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] QuicStream::read(), to read: 50, ret: 0
2023-09-23 08:03:10.896301 [DEBUG] [1039386] [178.235.180.214:51110-Q:511FFAF03AD01F10-168] End of stream detected, mark EOF
2023-09-23 08:03:10.896303 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: send_ctl_can_send: n_out: 240 (unacked_all: 240); cwnd: 122171; ccfc: 321767/16777216
2023-09-23 08:03:10.896307 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: send_ctl_can_send: n_out: 240 (unacked_all: 240); cwnd: 122171; ccfc: 321767/16777216
2023-09-23 08:03:10.896310 [DEBUG] [QUIC:511FFAF03AD01F10] sendctl: app-limited
Even though I don't have debugging enabled. Log level in the server settings is at ERROR (debug level None).
In the stderr.log file, I noticed these logs today:

Code:
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
2023-09-23 13:22:17.835 [STDERR] [UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
2023-09-23 13:22:17.935 [STDERR] [UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
2023-09-23 13:22:18.800 [STDERR] [UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
[UID:65534][1039489] Reached max children process limit: 500, extra: 0, current: 500, busy: 500, please increase LSAPI_CHILDREN.
After that, I increased PHP_LSAPI_CHILDREN and maxConns to 1000 (but that seems like a lot, doesn't it?).

This is my config:
Code:
#
# PLAIN TEXT CONFIGURATION FILE
#
# If not set, will use host name as serverName
serverName              
user                      nobody
group                     nogroup
priority                  0
inMemBufSize              60M
swappingDir               /tmp/lshttpd/swap
autoFix503                1
gracefulRestartTimeout    300
mime                      conf/mime.properties
showVersionNumber         0
adminEmails               myemail@gmail.com

errorlog logs/error.log {
  logLevel                ERROR
  debugLevel              0
  rollingSize             100M
  keepDays                1
  enableStderrLog         1
}

accesslog logs/access.log {
  rollingSize             10M
  keepDays                30
  compressArchive         0
}
indexFiles                index.html, index.php

expires  {
  enableExpires           1
  expiresByType           image/*=A31557600, text/css=A31557600, application/x-javascript=A31557600, application/javascript=A31557600, font/*=A31557600, application/x-font-ttf=A31557600
}

tuning  {
  maxConnections          10000
  maxSSLConnections       10000
  connTimeout             300
  maxKeepAliveReq         10000
  keepAliveTimeout        5
  sndBufSize              0
  rcvBufSize              0
  maxReqURLLen            32768
  maxReqHeaderSize        65536
  maxReqBodySize          2047M
  maxDynRespHeaderSize    32768
  maxDynRespSize          2047M
  maxCachedFileSize       4096
  totalInMemCacheSize     20M
  maxMMapFileSize         256K
  totalMMapCacheSize      40M
  useSendfile             1
  fileETag                28
  enableGzipCompress      1
  compressibleTypes       default
  enableDynGzipCompress   1
  gzipCompressLevel       6
  gzipAutoUpdateStatic    1
  gzipStaticCompressLevel 6
  brStaticCompressLevel   6
  gzipMaxFileSize         10M
  gzipMinFileSize         300

  quicEnable              1
  quicShmDir              /dev/shm
}

fileAccessControl  {
  followSymbolLink        1
  checkSymbolLink         0
  requiredPermissionMask  000
  restrictedPermissionMask 000
}

perClientConnLimit  {
  staticReqPerSec         0
  dynReqPerSec            0
  outBandwidth            0
  inBandwidth             0
  softLimit               10000
  hardLimit               10000
  gracePeriod             15
  banPeriod               300
}

CGIRLimit  {
  maxCGIInstances         20
  minUID                  11
  minGID                  10
  priority                0
  CPUSoftLimit            10
  CPUHardLimit            50
  memSoftLimit            2047M
  memHardLimit            2047M
  procSoftLimit           400
  procHardLimit           450
}

accessDenyDir  {
  dir                     /
  dir                     /etc/*
  dir                     /dev/*
  dir                     conf/*
  dir                     admin/conf/*
}

accessControl  {
  allow                   ALL
}

extprocessor lsphp {
  type                    lsapi
  address                 uds://tmp/lshttpd/lsphp.sock
  maxConns                1000
  env                     PHP_LSAPI_CHILDREN=1000
  env                     LSAPI_AVOID_FORK=1
  env                     LSAPI_SLOW_REQ_MSECS=2000
  initTimeout             60
  retryTimeout            0
  persistConn             1
  respBuffer              0
  autoStart               2
  path                    lsphp80/bin/lsphp
  backlog                 100
  instances               1
  priority                0
  memSoftLimit            0
  memHardLimit            0
  procSoftLimit           1400
  procHardLimit           1500
}

scripthandler  {
  add                     lsapi:lsphp php
}

railsDefaults  {
  maxConns                1
  env                     LSAPI_MAX_IDLE=60
  initTimeout             60
  retryTimeout            0
  pcKeepAliveTimeout      60
  respBuffer              0
  backlog                 50
  runOnStartUp            3
  extMaxIdleTime          300
  priority                3
  memSoftLimit            0
  memHardLimit            0
  procSoftLimit           500
  procHardLimit           600
}

wsgiDefaults  {
  maxConns                5
  env                     LSAPI_MAX_IDLE=60
  initTimeout             60
  retryTimeout            0
  pcKeepAliveTimeout      60
  respBuffer              0
  backlog                 50
  runOnStartUp            3
  extMaxIdleTime          300
  priority                3
  memSoftLimit            0
  memHardLimit            0
  procSoftLimit           500
  procHardLimit           600
}

nodeDefaults  {
  maxConns                5
  env                     LSAPI_MAX_IDLE=60
  initTimeout             60
  retryTimeout            0
  pcKeepAliveTimeout      60
  respBuffer              0
  backlog                 50
  runOnStartUp            3
  extMaxIdleTime          300
  priority                3
  memSoftLimit            0
  memHardLimit            0
  procSoftLimit           500
  procHardLimit           600
}

module cache {
  internal                1

checkPrivateCache   1
checkPublicCache    1
maxCacheObjSize     10000000
maxStaleAge         200
qsCache             1
reqCookieCache      1
respCookieCache     1
ignoreReqCacheCtrl  1
ignoreRespCacheCtrl 0

enableCache         1
expireInSeconds     600
enablePrivateCache  1
privateExpireInSeconds 600
  ls_enabled              1
}

virtualhost mydomainname {
  vhRoot                  mydomainname/
  configFile              conf/vhosts/mydomainname/vhconf.conf
  allowSymbolLink         1
  enableScript            1
  restrained              1
  setUIDMode              0
}

virtualhost myanotherdomainname {
  vhRoot                  myanotherdomainname/
  configFile              conf/vhosts/myanotherdomainname/vhconf.conf
  allowSymbolLink         1
  enableScript            1
  restrained              0
  setUIDMode              0
}

listener mydomainname {
  address                 *:443
  secure                  1
  keyFile                 /etc/letsencrypt/live/mydomainname.pl/privkey.pem
  certFile                /etc/letsencrypt/live/mydomainname.pl/fullchain.pem
  certChain               1
  map                     mydomainname mydomainname.pl
  map                     myanotherdomainname myanotherdomainname.com
}

listener http {
  address                 *:80
  secure                  0
  map                     mydomainname mydomainname.pl
  map                     myanotherdomainname myanotherdomainname.com
}

Please help. Is some script on my site causing these errors or is it a bug in OLS?
 

Cold-Egg

Administrator
#17
Looks weird, 500 is already way too much, please set it back or to a small number, and try to set 0 to both procSoftLimit and procHardLimit. If it does not work, please submit the issue to support@litespeedtech.com with this forum post link appended so we can have a further look.
 
#18
(Reposting this from a response to an older similar post at https://forum.openlitespeed.org/threads/possible-dead-lock-lsapi-application-help.4394/, because I've had no response and perhaps this is a better place. Happy to raise an entirely new issue if required. Sometimes you're damned if you do, damned if you don't!)

I've recently had to stop using OLS (and return to Apache2, which does work but slowly) because of this problem. :-(

I accept that it may not be a bug in OLS itself - the root cause is probably more likely to be a bug in a WordPress plugin or a misconfiguration of OLS that causes it to run out of resources. The problem is, there's no clear guidance for isolating the issue, but it has been reported for years and must be damaging uptake of OLS.

For me, the problem is intermittent and none of the suggested configuration tweaks (for example to cpuAffinity or zlib.output_compression) have helped.

Apart from multiple "No request delivery notification has been received from LSAPI application, possible dead lock" errors, I haven't been able to find anything significant in any error logs. I don't see any hung processes when viewed with "top". Switching from memcached to redis seemed to help for a while but the problem returned. Intermittent problems are a bitch. When it happens, it seems to simultaneously affect the back end of multiple sites more that the front end - it doesn't trigger site load speed monitors and I only know about it when admins start complaining that everything runs "like treacle" or they have seen 503 timeouts. The problem can be fixed for a while (a few days) by restarting everything.

It's a production server (small VPS, 4 MB RAM, 2 cores and about 50 sites all running on Virtualmin and Ubuntu 22.04) so there is a limit to the testing that can be done. Before deploying it I did run some load tests to check resource settings, and it seemed reliable (crucially, more reliable than Apache2).

Does anyone have guidance on tweaking the numerous resource settings to avoid this? Does it affect people using the paid LiteSpeed version as well?
 
#19
We are seeing exactly the same thing as described initially:

"No request delivery notification has been received from LSAPI application, possible dead lock."

happening very often -several times a day-. The server seems to work normally most of the times when this happens. Additionally, like above, we see from time to time the other error on stderr:

"Reached max children process limit: 10, extra: 0, current: 10, busy: 10, please increase LSAPI_CHILDREN "

that happens rarely though -every 2-5 days?-. And in the meantime, we see 500 happening also rarely from time to time.

What does the first message mean? What are the guidelines to get rid of this log and potentially related issues? just increase PHP_LSAPI_CHILDREN ?
 
#20
If it helps, I found that increasing PHP_LSAPI_CHILDREN did indeed eventually help. I had to increase it a LOT - it's currently at 100.

Be careful, though. There's a risk that increasing it too much will cause the server to run out of memory and crash. This happened to me three days in a row at 4am (UK time) and I lost a longstanding customer as a result. Doubling the resources of the server eventually fixed it (at considerable cost, of course).

Here't the thing though - I tested all this using sites like loadimpact.com before deploying the server. I found the maximum and minimum values of PHP_LSAPI_CHILDREN needed for a given number of simultanous visitors and scaled the whole thing with plenty of headroom. And looking at the access logs and resource graphs at the time of the crashes the resource usage (CPU, memory, bandwidth, disk) was never more than half the danger level.

We could really use some better guidance for setting OLS up in a reliable configuration and diagnosing it when things go wrong. The default values are clearly inadequate (especially PHP_LSAPI_CHILDREN) and the log files are hard to find and the messages are misleading.
 
Top