Possible hidden BUG in OpenLiteSpeed

#1
Hi everyone,

We are having a very extrange issue with OpenLiteSpeed that we cannot solve after doing many different test under different environments... we think it could be a BUG in LiteSpeed...

Attached is a sample 2 HTML files with some JS files that you can use to reproduce the issue:

As you can see, we load several JS files from the HTML using the <script> tag.

If you upload these files to a LiteSpeed webserver and open the first sample (sample1.html) in normal circunstances, the HTML loads fine.

But, sometimes, after reloading the page many times, the webserver gets hang downloading some of the JS files and it takes 1 minute (exactly 1 minute) to donwload that JS file. In consecuence, the overall load of the HTML gets hang until all JS files were loaded...

On the other attachment, you can see a screen capture of the network panel of Chrome.

You can see how the videojs.min.js file took 1 minute to download... 100% of the times it is the same file: videojs.min.js. We don't know what this file has to be with the issue, because it is a regular Javascript file. The only thing is that it is a big file (more than 100Kb). We have tested with the plain Javascript (without minimize), we have tested on different servers and configurations, and always the same...

But, if we change the order in wich the JS files are loaded, then we had the problem with jquery-ui-1.8.24.custom.min.js and jquery.datetimepicker.js. These 2 files are also big files. You can see this behaviour in sample2.html

So it seems the issue has some relation with the order of download and also with some specific Javascript files.

You can watch a video showing the issue here: https://wetransfer.com/downloads/5f1d87579be4e84071682ba81329ef4720200723150413/76400f

It is not easy to reproduce this behaviour... The easiest way to catch this issue is doing as follows:

* Clear all Chrome cache
* Close Chrorme entirely
* Open Chrome
* Open a "incognito" window
* Open the Network Panel on the Chrome Inspector
* Go to the URL where the HTML is stored and see if the issue occurs...


If the load works fine, repeat the process, and sooner or later, it will happend.

We have tested this in Unbuntu 18.04, Ubuntu 20.04 and Centos 7, in cloud servers on OVH, IONOS and Microsoft Azure. On every site, the problem still occurs.

We need to put some servers in production but we have to wait until this issue is solved... We would appreciate if you can give us a quick solution to this issue.

Thank you very much!
 

Attachments

Cold-Egg

Administrator
#2
Hi @natas123 ,

Did you set any Per Client Throttling under security settings?

I tried several times from my test server with both HTTP/HTTPS but no luck.
Will DM you the site URL see if you can reproduce it easily.

Best,
Eric
 
#3
Hi Eric,

Thank you very much for you quick response!

Our configuration is very usual, as we use most of the default values that come with the standard setup.

We are using latest stable version (1.6.14) altought we have been suffering this issue since a lot of time with previous versions. We have been trying this with version 1.7.3 (we don't know if this version is definitive or not) and it seems it works, maybe you have solved this issue in that version?

One important thing: to reproduce the issue, please only test in HTTPS, because we think it does not fail with HTTP.

I sent to you the two configuration files we are using via the private conversation you created. Maybe you can see something wrong on them... We have tried disabling/enabling compression, static files delivery, optimization, etc. but with no luck.

Thank you!
 
#4
@natas123 I've run into the "hanging for exactly one minute" problem a few times until I finally cleared out what was causing it.

For me, the problem was when I tried to upgrade to BBR for TCP congestion control on my server. If you look up some guides about making this upgrade, they often say to add two lines to your sysctl.conf file - but one of these was the culprit.

Good line:
net.ipv4.tcp_congestion_control = bbr

Bad! (on CentOS 7 with OpenLiteSpeed, at least)
net.ipv4.tcp_notsent_lowat = 16384

Check your conf files for this tcp_notsent_lowat line and comment it out, and see if that clears it up.
 
#5
Hi @Lubos

We have been testing your solution for a while in production and the error is effectively fixed.
We had in sysctl.conf the line that you comment "net.ipv4.tcp_notsent_lowat = xxxx", we deactivated it and the error stopped passing us.

You are a crack, thank you very much for your help
 
Top