Assistance Required with ModSecurity Rule Compatibility for OpenLiteSpeed

#1
I recently came across the following ModSecurity rule intended to limit client hits by user agent:

Apache config:
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit" \
    "id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "@gt 1" \
    "chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
    SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
Unfortunately, this rule does not seem to work with OpenLiteSpeed. Could you please help me rewrite this ModSecurity rule to make it compatible with OpenLiteSpeed?

Thank you for your assistance.
 
#5
@admiral504

Just for testing, try 403 instead of 429 to check if OLS supports ErrorDocument 429.
Thank you for your guidance. I tried modifying the security rule to return a 403 status instead of 429. Here is the updated rule:

Code:
# Increment the global rate limit variable for any user agent containing 'facebook'
SecRule REQUEST_HEADERS:User-Agent "@contains facebook" \
    "id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebook=+1,expirevar:global.ratelimit_facebook=10"

# Check if the rate limit variable exceeds the threshold of 3 requests in 10 seconds
SecRule GLOBAL:RATELIMIT_FACEBOOK "@gt 3" \
    "chain,id:400010,phase:2,pause:10000,deny,status:403,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
    SecRule REQUEST_HEADERS:User-Agent "@contains facebook"

# Custom error message for status 403
ErrorDocument 403 "Access Denied: Too Many Requests"
I placed the rule in /usr/local/lsws/conf/modsec/rules.conf and then restarted OpenLiteSpeed. Afterward, I made several requests by repeatedly crawling at Facebook Debugger.

However, all the requests still returned a status of 200, not the expected 403.

Code:
"172.68.26.8 - - [11/Jun/2024:01:21:31 +0700] "GET /robots.txt HTTP/1.1" 200 128 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.71.174.164 - - [11/Jun/2024:01:21:32 +0700] "GET / HTTP/1.1" 200 34326 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"162.158.175.172 - - [11/Jun/2024:01:21:32 +0700] "GET / HTTP/1.1" 200 34326 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.71.166.170 - - [11/Jun/2024:01:21:34 +0700] "GET / HTTP/1.1" 200 34331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.68.26.185 - - [11/Jun/2024:01:21:36 +0700] "GET / HTTP/1.1" 200 34326 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"162.158.114.2 - - [11/Jun/2024:01:21:38 +0700] "POST /wp-cron.php?doing_wp_cron=1718043698.6080009937286376953125 HTTP/1.1" 200 0 "-" "WordPress/6.5.3; https://truyenthongdps.com""
"172.69.65.211 - - [11/Jun/2024:01:21:37 +0700] "GET / HTTP/1.1" 200 34329 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.68.26.185 - - [11/Jun/2024:01:21:40 +0700] "GET / HTTP/1.1" 200 34326 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.68.26.37 - - [11/Jun/2024:01:21:41 +0700] "GET / HTTP/1.1" 200 34331 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
"172.69.65.34 - - [11/Jun/2024:01:21:45 +0700] "GET / HTTP/1.1" 200 34326 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)""
 
#6
I found in /usr/local/lsws/logs/error.log

2024-06-10 12:33:01.730873 [NOTICE] Loading LiteSpeed/1.7.19 Open (lsquic 3.3.2, modgzip 1.1, cache 1.66, mod_security 1.4 (with libmodsecurity v3.0.12)) BUILD (built: Tue Apr 16 15:14:26 UTC 2024) ...

Do you think im using mod_security ver 1.4 or 3.0.12.

These packages come with cyberpanel install.
 

Attachments

Cold-Egg

Administrator
#7
I tried to use https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)-Variables#session as an example, and it works on my OLS site.
Code:
SecRule REQUEST_COOKIES:PHPSESSID !^$ "phase:2,id:1070,nolog,pass,setsid:%{REQUEST_COOKIES.PHPSESSID}"
SecRule REQUEST_HEADERS:User-Agent "@contains facebook" "phase:2,id:71,t:none,t:lowercase,t:normalizePath,pass,setvar:SESSION.score=+5"
SecRule SESSION:score "@gt 50" "phase:2,id:1072,pass,setvar:SESSION.blocked=1"
SecRule SESSION:blocked "@eq 1" "phase:2,id:1073,deny,status:403"
 
#8
I tried to use https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)-Variables#session as an example, and it works on my OLS site.
Code:
SecRule REQUEST_COOKIES:PHPSESSID !^$ "phase:2,id:1070,nolog,pass,setsid:%{REQUEST_COOKIES.PHPSESSID}"
SecRule REQUEST_HEADERS:User-Agent "@contains facebook" "phase:2,id:71,t:none,t:lowercase,t:normalizePath,pass,setvar:SESSION.score=+5"
SecRule SESSION:score "@gt 50" "phase:2,id:1072,pass,setvar:SESSION.blocked=1"
SecRule SESSION:blocked "@eq 1" "phase:2,id:1073,deny,status:403"
Sorry I don't understand your answer. I need a SecRule to limit user agents containing "facebook" to 3 requests per 10 seconds. How can your answer help with that?
 

LiteCache

Active Member
#9
@admiral504

You should not use ModSecurity to limit the number of requests per unit of time, as this will cause a high load. If FB Bot is a legitimate bot, then this bot respects robots.txt and you can use robots.txt to limit the crawl frequency for FB Bot.

https://search.gov/indexing/robotstxt.html

Code:
User-agent: facebookexternalhit
Crawl-delay: 3
Or PHP:

Code:
define( 'FACEBOOK_REQUEST_THROTTLE', 2.0 ); // Number of seconds permitted between each hit from facebookexternalhit

if( !empty( $_SERVER['HTTP_USER_AGENT'] ) && preg_match( '/^facebookexternalhit/', $_SERVER['HTTP_USER_AGENT'] ) ) {
    $fbTmpFile = sys_get_temp_dir().'/facebookexternalhit.txt';
    if( $fh = fopen( $fbTmpFile, 'c+' ) ) {
        $lastTime = fread( $fh, 100 );
        $microTime = microtime( TRUE );
        // check current microtime with microtime of last access
        if( $microTime - $lastTime < FACEBOOK_REQUEST_THROTTLE ) {
            // bail if requests are coming too quickly with http 503 Service Unavailable
            header( $_SERVER["SERVER_PROTOCOL"].' 503' );
            die;
        } else {
            // write out the microsecond time of last access
            rewind( $fh );
            fwrite( $fh, $microTime );
        }
        fclose( $fh );
    } else {
        header( $_SERVER["SERVER_PROTOCOL"].' 503' );
        die;
    }
}
https://stackoverflow.com/questions/7716531/facebook-and-crawl-delay-in-robots-txt
 
Last edited:
#10
@admiral504

You should not use ModSecurity to limit the number of requests per unit of time, as this will cause a high load. If FB Bot is a legitimate bot, then this bot respects robots.txt and you can use robots.txt to limit the crawl frequency for FB Bot.

https://search.gov/indexing/robotstxt.html

Code:
User-agent: facebookexternalhit
Crawl-delay: 3
Or PHP:

Code:
define( 'FACEBOOK_REQUEST_THROTTLE', 2.0 ); // Number of seconds permitted between each hit from facebookexternalhit

if( !empty( $_SERVER['HTTP_USER_AGENT'] ) && preg_match( '/^facebookexternalhit/', $_SERVER['HTTP_USER_AGENT'] ) ) {
    $fbTmpFile = sys_get_temp_dir().'/facebookexternalhit.txt';
    if( $fh = fopen( $fbTmpFile, 'c+' ) ) {
        $lastTime = fread( $fh, 100 );
        $microTime = microtime( TRUE );
        // check current microtime with microtime of last access
        if( $microTime - $lastTime < FACEBOOK_REQUEST_THROTTLE ) {
            // bail if requests are coming too quickly with http 503 Service Unavailable
            header( $_SERVER["SERVER_PROTOCOL"].' 503' );
            die;
        } else {
            // write out the microsecond time of last access
            rewind( $fh );
            fwrite( $fh, $microTime );
        }
        fclose( $fh );
    } else {
        header( $_SERVER["SERVER_PROTOCOL"].' 503' );
        die;
    }
}
https://stackoverflow.com/questions/7716531/facebook-and-crawl-delay-in-robots-txt
Thank you for your reply.

You can refer to these samples, where many people have experienced similar issues with DDoS:

This might be due to Facebook's aggressive behavior not respecting robots.txt, or someone might have found a way to abuse the Facebook debugger tool to cause a DDoS attack.
 
Last edited:

LiteCache

Active Member
#11
@admiral504

Ignore FaceBook (MugBook)! FB thinks primarily of itself. This is especially true when it comes to AI functions. This means that the Facebook bot obtains (scrapes) content from websites in order to keep users on the FB website in order to earn money. That's why I blocked the "MugBook" bot.
 
Last edited:
#12
@admiral504

Ignore FaceBook (MugBook)! FB thinks primarily of itself. This is especially true when it comes to AI functions. This means that the Facebook bot obtains (scrapes) content from websites in order to keep users on the FB website in order to earn money. That's why I blocked the "MugBook" bot.
It seems like if we can't limit the bots, the only choice is to block them.

I'm currently using this SecRule to block bots based on the user agent, but it only blocks static files. Can you point out where to adjust it to block the entire request from that bot on the very first request?

Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?i:appinsights|semrush|ahrefs|dotbot|whatcms|rogerbot|trendictionbot|blexbot|linkfluence|magpie-crawler|mj12bot|mediatoolkitbot|aspiegelbot|domainstatsbot|cincraw|nimbostratus|httrack|serpstatbot|omgili|grapeshotcrawler|megaindex|petalbot|semanticbot|cocolyzebot|domcopbot|traackr|bomborabot|linguee|webtechbot|domainstatsbot|clickagy|sqlmap|internet-structure-research-project-bot|seekport|awariosmartbot|onalyticabot|buck|riddler|sbl-bot|df bot 1.0|pubmatic crawler bot|bvbot|sogou|barkrowler|admantx|adbeat|embed.ly|semantic-visions|voluumdsp|wc-test-dev-bot|amazonbot|gulperbot)" \
    "id:1000001,\
    phase:1,\
    t:none,\
    log,\
    msg:'Blocked bot User-Agent detected',\
    tag:'BOT',\
    severity:'WARNING',\
    deny,status:403"
 

LiteCache

Active Member
#13
It seems like if we can't limit the bots, the only choice is to block them.

Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?i:appinsights|semrush|ahrefs|dotbot|whatcms|rogerbot|trendictionbot|blexbot|linkfluence|magpie-crawler|mj12bot|mediatoolkitbot|aspiegelbot|domainstatsbot|cincraw|nimbostratus|httrack|serpstatbot|omgili|grapeshotcrawler|megaindex|petalbot|semanticbot|cocolyzebot|domcopbot|traackr|bomborabot|linguee|webtechbot|domainstatsbot|clickagy|sqlmap|internet-structure-research-project-bot|seekport|awariosmartbot|onalyticabot|buck|riddler|sbl-bot|df bot 1.0|pubmatic crawler bot|bvbot|sogou|barkrowler|admantx|adbeat|embed.ly|semantic-visions|voluumdsp|wc-test-dev-bot|amazonbot|gulperbot)" \
    "id:1000001,\
    phase:1,\
    t:none,\
    log,\
    msg:'Blocked bot User-Agent detected',\
    tag:'BOT',\
    severity:'WARNING',\
    deny,status:403"

I can't see facebookexternalhit UA in this code?!
 
#14
I can't see facebookexternalhit UA in this code?!
this just example. i will simply add facebook to the list like this:

Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?i:appinsights|facebook|semrush|ahrefs|dotbot|whatcms|rogerbot|trendictionbot|blexbot|linkfluence|magpie-crawler|mj12bot|mediatoolkitbot|aspiegelbot|domainstatsbot|cincraw|nimbostratus|httrack|serpstatbot|omgili|grapeshotcrawler|megaindex|petalbot|semanticbot|cocolyzebot|domcopbot|traackr|bomborabot|linguee|webtechbot|domainstatsbot|clickagy|sqlmap|internet-structure-research-project-bot|seekport|awariosmartbot|onalyticabot|buck|riddler|sbl-bot|df bot 1.0|pubmatic crawler bot|bvbot|sogou|barkrowler|admantx|adbeat|embed.ly|semantic-visions|voluumdsp|wc-test-dev-bot|amazonbot|gulperbot)" \
    "id:1000001,\
    phase:1,\
    t:none,\
    log,\
    msg:'Blocked bot User-Agent detected',\
    tag:'BOT',\
    severity:'WARNING',\
    deny,status:403"
However, the primary issue is that it only blocks part of the request (static files). I need it to block the entire request from that bot.

Here screenshot. I test ahrefs UA at https://123angels.vn/
The first request still 200. And then blocked static files:
https://short.dpsmedia.vn/PnxeH
 
#16

LiteCache

Active Member
#17
At the website level, I can already block them using Cloudflare.
CloudFlare is the better way to block unwanted access. ModSecurity and .htaccess always causes a high load to drop unwanted access, but CF don't have WAF functions to check headers like it is described in the posted link. The question is what do you prefer. Best blocking method? Use htaccess, but with higher load. Blocking without high load? Use CF, but with restrictions.
 
Top