Post by HAO » Tue Jun 10, 2025 12:11 pm

In recent months, Our server has been consuming too much traffic from Bingbot.

How can I make Bingbot crawl only product and category pages to avoid wasting traffic?

This is the discussion I saw:
https://www.reddit.com/r/bing/comments/ ... tl=zh-hant

How should I modify .htaccess?

Please help me!

HAO
Active Member

Posts

Joined
Fri Jun 03, 2011 2:52 pm

Post by nonnedelectari » Tue Jun 10, 2025 12:47 pm

HAO wrote:
Tue Jun 10, 2025 12:11 pm
In recent months, Our server has been consuming too much traffic from Bingbot.

How can I make Bingbot crawl only product and category pages to avoid wasting traffic?

This is the discussion I saw:
https://www.reddit.com/r/bing/comments/ ... tl=zh-hant

How should I modify .htaccess?

Please help me!
Depends on what your product and category urls look like as well as what the "wasted traffic" urls look like.
For complex/flexible exclusions it is often better to use php to send robots headers and/or robots meta tags than using robots.txt, especially when you use seo urls.

Active Member

Posts

Joined
Thu Mar 04, 2021 6:34 pm

Post by HAO » Tue Jun 10, 2025 1:09 pm

All I get is the statistics that Bingbot is wasting over 500G of traffic.

I can't tell what type of links Bingbot is capturing, I would like to start by disabling Bingbot from crawling search links.

Code: Select all

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bingbot [NC]
RewriteCond %{REQUEST_URI} ^/search/?$ [NC]
RewriteRule .* - [F,L]
Is it possible to modify the search link in this way?

Code: Select all

index.php?route=product/search&search=*
Thank you!

HAO
Active Member

Posts

Joined
Fri Jun 03, 2011 2:52 pm

Post by nonnedelectari » Tue Jun 10, 2025 1:25 pm

HAO wrote:
Tue Jun 10, 2025 1:09 pm
All I get is the statistics that Bingbot is wasting over 500G of traffic.

I can't tell what type of links Bingbot is capturing, I would like to start by disabling Bingbot from crawling search links.

Code: Select all

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bingbot [NC]
RewriteCond %{REQUEST_URI} ^/search/?$ [NC]
RewriteRule .* - [F,L]
Is it possible to modify the search link in this way?

Code: Select all

index.php?route=product/search&search=*
Thank you!
You could just use:

Code: Select all

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bingbot [NC]
RewriteCond %{REQUEST_URI} search= [NC]
RewriteRule .* - [F,L]
as REQUEST_URI contains both path and query string.
But if you want to give a 403 to all these urls, better also add them to robots.txt as bingbot might still come back for those if you do not explicitely tell it that you do not want them to i.e. in robots.txt.

Code: Select all

User-agent: BingBot
Disallow: search=

Active Member

Posts

Joined
Thu Mar 04, 2021 6:34 pm

Post by HAO » Tue Jun 10, 2025 5:33 pm

Thanks for your help, I have modified it.

HAO
Active Member

Posts

Joined
Fri Jun 03, 2011 2:52 pm

Post by paulfeakins » Tue Jun 10, 2025 6:06 pm

HAO wrote:
Tue Jun 10, 2025 12:11 pm
In recent months, Our server has been consuming too much traffic from Bingbot.
No one uses Bing, you could block it entirely?

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Legendary Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom

Post by by mona » Wed Jun 11, 2025 2:51 am

If you set up a profile with Bing Webmaster Tools they have an option there to set times you want them to crawl and rate limiting.
You can not stop them from crawling the pages they want to crawl, but they do have other options to resolve your issues.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by gogoweb » Sat Jun 28, 2025 5:44 am

Sure htaccess is your friend here.

Code: Select all

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bingbot [NC]
RewriteCond %{REQUEST_URI} .*(search=|tag=|account|limit=|filter).* [NC]
RewriteRule .* - [F,L]
You could also restrict bots in general to crawl static - SEO urls only.

All mods | OpenCart Bulk Related Products Ultimate Edition |GeoIP hide Prices / no add to cart by country| CSS override | Direct link to checkout / skip add to cart / buy now link | AUTO pilot - reward & purchase points


New member

Posts

Joined
Sat Oct 18, 2014 6:45 pm


Post by HAO » Sat Jun 28, 2025 9:38 pm

Thank you, I learned!

We currently only have URLs that use product IDs.

But what if I only want to limit SEO URLs?

PS. How can I allow only whitelisted bots to crawl my web pages?

Because I found that my cPanel has "Unknown robot identified by bot\*" which consumes a lot of my traffic, which is basically a BOT attack listed by AbuseIPDB.

How can I effectively ban?

HAO
Active Member

Posts

Joined
Fri Jun 03, 2011 2:52 pm

Post by by mona » Sat Jun 28, 2025 11:09 pm

HAO wrote:
Sat Jun 28, 2025 9:38 pm
How can I allow only whitelisted bots to crawl my web pages?
You cant - not unless you want to block all legitimate traffic - other than that - you cant.
This is nothing to do with Opencart - if you really are interested in learning use google and search

dealing with bots
good bots verses bad bots
htaccess to block crawiling
using htaccess to limit crawling
using robots.txt
using headers instead of robots.txt
why do bots crawl my website
why can you not stop bad bots
what can I do about bad bots
how to block ips in htaccess

note there are some interesting patterns in this HTACCESS and ROBOTS.TXT (not that bad robots would listen to that anyway) and there is also robots in the headers - which may be the most effective solution for bing - but no bad bot adheres to it. The majority do not identify themselves as bots.

The question was how to properly restrict Bingbot
For BING or GOOGLE - if you want to do it PROPERLY you do it by their tools
Blocking them via htaccess will not stop them from hitting your site - it will just block them when they try - the part that was not explained is that it may effect your rankings to block them via htaccess. Bing probably not - google probably yes.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by khnaz35 » Wed Jul 02, 2025 2:53 am

If nothing works directly within .htaccess file your best option would be put some firewall on like CF or whatever you choose and look for the browser/agent being used to abuse the site and block them, setup some rate limits rules etc.

Got an urgent question that’s keeping you up at night? There might just be a magical inbox ready to help: khnaz35@gmail.com
Enjoy nature ;) :) :-*


User avatar
Active Member

Posts

Joined
Mon Aug 27, 2018 11:30 pm
Location - Malaysia

Post by paulfeakins » Wed Jul 02, 2025 7:03 pm

You could give the Bingbot a crawl delay as described here: https://www.antropy.co.uk/blog/how-to-r ... ary-links/

UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk


User avatar
Legendary Member
Online

Posts

Joined
Mon Aug 22, 2011 11:01 pm
Location - London Gatwick, United Kingdom
Who is online

Users browsing this forum: No registered users and 55 guests