Post by Sylvain » Fri Jan 09, 2015 4:42 am

Hello
Is it normal to have 5 guess from Beiijing China on my web all the time ? Here are some of the IP adresses. They are all from
Beijing Baidu Netcom Science and Technology Co., L
180.76.6.58
180.76.6.152

thanks

Newbie

Posts

Joined
Fri Jan 09, 2015 4:36 am

Post by villagedefrance » Sat Jan 10, 2015 12:24 pm

BAIDU is the number 1 search Engine in China.
If your site has been catalogued by their robots and spiders, then they will come back regularly to check for changes. Google, Bing and Yandex robots do the same thing all the time.
If you want to block these robots you will need to edit your "robots.txt" file accordingly.

OpenCart custom solutions @ https://villagedefrance.net


User avatar
Active Member

Posts

Joined
Wed Oct 13, 2010 10:35 pm
Location - UK

Post by Dhaupin » Tue Jan 13, 2015 11:43 pm

Much like Google, Robots.txt wont stop any of the 4 pools of Baidu services/bots. They blatantly ignore everything except a ban. They may not index the content, but they will index everything else and sure as heck are internally recording the words from the content. There is a common misconception that robots.txt (or noindex, or x-robots header) actually does anything to make bots stop visiting. If nothing else, it simply tells bots what youre trying to hide -- they def still hit those areas...yes this applies even to Google.

So here are the IP ranges for Baidubots:

- Baidu Netcom is their large pool router...they seem to run mostly automation through it, although its technically an ISP too, prob for their corporate use or VPN routes. Generally they start with 180.76.

- Baidu itself is a different IP pool. They seem to use this less in favor of netcom pools. Best i can tell its meant for "traditional" spiders and its been crawling longer than netcom. Generally they start with 123.125.

- Baidu Hong Kong is a new kid on the block, i think they got it running in order to route outside HK to try to avoid the limits of the China government. It might use the Japan peers at this time. These IP's are less "botty" but still hit occasionally. Generally they start wih 185.10.

- There is another Baidu bot range too, but i cant remember what IP's it uses. It doesnt hit often enough for us to track it, but im sure ive seen it in the past. Perhaps its their Japan arm or 1 of their 2 peers-as-proxy via Japan (http://www.ntt.com or http://www.kddi.com)

https://creadev.org | support@creadev.org - Opencart Extensions, Integrations, & Development. Made in the USA.


User avatar
Active Member

Posts

Joined
Tue May 13, 2014 3:45 am
Location - PA
Who is online

Users browsing this forum: No registered users and 8 guests