Post by MarkIce » Fri Nov 14, 2014 9:56 pm

hi,

doe's anyone know how to stop google caching the unfriendly url's

www.domain_name/category_name?limit=25
www.domain_name/category_name?page=1

New member

Posts

Joined
Sat Mar 01, 2014 5:58 am

Post by uksitebuilder » Fri Nov 14, 2014 9:58 pm

You can add those querystrings to a robots.txt file

Once done you can login to Google Webmaster Tools and tell it to delete those pages from it's index

Next time a spider crawls your site (it should - no promises) ignore those urls/links

User avatar
Guru Member

Posts

Joined
Thu Jun 09, 2011 11:37 pm
Location - United Kindgom

Post by Martijn ISB » Fri Nov 14, 2014 10:32 pm

You probably mean indexing which uksitebuilder answered but in cache (get it?) you want to block Google Cache you can add the following meta robot:

Code: Select all

<meta name=”robots” content=”noarchive” />

Newbie

Posts

Joined
Thu Nov 13, 2014 2:48 am

Post by MarkIce » Sat Nov 15, 2014 6:49 pm

hi,
thanks for the replies.. yes sorry i did mean google indexing.

i already do have some disallow on my robot txt file, goggle told me to add the "first user-agent" part to my robot txt to help them with google products images so the website images can be spider'ed ok


User-agent: *
User-agent: Googlebot
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: Twitterbot
Disallow:


User-agent: *
Disallow: /*?page=
Disallow: /*&page=
Disallow: /?limit=
Disallow: /&limit=
Disallow: /image/
Disallow: /admin/
Disallow: /catalog/
Disallow: /vqmod/
Disallow: /download/
Disallow: /cgi-bin/
Disallow: /cache/
Disallow: /export/
Disallow: /system/
plus more:

New member

Posts

Joined
Sat Mar 01, 2014 5:58 am

Post by uksitebuilder » Sat Nov 15, 2014 7:39 pm

This really should be all you should need

Code: Select all

User-agent: *
Disallow: /*&limit
Disallow: /*&sort
Disallow: /*&page
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?route=affiliate/

Sitemap: http://yourdomain.com/link-to-sitemap
You should not put admin directory etc in robots.txt as it just gives script kiddies the locations of your private stuff.

Please see https://support.google.com/webmasters/a ... 2608?hl=en for more info on robots.txt

User avatar
Guru Member

Posts

Joined
Thu Jun 09, 2011 11:37 pm
Location - United Kindgom

Post by MarkIce » Sat Nov 15, 2014 9:18 pm

hi,

thanks will try your coding, my admin folders name has been changed now and isn't showing on my robot.txt file..
i seem to have a problem with google spider and my product image folder for google products, which users the same folder for google products and the main websites product page images.

goggle said to add the below code to my robot txt :

User-agent: *
User-agent: Googlebot
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /image/

New member

Posts

Joined
Sat Mar 01, 2014 5:58 am

Post by uksitebuilder » Sat Nov 15, 2014 10:02 pm

There should be no reason to disallow the image folder from bots

Hence why in the code ipt, there is no mention of it.

User avatar
Guru Member

Posts

Joined
Thu Jun 09, 2011 11:37 pm
Location - United Kindgom
Who is online

Users browsing this forum: No registered users and 14 guests