Post by jcgadgets » Fri Jan 14, 2011 6:23 am

Chones wrote:Just in case this is useful, this is my robots.txt file:

Code: Select all

User-agent: *
Disallow: /*?sort
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?page=1
Disallow: /*&create=1
Allow: /
That stops the same category page being indexed for every possible sort option. It stops all checkout pages, account pages and search pages being indexed. It stops a duplicate entry being indexed for a category page when that category page is accessed from page 2 or 3 etc., and it stops the Terms and Conditions and Privacy Policy as accessed from the Checkout stage being indexed.

I'd suggest everyone do this. Google seems to like me and this may be one of the reasons.

I genuinely think this is a post that should be stickied or mentioned in one of the huge posts on how to get started in OpenCart...I had fixed part of this issue in different ways, but since search engines will see every sorting method as an entirely new page, you can get a lot of duplicate content pretty quickly . I wasn't sure how to entirely get rid of it, but here we are! Didn't expect to find it here.

THANK YOU.


Jared

Active Member

Posts

Joined
Sun Oct 31, 2010 4:49 pm

Post by Chones » Fri Jan 14, 2011 8:23 am

Glad to be of help Jared - am surprised myself that nobody has posted this before.

http://scarletandjones.com/
http://sharpdressedman.co.uk/
http://coffincompany.co.uk/
http://horsesculptures.co.uk/
If I've helped you out, why not buy me a beer? http://craigmurray.me.uk


User avatar
Active Member

Posts

Joined
Wed Mar 24, 2010 9:07 pm
Location - London

Post by patchouly » Sat Jan 15, 2011 4:25 am

Clone,
Do you put your code in .htaccess?

And do I put it in as is, or is there supposed to be something next to the *

New member

Posts

Joined
Tue Jun 15, 2010 10:15 pm

Post by Chones » Sat Jan 15, 2011 4:45 am

No, copy that text and save it as a file called robots.txt

Put it in the root of your domain - so it would be at www.yourdomain.com/robots.txt

Just copy as is, no need to do anything with the *

http://scarletandjones.com/
http://sharpdressedman.co.uk/
http://coffincompany.co.uk/
http://horsesculptures.co.uk/
If I've helped you out, why not buy me a beer? http://craigmurray.me.uk


User avatar
Active Member

Posts

Joined
Wed Mar 24, 2010 9:07 pm
Location - London

Post by Chones » Tue Jan 18, 2011 6:44 am

No need to do that, the robots.txt file only deals with the end of the URL, it doesn't care if your store is at www.yoursite.com or www.yoursite.com/store

http://scarletandjones.com/
http://sharpdressedman.co.uk/
http://coffincompany.co.uk/
http://horsesculptures.co.uk/
If I've helped you out, why not buy me a beer? http://craigmurray.me.uk


User avatar
Active Member

Posts

Joined
Wed Mar 24, 2010 9:07 pm
Location - London

Post by speedingorange » Sat Mar 05, 2011 7:30 am

Sorry to drag up a slightly older post, but what's the best way to deal with the multiple home urls situation?

All the default links point to

Code: Select all

http://www.mystore.com/index.php?route=common/home
where ideally I want them to point at

Code: Select all

http://www.mystore.com/
I mean I could manually edit them, but is that not a bit of a bodge?

Cheers
James

EDIT:

Found it!

http://forum.opencart.com/viewtopic.php?f=20&t=6298

Active Member

Posts

Joined
Tue Feb 23, 2010 7:33 pm

Post by glolar » Thu Mar 10, 2011 5:51 pm

Chones,

I just want to clarify: The robots.txt file goes in the root folder of your domain (i.e., www.domain.com), even if your store is at www.domain.com/store?

Thanks for this great tip.

Increase Your Child's I.Q.
iPad Wallpapers
Turtle & Tortoise Screen Savers


User avatar
Active Member

Posts

Joined
Thu Jul 29, 2010 12:35 pm
Location - San Diego, CA

Post by Chones » Fri Mar 11, 2011 3:50 am

Yes - you can test it in Google's webmaster tools. From what I rememeber you just put the text of the robots.txt file in the first box then enter URLs you want to test to see if they are blocked, then run it - it wil tell you if they are blocked or not.

http://scarletandjones.com/
http://sharpdressedman.co.uk/
http://coffincompany.co.uk/
http://horsesculptures.co.uk/
If I've helped you out, why not buy me a beer? http://craigmurray.me.uk


User avatar
Active Member

Posts

Joined
Wed Mar 24, 2010 9:07 pm
Location - London

Post by glolar » Fri Mar 11, 2011 4:36 am

Chones,

Here are my robots.txt test results (I have no idea what they mean):

Test results
Url: http://www.glolar.com/

Googlebot:
Allowed by line 8: Allow: /
Detected as a directory; specific files may have different restrictions

Adsbot-Google:
Allowed
Detected as a directory; specific files may have different restrictions

Any clues?

Thanks.

Increase Your Child's I.Q.
iPad Wallpapers
Turtle & Tortoise Screen Savers


User avatar
Active Member

Posts

Joined
Thu Jul 29, 2010 12:35 pm
Location - San Diego, CA

Post by Chones » Fri Mar 11, 2011 4:57 am

You want / allowed. That's fine.

What you don't want allowed are urls such as

www.yoursite.com/index.php?route=account/login

Try testing urls like that.

http://scarletandjones.com/
http://sharpdressedman.co.uk/
http://coffincompany.co.uk/
http://horsesculptures.co.uk/
If I've helped you out, why not buy me a beer? http://craigmurray.me.uk


User avatar
Active Member

Posts

Joined
Wed Mar 24, 2010 9:07 pm
Location - London

Post by glolar » Fri Mar 11, 2011 7:10 am

Chones,

Thanks, it all makes sense now.

Regards

Increase Your Child's I.Q.
iPad Wallpapers
Turtle & Tortoise Screen Savers


User avatar
Active Member

Posts

Joined
Thu Jul 29, 2010 12:35 pm
Location - San Diego, CA

Post by mfive » Wed Apr 06, 2011 1:04 am

Chones wrote:Firstly, you don't want your checkout pages SEO friendly. I actually block all robots from indexing my Checkout pages, Account pages and Search results. Why would you want those indexed by Google or Bing? You want your home page, your products, categories and manufacturer pages indexed, nothing else. Google has actually asked shop owners to use robots.txt to stop indexing of shopping carts and checkout pages as it is useless to users and just clogs up their servers. You never know, they may even give you a boost in SERPS if they see you trying to do the right thing.

Secondly, to use HTTPS all you need is to have a certificate installed and then in Admin go to System > Settings and under the Server tag click Yes for Use SSL, and add the SSL URL to the Config file, as it says.
One very important note to make - I don't think this is always about SEO. For me, it's about reducing the possibility of exposing the architecture that my site runs on. I do not want just anyone to know I'm using Opencart as my platform. You never know what security vulnerabilities could be exploited both now and in the future... so why advertise it?

New member

Posts

Joined
Sat Jan 29, 2011 9:49 am

Post by opencart-templates » Sat Apr 07, 2012 12:06 am

Here is a more updated robots.txt

Code: Select all

User-agent: *
Disallow: /*&limit
Disallow: /*&sort
Disallow: /*?route=checkout/
Disallow: /*?route=account/
Disallow: /*?route=product/search
Disallow: /*?route=affiliate/
Disallow: /?route=information/contact
Disallow: /?route=information/information&information_id=6 
E.g the information ID = 6 is for the delivery information page.

I am not 100% sure if the &sort needs to be ?sort

Image Image Image Image
Advanced Professional Email Template


User avatar
Active Member

Posts

Joined
Mon May 16, 2011 7:24 pm
Location - UK

Post by MrTech » Sat Apr 07, 2012 12:55 am

For those of us using SEO URLs, should we be changing or adding links to the robots file?

Disallow: /*contact-us/

~
Install Extensions OR OpenCart Fast Service! PayPal Accepted
I will professionally install and configure any free or purchased theme, module or extension.

Visit http://www.mrtech.ca if you need an OpenCart webmaster
~


User avatar
Active Member

Posts

Joined
Mon Jan 09, 2012 2:39 pm
Location - Canada, Eh!

Post by modernmagic » Wed May 07, 2014 3:17 am

These are considered duplicate content so I would like to disallow the later with the brand "/cadwell":
http://www.admarneuro.com/emg-ncv-ep-sy ... ierra-wave
http://www.admarneuro.com/cadwell/emg-n ... ierra-wave

What is the code to disallow the brand result pages from being indexed?
Is it: /*?route=brand/

Joomla Web Site Designer at www.modernmagic.com


User avatar
New member

Posts

Joined
Sat Feb 09, 2013 4:55 am


Post by bonkopencart » Fri Mar 24, 2017 2:10 pm

modernmagic wrote:
Wed May 07, 2014 3:17 am
These are considered duplicate content so I would like to disallow the later with the brand "/cadwell":
http://www.admarneuro.com/emg-ncv-ep-sy ... ierra-wave
http://www.admarneuro.com/cadwell/emg-n ... ierra-wave

What is the code to disallow the brand result pages from being indexed?
Is it: /*?route=brand/
i think this must be handled with rel="canonical" relationship.

New member

Posts

Joined
Fri Mar 24, 2017 1:53 pm
Who is online

Users browsing this forum: No registered users and 5 guests