I've recently changed from Opencart 1.5.5.1 to OC3
The robots.txt file included in OC3 is different to the one I've used for many years.
Also,it also doesn't include "User-agent: *" anywhere which seems to mean it is doing nothing according to the Bing robots.txt tester.
I have a couple of questions:
1) Should I include User-agent: * at beginning of the OC3 robots.txt? \
2) If so, why wasn't it in the file ?
3) Does anyone have an example of a "best practice" robots.txt file for OC3.
Thanks very much
2. It's just another bug that seems to have made it into the latest release.
3. Depends on your site and what has already been indexed.
User-agent: *
Disallow: /cgi-bin/
Disallow: /vqmod/
Disallow: /system/
Disallow: /admin/
Disallow: /*?page=$
Disallow: /*&page=$
Disallow: /*?sort=
Disallow: /*&sort=
Disallow: /*?order=
Disallow: /*&order=
Disallow: /*?limit=
Disallow: /*&limit=
Disallow: /*?filter_name=
Disallow: /*&filter_name=
Disallow: /*?filter_sub_category=
Disallow: /*&filter_sub_category=
Disallow: /*?filter_description=
Disallow: /*&filter_description=
If you aren't using vqmod then you can delete Disallow: /vqmod/
Personally, I've had better luck using meta noindex tags within .htaccess files to block search engines. It seems like a lot of them ignore the robots.txt directives.
2 Week FREE Trial of our Shared Hosting plans (DIrectAdmin or cPanel) for new customers
2 Week FREE Trial of Astra Firewall and Malware Scanner
Visit our website for full details and to start your trial today - www.evolvewebhost.com
I alos used to have these lines in my Opencart 1.5 Robots.txt, is it worth including them in the OC3 robots.txt?
Disallow: /*?route=account/
Disallow: /*?route=affiliate/
Disallow: /*?route=checkout/
Disallow: /*?route=product/search
So any links you have on noindex and nofollow you can better put into robots.txt as they have no value when retrieved anyway.
Paths like /admin/ and /cgi-bin/ are useless in robots.txt as you do not advertise those.
robots.txt:
Code: Select all
User-agent: *
Disallow: /*checkout/
Disallow: /*account/
Sitemap: https://YOUR-DOMAIN/index.php?route=extension/feed/google_sitemap
Sitemap: https://YOUR-DOMAIN/sitemap.xml
Host: YOUR-DOMAIN
For the robots headers and tags:
in catalog/controller/common/header.php
before:
Code: Select all
return $this->load->view('common/header', $data);
Code: Select all
// index and follow headers and tags
$data['robots'] = false;
if (
isset($this->request->get['search']) ||
isset($this->request->get['order']) ||
isset($this->request->get['sort']) ||
isset($this->request->get['limit']) ||
isset($this->request->get['page']) ||
isset($this->request->get['start'])
) {
header("X-Robots-Tag: noindex, follow", true);
$data['robots'] = '<meta name="robots" content="noindex, follow">';
} elseif (
isset($this->request->get['route']) &&
(
stristr($this->request->get['route'],'error')
)
) {
header("X-Robots-Tag: noindex, follow", true);
$data['robots'] = '<meta name="robots" content="noindex, follow">';
} elseif (
isset($this->request->get['route']) &&
(
stristr($this->request->get['route'],'checkout/') ||
stristr($this->request->get['route'],'account/')
)
) {
header("X-Robots-Tag: noindex, nofollow", true);
$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
} elseif (
isset($this->request->get['route']) &&
(
stristr($this->request->get['route'],'FUTURE_PLACEHOLDER')
)
) {
header("X-Robots-Tag: index, nofollow, noarchive", true);
$data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">';
} else {
header("X-Robots-Tag: index, follow, noarchive", true);
$data['robots'] = '<meta name="robots" content="index, follow, noarchive">';
}
in catalog/view/theme/default/template/common/header.twig
after: <head>
add:
Code: Select all
{% if robots %}
{{ robots }}
{% endif %}
DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.
https://www.youtube.com/watch?v=zXIxDoCRc84
@by mona Do you have anything related to Content Security Policy(CSP) and Subresource Integrity(SRI) ?by mona wrote: ↑Wed Jul 14, 2021 3:47 pm
For the robots headers and tags:
in catalog/controller/common/header.php
before:add:Code: Select all
return $this->load->view('common/header', $data);
Code: Select all
// index and follow headers and tags $data['robots'] = false; if ( isset($this->request->get['search']) || isset($this->request->get['order']) || isset($this->request->get['sort']) || isset($this->request->get['limit']) || isset($this->request->get['page']) || isset($this->request->get['start']) ) { header("X-Robots-Tag: noindex, follow", true); $data['robots'] = '<meta name="robots" content="noindex, follow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'error') ) ) { header("X-Robots-Tag: noindex, follow", true); $data['robots'] = '<meta name="robots" content="noindex, follow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'checkout/') || stristr($this->request->get['route'],'account/') ) ) { header("X-Robots-Tag: noindex, nofollow", true); $data['robots'] = '<meta name="robots" content="noindex, nofollow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'FUTURE_PLACEHOLDER') ) ) { header("X-Robots-Tag: index, nofollow, noarchive", true); $data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">'; } else { header("X-Robots-Tag: index, follow, noarchive", true); $data['robots'] = '<meta name="robots" content="index, follow, noarchive">'; }
in catalog/view/theme/default/template/common/header.twig
after: <head>
add:
Code: Select all
{% if robots %} {{ robots }} {% endif %}
Get a secure, fast, and reliable web hosting service from https://turnuphosting.com.
Thanks, by mona!by mona wrote: ↑Wed Jul 14, 2021 3:47 pmrobots.txt, if obeyed, is to indicate which advertised urls search engines should not request in the first place, good to reduce SE traffic. Robots headers and tags are there to indicate what SEs should do with it after they have retrieved it.
So any links you have on noindex and nofollow you can better put into robots.txt as they have no value when retrieved anyway.
Paths like /admin/ and /cgi-bin/ are useless in robots.txt as you do not advertise those.
robots.txt:
Code: Select all
User-agent: * Disallow: /*checkout/ Disallow: /*account/ Sitemap: https://YOUR-DOMAIN/index.php?route=extension/feed/google_sitemap Sitemap: https://YOUR-DOMAIN/sitemap.xml Host: YOUR-DOMAIN
For the robots headers and tags:
in catalog/controller/common/header.php
before:add:Code: Select all
return $this->load->view('common/header', $data);
Code: Select all
// index and follow headers and tags $data['robots'] = false; if ( isset($this->request->get['search']) || isset($this->request->get['order']) || isset($this->request->get['sort']) || isset($this->request->get['limit']) || isset($this->request->get['page']) || isset($this->request->get['start']) ) { header("X-Robots-Tag: noindex, follow", true); $data['robots'] = '<meta name="robots" content="noindex, follow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'error') ) ) { header("X-Robots-Tag: noindex, follow", true); $data['robots'] = '<meta name="robots" content="noindex, follow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'checkout/') || stristr($this->request->get['route'],'account/') ) ) { header("X-Robots-Tag: noindex, nofollow", true); $data['robots'] = '<meta name="robots" content="noindex, nofollow">'; } elseif ( isset($this->request->get['route']) && ( stristr($this->request->get['route'],'FUTURE_PLACEHOLDER') ) ) { header("X-Robots-Tag: index, nofollow, noarchive", true); $data['robots'] = '<meta name="robots" content="index, nofollow, noarchive">'; } else { header("X-Robots-Tag: index, follow, noarchive", true); $data['robots'] = '<meta name="robots" content="index, follow, noarchive">'; }
in catalog/view/theme/default/template/common/header.twig
after: <head>
add:
Code: Select all
{% if robots %} {{ robots }} {% endif %}
Will this work for 3.0.3.8?
Still no User Agent here:ADD Creative wrote: ↑Wed Jul 14, 2021 7:25 am2. It's just another bug that seems to have made it into the latest release.
https://github.com/opencart/opencart/bl ... robots.txt
UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk
Probably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.paulfeakins wrote: ↑Thu Sep 21, 2023 6:54 pmStill no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
Could it be worth raising it as an issue on github?ADD Creative wrote: ↑Thu Sep 21, 2023 8:04 pmProbably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.paulfeakins wrote: ↑Thu Sep 21, 2023 6:54 pmStill no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk
I did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.paulfeakins wrote: ↑Fri Sep 22, 2023 6:53 pmCould it be worth raising it as an issue on github?ADD Creative wrote: ↑Thu Sep 21, 2023 8:04 pmProbably wants all removing anyway, as will likely cause "Indexed, though blocked by robots.txt" issues. Blocking page could cause issues with products that don't appear in the first page of category.paulfeakins wrote: ↑Thu Sep 21, 2023 6:54 pmStill no User Agent here:
https://github.com/opencart/opencart/bl ... robots.txt
https://github.com/opencart/opencart/is ... -662372780
Export/Import Tool * SpamBot Buster * Unused Images Manager * Instant Option Price Calculator * Number Option * Google Tag Manager * Survey Plus * OpenTwig
That's not the missing user agent problem though.ADD Creative wrote: ↑Fri Sep 22, 2023 7:12 pmI did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.
https://github.com/opencart/opencart/is ... -662372780
UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk
Sorry, you replied to my post about the robot.txt file being wrong regardless of whether it had a user agent or not, so I thought you were talking about that.paulfeakins wrote: ↑Mon Sep 25, 2023 7:27 pmThat's not the missing user agent problem though.ADD Creative wrote: ↑Fri Sep 22, 2023 7:12 pmI did point out some potential problems when the robots.txt was first added. So I guess the developers disagreed.
https://github.com/opencart/opencart/is ... -662372780
Sorry if I wasn't clear.ADD Creative wrote: ↑Mon Sep 25, 2023 7:59 pmSorry, you replied to my post about the robot.txt file being wrong regardless of whether it had a user agent or not, so I thought you were talking about that.
UK OpenCart Hosting | OpenCart Audits | OpenCart Support - please email info@antropy.co.uk
Users browsing this forum: No registered users and 13 guests