Prevent indexing

Post Reply

7 posts Page 1 of 1

Medical sooq

New member

Posts

Joined

Sun Jul 23, 2017 2:42 am

Post by Medical sooq » Wed Sep 30, 2020 5:46 am

Version 3.0.3.2

I find a problem in duplicating the search pages and indexing them on the Google search engine until there are pages indexed from the old site of Opencart, which are still indexed in Google
i have set the code to prevent indexing

User-agent: *
Disallow: / * & limit
Disallow: / *? Sort
Disallow: / * & sort
Disallow: / *? Route = checkout /
Disallow: / *? Route = account /
Disallow: / *? Route = product / search

Sitemap: https://www.medicalsooq.com/index.php?r ... le_sitemap

Coverage report in Search Console There are numbers for highly indexed pages due to search pages. How can I find a solution to the problem of old and new search pages to prevent them from being indexed on Google and other search engines and to delete their previous indexing.

I have a huge number of indexed pages, all old and duplicate, and search pages

Medical sooq

New member

Posts

Joined

Sun Jul 23, 2017 2:42 am

Re: Prevent indexing

by mona

Expert Member

Posts

3295

Joined

Mon Jun 10, 2019 9:31 am

Post by by mona » Wed Sep 30, 2020 6:30 am

Use google tools
https://support.google.com/webmasters/answer/9689846?

make sure your sitemap is updated with google
search google for google webmaster tools etc ..

Learn about google
https://search.google.com/search-console/about

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.

https://www.youtube.com/watch?v=zXIxDoCRc84

by mona

Expert Member

Posts

3295

Joined

Mon Jun 10, 2019 9:31 am

Re: Prevent indexing

letxobnav

Expert Member

Posts

3264

Joined

Fri Aug 18, 2017 4:35 pm

Location

Taiwan

Post by letxobnav » Wed Sep 30, 2020 9:24 am

search engines do not index search result pages as they do not enter search terms unless ofcourse you advertise search result urls with search terms.
if you do/did that you can simply use:

Code: Select all

User-agent: *
Disallow: /*search=*

google will start to complain that your are disallowing pages with robots.txt which are already indexed, just ignore that.

and/or use robots tags and headers:
put this in your header.php controller:

Code: Select all

		// index and follow meta tags and headers
		$data['robots'] = false;
		if (isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['start'])
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'error')
			)
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'checkout/') ||
			stristr($this->request->get['route'],'account/')
			)
			) {
			header("X-Robots-Tag: noindex, nofollow", true);
			$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
		} elseif (isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'FUTURE_PLACEHOLDER')
			)
			) {
			header("X-Robots-Tag: index, nofollow", true);
			$data['robots'] = '<meta name="robots" content="index, nofollow">';
		} else {
			header("X-Robots-Tag: index, follow", true);
			$data['robots'] = '<meta name="robots" content="index, follow">';
		}

put this in the <head> section of your header.twig view:

Code: Select all

{% if robots %}
	{{ robots }}
{% endif %}

robots.txt will tell search engines not to request those urls, tags and headers will tells them what to do with it when they do.
You can request google to remove already indexed pages but only individual ones, if there are a lot, don't bother and let them be removed naturally.

Crystal Light Centrum Taiwan
Extensions: MailQueue | SUKHR | VBoces

“Data security is paramount at [...], and we are committed to protecting the privacy of anyone who is associated with our [...]. We’ve made a lot of improvements and will continue to make them.”
When you know your life savings are gone.

letxobnav

Expert Member

Posts

3264

Joined

Fri Aug 18, 2017 4:35 pm

Location - Taiwan

Re: Prevent indexing

Medical sooq

New member

Posts

Joined

Sun Jul 23, 2017 2:42 am

Post by Medical sooq » Thu Oct 01, 2020 2:01 am

thank you by mona
thank you letxobnav
letxobnav i will put two options together and I will monitor the indexing and get back to you with the results

To make sure that the code is placed in the required place I put the code in these path :

/public_html/admin/controller/common/header.php
/public_html/admin/view/template/common/header.twig

Is that correct

Medical sooq

New member

Posts

Joined

Sun Jul 23, 2017 2:42 am

Re: Prevent indexing

sw!tch

Active Member

Posts

1470

Joined

Sat Apr 28, 2012 2:32 pm

Post by sw!tch » Thu Oct 01, 2020 3:13 am

You would want to place it in the catalog side not the admin.

Backup and learn how to recover before you make any changes!

sw!tch

Active Member

Posts

1470

Joined

Sat Apr 28, 2012 2:32 pm

Re: Prevent indexing

Ahmed-E-Shop

New member

Posts

Joined

Mon Nov 11, 2024 7:40 pm

Post by Ahmed-E-Shop » Tue Feb 11, 2025 2:53 am

Medical sooq wrote: ↑
Thu Oct 01, 2020 2:01 am
thank you by mona
thank you letxobnav
letxobnav i will put two options together and I will monitor the indexing and get back to you with the results

Medical sooq, I found that in my search console , Does the solution in this post solved your problem?

Ahmed-E-Shop

New member

Posts

Joined

Mon Nov 11, 2024 7:40 pm

Re: Prevent indexing

by mona

Expert Member

Posts

3295

Joined

Mon Jun 10, 2019 9:31 am

Post by by mona » Tue Feb 11, 2025 9:09 pm

The point of a robots.txt file is for robots or more commonly known as bots.

Google respects robots.txt - which is a good thing - so as per the post of letxobnav using robots.txt is correct, but google will "complain".

Google "complaining" is not an issue - your console is just for you to check there are no problems. So if it highlights to you these pages are not being indexed due to being blocked in robots.txt - that is correct. Just because it is in red does not necessarily mean it is wrong, only that it is to be checked.

Since google respects robots.txt it doesnt actually matter, but not all robots do. Then again not all bots are good not even most of them are good, but aside from that you can also use the "no follow" directive in your header. letxobnav has given you the code for that.

https://support.google.com/webmasters/a ... 7942?hl=en
Notably
robots.txt https://support.google.com/webmasters/answer/12818275

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.

https://www.youtube.com/watch?v=zXIxDoCRc84

by mona

Expert Member

Posts

3295

Joined

Mon Jun 10, 2019 9:31 am