Post by head_dunce » Thu Sep 02, 2021 4:47 am

I actually found this problem when I clicked one of my site's category pages in Google and it came up without any products. I then realized that Google had index a category with ?page=2 and that page actually doesn't contain any products. Maybe it did a while ago, but it should just redirect to the main page now since there are no products on that page=2 -- just wondering if anyone has figured out how to do this already? Otherwise I'll have to give it look when I have some time. Thank you.

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by straightlight » Thu Sep 02, 2021 6:05 am

OC version. So, what are you referring about exactly?

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by by mona » Thu Sep 02, 2021 3:34 pm

You should not allow search engines to index urls with page queries (page=xx) just like sort=xx and order=xx etc.
This is exactly because the availability of those pages can vary and they rarely add search engine value.
Today you may have page=10 content, tomorrow you may only have up to page=9.
Redirecting page=10 to your home page is a bad idea.

So ideally, do not allow indexing of those, via htaccess and/or x-robot headers/meta-tags.
If they already are, give a 404 or a 410 when the page number requested exceeds the number-of-pages value you have in your controller, this would require a small alteration though.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by JNeuhoff » Thu Sep 02, 2021 4:48 pm

Try something like this in your robots.txt:

Code: Select all

.....
Disallow: /*&tag=
Disallow: /*?tag=
Disallow: /*&filter_name=
Disallow: /*?filter_name=
Disallow: /*?route=checkout/
Disallow: /*&route=checkout/
Disallow: /*?route=account/
Disallow: /*&route=account/
Disallow: /*?route=product/search
Disallow: /*&route=product/search
Disallow: /*?route=product/product/review
Disallow: /*&route=product/product/review
Disallow: /*?route=information/information/agree
Disallow: /*&route=information/information/agree
Disallow: /*?page=1
Disallow: /*&page=1
Disallow: /*?route=affiliate/
Disallow: /*&route=affiliate/
Disallow: /*?keyword=
Disallow: /*&keyword=
Disallow: /*?order=
Disallow: /*&order=
Disallow: /*?sort=
Disallow: /*&sort=
.....

Export/Import Tool * SpamBot Buster * Unused Images Manager * Instant Option Price Calculator * Number Option * Google Tag Manager * Survey Plus * OpenTwig


User avatar
Guru Member

Posts

Joined
Wed Dec 05, 2007 3:38 am


Post by head_dunce » Thu Sep 02, 2021 5:08 pm

straightlight wrote:
Thu Sep 02, 2021 6:05 am
OC version. So, what are you referring about exactly?
Version 3.0.3.2
I see this page in Google SERPS -

Code: Select all

https://www.carguygarage.com/storage/cabinets/polished-stainless-steel-cabinets?page=2
I see this page in Google Search Console -

Code: Select all

https://www.carguygarage.com/accessories/decor?sort=pd.name&order=DESC&limit=isjfgyevlvttii
Neither page should exist with those parameters but they do
####################################################################
by mona wrote:
Thu Sep 02, 2021 3:34 pm
Redirecting page=10 to your home page is a bad idea.

So ideally, do not allow indexing of those, via htaccess and/or x-robot headers/meta-tags.
If they already are, give a 404 or a 410 when the page number requested exceeds the number-of-pages value you have in your controller, this would require a small alteration though.
I would redirect to the main category page without parameters although a 404 would be fine too. I'm asking if anyone has made the code change yet and would share? Otherwise I'll have to dive in and figure it out.
####################################################################
JNeuhoff wrote:
Thu Sep 02, 2021 4:48 pm
Try something like this in your robots.txt:
Thanks, but I do want valid page 2 etc crawled so the products on those pages are indexed and this would prevent that.

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by by mona » Thu Sep 02, 2021 5:21 pm

Code: Select all

Disallow: /*?page=1
Disallow: /*&page=1
does not help here, then use:

Code: Select all

Disallow: /*page=
Best way is to use meta tags or x-robot headers or both for these because then you can set them to noindex, follow.

in your header.php controller:

Code: Select all

		if (
			isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['page'])
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} else {
			header("X-Robots-Tag: index, follow", true);
			$data['robots'] = '<meta name="robots" content="index, follow">';
		}
		
or complete

Code: Select all

// index and follow headers
		if (
			isset($this->request->get['search']) ||
			isset($this->request->get['order']) ||
			isset($this->request->get['sort']) ||
			isset($this->request->get['limit']) ||
			isset($this->request->get['page']) ||
			isset($this->request->get['start']) 
			) {
			header("X-Robots-Tag: noindex, follow", true);
			$data['robots'] = '<meta name="robots" content="noindex, follow">';
		} elseif (
			isset($this->request->get['route']) && 
			(
			stristr($this->request->get['route'],'error/') ||
			stristr($this->request->get['route'],'checkout/') ||
			stristr($this->request->get['route'],'account/') ||
			stristr($this->request->get['route'],'affiliate/') 
			)
			) {
			header("X-Robots-Tag: noindex, nofollow", true);
			$data['robots'] = '<meta name="robots" content="noindex, nofollow">';
		} else {
			header("X-Robots-Tag: index, follow, noarchive", true);
			$data['robots'] = '<meta name="robots" content="index, follow, noarchive">';
		}
in your header.twig view between <head> and </head>:

Code: Select all

{{ robots }}
That means that the search engines will request those pages and follow (collect) the links on it but will not index the requested page itself.

This you cannot do with htaccess where you can only request the search engines not to request the page in the first place and therefore they will also not pick up any links these contain.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by head_dunce » Thu Sep 02, 2021 6:55 pm

by mona wrote:
Thu Sep 02, 2021 5:21 pm
Best way is to use meta tags or x-robot headers or both for these because then you can set them to noindex, follow.
Ah, interesting approach! This looks like a good idea!

I was just looking at the /catalog/controller/product/category.php
And made a quick change to test. I'd have to do this for all the available parameters, but I like your idea better.

Code: Select all

                if (isset($this->request->get['limit'])) {
                        $limit = $this->request->get['limit'];
                        if( (!is_numeric($limit)) || (floor($limit) != $limit) ){
                                $this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
                        }

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by straightlight » Thu Sep 02, 2021 7:32 pm

head_dunce wrote:
Thu Sep 02, 2021 6:55 pm
by mona wrote:
Thu Sep 02, 2021 5:21 pm
Best way is to use meta tags or x-robot headers or both for these because then you can set them to noindex, follow.
Ah, interesting approach! This looks like a good idea!

I was just looking at the /catalog/controller/product/category.php
And made a quick change to test. I'd have to do this for all the available parameters, but I like your idea better.

Code: Select all

                if (isset($this->request->get['limit'])) {
                        $limit = $this->request->get['limit'];
                        if( (!is_numeric($limit)) || (floor($limit) != $limit) ){
                                $this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
                        }

Code: Select all

$limit = $this->request->get['limit'];
for:

Code: Select all

$limit = (int)$this->request->get['limit'];

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by ADD Creative » Thu Sep 02, 2021 8:28 pm

head_dunce wrote:
Thu Sep 02, 2021 6:55 pm
by mona wrote:
Thu Sep 02, 2021 5:21 pm
Best way is to use meta tags or x-robot headers or both for these because then you can set them to noindex, follow.
Ah, interesting approach! This looks like a good idea!

I was just looking at the /catalog/controller/product/category.php
And made a quick change to test. I'd have to do this for all the available parameters, but I like your idea better.

Code: Select all

                if (isset($this->request->get['limit'])) {
                        $limit = $this->request->get['limit'];
                        if( (!is_numeric($limit)) || (floor($limit) != $limit) ){
                                $this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
                        }
Check your pages have the correct canonical links and that you have correctly configured the URL Parameters for your site in Google's Search console. The canonical links should take care on any duplicates caused by URL parameters other than page.

Be careful of adding noindex (and possibly 404) to pages that have a canonical link to a page you need to be indexed. It's something Google warn against doing. https://developers.google.com/search/do ... age%20when

For the original page issue. You could try. After.

Code: Select all

$results = $this->model_catalog_product->getProducts($filter_data);
Add.

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
}
But again this could cause issues if the limit is set. For example category?page=2&limit=100 would return no products and a 404, but the canonical link to category?page=2 would be valid and require indexing.

www.add-creative.co.uk


Guru Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by head_dunce » Thu Sep 02, 2021 11:16 pm

straightlight wrote:
Thu Sep 02, 2021 7:32 pm

Code: Select all

$limit = $this->request->get['limit'];
for:

Code: Select all

$limit = (int)$this->request->get['limit'];
Actually that's a problem, and that's how the original OC source code has it. If what's supplied is not a number, (int) will return zero.
By using (!is_numeric($limit)) || (floor($limit) != $limit) you're able to check if it is a number and if it is an integer. If what was supplied was anything else, it should just give a 404 since whatever was given was not supplied by OC. As noted above, I am showing links in Google Search Console with "limit=" being random strings. No idea how or why Google found those pages, but OC shouldn't be allowing them anyway.

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by head_dunce » Fri Sep 03, 2021 12:02 am

ADD Creative wrote:
Thu Sep 02, 2021 8:28 pm
Check your pages have the correct canonical links and that you have correctly configured the URL Parameters for your site in Google's Search console. The canonical links should take care on any duplicates caused by URL parameters other than page. Be careful of adding noindex (and possibly 404) to pages that have a canonical link to a page you need to be indexed. It's something Google warn against doing. https://developers.google.com/search/do ... age%20when
The URL Parameters in Google Search Console is a legacy tool, so I'm not sure how long that will stick around. I see what you mean about the canonical, the "page=" seems to be the only parameter passed to the canonical link (sort, order, and limit are not passed to the canonical link.)
ADD Creative wrote:
Thu Sep 02, 2021 8:28 pm
For the original page issue. You could try. After.

Code: Select all

$results = $this->model_catalog_product->getProducts($filter_data);
Add.

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
}
But again this could cause issues if the limit is set. For example category?page=2&limit=100 would return no products and a 404, but the canonical link to category?page=2 would be valid and require indexing.
So this begs the question can I set the canonical link to be an error page after -

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
}
Then I could return a 404 with an error page canonical

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by straightlight » Fri Sep 03, 2021 12:06 am

head_dunce wrote:
Thu Sep 02, 2021 11:16 pm
straightlight wrote:
Thu Sep 02, 2021 7:32 pm

Code: Select all

$limit = $this->request->get['limit'];
for:

Code: Select all

$limit = (int)$this->request->get['limit'];
Actually that's a problem, and that's how the original OC source code has it. If what's supplied is not a number, (int) will return zero.
By using (!is_numeric($limit)) || (floor($limit) != $limit) you're able to check if it is a number and if it is an integer. If what was supplied was anything else, it should just give a 404 since whatever was given was not supplied by OC. As noted above, I am showing links in Google Search Console with "limit=" being random strings. No idea how or why Google found those pages, but OC shouldn't be allowing them anyway.
May cause an undefined variable error message in the error triggers if it's not sanitized, that's why.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by head_dunce » Fri Sep 03, 2021 12:15 am

straightlight wrote:
Fri Sep 03, 2021 12:06 am
May cause an undefined variable error message in the error triggers if it's not sanitized, that's why.
Yeah, but if it's not a positive integer then OC should return the 404 or do something else besides generating the page as if all is well. You can sanitize it after it's checked to avoid the php error if you'd like.

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by straightlight » Fri Sep 03, 2021 1:18 am

head_dunce wrote:
Fri Sep 03, 2021 12:15 am
straightlight wrote:
Fri Sep 03, 2021 12:06 am
May cause an undefined variable error message in the error triggers if it's not sanitized, that's why.
Yeah, but if it's not a positive integer then OC should return the 404 or do something else besides generating the page as if all is well. You can sanitize it after it's checked to avoid the php error if you'd like.
Cinderella request. Browsers requires that the variable gets sanitized before the load completes, not after.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by head_dunce » Fri Sep 03, 2021 2:06 am

straightlight wrote:
Fri Sep 03, 2021 1:18 am
Cinderella request. Browsers requires that the variable gets sanitized before the load completes, not after.
Huh? Try on this glass slipper, let me know if this fits for you - you know the OC code best, I'm just trying to fix this SEO problem
/catalog/controller/product/category.php

Code: Select all

                if (isset($this->request->get['limit'])) {
                        $limit = $this->request->get['limit'];
                        if( (!is_numeric($limit)) || (floor($limit) != $limit) || ($limit < 1) ){
                                $this->response->redirect($this->url->link('error/not_found'));
                        }
That at least solves one of the problems I'm seeing in Google Search Console where the limit parameter is being fed random strings.

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by head_dunce » Fri Sep 03, 2021 2:17 am

ADD Creative wrote:
Thu Sep 02, 2021 8:28 pm
For the original page issue. You could try. After.

Code: Select all

$results = $this->model_catalog_product->getProducts($filter_data);
Add.

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->addHeader($this->request->server['SERVER_PROTOCOL'] . ' 404 Not Found');
}
But again this could cause issues if the limit is set. For example category?page=2&limit=100 would return no products and a 404, but the canonical link to category?page=2 would be valid and require indexing.
Building on this, it seems like this might be the answer to the problem of pages with no products
/catalog/controller/product/category.php

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->redirect($this->url->link('error/not_found'));
}

Jim
https://www.carguygarage.com
Yahoo Store since 2006 moved to OpenCart on January 24, 2020


Active Member

Posts

Joined
Thu Apr 04, 2019 11:50 pm

Post by ADD Creative » Fri Sep 03, 2021 4:44 pm

If you are going to redirect use a 301 (moved permanently).

Code: Select all

$this->response->redirect($this->url->link('error/not_found', 301));
Another option would be to redirect to the first page of the category. This would probably be better for a customer point of view.

Code: Select all

$this->response->redirect($this->url->link('product/category', 'path=' . $category_info['category_id'], 301));

www.add-creative.co.uk


Guru Member

Posts

Joined
Sat Jan 14, 2012 1:02 am
Location - United Kingdom

Post by by mona » Fri Sep 03, 2021 6:03 pm

Code: Select all

 if (!$results && (int)$page != 1) {
	$this->response->redirect($this->url->link('error/not_found'));
}
You can do that but remember, a 404/410 should be given when a page does not exist, not when no products are found, website-logic vs. business-logic.

When a request is made for page=10 but page 10 does not exist (given the number of products and the limit), you could give a 404 and when you do not let page= be indexed requests for them should not happen unless manually entered.

Code: Select all

			// non-existent pages (requested page > number of pages)
			if (!empty($this->request->get['page'])) {
				if (ceil($product_total / $limit) < $this->request->get['page']) {
					header($_SERVER['SERVER_PROTOCOL'] . ' 404 Not Found', true);
					exit();
				}
			}

However, when a page does not yield products, you simply state that no products are found, you do not give a 404, the page exists, the products don't.
You could have combinations of filters, price ranges, search keywords, etc. which simply yield no products.
In short, the $results are irrelevant for a 404 response.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am

Post by straightlight » Fri Sep 03, 2021 8:52 pm

Code: Select all

// non-existent pages (requested page > number of pages)
			if (isset($this->request->get['page'])) {
				if (ceil($product_total / $limit) < (int)$this->request->get['page']) {
					$this->response->redirect($this->url->link('product/category', 'path=' . $category_info['category_id'], 301));
					exit();
				}
			}

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by by mona » Fri Sep 03, 2021 10:34 pm

Code: Select all

// non-existent pages (requested page > number of pages)
			if (isset($this->request->get['page'])) {
				if (ceil($product_total / $limit) < (int)$this->request->get['page']) {
					$this->response->redirect($this->url->link('product/category', 'path=' . $category_info['category_id'], 301));
					exit();
				}
			}

This is not correct as search engines cache 301 permanent redirects.
Today you may not have a page 10, so you give a permanent redirect to the base page.
Tomorrow you may have a page 10 but since the search engine remembers the redirect it will not request that page but again retrieve the base page.

Then better do a 302 redirect which is not cached.

But if it is your goal to have indexed urls with page= querystring parameters removed from the index, you need to give a 404 or better a 410 or they will not remove them from the index.

DISCLAIMER:
You should not modify core files .. if you would like to donate a cup of coffee I will write it in a modification for you.


https://www.youtube.com/watch?v=zXIxDoCRc84


User avatar
Expert Member

Posts

Joined
Mon Jun 10, 2019 9:31 am
Who is online

Users browsing this forum: Mariogs and 21 guests