Post by lfairban » Fri Jun 24, 2016 12:59 am

When looking at the Google Sitemap Feed page, it's showing errors with products that have numbers and special characters and spaces in the Name field of the (oc_product_description.name) table. The sitemap appears to fail as it pertains to the thumbnail caption.

As an example: <image:caption>A & E - It' fails with the &

The data in the table is obviously fine because it's used elsewhere. What about the thumbnail caption that's causing it to fail? Anyone else have this issue? Workarounds?

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jun 24, 2016 1:13 am

In catalog/controller/feed/google_sitemap.php file,

find:

Code: Select all

$output .= '<image:caption>' . $product['name'] . '</image:caption>';
replace with:

Code: Select all

$output .= '<image:caption>' . str_replace('&', '&', html_entity_decode($product['name'], ENT_QUOTES, 'UTF-8')) . '</image:caption>';
Last edited by straightlight on Fri Jun 24, 2016 1:50 am, edited 3 times in total.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jun 24, 2016 1:30 am

Thanks much.

That did something but didn't solve it. Any ideas.

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jun 24, 2016 1:33 am

I edited the code above. Try with the new replacement.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jun 24, 2016 1:47 am

10+1&trade;
It looks like it might be failing on the number 1?

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jun 24, 2016 1:49 am

Which OC version are you using? Can you also post the exact URL to where this is happening? I have also re-edited the code above. Amp characters should not show up on this page ... is this the only page where this behavior occurs?

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by IP_CAM » Fri Jun 24, 2016 4:09 am

&trade; is probably NOT allowed even in HTML because it's likely 'interpreted' as BEEING some (invalid) HTML Code !
Ernie

My Github OC Site: https://github.com/IP-CAM
5'600 + FREE OC Extensions, on the World's largest private Github OC Repository Archive Site.


User avatar
Legendary Member

Posts

Joined
Tue Mar 04, 2014 1:37 am
Location - Switzerland

Post by lfairban » Fri Jun 24, 2016 4:44 am

IP_CAM wrote:&trade; is probably NOT allowed even in HTML because it's likely 'interpreted' as BEEING some (invalid) HTML Code !
Ernie
I think allowing ™ and ® are reasonable characters to expect in a product name. and there are just so many possible characters to anticipate. Somethings' just not setup to handle them correctly.

Straightlight, you've been very helpful with this error but I've tried to fix hundreds of variants and I'm still getting errors on the sitemap with no finish line in sight.

For now I'm just going to submit my own sitemap and forget about the hack. Well, unless something better comes along.

Thanks for you help.

BTW, is this something that's still broken in version 2.2?

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by lfairban » Sat Jul 23, 2016 2:06 am

I submitted my own sitemap to Google but I still wonder if there's a fix to this?

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Sat Jul 23, 2016 3:17 am

I've been working on a solution based on this long time reported problem. I found an interesting topic on stackoverflow that may perhaps fix this encoding issue with the application/xml heading based on: http://stackoverflow.com/questions/2822 ... afe-values

In catalog/controller/feed/google_sitemap.php flle,

find:

Code: Select all

protected function getCategories($parent_id, $current_path = '') {
add above:

Code: Select all

protected function sanitizeXML($xml_content, $xml_followdepth = true) {
		if (preg_match_all('%<((\w+)\s?.*?)>(.+?)</\2>%si', $xml_content, $xmlElements, PREG_SET_ORDER)) {			                $xmlSafeContent = '';

			foreach ($xmlElements as $xmlElem){
				$xmlSafeContent .= '<'.$xmlElem['1'].'>';
				
				if (preg_match('%<((\w+)\s?.*?)>(.+?)</\2>%si', $xmlElem['3'])) {
					$xmlSafeContent .= $this->sanitizeXML($xmlElem['3'], false);
				} else {
					$xmlSafeContent .= htmlspecialchars($xmlElem['3'], ENT_NOQUOTES);
				}
				
				$xmlSafeContent .= '</'.$xmlElem['2'].'>';
			}

			if (!$xml_followdepth) {
				return $xmlSafeContent;
				
			} else {
				return "<?xml version='1.0' encoding='UTF-8'?>".$xmlSafeContent;
			}

		} else {
			return htmlspecialchars($xml_content, ENT_NOQUOTES);
		}

	}
Then, find:

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
			$this->response->setOutput($parse_output);
replace with:

Code: Select all

$parse_output = $this->sanitizeXML($output);

			$this->response->addHeader('Content-Type: application/xml');
			$this->response->setOutput($parse_output);
Hope this helps.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jul 29, 2016 9:35 pm

Thanks, I'll try that.
Where your search was:

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
         $this->response->setOutput($parse_output);
I had the following (output)

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
			$this->response->setOutput($output);

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jul 29, 2016 9:49 pm

lfairban wrote:Thanks, I'll try that.
Where your search was:

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
         $this->response->setOutput($parse_output);
I had the following (output)

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
			$this->response->setOutput($output);
This is intended. The $set_output variable is the parsed variable that should be replaced with. Although, I did not tried it with the same $output variable for various reasons, in this case.

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jul 29, 2016 10:12 pm

That doesn't seem to have done the trick.
I wonder if the expression (%<((\w+)\s?.*?)>(.+?)</\2>%si) is catching all of the errors I'm seeing?

I'm not all that good with Reg Exp. what is this expression finding?

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jul 29, 2016 10:16 pm

lfairban wrote:That doesn't seem to have done the trick.
I wonder if the expression (%<((\w+)\s?.*?)>(.+?)</\2>%si) is catching all of the errors I'm seeing?

I'm not all that good with Reg Exp. what is this expression finding?
Can you clarify: 'Doesn't seem to have done the trick?'

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jul 29, 2016 10:39 pm

It's not finding "&reg;" in that regular expression. From the web, this looks like it would find all special characters

Code: Select all

&[#0-9a-z]+;

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by lfairban » Fri Jul 29, 2016 11:04 pm

Can you clarify: 'Doesn't seem to have done the trick?'
That regular expression appears to only search for standard html so it's not finding ®, ™ and other special characters in the name of the product.

Code: Select all

%<((\w+)\s?.*?)>(.+?)</\2>%si
So, when I hit the google sitemap page, it fails because of those characters.

Since I'm not putting "html" in the Name field, it's not finding html. I need help build something that would search for special characters. ..Let me try something. I may have answered my own question.
Last edited by lfairban on Fri Jul 29, 2016 11:08 pm, edited 1 time in total.

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by straightlight » Fri Jul 29, 2016 11:04 pm

According to this site: http://stackoverflow.com/questions/2747 ... p-with-php , after reverting the changes above, try this solution:

Change:

Code: Select all

$this->response->addHeader('Content-Type: application/xml');
to read:

Code: Select all

$this->response->addHeader('Content-Type: text/xml');

Dedication and passion goes to those who are able to push and merge a project.

Regards,
Straightlight
Programmer / Opencart Tester


Legendary Member

Posts

Joined
Mon Nov 14, 2011 11:38 pm
Location - Canada, ON

Post by lfairban » Fri Jul 29, 2016 11:37 pm

Thank you so much for your help.

Reverting back to original and swapping out the application for text did not get rid of the error.

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am

Post by lfairban » Fri Jul 29, 2016 11:49 pm

Oh, Brother!
The VQMOD search wasn't selecting as expected.

I changed the file directly and got different results... it's failing on the image caption now.

Code: Select all

</image:loc><image:caption>Coin & Feather
Now I need to go back and test your original fix to this.

User avatar
New member

Posts

Joined
Fri Feb 28, 2014 5:01 am
Who is online

Users browsing this forum: No registered users and 47 guests