Page 1 of 2
Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 12:59 am
by lfairban
When looking at the Google Sitemap Feed page, it's showing errors with products that have numbers and special characters and spaces in the Name field of the (oc_product_description.name) table. The sitemap appears to fail as it pertains to the thumbnail caption.
As an example: <image:caption>A & E - It' fails with the &
The data in the table is obviously fine because it's used elsewhere. What about the thumbnail caption that's causing it to fail? Anyone else have this issue? Workarounds?
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 1:13 am
by straightlight
In catalog/controller/feed/google_sitemap.php file,
find:
Code: Select all
$output .= '<image:caption>' . $product['name'] . '</image:caption>';
replace with:
Code: Select all
$output .= '<image:caption>' . str_replace('&', '&', html_entity_decode($product['name'], ENT_QUOTES, 'UTF-8')) . '</image:caption>';
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 1:30 am
by lfairban
Thanks much.
That did something but didn't solve it. Any ideas.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 1:33 am
by straightlight
I edited the code above. Try with the new replacement.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 1:47 am
by lfairban
10+1™
It looks like it might be failing on the number 1?
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 1:49 am
by straightlight
Which OC version are you using? Can you also post the exact URL to where this is happening? I have also re-edited the code above. Amp characters should not show up on this page ... is this the only page where this behavior occurs?
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 2:09 am
by lfairban
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 4:09 am
by IP_CAM
™ is probably NOT allowed even in HTML because it's likely 'interpreted' as BEEING some (invalid) HTML Code !
Ernie
Re: Sitemap Error with numbers and special characters
Posted: Fri Jun 24, 2016 4:44 am
by lfairban
IP_CAM wrote:™ is probably NOT allowed even in HTML because it's likely 'interpreted' as BEEING some (invalid) HTML Code !
Ernie
I think allowing ™ and ® are reasonable characters to expect in a product name. and there are just so many possible characters to anticipate. Somethings' just not setup to handle them correctly.
Straightlight, you've been very helpful with this error but I've tried to fix hundreds of variants and I'm still getting errors on the sitemap with no finish line in sight.
For now I'm just going to submit my own sitemap and forget about the hack. Well, unless something better comes along.
Thanks for you help.
BTW, is this something that's still broken in version 2.2?
Re: Sitemap Error with numbers and special characters
Posted: Sat Jul 23, 2016 2:06 am
by lfairban
I submitted my own sitemap to Google but I still wonder if there's a fix to this?
Re: Sitemap Error with numbers and special characters
Posted: Sat Jul 23, 2016 3:17 am
by straightlight
I've been working on a solution based on this long time reported problem. I found an interesting topic on stackoverflow that may perhaps fix this encoding issue with the application/xml heading based on:
http://stackoverflow.com/questions/2822 ... afe-values
In catalog/controller/feed/google_sitemap.php flle,
find:
Code: Select all
protected function getCategories($parent_id, $current_path = '') {
add above:
Code: Select all
protected function sanitizeXML($xml_content, $xml_followdepth = true) {
if (preg_match_all('%<((\w+)\s?.*?)>(.+?)</\2>%si', $xml_content, $xmlElements, PREG_SET_ORDER)) { $xmlSafeContent = '';
foreach ($xmlElements as $xmlElem){
$xmlSafeContent .= '<'.$xmlElem['1'].'>';
if (preg_match('%<((\w+)\s?.*?)>(.+?)</\2>%si', $xmlElem['3'])) {
$xmlSafeContent .= $this->sanitizeXML($xmlElem['3'], false);
} else {
$xmlSafeContent .= htmlspecialchars($xmlElem['3'], ENT_NOQUOTES);
}
$xmlSafeContent .= '</'.$xmlElem['2'].'>';
}
if (!$xml_followdepth) {
return $xmlSafeContent;
} else {
return "<?xml version='1.0' encoding='UTF-8'?>".$xmlSafeContent;
}
} else {
return htmlspecialchars($xml_content, ENT_NOQUOTES);
}
}
Then, find:
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($parse_output);
replace with:
Code: Select all
$parse_output = $this->sanitizeXML($output);
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($parse_output);
Hope this helps.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 9:35 pm
by lfairban
Thanks, I'll try that.
Where your search was:
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($parse_output);
I had the following (output)
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($output);
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 9:49 pm
by straightlight
lfairban wrote:Thanks, I'll try that.
Where your search was:
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($parse_output);
I had the following (output)
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
$this->response->setOutput($output);
This is intended. The $set_output variable is the parsed variable that should be replaced with. Although, I did not tried it with the same $output variable for various reasons, in this case.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 10:12 pm
by lfairban
That doesn't seem to have done the trick.
I wonder if the expression (%<((\w+)\s?.*?)>(.+?)</\2>%si) is catching all of the errors I'm seeing?
I'm not all that good with Reg Exp. what is this expression finding?
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 10:16 pm
by straightlight
lfairban wrote:That doesn't seem to have done the trick.
I wonder if the expression (%<((\w+)\s?.*?)>(.+?)</\2>%si) is catching all of the errors I'm seeing?
I'm not all that good with Reg Exp. what is this expression finding?
Can you clarify: 'Doesn't seem to have done the trick?'
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 10:39 pm
by lfairban
It's not finding "®" in that regular expression. From the web, this looks like it would find all special characters
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 11:04 pm
by lfairban
Can you clarify: 'Doesn't seem to have done the trick?'
That regular expression appears to only search for standard html so it's not finding ®, ™ and other special characters in the name of the product.
So, when I hit the google sitemap page, it fails because of those characters.
Since I'm not putting "html" in the Name field, it's not finding html. I need help build something that would search for special characters. ..Let me try something. I may have answered my own question.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 11:04 pm
by straightlight
According to this site:
http://stackoverflow.com/questions/2747 ... p-with-php , after reverting the changes above, try this solution:
Change:
Code: Select all
$this->response->addHeader('Content-Type: application/xml');
to read:
Code: Select all
$this->response->addHeader('Content-Type: text/xml');
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 11:37 pm
by lfairban
Thank you so much for your help.
Reverting back to original and swapping out the application for text did not get rid of the error.
Re: Sitemap Error with numbers and special characters
Posted: Fri Jul 29, 2016 11:49 pm
by lfairban
Oh, Brother!
The VQMOD search wasn't selecting as expected.
I changed the file directly and got different results... it's failing on the image caption now.
Code: Select all
</image:loc><image:caption>Coin & Feather
Now I need to go back and test your original fix to this.