Post by serpro-ltd » Tue Aug 20, 2024 2:08 am

HI all

Penny pinching me again.
Being in a process of building umpteen sites using the multi site facility of Opencart, which is great for doing this. Needed to have a script that would create a sitemap xml file for each one we create.

Got no problem people wanting the easy way to do things but I'm a bit tight in my old age and just want a script that would create the file for google and ping them when finished.

Not an integral bit of code for use inside Opencart but once proven that it works for you, you can call it by a cron. Can have lots of them then for each of the sites withing the multisite framework.

Think it's self explanatory. Just change all my URL's to yours, you'll have to go through the script to find them, not too many. It will strip out many of the superfluous files automatically.

If you change to code for the better please place it here for other users to benefit from. I make no apologies for the coding, I'm 70 years old and am not into people saying I should have done this or that. Change it if you wish, up to you.

Obviously, make backups of all your site before using mods and scripts. You may have to change directory names.

Have fun
Regards
Adrian
=================================================

Code: Select all

<?php

// Configuration
$baseUrl = 'https://www.your-site.com/';
$visited = [];
$toVisit = [$baseUrl];
$pages = [];
$sitemapFile = 'sitemap-your-site.xml';

// Exclude specific URLs from the sitemap
$excludedUrls = [
    'https://www.your-site.com/index.php?route=account/login',
    'https://www.your-site.com/index.php?route=account/register',
    'https://www.your-site.com/index.php?route=checkout/cart',
    'https://www.your-site.com/index.php?route=account/return/add',
    'https://www.your-site.com/index.php?route=account/forgotten',
    'https://www.your-site.com/index.php?route=information/information/agree&information_id=5',
    'https://www.your-site.com/index.php?route=product/compare',
    'https://www.your-site.com/index.php?route=product/special',
    'https://www.your-site.com/index.php?route=product/search',
    'https://www.your-site.com/index.php?route=account/transaction',
    'https://www.your-site.com/index.php?route=account/return',
    'https://www.your-site.com/index.php?route=account/reward',
    'https://www.your-site.com/index.php?route=account/recurring',
    'https://www.your-site.com/index.php?route=account/download',
    'https://www.your-site.com/index.php?route=account/address',
    'https://www.your-site.com/index.php?route=information/sitemap',
    'https://www.your-site.com/index.php?route=account/password',
    'https://www.your-site.com/index.php?route=account/edit'
];

// Function to get all links from a given URL
function getLinks($url) {
    $htmlContent = @file_get_contents($url);
    $links = [];
    
    if ($htmlContent === FALSE) {
        echo "Failed to retrieve content from $url\n";
        return $links;
    }

    $dom = new DOMDocument();
    @$dom->loadHTML($htmlContent);
    $xpath = new DOMXPath($dom);
    $anchorTags = $xpath->query("//a[@href]");
    
    foreach ($anchorTags as $anchor) {
        $link = $anchor->getAttribute('href');
        if (strpos($link, '#') === 0 || strpos($link, 'mailto:') === 0) {
            continue; // Ignore anchor and mailto links
        }
        // Convert relative URLs to absolute
        if (strpos($link, 'http') !== 0) {
            $link = rtrim($url, '/') . '/' . ltrim($link, '/');
        }
        $links[] = $link;
    }
    
    return $links;
}

// Crawl the site and gather all unique pages
while (!empty($toVisit)) {
    $url = array_pop($toVisit);
    
    if (!in_array($url, $visited)) {
        $visited[] = $url;

        // Exclude specific URLs
        if (!in_array($url, $excludedUrls)) {
            $pages[] = $url;
        }

        $foundLinks = getLinks($url);
        
        foreach ($foundLinks as $link) {
            if (strpos($link, 'https://www.your-site.com') === 0 && !in_array($link, $visited) && !in_array($link, $toVisit)) {
                $toVisit[] = $link;
            }
        }
    }
}

// Create the XML structure for the sitemap
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><urlset></urlset>');
$xml->addAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');

// Add each valid page to the sitemap
foreach ($pages as $page) {
    $urlElement = $xml->addChild('url');
    $urlElement->addChild('loc', htmlspecialchars($page));
    $urlElement->addChild('lastmod', date('Y-m-d'));
    $urlElement->addChild('priority', '0.80');
}

// Save the XML sitemap to a file
if ($xml->asXML($sitemapFile) === FALSE) {
    echo "Failed to save the sitemap to file: $sitemapFile\n";
} else {
    echo "Sitemap successfully created: $sitemapFile\n";

    // Ping Google to notify of the new sitemap
    $googlePingUrl = 'http://www.google.com/ping?sitemap=' . urlencode($baseUrl . $sitemapFile);
    $pingResponse = file_get_contents($googlePingUrl);

    if ($pingResponse !== FALSE) {
        echo "Google successfully pinged.\n";
    } else {
        echo "Failed to ping Google.\n";
    }
}

Attachments


Serpro Spill Management
www.serpro.co.uk

Opencart 3.0.03.9
Journal3
Order Number Manager
PDF Invoice Pro
Category Description By Multi Store
MASS products update: Stores
MultiStore Payment Methods
Price Update With Option Change
Product Description By Multi Store
Quick Admin Search By Mart Extensions
Ultimate Shipping


User avatar
Newbie

Posts

Joined
Wed Aug 07, 2024 2:07 am
Location - Maidstone Uk

Post by IP_CAM » Wed Aug 21, 2024 2:51 am


My Github OC Site: https://github.com/IP-CAM
5'600 + FREE OC Extensions, on the World's largest private Github OC Repository Archive Site.


User avatar
Legendary Member

Posts

Joined
Tue Mar 04, 2014 1:37 am
Location - Switzerland
Who is online

Users browsing this forum: No registered users and 5 guests