#33998 closed Bug (needsinfo)

Use alternates and i18n generate duplicated URLs in the sitemap

Reported by: brenosss Owned by: nobody
Component: contrib.sitemaps Version: 4.0
Severity: Normal Keywords: sitemap, i18n, alternates
Cc: Florian Demmer Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by brenosss)

If the i18n variable is set for True, the SiteMapClass generates a different URL for each item in LANGUAGES, but I'm using the alternates so I expected to have only one URL for the default language and have the translations version in the alternates URLs.

The current function:

    def _items(self):
        if self.i18n:
            # Create (item, lang_code) tuples for all items and languages.
            # This is necessary to paginate with all languages already considered.
            items = [
                (item, lang_code)
                for lang_code in self._languages()
                for item in self.items()
            ]
            return items
        return self.items()

This is a e.g of a result:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <url>
        <loc>https://example.com/en/contact</loc>
        <changefreq>weekly</changefreq>
        <priority>0.5</priority>
        <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/contact" />
        <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/contact" />
        <xhtml:link rel="alternate" hreflang="el" href="https://example.com/el/contact" />
    </url>
    <url>
        <loc>https://example.com/es/contact</loc>
        <changefreq>weekly</changefreq>
        <priority>0.5</priority>
        <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/contact" />
        <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/contact" />
        <xhtml:link rel="alternate" hreflang="el" href="https://example.com/el/contact" />
    </url>
    <url>
        <loc>https://example.com/el/contact</loc>
        <changefreq>weekly</changefreq>
        <priority>0.5</priority>
        <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/contact" />
        <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/contact" />
        <xhtml:link rel="alternate" hreflang="el" href="https://example.com/el/contact" />
    </url>
</urlset>

I propose to verify if the alternates is True and generate the items only for the default language:

    def _items(self):
        if self.i18n:
            if self.alternates:
                lang_code = self.default_lang or settings.LANGUAGE_CODE
                items = self.items()
                items = [
                    # The url will be generated based on the default language the translations links will be added in the alternate links
                    (item, lang_code)
                    for item in items
                ]
                return items
            # Create (item, lang_code) tuples for all items and languages.
            # This is necessary to paginate with all languages already considered.
            items = [
                (item, lang_code)
                for lang_code in self._languages()
                for item in self.items()
            ]
            return items
        return self.items()

Then i expected a result more like:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <url>
        <loc>https://example.com/en/contact</loc>
        <changefreq>weekly</changefreq>
        <priority>0.5</priority>
        <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/contact" />
        <xhtml:link rel="alternate" hreflang="es" href="https://example.com/es/contact" />
        <xhtml:link rel="alternate" hreflang="el" href="https://example.com/el/contact" />
    </url>
</urlset>

This makes sense? can I start work on it?

Change History (4)

comment:1 by brenosss, 20 months ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 20 months ago

Cc: Florian Demmer added

comment:3 by Florian Demmer, 20 months ago

Thank you for your report, but it is my understanding that the result you expect would not be correct. Each translation of a page needs a separate url entry.

Here is an example showing the explicit listing of all language "pages" by themselves with their alternates including itself: https://developers.google.com/search/docs/advanced/crawling/localized-versions#sitemap

Do you have any references, that support your expectation?

comment:4 by Mariusz Felisiak, 20 months ago

Resolution: needsinfo
Status: newclosed

Florian, thanks for checking.

Note: See TracTickets for help on using tickets.
Back to Top