Opened 8 years ago

Closed 8 years ago

#26077 closed Cleanup/optimization (wontfix)

Change latin map in urlify to correctly translate Umlauts

Reported by: Christian Peters Owned by: nobody
Component: Internationalization Version: 1.9
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The latin map translates the umlauts ä, ü and ö to a, u and o:

https://github.com/django/django/blob/master/django/contrib/admin/static/admin/js/urlify.js#L5-L16

The correct way of doing this would be ae, ue and oe (it's the correct writing and preferred for seo reasons).

Change History (7)

comment:1 by Tim Graham, 8 years ago

Could you provide a reference for the claim or point to another system that has the proposed behavior? Thanks.

comment:2 by Christian Peters, 8 years ago

I'll look for references, but the basic point is, that the url version of an german umlaut should be like proposed.

If you have a website with mydomain.de/apple and mydomain.de/apples than the german version of it should be mydomain.de/apfel and mydomain.de/aepfel

In other cases, where you do not have a singular / plural ambiguity the slugified version is simply a misspelled version of the original one and this is noted by google (for the german rätsel, aka riddle=

https://www.google.com/search?q=ratsel -> google tries to autocorrect you to rätsel
https://www.google.com/search?q=raetsel -> google accepts the query as rätsel

This link describes the issue: http://blog.webcertain.com/do-umlauts-matter-how-to-handle-the-most-annoying-characters-in-german-seo-2/10/04/2014/

TL;DR: Converting ä to a results in misspelled words.

comment:3 by Aymeric Augustin, 8 years ago

Among the Western languages I'm familiar with, ä, ö and ü are most common in German and this is indeed the proper way to transliterate them.

However it would look rather weird for the handful of French, Spanish and Brazilian Portuguese words that include that letter.

I checked scandinavian languagues quickly and it looks like there's no general rule. Per https://en.wikipedia.org/wiki/Finnish_orthography:

The Germanic umlaut or convention of considering digraph ae equivalent to ä, and oe equivalent to ö is inapplicable in Finnish.

comment:4 by Aymeric Augustin, 8 years ago

It's a judgement call, but I'm -0 on this change. It would involve doing something more complicated that sometimes doesn't make sense, rather than doing something simple that isn't always optimal.

I won't stand in the way if we consider that German usage of umlauts is so dominant that we should ignore the edge cases in other languages.

comment:5 by Christian Peters, 8 years ago

I get the point.

Maybe URLify could expose an API that one could add / override MAPs? One could then add some logic based on the settings.py to configure the correct language?

It's used in Wagtail very heavily for slug generation and ATM i override the entire URLify.

comment:6 by Aymeric Augustin, 8 years ago

For better or worse, that's pretty much the expected solution if you need something more tailored than Django's default utilities :-|

There's the same situation with the Python slugify function. Django has a naive, four-line version that works fine for Western languages. If you want something more advanced, there's https://github.com/mozilla/unicode-slugify.

comment:7 by Tim Graham, 8 years ago

Component: UncategorizedInternationalization
Resolution: wontfix
Status: newclosed
Type: UncategorizedCleanup/optimization

I guess the original proposal is a "wontfix" unless a discussion on the DevelopersMailingList yields a different consensus.

Note: See TracTickets for help on using tickets.
Back to Top