Code

Opened 8 years ago

Closed 7 years ago

Last modified 3 years ago

#2282 closed enhancement (wontfix)

Urlify in admin compatible with accents

Reported by: David Larlet <larlet@…> Owned by: nobody
Component: contrib.admin Version: master
Severity: normal Keywords:
Cc: semente@…, serialx.net@…, mmitar@… Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This feature is for non english writers, when you write a title which have a slug field, urlify.js just convert the string into a url-one-removing-special-chars. Unfortunatly, there is an issue if you put accents in this title (eg. "Modèles" will be urlified as "modles" that is not really "safe" for referencement and url comprehension). So maybe a litle hack of this file can be awsome, for the moment I had to edit the slug field by hand...

It can be useful to adapt the removelist too for non english applications (not really the same enhancement, I know).

Change History (22)

comment:1 Changed 8 years ago by ubernostrum

The issue of converting non-"English" characters to a form suitable for a URL has been brought up several times, and always stalls out at the "OK, so how do we do it" phase.

See this thread on the django-developers mailing list for some discussion:

http://groups.google.com/group/django-developers/browse_thread/thread/e1079d2f77c7f938/086bc7e24ad26ded

comment:2 Changed 8 years ago by David Larlet

This is not really the same problem in this thread. UTF-8 URLs looks too close to phishing today, I just want to convert accents in order to keep understandable slugs. In my current blog app I have:

file attached, too ugly with wiki formatting...

Is it possible to had this in current urlify? (if I'm not clear you can ask me on IRC, david`bgk)

Changed 8 years ago by David Larlet

str2url

comment:5 Changed 8 years ago by mtredinnick

Note that #1602 (which is now marked as a dupe of this one) has some Serbian mappings that we can use in the eventual patch.

comment:6 Changed 8 years ago by David Larlet

Any pros and cons for this patch?

comment:7 Changed 8 years ago by Orestis Markou <orestis@…>

Here's a thought:

Can't every language have its own urlify.js file ? Like urlify.fr.js,
urlify.el.js, urlify.de.js etc. Django selects the correct one by
looking at the Locale settings, or falls back at the default (English)
one if there isn't any.

This is:

a) Easier to maintain, the group responsible for the i18n gets
responsibility of sensibly defining the transliteration scheme and
b) Easier to extend, as each language can have custom exclude word
lists, custom techniques etc.

Maybe there can be a way of passing parameters to the javascript
function, so one can select between different methods of slugifying.

I'm Greek, Greeks use a transliteration scheme called "Greeklish" that
uses the Latin alphabet, which is widely understood.

comment:8 Changed 7 years ago by Michael Radziej <mir@…>

  • Resolution set to duplicate
  • Status changed from new to closed
  • Triage Stage changed from Unreviewed to Design decision needed

#3309 marked as duplicate. #3309 is about the javascript helper, and both parts must use the same approach to handle non-ASCII characters, so it's really one problem.

My personal view is that, at the current stage of implemention, it is too costly to try to maintain a best approach for each of the several hundreds languages in use today. I'd suggest:

  • to provide one common standard version for all languages that might remove all accents ('ä' -> 'a'), which can be done generically (but perhaps not easily from within JavaScript),
  • and add a hook so that everybody can link in their favourite way of handling this

I don't have any idea how to treat non-Latin scripts like chinese, not even Russian or Greek, but table-based transliterations are too much effort when they need to be maintained for each language.

comment:9 Changed 7 years ago by Michael Radziej <mir@…>

  • Keywords reopen added

Args, I didn't want to close this one! Can anybody pleease reopen? Sorry.

comment:10 Changed 7 years ago by orestis@…

Table based transliteration have to only be set up once, then they can be forgot...

comment:11 Changed 7 years ago by jacob

  • Resolution duplicate deleted
  • Status changed from closed to reopened

comment:12 Changed 7 years ago by Michael Radziej <mir@…>

  • Keywords reopen removed

comment:13 Changed 7 years ago by David Larlet <larlet@…>

For the record:
2007/3/1, James Bennett:

On 3/1/07, David Larlet:

What about bug #2282? What's the actual status? A patch is proposed,
it doesn't suit any case but it's a good start for a part of the
world.

It still needs a discussion and decision about how far we want to go
to support (essentially) arbitrary Unicode in slugs. That's come up a
time or two, and there's never been any sort of consensus on a best
practice, so I'd say this probably isn't going to change before 0.96.

Source: http://groups.google.com/group/django-developers/msg/f20dea3b549af9c8?hl=en&

comment:14 Changed 7 years ago by Guilherme M. Gondim (semente) <semente@…>

Only signing the ticket for mail updates.

comment:15 Changed 7 years ago by SmileyChris

  • Cc semente@… added

Guilherme, you have to add your address to the CC field. Let me do it for you ;)

comment:16 follow-up: Changed 7 years ago by serialx

  • Cc serialx.net@… added

What about the CJK languages?

I think something must be done with the CJK languages seperately.

comment:17 in reply to: ↑ 16 Changed 7 years ago by anonymous

Replying to serialx:

What about the CJK languages?

I think something must be done with the CJK languages seperately.

as for CJK languages i prefer use encodeURI to encode those CJK char:
in function URLify

s = s.replace(/\s+|\s+$/g, ); trim leading/trailing spaces
s = s.replace(/[-\s]+/g, '-');
convert spaces to hyphens
s = encodeURI(s);
s = s.replace(/[
-\w\s%]/g, ); remove unneeded chars #besides# %
s = s.toLowerCase();
convert to lowercase

comment:18 Changed 7 years ago by askfor@…

    s = s.replace(/^\s+|\s+$/g, ''); // trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-');   // convert spaces to hyphens
    s = encodeURI(s);
    s = s.replace(/[^-\w\s%]/g, '');  // remove unneeded chars #besides# %
    s = s.toLowerCase();             // convert to lowercase

comment:19 Changed 7 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from reopened to closed

We've debated this seemingly endlessly on django-dev. Realistically, continuing to tweak urlify.js is not an option, it will become ridiculously large and unmanageable.

This is a helper function for doing simple transliteration where simple cases are possible. It is not a solution to all the world's localisation problems and is not intended to be. If a particular locale cannot be handled via simple character mapping (and East Asian languages are an obvious example of those that can't be), then using automatic slug generation via this filter is simply not an option for those locales. Fortunately, even if automatic generation is turned on, it can be manually overridden by somebody using the admin, so it isn't a show-stopper.

Let's leave urlify.js doing it's current job and people who want extra features can write them as separate javascript files and include them directly into admin and on the field. Trying to achieve or expect too much with a simple helper is unrealistic.

comment:20 Changed 3 years ago by mitar

  • Cc mmitar@… added
  • Easy pickings unset
  • UI/UX unset

I have added made slugify2 function which first downcodes and then translates to slug. It behaves exactly the same as its JavaScript counterpart (which can be used simply to override the bundled function by loading it later in HTML). So now it is possible to have both in Python and JavaScript same behavior.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.