Opened 19 years ago

Closed 17 years ago

Last modified 14 years ago

#2282 closed enhancement (wontfix)

Urlify in admin compatible with accents

Reported by: David Larlet <larlet@…> Owned by: nobody
Component: contrib.admin Version: dev
Severity: normal Keywords:
Cc: semente@…, serialx.net@…, mmitar@… Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This feature is for non english writers, when you write a title which have a slug field, urlify.js just convert the string into a url-one-removing-special-chars. Unfortunatly, there is an issue if you put accents in this title (eg. "Modèles" will be urlified as "modles" that is not really "safe" for referencement and url comprehension). So maybe a litle hack of this file can be awsome, for the moment I had to edit the slug field by hand...

It can be useful to adapt the removelist too for non english applications (not really the same enhancement, I know).

Change History (22)

comment:1 by James Bennett, 19 years ago

The issue of converting non-"English" characters to a form suitable for a URL has been brought up several times, and always stalls out at the "OK, so how do we do it" phase.

See this thread on the django-developers mailing list for some discussion:

http://groups.google.com/group/django-developers/browse_thread/thread/e1079d2f77c7f938/086bc7e24ad26ded

comment:2 by David Larlet, 19 years ago

This is not really the same problem in this thread. UTF-8 URLs looks too close to phishing today, I just want to convert accents in order to keep understandable slugs. In my current blog app I have:

file attached, too ugly with wiki formatting...

Is it possible to had this in current urlify? (if I'm not clear you can ask me on IRC, david`bgk)

by David Larlet, 19 years ago

Attachment: str2url.js added

str2url

comment:5 by Malcolm Tredinnick, 19 years ago

Note that #1602 (which is now marked as a dupe of this one) has some Serbian mappings that we can use in the eventual patch.

comment:6 by David Larlet, 18 years ago

Any pros and cons for this patch?

comment:7 by Orestis Markou <orestis@…>, 18 years ago

Here's a thought:

Can't every language have its own urlify.js file ? Like urlify.fr.js,
urlify.el.js, urlify.de.js etc. Django selects the correct one by
looking at the Locale settings, or falls back at the default (English)
one if there isn't any.

This is:

a) Easier to maintain, the group responsible for the i18n gets
responsibility of sensibly defining the transliteration scheme and
b) Easier to extend, as each language can have custom exclude word
lists, custom techniques etc.

Maybe there can be a way of passing parameters to the javascript
function, so one can select between different methods of slugifying.

I'm Greek, Greeks use a transliteration scheme called "Greeklish" that
uses the Latin alphabet, which is widely understood.

comment:8 by Michael Radziej <mir@…>, 18 years ago

Resolution: duplicate
Status: newclosed
Triage Stage: UnreviewedDesign decision needed

#3309 marked as duplicate. #3309 is about the javascript helper, and both parts must use the same approach to handle non-ASCII characters, so it's really one problem.

My personal view is that, at the current stage of implemention, it is too costly to try to maintain a best approach for each of the several hundreds languages in use today. I'd suggest:

  • to provide one common standard version for all languages that might remove all accents ('ä' -> 'a'), which can be done generically (but perhaps not easily from within JavaScript),
  • and add a hook so that everybody can link in their favourite way of handling this

I don't have any idea how to treat non-Latin scripts like chinese, not even Russian or Greek, but table-based transliterations are too much effort when they need to be maintained for each language.

comment:9 by Michael Radziej <mir@…>, 18 years ago

Keywords: reopen added

Args, I didn't want to close this one! Can anybody pleease reopen? Sorry.

comment:10 by orestis@…, 18 years ago

Table based transliteration have to only be set up once, then they can be forgot...

comment:11 by Jacob, 18 years ago

Resolution: duplicate
Status: closedreopened

comment:12 by Michael Radziej <mir@…>, 18 years ago

Keywords: reopen removed

comment:13 by David Larlet <larlet@…>, 18 years ago

For the record:
2007/3/1, James Bennett:

On 3/1/07, David Larlet:

What about bug #2282? What's the actual status? A patch is proposed,
it doesn't suit any case but it's a good start for a part of the
world.

It still needs a discussion and decision about how far we want to go
to support (essentially) arbitrary Unicode in slugs. That's come up a
time or two, and there's never been any sort of consensus on a best
practice, so I'd say this probably isn't going to change before 0.96.

Source: http://groups.google.com/group/django-developers/msg/f20dea3b549af9c8?hl=en&

comment:14 by Guilherme M. Gondim (semente) <semente@…>, 18 years ago

Only signing the ticket for mail updates.

comment:15 by Chris Beaven, 18 years ago

Cc: semente@… added

Guilherme, you have to add your address to the CC field. Let me do it for you ;)

comment:16 by Sung-jin Hong, 17 years ago

Cc: serialx.net@… added

What about the CJK languages?

I think something must be done with the CJK languages seperately.

in reply to:  16 comment:17 by anonymous, 17 years ago

Replying to serialx:

What about the CJK languages?

I think something must be done with the CJK languages seperately.

as for CJK languages i prefer use encodeURI to encode those CJK char:
in function URLify

s = s.replace(/\s+|\s+$/g, ); trim leading/trailing spaces
s = s.replace(/[-\s]+/g, '-');
convert spaces to hyphens
s = encodeURI(s);
s = s.replace(/[
-\w\s%]/g, ); remove unneeded chars #besides# %
s = s.toLowerCase();
convert to lowercase

comment:18 by askfor@…, 17 years ago

    s = s.replace(/^\s+|\s+$/g, ''); // trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-');   // convert spaces to hyphens
    s = encodeURI(s);
    s = s.replace(/[^-\w\s%]/g, '');  // remove unneeded chars #besides# %
    s = s.toLowerCase();             // convert to lowercase

comment:19 by Malcolm Tredinnick, 17 years ago

Resolution: wontfix
Status: reopenedclosed

We've debated this seemingly endlessly on django-dev. Realistically, continuing to tweak urlify.js is not an option, it will become ridiculously large and unmanageable.

This is a helper function for doing simple transliteration where simple cases are possible. It is not a solution to all the world's localisation problems and is not intended to be. If a particular locale cannot be handled via simple character mapping (and East Asian languages are an obvious example of those that can't be), then using automatic slug generation via this filter is simply not an option for those locales. Fortunately, even if automatic generation is turned on, it can be manually overridden by somebody using the admin, so it isn't a show-stopper.

Let's leave urlify.js doing it's current job and people who want extra features can write them as separate javascript files and include them directly into admin and on the field. Trying to achieve or expect too much with a simple helper is unrealistic.

comment:20 by Mitar, 14 years ago

Cc: mmitar@… added
Easy pickings: unset
UI/UX: unset

I have added made slugify2 function which first downcodes and then translates to slug. It behaves exactly the same as its JavaScript counterpart (which can be used simply to override the bundled function by loading it later in HTML). So now it is possible to have both in Python and JavaScript same behavior.

Note: See TracTickets for help on using tickets.
Back to Top