#16501 closed New feature (fixed)
Add option to accept unicode characters in SlugField
Reported by: | norn | Owned by: | Piotr Banaszkiewicz |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | unicode, slug |
Cc: | kwadrat | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I patch every new version of my django installation and wondering why nobody fixed this in branch tree.
The problem is extremely simple - I like to use slug in my native language (and google likes the same). To fix the problem we just need to do this in validators.py:
slug_re = re.compile(r'^[-\w]+$',flags=re.U)
instead of this
slug_re = re.compile(r'^[-\w]+$')
The solution is simple. And the same we should do with URLValidator.
Attachments (1)
Change History (29)
comment:2 by , 13 years ago
The world is not english speaking only.
Why do we have to do some magic instead of using out-of-the-box functionality?
follow-up: 4 comment:3 by , 13 years ago
Non-ASCII letters are completely valid in URL schemes. The whole point of a slug isn't to contain ASCII letters. It is merely to provide a human-readable URL entry. There are plenty of humans who read outside of ASCII. Assuming the fix can be made backwards compatible, I see no reason not to have it.
Also, aaugustin, your tone came off a little combative/hostile. I'm sure you didn't mean it that way, but that's how it came out. Just a heads up.
follow-up: 5 comment:4 by , 13 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
Replying to anonymous:
The world is not english speaking only.
Sure — as a native speaker of a language that uses accents heavily, I agree completely :)
Why do we have to do some magic instead of using out-of-the-box functionality?
Well, if you don't like the out-of-the-box functionality that everyone else uses, it sounds reasonable to write you own variant. There's hardly any magic in my suggestion.
Replying to justinlilly:
Non-ASCII letters are completely valid in URL schemes.
Yes, since RFC 3986, it's possible to use non-ASCII characters in URLs without ambiguities: the charset must be UTF-8.
That RFC was published is 2005. Browser vendors may not have changed it immediately (historically, most browsers defaulted to latin1 in western languages), and some people still use browsers from a few years ago. I don't know exactly what's the current status. We need to check how mainstream browsers react to an URL like "http://localhost/how-to-brew-café/" before proceeding: do they properly utf-8-encode and percent-encode it?
The whole point of a slug isn't to contain ASCII letters. It is merely to provide a human-readable URL entry. There are plenty of humans who read outside of ASCII.
I'm a bit confused by this. Django provides a SlugField
that is designed to contain an URL-friendly version of a CharField
. For instance, the title of a blog post could be "How to brew café" (in the CharField
) and the slug would be "how-to-brew-cafe" (in the SlugField
). If you assume that an URL can contain any text, you don't need SlugField
at all; you can just use the original title.
A drawback is that the actual, correct URL is: http://localhost/how-to-brew-caf%C3%A9/. Some browsers may percent-decode and utf-8-decode this to display it as: http://localhost/how-to-brew-café/ — I think at least Firefox does. I don't know about IE. If it turns out we have to choose between http://localhost/how-to-brew-caf%C3%A9/ and http://localhost/how-to-brew-cafe/, I think the latter is more readable.
Also, aaugustin, your tone came off a little combative/hostile. I'm sure you didn't mean it that way, but that's how it came out. Just a heads up.
Thanks for the hint and sorry -- the tone of the original report got a bit on my nerves. Comment 2 is great too... :)
comment:5 by , 13 years ago
Replying to aaugustin:
Replying to justinlilly:
Non-ASCII letters are completely valid in URL schemes.
Yes, since RFC 3986, it's possible to use non-ASCII characters in URLs without ambiguities: the charset must be UTF-8.
That RFC was published is 2005. Browser vendors may not have changed it immediately (historically, most browsers defaulted to latin1 in western languages), and some people still use browsers from a few years ago. I don't know exactly what's the current status. We need to check how mainstream browsers react to an URL like "http://localhost/how-to-brew-café/" before proceeding: do they properly utf-8-encode and percent-encode it?
Well, why should we fear of browser incompatibility to pre-2005 standards in Django to be released in 2011/2012?
Django 1.0 to 1.3 do not allow utf8 in slug, but world is changing. If RFC3986 was approved in 2005, then django have to comply it.
I am sorry if my tone seems not friendly. English is not my native language, but I am definitely do not want to offense anybody. I am just trying to solve the problem and communicate in the most effecient way without formal shell. Peace!
comment:6 by , 13 years ago
Regarding browser support, the current consensus (per a recent discussion on the mailing list) is to drop IE6 and start at IE7 — released in 2006. We have to live with the fact that many developers use Django to create websites for end users that aren't on the cutting edge of technology :)
In my opinion, the real issue here isn't the date of the RFC. It's the support in mainstream browsers (IE, Firefox, Chrome, and to a lesser extent Safari, Opera). I don't know if the RFC was actually implemented in these browsers, and when — many RFCs die unimplemented... That why I suggested that, if someone is interested in this, the first step is to prove that it works correctly in IE >= 7, Firefox >= 3, and the latest Chrome, Safari and Opera.
Then, there's still the question of whether the percent-encoded version is more readable than the ASCII version, but that's a matter of taste, and I'm not the one who makes these decisions :)
comment:7 by , 13 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Marking accepted: SlugField
should be useful in most of the world, not just the English-speaking parts.
(SlugField
probably should take a regex
argument so people could easily change the concept of what a "slug" is. But that's another ticket :)
comment:8 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 13 years ago
Attachment: | slug_unicode_16501.diff added |
---|
comment:9 by , 13 years ago
Has patch: | set |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Type: | Bug → New feature |
comment:10 by , 13 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Closed accidentally, sorry.
comment:11 by , 13 years ago
Cc: | added |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Reviewed - ready for checkin.
comment:12 by , 13 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
I'm bumping this back out of RFC. While I agree wholeheartedly that we need to allow users to choose unicode slugs if they so desire, we can't just arbitrarily change the way this works. Many developers depend on the promise that the slug is ASCII. Arbitrarily switching to allow unicode characters will produce bugs in previously stable code, and may introduce security vulnerabilities.
Django's backwards compatibility policy requires that features like this continue to work as documented. Before this patch is ready to commit, the patch must be re-worked so that the default slug behavior of reducing to ASCII remains. Whether or not the default behavior should change at some point in the future (1.5 or 1.6) is a matter that should be discussed on the django-dev list.
comment:13 by , 12 years ago
Status: | reopened → new |
---|
comment:14 by , 11 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
I would propose to add a UnicodeSlugField instead of introducing an incompatible change, thats why I would propose a patch needs update.
comment:15 by , 11 years ago
One of the requirements in my app is that I need to have non-English slugs for use in URLs. This should pose no problem, since modern browsers support Unicode in URLs, and Django is a framework that supports modern practices as well, right? So I try to do it, but find out that SlugField
only works with ASCII by default. This is mildly annoying, but not really a problem, since you can just extend SlugField
, yes? So I look up the source for SlugField
and modify the validator regexp like this:
slug_re = re.compile(r'^[-\w_]+$', re.UNICODE) validate_slug = RegexValidator(slug_re, "Enter a valid name consisting of letters, numbers, underscores or hyphens.", 'invalid') class SlugField(models.SlugField): default_validators = [validate_slug]
As it turns out, it just doesn't work, Unicode alphanumerical characters are still rejected as invalid, and I have not the slightest idea why. A workaround I found was extending directly from CharField
, but having to completely reimplement Django functionality just because I want to use more than ASCII is not very nice at all.
comment:16 by , 11 years ago
I've wrote a patch adding a unicode
option for SlugField. Pull Request is at https://github.com/django/django/pull/1979
comment:17 by , 11 years ago
I've created an alternative pul request that adds db.models.UnicodeSlugField
and forms.UnicodeSlugField
https://github.com/django/django/pull/1987
On one hand, I like the new field approach because it gives us an easier upgrade path. On the other hand, it feels like I'm polluting django.db.models
and django.forms
with slightly different versions of something that's already there.
Unfortunately, just passing a regex argument (as suggested by Jacob) is not going to be enough, because we'll also have to pass an error message string for the validator.
comment:18 by , 10 years ago
An alternate approach suggested by Claude on the mailing list was to "create a new validate_uslug
validator (better name anyone?) and allow a custom validator to be passed to the SlugField constructor." We need someone to compare these three approaches (at least) and find a consensus on the DevelopersMailingList in order to move this issue forward.
comment:19 by , 10 years ago
Well, I realized that 3rd approach is more or less the same as the PR linked in comment 16.
I created a django-developers thread to get feedback on the two approaches.
comment:20 by , 10 years ago
Summary: | validators.py don't like unicode slug → Add option to accept unicode characters in SlugField |
---|
comment:21 by , 10 years ago
comment:22 by , 10 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Version: | 1.3 → master |
comment:23 by , 10 years ago
Patch needs improvement: | set |
---|
comment:24 by , 9 years ago
Submitted a new PR #4518 to replace PR #3729.
The new pull request is rebased onto the latest master, and fixes some items discussed in the previous pull request.
There is one item that needs to be reviwed.
django/contrib/admin/options.py:1744
@property def media(self): extra = '' if settings.DEBUG else '.min' js = ['jquery%s.js' % extra, 'jquery.init.js', 'inlines%s.js' % extra] # TODO: Was removed in master, modified in patch. if self.prepopulated_fields: js.extend(['xregexp.min.js', 'urlify.js', 'prepopulate%s.js' % extra]) if self.filter_vertical or self.filter_horizontal: js.extend(['SelectBox.js', 'SelectFilter2.js']) return forms.Media(js=[static('admin/js/%s' % url) for url in js])
I am not sure what to do with the updated code.
Congrats, you have found the very best way to piss off the maintainers of an open-source project in your first sentence :)
You may want to read the FAQ: https://docs.djangoproject.com/en/1.3/faq/contributing/#faq-contributing-code
A bug won't be fixed if you don't submit a patch: https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/submitting-patches/
Now, regarding your report, out of the 4 files called
validators.py
in Django, you're referring to the one indjango/core
. You're saying that the built-in validatordjango.core.validators.validate_slug
should accept unicode characters.This function is currently documented as:
We could debate if "letters" means "ASCII letters" or "Unicode characters that have some property that says they're a letter". Currently, it means "ASCII letters".
validate_slug
is used in only one place in Django, as the default validator forSlugField
.So, the real problem here is that
SlugField
will only accept ASCII letters.In my opinion, the whole point of a slug is to contain only ASCII characters, to make sure URLs built with slugs have no charset issues, no matter where they're copy-pasted (URL bars, email, IRC, IM, etc.)
If the
SlugField
provided by Django doesn't meet your expectations, you can easily build your own:I'm leaning towards WONTFIX, but I'd like someone else to confirm my analysis before closing the ticket.