Context Navigation

← Previous Ticket
Next Ticket →

#3690 closed (fixed)

the smart_unicode in newforms/util.py shouldn't assume utf-8 encoded strings in utf-8 environment

Reported by:	fback+django@…	Owned by:	Adrian Holovaty
Component:	Forms	Version:	dev
Severity:		Keywords:	unicode-branch
Cc:		Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	yes	Patch needs improvement:	yes
Easy pickings:	no	UI/UX:	no

Description

In utf-8 environment:

>>> from django import newforms as forms
>>> f = forms.CharField()
>>> f.clean('aaa')
u'aaa'
>>> f.clean('ąąą')  <---- there are latin2 characters, instead utf-8
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/django/newforms/fields.py", line 99, in clean
    value = smart_unicode(value)
  File "/usr/lib/python2.4/site-packages/django/newforms/util.py", line 15, in smart_unicode
    s = unicode(s, settings.DEFAULT_CHARSET)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
>>>

Django should not trust to anything that the browser sends. It's trivial (and common in non-latin1 countries) to set custom
encoding (instead auto-detect, this helps with wrongly configured www servers that declare to send latin1 and in fact send
local (eg. latin2, or koir8, or...) encoded characters) and this will cause 500 server error messages.

Maybe the solution would be to catch UnicodeError and encode incoming string like:

>>> "ąąą".decode('ascii', 'replace')  <--- this is again in latin2 instead utf8
u'\ufffd\ufffd\ufffd'
>>> print "ąąą".decode('ascii', 'replace')
���

Other solution could be to add FALLBACK_CHARSET variable, that will be used to decode the string.
This variable could be set per language for i18n environments, so one can set latin1 for de, fr,
latin2 for cz, pl and other encodings as appropriate.

Regards,
fback

Attachments (1)

util.py.diff (483 bytes ) - added by fback+django@… 19 years ago.: proposed patch

Download all attachments as: .zip

Change History (6)

by fback+django@…, 19 years ago

Attachment:	util.py.diff added

proposed patch

comment:1 by fback+django@…, 19 years ago

Has patch:	set
Patch needs improvement:	set

This is the easiest patch that solves it.

Problems:

this will work for european languages, but not for russian / asian
after discussion on #django we did not agree, if this should be done here, or in some middleware
this could be improved in two ways: one can pass other encoding to use as additional argument to smart_unicode(), or it could try to guess incoming encoding.

comment:2 by Simon G. <dev@…>, 19 years ago

Needs tests:	set
Triage Stage:	Unreviewed → Accepted

comment:3 by Malcolm Tredinnick, 19 years ago

Keywords:	unicode-branch added

This was fixed on the unicode branch in [5197], in a different way to what is given here: we fix it at the source (input), rather than later on.

I will close the ticket when the branch is merged back into trunk.

comment:4 by Malcolm Tredinnick, 19 years ago

Resolution:	→ fixed
Status:	new → closed

(In [5609]) Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

comment:12 by Martin v. Löwis, 15 years ago

In [16948]:

Dummy merge.
Merged revisions 5609-5612,5614-5626,5629-5632,5636,5638-5646,5649-5654,5658-5660,5662-5700 via svnmerge from
https://code.djangoproject.com/svn/django/trunk

........

r5609 | mtredinnick | 2007-07-04 14:11:04 +0200 (Mi, 04 Jul 2007) | 5 lines

Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

........

r5610 | mtredinnick | 2007-07-04 14:25:43 +0200 (Mi, 04 Jul 2007) | 3 lines

Fixed Javascript syntax from [5608] that was causing a problem in Opera. Fixed
#4365.

........

r5611 | mtredinnick | 2007-07-04 14:31:19 +0200 (Mi, 04 Jul 2007) | 3 lines

Fixed #4766 -- Added Russian support to Javascript slug creation. Thanks,
boobsd@….

........

r5612 | mtredinnick | 2007-07-04 14:48:12 +0200 (Mi, 04 Jul 2007) | 2 lines

Fixed some ReST errors.

........

r5614 | mtredinnick | 2007-07-05 03:25:05 +0200 (Do, 05 Jul 2007) | 3 lines

Form encoding should be changed only via HttpRequest, not on GET and POST
directly.

........

r5615 | mtredinnick | 2007-07-05 05:25:11 +0200 (Do, 05 Jul 2007) | 2 lines

Fixed #4717 -- Updated Catalan translation. Thanks, marc.garcia@….

........

r5616 | mtredinnick | 2007-07-05 05:29:18 +0200 (Do, 05 Jul 2007) | 3 lines

Fixed #4753 -- Updated Spanish translation. Also move translators' names out of
PO file and into AUTHORS.

........

r5617 | mtredinnick | 2007-07-05 12:27:22 +0200 (Do, 05 Jul 2007) | 3 lines

Added a test that shows the problem in #4470. This fails only for the mysql_old
backend. Refs #4470.

........

r5618 | mtredinnick | 2007-07-05 13:08:40 +0200 (Do, 05 Jul 2007) | 3 lines

Added CACHE_MIDDLEWARE_SECONDS to global settings and documentation (it's
used by the cache middleware). Refs #1015.

........

r5619 | mtredinnick | 2007-07-05 13:10:27 +0200 (Do, 05 Jul 2007) | 5 lines

Fixed #1015 -- Fixed decorator_from_middleware to return a real decorator even
when arguments are given. This looks a bit ugly, but it's fully backwards
compatible and all the extra work is done at import time, so it shouldn't have
any real performance impact.

........

r5620 | russellm | 2007-07-05 14:54:42 +0200 (Do, 05 Jul 2007) | 2 lines

Fixed minor typo in assertion message.

........

r5621 | gwilson | 2007-07-06 06:04:42 +0200 (Fr, 06 Jul 2007) | 2 lines

Fixed #4779 -- Fixed a couple typos in the test_client_regress tests that surfaced when typo was corrected in [5620]. Thanks ferringb@….

........

r5622 | mtredinnick | 2007-07-06 08:53:27 +0200 (Fr, 06 Jul 2007) | 2 lines

Fixed #4781 -- Typo fix. Pointed out by Simon Litchfield.

........

r5623 | mtredinnick | 2007-07-06 10:04:04 +0200 (Fr, 06 Jul 2007) | 4 lines

Fixed #4770 -- Fixed some Unicode conversion problems in the mysql_old backend
with old MySQLdb versions. Tested against 1.2.0, 1.2.1 and 1.2.1p2 with only
expected failures.

........

r5624 | mtredinnick | 2007-07-06 10:35:25 +0200 (Fr, 06 Jul 2007) | 3 lines

Fixed #4782 -- Updated Slovenian translation. Thanks, Gasper Koren. Also moved
contributor names into AUTHORS file.

........

r5625 | mtredinnick | 2007-07-06 12:21:14 +0200 (Fr, 06 Jul 2007) | 4 lines

Fixed #4776 -- Fixed a problem with handling of upload_to attributes. The new
solution still works with non-ASCII filenames. Based on a patch from
mike.j.thompson@….

........

r5626 | russellm | 2007-07-07 04:16:23 +0200 (Sa, 07 Jul 2007) | 2 lines

Added some uncredited authors that worked on the Oracle branch.

........

r5629 | mtredinnick | 2007-07-07 19:15:54 +0200 (Sa, 07 Jul 2007) | 8 lines

Changed HttpRequest.path to be a Unicode object. It has already been
URL-decoded by the time we see it anyway, so keeping it as a UTF-8 bytestring
was causing unnecessary problems.

Also added handling for non-ASCII URL fragments in feed creation (the portion
that was outside the control of the Feed class was messed up).

........

r5630 | mtredinnick | 2007-07-07 20:24:27 +0200 (Sa, 07 Jul 2007) | 4 lines

Fixed #4772 -- Fixed reverse URL creation to work with non-ASCII arguments.
Also included a test for non-ASCII strings in URL patterns, although that
already worked correctly.

........

r5631 | mtredinnick | 2007-07-07 20:39:23 +0200 (Sa, 07 Jul 2007) | 3 lines

Corrected misleading comment from [5619]. Not sure what I was smoking at the
time.

........

r5632 | mtredinnick | 2007-07-08 02:39:32 +0200 (So, 08 Jul 2007) | 5 lines

Fixed reverse URL lookup using functions when the original URL pattern was a
string. This is now just as fragile as it was prior to [5609], but works in a
few cases that people were relying on, apparently.

........

r5636 | mtredinnick | 2007-07-08 13:22:53 +0200 (So, 08 Jul 2007) | 4 lines

Fixed #4798-- Made sure that function keyword arguments are strings (for the
keywords themselves) when using Unicode URL patterns.

........

r5638 | gwilson | 2007-07-10 04:34:42 +0200 (Di, 10 Jul 2007) | 2 lines

Fixed #4817 -- Removed leading forward slashes from some urlconf examples in the documentation.

........

r5639 | gwilson | 2007-07-10 04:45:11 +0200 (Di, 10 Jul 2007) | 2 lines

Fixed #4814 -- Fixed some whitespace issues in tutorial01, thanks John Shaffer.

........

r5640 | gwilson | 2007-07-10 05:26:26 +0200 (Di, 10 Jul 2007) | 2 lines

Fixed #4812 -- Fixed an octal escape in regular expression that is used in the isValidEmail validator, thanks batchman@….

........

r5641 | mtredinnick | 2007-07-10 14:02:06 +0200 (Di, 10 Jul 2007) | 3 lines

Fixed #4823 -- Fixed a Python 2.3 incompatibility from [5636] (it was even
demonstrated by existing tests, so I really screwed this up).

........

r5642 | mtredinnick | 2007-07-10 14:03:36 +0200 (Di, 10 Jul 2007) | 3 lines

Fixed #4804 -- Fixed a problem when validating choice lists with non-ASCII
data. Thanks, django@….

........

r5643 | mtredinnick | 2007-07-10 14:33:55 +0200 (Di, 10 Jul 2007) | 4 lines

Fixed #3760 -- Added the ability to manually set feed- and item-level id
elements in Atom feeds. This is fully backwards compatible. Based on a patch
from spark343@….

........

r5644 | mtredinnick | 2007-07-11 08:55:12 +0200 (Mi, 11 Jul 2007) | 3 lines

Fixed #4815 -- Fixed decoding of request parameters when the input encoding is
not UTF-8. Thanks, Jordan Dimov.

........

r5645 | mtredinnick | 2007-07-11 09:00:27 +0200 (Mi, 11 Jul 2007) | 3 lines

Fixed #4802 -- Updated French translation. Combined contribution from
baptiste.goupil@… and rocherl@….

........

r5646 | mtredinnick | 2007-07-11 09:12:50 +0200 (Mi, 11 Jul 2007) | 2 lines

Fixed #4753 -- Small update to Spanish translation from Mario Gonzalez.

........

r5649 | jacob | 2007-07-12 02:33:44 +0200 (Do, 12 Jul 2007) | 1 line

Fixed #4615: corrected reverse URL resolution examples in tutorial 4. Thanks for the patch, simeonf.

........

r5650 | adrian | 2007-07-12 06:43:29 +0200 (Do, 12 Jul 2007) | 1 line

Added 'New in Django development version' note to docs/syndication_feeds.txt changes from [5643]

........

r5651 | adrian | 2007-07-12 06:44:45 +0200 (Do, 12 Jul 2007) | 1 line

Edited changes to docs/tutorial04.txt from [5649]

........

r5652 | adrian | 2007-07-12 07:23:47 +0200 (Do, 12 Jul 2007) | 1 line

Added helpful error message to SiteManager.get_current() if the user hasn't set SITE_ID

........

r5653 | adrian | 2007-07-12 07:28:04 +0200 (Do, 12 Jul 2007) | 1 line

Added RequestSite class to sites framework

........

r5654 | adrian | 2007-07-12 07:29:32 +0200 (Do, 12 Jul 2007) | 1 line

Improved syndication feed framework to use RequestSite if the sites framework is not installed -- i.e., the sites framework is no longer required to use the syndication feed framework. This is backwards incompatible if anybody has subclassed Feed and overridden init(), because the second parameter is now expected to be an HttpRequest object instead of request.path

........

r5658 | russellm | 2007-07-12 09:45:35 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4459 -- Added 'raw' argument to save method, to override any pre-save processing, and modified serializers to use a raw-save. This enables serialization of DateFields with auto_now/auto_now_add. Also modified serializers to invoke save() directly on the model baseclass, to avoid any (potentially order-dependent, data modifying) behavior in a custom save() method.

........

r5659 | russellm | 2007-07-12 13:24:16 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #3770 -- Remove null=True tag from OneToOne serialization test. OneToOne fields can't have a value of null.

........

r5660 | russellm | 2007-07-12 13:27:38 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #3768 -- Disabled NullBooleanField PK serialization test. We can't and don't test null PK values.

........

r5662 | russellm | 2007-07-12 14:33:24 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4837 -- Updated Debian packaging details. Thanks for the suggestion, Yasushi Masuda <whosaysni@…>.

........

r5663 | russellm | 2007-07-12 14:44:05 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4808 -- Added Chilean regions in localflavor. Thanks, Marijn Vriens <marijn@…>.

........

r5664 | russellm | 2007-07-12 14:48:27 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4745 -- Updated docs to point out that 0 is not a valid SITE_ID when running the tests. Thanks for the suggestion, Lars Stavholm <stava@…>.

........

r5665 | russellm | 2007-07-12 14:50:02 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4763 -- Minor typo in cache documentations. Thanks, dan@….

........

r5666 | russellm | 2007-07-12 14:55:28 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4627 -- Added details on MacPorts packaging of Django. Thanks, Paul Bissex.

........

r5667 | russellm | 2007-07-12 15:23:11 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4640 -- Fixed import to stringfilter in docs. Proposed solution to move stringfilter into django.template.init introduces a circular import problem.

........

r5668 | russellm | 2007-07-12 15:32:00 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4722 -- Clarified discussion about PYTHONPATH in modpython docs. Thanks for the suggestion, Collin Grady <cgrady@…>.

........

r5669 | russellm | 2007-07-12 15:37:59 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4755 -- Modified newforms MultipleChoiceField to use list comprehension, rather than iteration.

........

r5670 | russellm | 2007-07-12 15:41:27 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4764 -- Added reference to Locale middleware in middleware docs. Thanks, dan@….

........

r5671 | russellm | 2007-07-12 15:55:19 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4768 -- Converted timesince and dateformat to use explicit floor division (pre-emptive avoidance of Python 3000 compatibility problem), and removed a redundant millisecond check. Thanks, John Shaffer <jshaffer2112@…>.

........

r5672 | russellm | 2007-07-12 16:00:13 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4775 -- Added some missing Hungarian accents to the urlify.js LATIN_MAP. Thanks, Pistahh <szekeres@…>.

........

r5673 | russellm | 2007-07-12 16:05:16 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4502 -- Clarified reference to view in tutorial. Thanks for the suggestion, Carl Karsten <carl@…>.

........

r5674 | russellm | 2007-07-12 16:11:41 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4522 -- Clarified the allowed filter arguments on the time and date filters. Thanks for the suggestion, admackin@….

........

r5675 | russellm | 2007-07-12 16:21:51 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4525 -- Fixed mistaken documentation on arguments to runfcgi. Thanks, Johan Bergstrom <bugs@…>.

........

r5676 | russellm | 2007-07-12 16:41:32 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4538 -- Split the installation instructions to differentiate between installing a distribution package and installing an official release. Thanks to Carl Karsten for the idea, and Paul Bissex for the patch.

........

r5677 | russellm | 2007-07-12 17:26:37 +0200 (Do, 12 Jul 2007) | 2 lines

Fixed #4526 -- Modified the test Client login method to fail when a user is inactive. Thanks, marcin@….

........

r5678 | russellm | 2007-07-13 07:03:33 +0200 (Fr, 13 Jul 2007) | 2 lines

Fixed #3505 -- Added handling for the error raised when the user forgets the comma in a single element tuple when defining AUTHENTICATION_BACKENDS. Thanks for the help identifying this problem, Mario Gonzalez <gonzalemario@…>.

........

r5679 | mtredinnick | 2007-07-13 10:52:07 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #2591 -- Fixed a problem with inspectdb with psycopg2 (only). Patch from
Gary Wilson.

........

r5680 | mtredinnick | 2007-07-13 11:09:59 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4807 -- Fixed a couple of corner cases in decimal form input validation.
Based on a suggestion from Chriss Moffit.

........

r5681 | mtredinnick | 2007-07-13 11:14:51 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4839 -- Added repr methods to URL classes that show the pattern they
contain. Thanks, Thomas Güttler.

........

r5682 | mtredinnick | 2007-07-13 12:56:30 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4842 -- Added slightly more robust error reporting. Thanks, Thomas
Güttler.

........

r5683 | mtredinnick | 2007-07-13 13:05:01 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4846 -- Fixed some Python 2.3 encoding problems in the admin interface.
Based on a patch from daybreaker12@….

........

r5684 | mtredinnick | 2007-07-13 14:03:20 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4861 -- Removed some duplicated logic from the newforms RegexField by
making it a subclass of CharField. Thanks, Collin Grady.

........

r5685 | mtredinnick | 2007-07-13 15:15:35 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4865 -- Replaced a stray generator comprehension with a list
comprehension so that we don't break Python 2.3.

........

r5686 | mtredinnick | 2007-07-13 16:13:35 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4469 -- Added slightly more informative error messages to max- and
min-length newform validation. Based on a patch from A. Murat Eren.

........

r5687 | mtredinnick | 2007-07-13 16:14:47 +0200 (Fr, 13 Jul 2007) | 2 lines

Added author credit for [5686]. Refs #4469.

........

r5688 | mtredinnick | 2007-07-13 16:33:46 +0200 (Fr, 13 Jul 2007) | 3 lines

Fixed #4484 -- Fixed APPEND_SLASH handling to handle an empty path value.
Thanks, VesselinK.

........

r5689 | mtredinnick | 2007-07-13 16:40:39 +0200 (Fr, 13 Jul 2007) | 2 lines

Fixed #4556 -- Stylistic changes to [5500]. Thanks, glin@….

........

r5690 | gwilson | 2007-07-13 22:36:01 +0200 (Fr, 13 Jul 2007) | 2 lines

Refs #2591 -- Removed int conversion and try/except since the value in the single-item list is already an int. I overlooked this in my original patch, which was applied in [5679].

........

r5691 | adrian | 2007-07-13 23:20:07 +0200 (Fr, 13 Jul 2007) | 1 line

Documented the 'commit' argument to save() methods on forms created via form_for_model() or form_for_instance()

........

r5692 | mtredinnick | 2007-07-14 07:27:22 +0200 (Sa, 14 Jul 2007) | 3 lines

Fixed #4869 -- Added a note that syncdb does not alter existing tables. Thanks,
James Bennett.

........

r5693 | mtredinnick | 2007-07-14 14:48:24 +0200 (Sa, 14 Jul 2007) | 3 lines

Fixed #4863 -- Removed comment references to a no-longer present link. Pointed
out by Thomas Güttler.

........

r5694 | mtredinnick | 2007-07-14 15:14:28 +0200 (Sa, 14 Jul 2007) | 2 lines

Fixed #4862 -- Fixed invalid Javascript creation in popup windows in admin.

........

r5695 | mtredinnick | 2007-07-14 15:39:41 +0200 (Sa, 14 Jul 2007) | 2 lines

Fixed a problem with translatable strings from [5686].

........

r5696 | mtredinnick | 2007-07-14 16:47:14 +0200 (Sa, 14 Jul 2007) | 3 lines

Fixed #4731 -- Changed management.setup_environ() so that it no longer assumes
the settings module is called "settings". Patch from SmileyChris.

........

r5697 | mtredinnick | 2007-07-14 16:50:35 +0200 (Sa, 14 Jul 2007) | 3 lines

Fixed #4870 -- Removed unneeded import and fixed a docstring in an example.
Thanks, Collin Grady.

........

r5698 | adrian | 2007-07-14 18:58:54 +0200 (Sa, 14 Jul 2007) | 1 line

Edited docs/db-api.txt changes from [5658]

........

r5699 | adrian | 2007-07-14 19:04:30 +0200 (Sa, 14 Jul 2007) | 1 line

Negligible capitalization fix in test/client.py docstring

........

r5700 | russellm | 2007-07-15 06:41:59 +0200 (So, 15 Jul 2007) | 2 lines

Clarified the documentation on the steps that happen during a save, and how raw save affects those steps.

........

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#3690 closed (fixed)

the smart_unicode in newforms/util.py shouldn't assume utf-8 encoded strings in utf-8 environment

Description

Attachments (1)

Change History (6)

by fback+django@…, 19 years ago

comment:1 by fback+django@…, 19 years ago

comment:2 by Simon G. <dev@…>, 19 years ago

comment:3 by Malcolm Tredinnick, 19 years ago

comment:4 by Malcolm Tredinnick, 19 years ago

comment:12 by Martin v. Löwis, 15 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us