Opened 16 years ago

Closed 16 years ago

#5875 closed (wontfix)

Non-ascii chars in source code causing UnicodeEncodeError for default 500 error handler, resulting in non-descriptive traceback-only error page

Reported by: tt@… Owned by: nobody
Component: Core (Other) Version: dev
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Project developed for pre-unicode django version, not utilizing i18n and using national chars in source with iso-8859-1 encoding.

Django configured for serving with DEFAULT_CHARSET and FILE_CHARSET set to "iso-8859-1".

Django upgraded to current SVN, and the mod_apache error handler appears, giving UnicodeEncodeError deep in Django.

Traceback (most recent call last):

  File "/usr/lib/python2.3/site-packages/django/core/servers/basehttp.py", line 279, in run
    self.finish_response()

  File "/usr/lib/python2.3/site-packages/django/core/servers/basehttp.py", line 317, in finish_response
    for data in self.result:

  File "/usr/lib/python2.3/site-packages/django/http/__init__.py", line 332, in next
    chunk = chunk.encode(self._charset)

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 5735-5737: ordinal not in range(256)

The core problem seems to be chars from the non-allowed range enter the unicode-only path for django, for example in the case of str model methods not returning unicode.

That is obviously a problem in itself, but should not trip Django up this badly.

Other examples have also beed observed, but all relate to database or form data being touched by non-unicode strings.

Catching UnicodeEncodeError and using replace mechanism at least enables developer to find offending chars and convert code to unicode.
I don't know if this patch does the right thing, but it gives a workable error message.

See attached patch for details.

Attachments (1)

patch.diff (519 bytes ) - added by tt@… 16 years ago.
patch for unicode exception

Download all attachments as: .zip

Change History (7)

by tt@…, 16 years ago

Attachment: patch.diff added

patch for unicode exception

comment:1 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: newclosed

This is not the right fix. We should be raising errors when bad data is passed in, not quietly disguising it. If your code is generating invalid data, it should be caught. If you want to replace whatever code it is you're changing in your local copy (since the patch doesn't identify the path to the file, I can only guess you're patching django.http.__init__.py, go for it, but this isn't a patch for core.

It's not clear from your description what the root cause of the problem is. Try to construct a small example demonstrating it and post to django-users to get some help if you need to. Remember that you must use UTF-8 encoded bytestrings in any data you pass to Django, if they're not unicode objects.

comment:2 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: closedreopened
Triage Stage: UnreviewedAccepted

Aah, I see .. you're talking about when we try to display the 500 error page (the part that is mentioned in the title and nowhere in the description).

We should force that to use UTF-8 output, for sure. That will require a change to HttpResponse to allow explicit overriding of _charset, but that's pretty easy.

comment:3 by tt@…, 16 years ago

Well, I talk about it in the description, "this badly", "mod_apache error handler" and so on, but I could be clearer in the description, sure.

Anway, yes, django choking on its own error handler is the concrete problem.
Will serving as UTF-8 resolve problems with non-unicode strings used in source ?

From my point of view, handling this error scenario "softer" than now is the priority.
Django should of course clearly indicate when there are encoding issues, and not do best-effort and get it wrong, but better error handling would be nice to have.

comment:4 by Malcolm Tredinnick, 16 years ago

By the way, we'll still need an example of how to reproduce this problem. There's already quite a lot of tolerance for displaying invalid characters in the technical_500_handler() function, so if you're able to cause the crash with valid code in current subversion trunk, please post a short example of how to do this.

I'm not inclined to move to any kind of softer error handling in terms of using "replace" or "ignore" anywhere there. We should never be sending out bad data and, if we are, it must be caught somewhere. We are permissive in what we accept (form input, etc) and strict in what we produce. Best practice.

comment:5 by tt@…, 16 years ago

I am sorry, but I seem to be unable to generate a good testcase against current SVN.

The code which triggered the report is being ported from 0.96 to current SVN, and did some interesting things with non-ascii chars, but I am not able to replicate that in small testcase.

You can probably close this bug at this time, as this might be more a problem in our code then in Django.
If I can figure this out with a good testcase, should I reopen this bug, or start a new one ?

comment:6 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: reopenedclosed

Reopen if you find a testcase, by all means. I'm interested in solving problems that occur, but we need to be able to identify what they are first. :-)

Note: See TracTickets for help on using tickets.
Back to Top