Code

Opened 5 months ago

Closed 4 months ago

Last modified 4 months ago

#21574 closed Bug (fixed)

Different behaviour in Python 2 and 3 when normalizing newlines with django.utils.text.normalize_newlines

Reported by: vajrasky Owned by: vajrasky
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: sky.kok@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Python 3.3:

>>> from django.utils.text import normalize_newlines
>>> normalize_newlines("abc\r\ndef")
'abc\ndef'
>>> normalize_newlines(b"abc\r\ndef")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/sky/Code/python/env/django/lib/python3.3/site-packages/django/utils/functional.py", line 213, in wrapper
    return func(*args, **kwargs)
  File "/home/sky/Code/python/env/django/lib/python3.3/site-packages/django/utils/text.py", line 252, in normalize_newlines
    return force_text(re.sub(r'\r\n|\r|\n', '\n', text))
  File "/usr/lib64/python3.3/re.py", line 170, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: can't use a string pattern on a bytes-like object

Python 2.7:

>>> from django.utils.text import normalize_newlines
>>> normalize_newlines(u"abc\r\ndef")
u'abc\ndef'
>>> normalize_newlines("abc\r\ndef")
u'abc\ndef'

I can produce the patch but I need to know who is in fault here: Python 2 or Python 3? Should Python 2 rejects binary or should Python 3 accepts binary?

Attachments (0)

Change History (4)

comment:1 Changed 5 months ago by vajrasky

  • Cc sky.kok@… added
  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to vajrasky
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:2 Changed 5 months ago by vajrasky

Or maybe when we give bytes to normalize_newlines, we will get bytes. But if it is string, then we'll get string. But I believe this will break backward compatibility. My vote is on banning bytes.

comment:3 Changed 4 months ago by Baptiste Mispelon <bmispelon@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 2c837233f5de7d5e309833e39782c7a208a03880:

Fixed #21574 -- Handle bytes consistently in utils.text.normalize_newlines.

All input is now coerced to text before being normalized.
This changes nothing under Python 2 but it allows bytes
to be passed to the function without a TypeError under Python3
(bytes are assumed to be utf-8 encoded text).

Thanks to trac user vajrasky for the report.

comment:4 Changed 4 months ago by Baptiste Mispelon <bmispelon@…>

In db41778e8ccbbba19954c3b47853b8520ab263a1:

Removed unnecessary call to force_text in utils.html.clean_html.

Refs #21574

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.