#23010 closed Bug (fixed)

UnicodeDecodeError in makemessages’ call to os.walk

Reported by: alub Owned by: claudep
Component: Core (Management commands) Version: 1.7-rc-1
Severity: Release blocker Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In my project, using Python 2.7.8, if I try the following command:

./manage.py makemessages -l en

I get this error:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/__init__.py", line 385, in execute_from_command_line
    utility.execute()
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/__init__.py", line 377, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/base.py", line 337, in execute
    output = self.handle(*args, **options)
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/base.py", line 532, in handle
    return self.handle_noargs(**options)
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/commands/makemessages.py", line 288, in handle_noargs
    potfiles = self.build_potfiles()
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/commands/makemessages.py", line 304, in build_potfiles
    file_list = self.find_files(".")
  File "/srv/http/webapp/venv/lib/python2.7/site-packages/Django-1.7c1-py2.7.egg/django/core/management/commands/makemessages.py", line 355, in find_files
    for dirpath, dirnames, filenames in os.walk(root, topdown=True, followlinks=self.symlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/srv/http/webapp/venv/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 4: ordinal not in range(128)

It works in Django 1.6.5, but breaks since 1.7rc1. I think it is related to the commit dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f with the introduction of unicode_literals in this command because of this:

Python 2.7.8 (default, Jul  1 2014, 17:30:21) 
[GCC 4.9.0 20140604 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> _ = list(os.walk('.'))
>>> from __future__ import unicode_literals
>>> _ = list(os.walk('.'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "/srv/http/webapp/venv/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/srv/http/webapp/venv/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 4: ordinal not in range(128)

So I don’t really know if it’s a Django or Python bug, but it seems like a regression.

Change History (9)

comment:1 Changed 12 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to claudep
  • Patch needs improvement unset
  • Severity changed from Normal to Release blocker
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

Thanks for the report. I'll look into it.

comment:2 follow-up: Changed 12 months ago by claudep

I cannot reproduce. What is your system locale?

comment:3 Changed 12 months ago by alub

My system locale is fr_FR.UTF-8.

comment:4 in reply to: ↑ 2 Changed 12 months ago by alub

You can try the following to reproduce:

echo -e '\xe9' | xargs touch

and then run the command.

comment:5 Changed 12 months ago by alub

I must have other invalid (as in “not UTF-8”) file names in my project (in my media directory), so the UnicodeDecodeError from os.walk is to be expected in that case.
As for Django, I really don’t know if it should handle that or leave it that way.

comment:6 Changed 12 months ago by claudep

Ah, it's about invalid file names. Python 3 seems to handle this case much more gracefully.
Unfortunately, the exception location and message don't allow us to tell the user about the problematic file name.

Even if this is a bit of an edge case, I'm tempted to partially revert the commit [dbb48d2bb99a5f6] and keep feeding os.walk with a bytestring on Python 2.
The alternative would be to catch this exception and simply inform the user to check for invalid file names in its file tree.

comment:7 Changed 12 months ago by claudep

  • Has patch set

After some thought, I think that it would be better to ignore STATIC_ROOT/MEDIA_ROOT dirs in makemessages, which are the most potential source of weird file names. And it's probably the right thing to do anyway.
https://github.com/django/django/pull/2922

comment:8 Changed 12 months ago by timo

  • Triage Stage changed from Accepted to Ready for checkin

Patch looks good. I don't like adding more dependencies on settings in the command (especially if we move static files settings to an AppConfig, but I don't have alternate suggestions.

comment:9 Changed 12 months ago by claudep

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.
Back to Top