Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#19397 closed Bug (fixed)

UnicodeDecodeError on binary file when using custom project template/skeleton

Reported by: gw.2012@… Owned by: nobody
Component: Core (Management commands) Version: dev
Severity: Release blocker Keywords: project template, skeleton, utf8
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is a regression in current development Django 1.5 version when using startproject (and startapp) with custom project/app template/skeleton directory.

In Django 1.4 the following worked flawlessly, but in current master version an error happens during processing of binary files (that should imho not be parsed if not explicitly requested). Steps to repeat:

$ virtualenv --no-site-packages testdj15; cd testdj15/; . bin/activate
...
$ pip install git+http://github.com/django/django.git
...
$ mkdir skeleton; dd if=/dev/urandom of=skeleton/test.png bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0850216 s, 12.3 MB/s
$ django-admin.py startproject --template skeleton abcproject
UnicodeDecodeError: 'utf8' codec can't decode byte 0x93 in position 3: invalid start byte
$ ls -al abcproject/test.png skeleton/test.png
ls: cannot access abcproject/test.png: No such file or directory
-rw-r--r-- 1 gw gw 1048576 Nov 30 13:47 skeleton/test.png

Change History (6)

comment:1 by Preston Holmes, 11 years ago

Triage Stage: UnreviewedAccepted

While the example of using a PNG as a project template may not make sense at first glance, I've verified the error happens if you have a PNG inside a zipped tar as well - which is entirely possible.

This is in fact a regression introduced in https://github.com/django/django/commit/3afb5916b215c79e36408b729c9516bc435f5cb7

We will probably have to come up with a way of checking each file walked in the template.

http://stackoverflow.com/questions/898669/how-can-i-detect-if-a-file-is-binary-non-text-in-python

has some food for thought.

comment:2 by gw.2012@…, 11 years ago

Use case for PNG, ICO or similar are in project template is when someone is creating an educational template for a series of very similar projects and wants to put everything in it , such as apple-touch-icon.png and favicon.ico which are binary.

Anyway a solution would be to decide based on the extension if it is binary or not like that:

django/django/core/management/templates.py:
if filename.endswith(extensions) or filename in extra_files:
    ... codecs.open(old_file,.., 'utf-8'), read, template.render, codecs.open(new_file,.., 'utf-8'), write ...
else:
    ... use a binary file copying method without rendering, eg. with shutil.copyfile(old_file, new_file) ...

comment:3 by Aymeric Augustin, 11 years ago

First, that commit isn't correct; it should have used settings.FILE_CHARSET instead of hardcoding utf-8.

I propose to load the file contents as a bytestring, attempt to decode it with FILE_CHARSET, and skip the file if that raises a UnicodeDecodeError.

comment:4 by Aymeric Augustin, 11 years ago

Yay for speaking too fast...

Settings aren't avaisable in startproject. utf-8 is a reasonnable default; if that's a problem, we could add an option to specify a different charset.

There's a whitelist of file extensions that can be processed. The easiest solution is to decode/render/encode only those.

comment:5 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: newclosed

In c9a47fb379cab4c0fe9be27c9924236e75327bd0:

[1.5.x] Fixed #19397 -- Crash on binary files in project templates.

Thanks gw 2012 at tnode com for the report.

Backport of baae4b8.

comment:6 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In baae4b818778180fedfcfcfc7aa77acfb9b237fb:

Fixed #19397 -- Crash on binary files in project templates.

Thanks gw 2012 at tnode com for the report.

Note: See TracTickets for help on using tickets.
Back to Top