Opened 6 years ago

Closed 6 years ago

#12301 closed (invalid)

Template adds extra characters when using utf8 file encoding

Reported by: anonymous Owned by: nobody
Component: Uncategorized Version: 1.1
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

I have this setup:

website/
   templates/base.html
   templates/header.html

and all other needed files, but they are not important. In base.html I try to include header.html, which had utf8 character, so it threw me an exception. I changed header.html file encoding from ansii to utf8, and no exceptions, but it adds extra space in the top. It doesn't happen, if I write directly into base.html, only then I'm including.

base.html file looks like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>test</title>
    </head>
    <body style="margin: 0px; padding: 0px;">
	    <div>{% include "header.html" %}</div>
    </body>
</html

header.html looks like this:

<div style="border: solid 1px;">something</div>

Attachments (3)

not.PNG (32.6 KB) - added by anonymous 6 years ago.
with extra space line on the top
fine.PNG (32.8 KB) - added by anonymous 6 years ago.
no space, ansii file encoding
website.zip (3.8 KB) - added by anonymous 6 years ago.
simple project

Download all attachments as: .zip

Change History (11)

Changed 6 years ago by anonymous

with extra space line on the top

Changed 6 years ago by anonymous

no space, ansii file encoding

comment:1 Changed 6 years ago by lukeplant

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

I'm afraid there is no evidence that you've found a bug in Django here. Obviously there must be a reason with Firefox displays the two pages differently, and the difference between the pages can probably be discovered using "view source".

I'm closing — re-open if you can show that Django is doing something it shouldn't. We would need the actual base.html and header.html files, along with the output generated. Perhaps your header.html files ends or starts with a new line, in which case the above will behave differently from simply modifying base.html to read. <div><div style="border: solid 1px;">something</div></div>

comment:2 Changed 6 years ago by anonymous

  • Resolution invalid deleted
  • Status changed from closed to reopened

Ok, I'm attaching simple simple project, which has all needed files to run. And if you run it without any changes, you should get invisible character, which gives extra line, but if you change header.html file encoding to ansii, the space will disappear. Nothing changes, except for that file encoding. It is the same in both, IE8 and FF, only FF firebug shows empty line, and IE8 developer tools show that character as a square. I suggest you, just try it and you should be able to see. I'm running windows xp, python 2.6.4 and latest stable django, which is 1.1.1

Changed 6 years ago by anonymous

simple project

comment:3 Changed 6 years ago by anonymous

BTW, looking to the pages source, in both browsers (IE8 and FF) shows the same output, and nothing special there:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>test</title>
    </head>
    <body style="margin: 0px; padding: 0px;">
	    <div><div style="border: solid 1px;">something</div>
</div>
    </body>
</html>

and if you copy this to simple text file and will try to save in ansii encoding, you will get an error, saying it has unicode symbols, which you can't see here...

comment:4 Changed 6 years ago by anonymous

This unicode char appears on the next line to body, between two first <div> tags. Clearly, there is nothing visible there, not even a space.. so try to copy it :)

comment:5 Changed 6 years ago by emulbreh

This is probably a byte order mark issue.

comment:6 Changed 6 years ago by kmtracey

The first bytes of your header.html file are EF BB BF. That's the UTF-8 encoding of the BOM (byte-order mark, see: http://en.wikipedia.org/wiki/Byte-order_mark). A BOM is neither required nor recommended for utf-8 encoded files. I'd guess the browsers are not expecting to find a BOM mid-stream and the extra line you see is a reaction to encountering this unexpected character. To get rid of it, use some tool that does not insert BOMs for utf-8 encoded files to change the file encoding.

There was a (very old) thread on django-dev that noted this behavior: http://groups.google.com/group/django-developers/browse_thread/thread/b19bd59d61a688b8/

It proposes noting, stripping, and relocating the BOM to the front of the ultimately rendered template. That seems overly complicated to me, at least for utf-8. But, the idea raises the question of what happens to included BOMs for files encoded with utf-16 or utf-32 (are these encodings used in practice?). Seems Django might have an issue there, especially if different included files had different endianness, but I don't have time to check in detail and it seems like a very contrived case. Perhaps Django could just strip the BOM from included template files?

comment:7 Changed 6 years ago by anonymous

Thanks for explaining, I never heard about this thing called BOM :) Ok, so now I know, that I shouldn't use notepad to change file encoding and, I guess, you can close this ticket.

Thanks again :)

comment:8 Changed 6 years ago by SmileyChris

  • Resolution set to invalid
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.
Back to Top