Context Navigation

← Previous Ticket
Next Ticket →

#20568 closed Bug (fixed)

templatetag truncatewords_html split words containing HTML entities

Reported by:	yann0@…	Owned by:	Jaap Roes
Component:	Utilities	Version:	dev
Severity:	Normal	Keywords:
Cc:	bmispelon@…	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

I'm working with a Englsih / French website and when I use truncatewords_html with french texts with special caracters like "é,è,à, etc." (which is very common), it split words in half at thoses caracters.

Example:
Depuis mars 2008, le programme RECYC-FRIGO d’Hydro-Québec vous permet de vous débarrasser d’un vieil appareil, réfrigérateur ou congélateur, facilement [...]

become:
Depuis mars 2008, le programme RECYC-FRIGO d’Hydro-Québec vous permet de vous débarrasser d’un vieil appareil, r ...

Change History (6)

comment:1 by Baptiste Mispelon, 12 years ago

Resolution:	→ worksforme
Status:	new → closed

Hi,

I cannot reproduce the issue you're describing.
I tried the following code with both 1.3 and master but it seems to be working correctly for me:

>>> from django.template.defaultfilters import truncatewords_html
>>> s = u"Depuis mars 2008, le programme RECYC-FRIGO d’Hydro-Québec vous permet de vous débarrasser d’un vieil appareil, réfrigérateur ou congélateur, facilement [...]"
>>> truncatewords_html(s, 18)
u'Depuis mars 2008, le programme RECYC-FRIGO d\u2019Hydro-Qu\xe9bec vous permet de vous d\xe9barrasser d\u2019un vieil appareil, r\xe9frig\xe9rateur ...'

I'm closing this ticket as worksforme.
Could you please reopen it with an example of a piece of code that shows the issue you're having?

Thanks.

comment:2 by Jaap Roes, 12 years ago

Resolution:	worksforme
Status:	closed → new

I can reproduce it, but only if I convert the special characters to html entities first. Think that might be the actual cause:

>>> s = u'Depuis mars 2008, le programme RECYC-FRIGO d\u2019Hydro-Qu&eacute;bec vous permet de vous d&eacute;barrasser d\u2019un vieil appareil, r&eacute;frig&eacute;rateur ou cong&eacute;lateur, facilement'
>>> truncatewords_html(s, 8)
u'Depuis mars 2008, le programme RECYC-FRIGO d\u2019Hydro-Qu ...'

comment:3 by Baptiste Mispelon, 12 years ago

Cc:	bmispelon@… added
Summary:	templatetag truncatewords_html split words on special caracters → templatetag truncatewords_html split words containing HTML entities
Triage Stage:	Unreviewed → Accepted
Version:	1.3 → master

Hi,

Thanks for reopening this, there does appear to be an issue.

I made some quick tests and it seems that this behavior has always been present.

The problem seems to be that the regexp used to split words [1] doesn't consider a & to be part of a word, hence the behavior.

comment:4 by Jaap Roes, 12 years ago

What about converting html entities back to chars before the regex? Just whipped up a quick proof of concept that seems to work fine (and uses just stdlib code)

>>> import xml.sax.saxutils
>>> import htmlentitydefs
>>> entity2unicode = dict([('&%s;' % k, unichr(v)) for k, v in htmlentitydefs.name2codepoint.items()])
>>> truncatewords_html(xml.sax.saxutils.unescape(s, entity2unicode), 8)
u'Depuis mars 2008, le programme RECYC-FRIGO d\u2019Hydro-Qu\xe9bec ...'

comment:5 by Jaap Roes, 12 years ago

Has patch:	set
Owner:	changed from nobody to Jaap Roes
Status:	new → assigned

Noticed that the django.utils.text module already had an unescape_entities function. So I created this pull request:

https://github.com/django/django/pull/1332

comment:6 by Tim Graham <timograham@…>, 12 years ago

Resolution:	→ fixed
Status:	assigned → closed

In 40b95a24ae159b6600457a23d6c2779a18037b7b:

Fixed #20568 -- truncatewords_html no longer splits words containing HTML entities.

Thanks yann0 at hotmail.com for the report.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#20568 closed Bug (fixed)

templatetag truncatewords_html split words containing HTML entities

Description

Change History (6)

comment:1 by Baptiste Mispelon, 12 years ago

comment:2 by Jaap Roes, 12 years ago

comment:3 by Baptiste Mispelon, 12 years ago

comment:4 by Jaap Roes, 12 years ago

comment:5 by Jaap Roes, 12 years ago

comment:6 by Tim Graham <timograham@…>, 12 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us