Code

Opened 12 months ago

Closed 7 months ago

Last modified 7 months ago

#20364 closed Bug (fixed)

urlize incorrectly includes quotation marks in links

Reported by: EmilStenstrom Owned by: anonymous
Component: Template system Version: master
Severity: Normal Keywords: urlize, quotes
Cc: piwonski.kris@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

>>> urlize("some text here \"hi@example.com\" and some text afterwards")
u'some text here <a href="mailto:"hi@example.com"">"hi@example.com"</a> and some text afterwards'

Note the extra quotes inside the links.

Luckily, this is solved quite easily, just do this:

>>> from django.utils import html
>>> html.TRAILING_PUNCTUATION += ['"', '\'']
>>> html.WRAPPING_PUNCTUATION += [('"', '"'), ('\'', '\'')]
>>> urlize("some text here \"hi@example.com\" and some text afterwards")
u'some text here "<a href="mailto:hi@example.com">hi@example.com</a>" and some text afterwards'

I suggest quotation marks are included in the *_PUNCTUATION variables in django.utils.html.

Attachments (1)

20364.diff (1.3 KB) - added by Chris Piwoński <piwonski.kris@…> 11 months ago.
ticket patch

Download all attachments as: .zip

Change History (18)

comment:1 Changed 12 months ago by EmilStenstrom

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I don't know if this matters, but django-rest-framework has both of the changes above added in their version of urlize: https://github.com/glic3rinu/django-rest-framework/commit/2c0363ddaec22ac54385f7e0c2e1401ed3ff0879

comment:2 Changed 12 months ago by EmilStenstrom

Similar ticket, that has to do with JSON somehow... https://code.djangoproject.com/ticket/17576

comment:3 Changed 11 months ago by wim@…

  • Triage Stage changed from Unreviewed to Accepted

Hi Emil,

Thanks for reporting! Clear that we get erroneous output here. You are welcome to write a patch, if you want to.

Changed 11 months ago by Chris Piwoński <piwonski.kris@…>

ticket patch

comment:4 Changed 11 months ago by Chris Piwoński <piwonski.kris@…>

  • Cc piwonski.kris@… added
  • Has patch set
  • Owner changed from nobody to anonymous
  • Status changed from new to assigned
  • Triage Stage changed from Accepted to Ready for checkin

comment:5 Changed 11 months ago by EmilStenstrom

@Chris Piwoński: Thanks for a patch!

Can i suggest you add a few more tests? The first only works with the TRAILING_PUNCTUATION fix. The second seems to work with the WRAPPING_PUNCTUATION fix.

>>> html.urlize("some text here hi@example.com\" and some text afterwards")
u'some text here <a href="mailto:hi@example.com"">hi@example.com"</a> and some text afterwards'
>>> html.urlize("some text here \"hi@example.com and some text afterwards")
u'some text here <a href="mailto:"hi@example.com">"hi@example.com</a> and some text afterwards'

And maybe some tests for the single quote case too?

Version 0, edited 11 months ago by EmilStenstrom (next)

comment:6 Changed 11 months ago by Chris Piwoński <piwonski.kris@…>

Made tests for all six cases.

Changes are on github: https://github.com/kriestof/django/tree/ticket_20364

comment:7 Changed 11 months ago by claudep

  • Triage Stage changed from Ready for checkin to Accepted

Sorry, but the RFC flag should never be set by the patch author.

comment:8 Changed 11 months ago by EmilStenstrom

  • Triage Stage changed from Accepted to Ready for checkin

This looks great Chris. The new tests pass, and the code looks good.

comment:9 Changed 11 months ago by aaugustin

If you read #17576, which is linked from above, it provides you with a failing test case: adding a comma after the quote.

For instance I'm almost sure this doesn't work with the current patch.

urlize('Email us at "hi@example.com", or phone us at +xx.yy').

The problem here is that urlize cannot deal with multiple punctuation, and quotes are often followed by other punctuation. I'd rather not merge a half-broken feature, and in my humble opinion this ticket should be closed wontfix like #17576.

That said, urlize seems to attract tons of contributions, so maybe it's a very important part of Django and I should stop wontfixing such tickets... I'll let another committer make the decision!

Last edited 11 months ago by aaugustin (previous) (diff)

comment:10 Changed 11 months ago by KJ

In case this ticket isn’t marked as wontfix, shouldn’t we also support localized quotation marks like „” and «»? There is a list of them available on Wikipedia.

comment:11 Changed 8 months ago by akaariai

  • Triage Stage changed from Ready for checkin to Accepted

I am not that good with HTML stuff, so I am not going to make wontfix call... But in any case this isn't Ready for checkin...

comment:12 Changed 7 months ago by GilesGreenway

aaugustin's example is in fact passed with flying colours after applying the patch:

>>> urlize('Email us at "hi@example.com", or phone us at +xx.yy')
u'Email us at "<a href="mailto:hi@example.com">hi@example.com</a>", or phone us at +xx.yy'
>>> 
>>> urlize('Visit us at "http://www.example.com", or phone us at +xx.yy')
u'Visit us at "<a href="http://www.example.com">http://www.example.com</a>", or phone us at +xx.yy'

I'll see if I can get a test written. (I'm a newbie contributor at the PyConUK Django sprint.)

comment:13 Changed 7 months ago by GilesGreenway

Test pushed, pull request made:

https://github.com/django/django/pull/1662

comment:14 Changed 7 months ago by EvilDMP

  • Triage Stage changed from Accepted to Ready for checkin

I've tested the patch at https://github.com/django/django/pull/1662 locally and it looks good to me. Commit/merge message needs to thank EmilStenstrom for the report and Chris Piwoński for the first patch, but other than that I think it's ready for checkin.

comment:15 Changed 7 months ago by GilesGreenway

Done. 'Hope this is suitably congratulatory:

https://github.com/django/django/pull/1662/commits

comment:16 Changed 7 months ago by Florian Apolloner <florian@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 6c06adad1dc45631c1c220d7f8fb531a9cf3ed55:

Fixed #20364 -- Changed urlize regexes to include quotation marks as punctation.

Thanks to EmilStenstrom for raising this, and to Chris Piwoński for all of the fixes and most of the tests.

comment:17 Changed 7 months ago by EmilStenstrom

I think far to few Ticket threads end with the reporter wrapping up the thread, so here goes:

YOU ARE AWESOME :D

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.