Opened 16 years ago

Closed 16 years ago

Last modified 8 years ago

#7355 closed (fixed)

Urlize function in django.utils.html does not properly work on https:// links

Reported by: clint Owned by: nobody
Component: Core (Other) Version: dev
Severity: Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description (last modified by clint)

Just tested this out on the version of utils/html.py in [7569]

urlize() incorrectly adds a http:// to the beginning of a https link

To replicate:

>>> from django.utils.html import urlize
>>> words = "Hi there https://www.google.com"
>>> urlize(words)
u'Hi there <a href="http://https://www.google.com">http://https://www.google.com</a>'

I did a search in trac for "urlize https" and came up with no hits. As far as I can tell this bug has been around for a while, not sure why no one would've caught this.

Patch:
===================================================================
--- html.py	(revision 7569)
+++ html.py	(working copy)
@@ -99,7 +99,7 @@
             lead, middle, trail = match.groups()
             if safe_input:
                 middle = mark_safe(middle)
-            if middle.startswith('www.') or ('@' not in middle and not middle.startswith('http://') and \
+            if middle.startswith('www.') or ('@' not in middle and not (middle.startswith('http://') or middle.startswith('https://')) and \
                     len(middle) > 0 and middle[0] in string.ascii_letters + string.digits and \
                     (middle.endswith('.org') or middle.endswith('.net') or middle.endswith('.com'))):
                 middle = 'http://%s' % middle

And a test to check for this

Index: tests.py
===================================================================
--- tests.py	(revision 7569)
+++ tests.py	(working copy)
@@ -166,6 +166,11 @@
 >>> urlizetrunc(uri, 2)
 u'<a href="http://31characteruri.com/test/" rel="nofollow">...</a>'
 
+# Check normal urlize
+>>> url = 'https://google.com'
+>>> urlize(url)
+u'<a href="https://google.com" rel="nofollow">https://google.com</a>'
+
 >>> wordcount('')
 0

Attachments (3)

html.py.diff (813 bytes ) - added by clint 16 years ago.
The patch for html.py
tests.py.diff (535 bytes ) - added by clint 16 years ago.
Update for tests
urlize-https-fix-tests.diff (1.3 KB ) - added by clint 16 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 by clint, 16 years ago

Description: modified (diff)

Uh.. sorry for the horrible formatting...

v

comment:2 by Collin Grady, 16 years ago

although it only seems to do this if you leave off the trailing slash

>>> urlize("https://www.google.com/")                                                                                                                                                     
u'<a href="https://www.google.com/" rel="nofollow">https://www.google.com/</a>'
>>> urlize("https://www.google.com")                                                                                                                                                                     
u'<a href="http://https://www.google.com" rel="nofollow">http://https://www.google.com</a>'

comment:3 by clint, 16 years ago

Could this mean that the real root of the issue is that the regexp that splits urls into lead,middle,and trailing doesn't work right if there's no trailing slash?

comment:4 by clint, 16 years ago

Actually it looks like if you have a trailing slash on your url, then:

middle = 'https://google.com/'

and the check done on line 104-106 fails:

            if middle.startswith('www.') or ('@' not in middle and not middle.startswith('http://') and \
                    len(middle) > 0 and middle[0] in string.ascii_letters + string.digits and \
                    (middle.endswith('.org') or middle.endswith('.net') or middle.endswith('.com'))):

And it then matches the next set of conditions on line 108 and is handled properly:

if middle.startswith('http://') or middle.startswith('https://'):

However if you URL has no slash at the end of the URL:

middle = 'https://google.com'

Then you do not fail the first chunk (which it should, but does not because it only checks for http://), and an http:// is appended to the front of the valid https:// URL. So it would appear that my patch looks like it probably is the correct solution.

comment:5 by Simon Greenhill, 16 years ago

Triage Stage: UnreviewedAccepted

Clint, this looks good - thanks for tracking it down. Can you attach the patches as diffs (svn diff) and promote to ready for checkin?

by clint, 16 years ago

Attachment: html.py.diff added

The patch for html.py

by clint, 16 years ago

Attachment: tests.py.diff added

Update for tests

comment:6 by clint, 16 years ago

Triage Stage: AcceptedReady for checkin

I've added them and I'm moving this to rdy for checkin

comment:7 by Marc Fargas, 16 years ago

You post a single diff with everything in it.

by clint, 16 years ago

Attachment: urlize-https-fix-tests.diff added

comment:8 by clint, 16 years ago

Doh. Sorry about that. I cat'd them together. I hope that's similar to the format you need. (I don't have the tests fix deployed on the version of Django I'd updated, so I can't just pop out a single svn diff right now).

comment:9 by Marc Fargas, 16 years ago

milestone: 1.0

Thanks that makes things easier for commiters later as they only have to review and apply the latest patch in the ticket ;)

Marking 1.0 as this is a bug.

comment:10 by Russell Keith-Magee, 16 years ago

Resolution: fixed
Status: newclosed

(In [7701]) Fixed #7355 -- Modified urlize utility to handle https:// addresses. Thanks for the report and patch, clint.

comment:11 by Jacob, 13 years ago

milestone: 1.0

Milestone 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top