Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#25620 closed Bug (fixed)

URLValidator regex does not trigger on consecutive periods

Reported by: Sully Owned by: nobody
Component: Core (Other) Version: 1.8
Severity: Normal Keywords:
Cc: dheeru.rathor14@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The regular expression for URLValidator accepts consecutive periods as valid. This bug was introduced in 1.8.3.

Steps to Reproduce

>>> from django.core.validators import URLValidator
>>> validate = URLValidator()
>>> validate('http://example..com')
>>> validate('http://example...............com')

Expected Result

A ValidationError exception should be raised.

Current Result

No exception is raised, and the URL is deemed valid.

Reference

RFC 2181:

The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators). The zero length full name is defined as representing the root of the DNS tree, and is typically written and displayed as ".".

Attachments (1)

0001-Fixed-25620-Changed-to-in-domain-name-regex.patch (1.4 KB ) - added by Dheerendra Rathor 9 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 by Dheerendra Rathor, 9 years ago

Here are the relevant code part from URLValidator (this is from the master branch)

hostname_re = r'[a-z' + ul + r'0-9](?:[a-z' + ul + r'0-9-]*[a-z' + ul + r'0-9])?'
domain_re = r'(?:\.(?!-)[a-z' + ul + r'0-9-]*(?<!-))*'
tld_re = r'\.(?:[a-z' + ul + r']{2,}|xn--[a-z0-9]+)\.?'
host_re = '(' + hostname_re + domain_re + tld_re + '|localhost)'

The culprit is domain_re which allows multiple dots due to '*' on [a-z0-9] part. Changing * to + should solve the problem.

comment:2 by Dheerendra Rathor, 9 years ago

Triage Stage: UnreviewedAccepted

comment:3 by Tim Graham, 9 years ago

Has patch: set

Could you submit the patch as a pull request?

comment:4 by Dheerendra Rathor, 9 years ago

Recently I've also noted that scheme regex is r'^(?:[a-z0-9\.\-]*)://' but it should have been r'^(?:[a-z0-9\.\-\+]+)://' according to rfc1738. I'll update my PR soon.

comment:5 by Dheerendra Rathor, 9 years ago

Cc: dheeru.rathor14@… added

comment:6 by Dheerendra Rathor, 9 years ago

Also current regex are not handling label limit of 63 characters and total limit of 253 characters. Should I modify regex to handle them as well?

comment:7 by Tim Graham, 9 years ago

It's probably better to handle each issue separately. Otherwise, it's difficult to determine which change matches which test.

Making the regex more complex must be done very carefully to avoid issues like 17d3a6d8044752f482453f5906026eaf12c39e8e.

comment:8 by Dheerendra Rathor, 9 years ago

Cool, then I'll modify regex for domain name and scheme. For length validation I'll open another ticket.

comment:9 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: newclosed

In 96fe90f:

Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

comment:10 by Tim Graham <timograham@…>, 9 years ago

In 6bb9f51:

[1.9.x] Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

Backport of 96fe90f5356971e0e51a0bc41e045dde600d7521 from master

comment:11 by Tim Graham <timograham@…>, 9 years ago

In 540de2f:

[1.8.x] Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

Backport of 96fe90f5356971e0e51a0bc41e045dde600d7521 from master

Note: See TracTickets for help on using tickets.
Back to Top