Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#25620 closed Bug (fixed)

URLValidator regex does not trigger on consecutive periods

Reported by: Sully Owned by: nobody
Component: Core (Other) Version: 1.8
Severity: Normal Keywords:
Cc: dheeru.rathor14@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The regular expression for URLValidator accepts consecutive periods as valid. This bug was introduced in 1.8.3.

Steps to Reproduce

>>> from django.core.validators import URLValidator
>>> validate = URLValidator()
>>> validate('http://example..com')
>>> validate('http://example...............com')

Expected Result

A ValidationError exception should be raised.

Current Result

No exception is raised, and the URL is deemed valid.

Reference

RFC 2181:

The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators). The zero length full name is defined as representing the root of the DNS tree, and is typically written and displayed as ".".

Attachments (1)

0001-Fixed-25620-Changed-to-in-domain-name-regex.patch (1.4 KB) - added by Dheerendra Rathor 4 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 4 years ago by Dheerendra Rathor

Here are the relevant code part from URLValidator (this is from the master branch)

hostname_re = r'[a-z' + ul + r'0-9](?:[a-z' + ul + r'0-9-]*[a-z' + ul + r'0-9])?'
domain_re = r'(?:\.(?!-)[a-z' + ul + r'0-9-]*(?<!-))*'
tld_re = r'\.(?:[a-z' + ul + r']{2,}|xn--[a-z0-9]+)\.?'
host_re = '(' + hostname_re + domain_re + tld_re + '|localhost)'

The culprit is domain_re which allows multiple dots due to '*' on [a-z0-9] part. Changing * to + should solve the problem.

comment:2 Changed 4 years ago by Dheerendra Rathor

Triage Stage: UnreviewedAccepted

Changed 4 years ago by Dheerendra Rathor

comment:3 Changed 4 years ago by Tim Graham

Has patch: set

Could you submit the patch as a pull request?

comment:4 Changed 4 years ago by Dheerendra Rathor

Recently I've also noted that scheme regex is r'^(?:[a-z0-9\.\-]*)://' but it should have been r'^(?:[a-z0-9\.\-\+]+)://' according to rfc1738. I'll update my PR soon.

comment:5 Changed 4 years ago by Dheerendra Rathor

Cc: dheeru.rathor14@… added

comment:6 Changed 4 years ago by Dheerendra Rathor

Also current regex are not handling label limit of 63 characters and total limit of 253 characters. Should I modify regex to handle them as well?

comment:7 Changed 4 years ago by Tim Graham

It's probably better to handle each issue separately. Otherwise, it's difficult to determine which change matches which test.

Making the regex more complex must be done very carefully to avoid issues like 17d3a6d8044752f482453f5906026eaf12c39e8e.

comment:8 Changed 4 years ago by Dheerendra Rathor

Cool, then I'll modify regex for domain name and scheme. For length validation I'll open another ticket.

comment:9 Changed 4 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 96fe90f:

Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

comment:10 Changed 4 years ago by Tim Graham <timograham@…>

In 6bb9f51:

[1.9.x] Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

Backport of 96fe90f5356971e0e51a0bc41e045dde600d7521 from master

comment:11 Changed 4 years ago by Tim Graham <timograham@…>

In 540de2f:

[1.8.x] Fixed #25620 -- Made URLValidator prohibit URLs with consecutive dots in the domain section.

Backport of 96fe90f5356971e0e51a0bc41e045dde600d7521 from master

Note: See TracTickets for help on using tickets.
Back to Top