Opened 2 months ago

Last modified 4 weeks ago

#28628 assigned Cleanup/optimization

Audit for and abolish all use of '\d' in regexes

Reported by: James Bennett Owned by: JunyiJ
Component: Documentation Version: master
Severity: Normal Keywords:
Cc: Sergey Fedoseev Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by James Bennett)

Now that we're in the 2.0 release cycle and Python 3 only, any examples or code in Django using a \d in a regex should be replaced with [0-9], as \d on Python 3 matches any character with Unicode category [Nd], which is almost certainly not what people expect. Changing to explicit [0-9], perhaps with a note about why it should be preferred, would be better.

Change History (10)

comment:1 Changed 2 months ago by James Bennett

Here's a tentative list of places that should change, from grepping on current master:

django/contrib/gis/db/backends/postgis/operations.py:344:        proj_regex = re.compile(r'(\d+)\.(\d+)\.(\d+)')
django/contrib/gis/gdal/libgdal.py:87:version_regex = re.compile(r'^(?P<major>\d+)\.(?P<minor>\d+)(\.(?P<subminor>\d+))?')
django/contrib/gis/geometry.py:7:wkt_regex = re.compile(r'^(SRID=(?P<srid>\-?\d+);)?'
django/contrib/gis/geometry.py:11:                       r'[ACEGIMLONPSRUTYZ\d,\.\-\+\(\) ]+)$',
django/contrib/gis/geos/geometry.py:116:            match = re.match(b'SRID=(?P<srid>\-?\d+)', srid_part)
django/contrib/gis/templates/gis/admin/openlayers.js:4:{{ module }}.map = null; {{ module }}.controls = null; {{ module }}.panel = null; {{ module }}.re = new RegExp("^SRID=\\d+;(.+)", "i"); {{ module }}.layers = {};
django/contrib/humanize/templatetags/humanize.py:48:    new = re.sub(r"^(-?\d+)(\d{3})", r'\g<1>,\g<2>', orig)
django/core/management/commands/makemessages.py:402:        m = re.search(r'(\d+)\.(\d+)\.?(\d+)?', out)
django/core/management/commands/runserver.py:18:    (?P<ipv4>\d{1,3}(?:\.\d{1,3}){3}) |         # IPv4 address
django/core/management/commands/runserver.py:21:):)?(?P<port>\d+)$""", re.X)
django/core/validators.py:78:    ipv4_re = r'(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
django/core/validators.py:99:        r'(?::\d{2,5})?'  # port
django/core/validators.py:136:            host_match = re.search(r'^\[(.+)\](?::\d{2,5})?$', urlsplit(value).netloc)
django/core/validators.py:153:    _lazy_re_compile(r'^-?\d+\Z'),
django/core/validators.py:296:    regexp = _lazy_re_compile(r'^%(neg)s\d+(?:%(sep)s%(neg)s\d+)*\Z' % {
django/db/backends/mysql/base.py:49:server_version_re = re.compile(r'(\d{1,2})\.(\d{1,2})\.(\d{1,2})')
django/db/backends/sqlite3/introspection.py:8:field_size_re = re.compile(r'^\s*(?:var)?char\s*\(\s*(\d+)\s*\)\s*$')
django/db/migrations/autodetector.py:1233:        match = re.match(r'^\d+', name)
django/db/migrations/writer.py:170:            if re.match(r"^import (.*)\.\d+[^\s]*$", line):
django/forms/widgets.py:925:    date_re = re.compile(r'(\d{4}|0)-(\d\d?)-(\d\d?)$')
django/http/request.py:21:host_validation_re = re.compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(:\d+)?$")
django/template/base.py:606:    'num': r'[-+\.]?\d[\d\.e]*',
django/template/defaultfilters.py:238:    return re.sub(r"\d([A-Z])", lambda m: m.group(0).lower(), t)
django/test/client.py:34:CONTENT_TYPE_RE = re.compile(r'.*; charset=([\w\d-]+);?')
django/utils/dateparse.py:14:    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})$'
django/utils/dateparse.py:18:    r'(?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
django/utils/dateparse.py:19:    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
django/utils/dateparse.py:23:    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})'
django/utils/dateparse.py:24:    r'[T ](?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
django/utils/dateparse.py:25:    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
django/utils/dateparse.py:26:    r'(?P<tzinfo>Z|[+-]\d{2}(?::?\d{2})?)?$'
django/utils/dateparse.py:31:    r'(?:(?P<days>-?\d+) (days?, )?)?'
django/utils/dateparse.py:32:    r'((?:(?P<hours>-?\d+):)(?=\d+:\d+))?'
django/utils/dateparse.py:33:    r'(?:(?P<minutes>-?\d+):)?'
django/utils/dateparse.py:34:    r'(?P<seconds>-?\d+)'
django/utils/dateparse.py:35:    r'(?:\.(?P<microseconds>\d{1,6})\d{0,6})?'
django/utils/dateparse.py:44:    r'(?:(?P<days>\d+(.\d+)?)D)?'
django/utils/dateparse.py:46:    r'(?:(?P<hours>\d+(.\d+)?)H)?'
django/utils/dateparse.py:47:    r'(?:(?P<minutes>\d+(.\d+)?)M)?'
django/utils/dateparse.py:48:    r'(?:(?P<seconds>\d+(.\d+)?)S)?'
django/utils/dateparse.py:58:    r'(?:(?P<days>-?\d+) (days? ?))?'
django/utils/dateparse.py:60:    r'(?P<hours>\d+):'
django/utils/dateparse.py:61:    r'(?P<minutes>\d\d):'
django/utils/dateparse.py:62:    r'(?P<seconds>\d\d)'
django/utils/dateparse.py:63:    r'(?:\.(?P<microseconds>\d{1,6}))?'
django/utils/html.py:28:unencoded_ampersands_re = re.compile(r'&(?!(\w+|#\d+);)')
django/utils/http.py:30:__D = r'(?P<day>\d{2})'
django/utils/http.py:31:__D2 = r'(?P<day>[ \d]\d)'
django/utils/http.py:33:__Y = r'(?P<year>\d{4})'
django/utils/http.py:34:__Y2 = r'(?P<year>\d{2})'
django/utils/http.py:35:__T = r'(?P<hour>\d{2}):(?P<min>\d{2}):(?P<sec>\d{2})'
django/utils/translation/trans_real.py:35:        (?:\s*;\s*q=(0(?:\.\d{,3})?|1(?:\.0{,3})?))?  # Optional "q=1.00", "q=0.8"
django/views/i18n.py:249:        match = re.search(r'nplurals=\s*(\d+)', self._plural_string or '')

comment:2 Changed 2 months ago by James Bennett

Description: modified (diff)
Summary: Audit for and abolish all use of '\d' in URL patternsAudit for and abolish all use of '\d' in regexes

comment:3 Changed 2 months ago by James Bennett

A few more:

tests/gis_tests/geo3d/tests.py:218:        ref_kml_regex = re.compile(r'^<Point><coordinates>-95.363\d+,29.763\d+,18</coordinates></Point>$')
tests/gis_tests/geoapp/test_functions.py:108:                r'<gml:coordinates decimal="\." cs="," ts=" ">-104.60925\d+,38.25500\d+ '
tests/gis_tests/geoapp/test_functions.py:114:                r'-104\.60925\d+,38\.255001</gml:coordinates></gml:Point>'
tests/i18n/test_extraction.py:100:        (or `#: .\dirA\dirB\foo.py:42` on Windows)
tests/i18n/tests.py:524:            self.assertEqual(r'j \d\e F \d\e Y', get_format('DATE_FORMAT'))
tests/logging_tests/tests.py:527:            self.assertRegex(logger_output.getvalue(), r'^\[[-:,.\s\d]+\] %s' % log_msg)
tests/sitemaps_tests/urls/http.py:198:    url(r'^i18n/testmodel/(?P<id>\d+)/$', testmodelview, name='i18n_testmodel'),
tests/urlpatterns_reverse/urls.py:28:    url(r'^optional/(?P<arg1>\d+)/(?:(?P<arg2>\d+)/)?', absolute_kwargs_view, name="named_optional"),
tests/urlpatterns_reverse/urls.py:29:    url(r'^optional/(?P<arg1>\d+)/(?:(?P<arg2>\d+)/)?$', absolute_kwargs_view, name="named_optional_terminated"),

comment:4 Changed 2 months ago by Aymeric Augustin

Triage Stage: UnreviewedAccepted

Definitely a good idea! Also one of my pet peeves.

comment:5 Changed 8 weeks ago by Sergey Fedoseev

Cc: Sergey Fedoseev added

comment:6 Changed 8 weeks ago by JunyiJ

Owner: changed from nobody to JunyiJ
Status: newassigned

comment:7 Changed 8 weeks ago by Sergey Fedoseev

I'm not sure if this change should be done for bytes, also the usage of https://docs.python.org/3/library/re.html#re.ASCII flag could be better alternative to explicit [0-9].

comment:8 Changed 8 weeks ago by JunyiJ

I searched and replace '/d' using bash and the following places are changed:

django/django/http/request.py:21:host_validation_re = re.compile(r"^([a-z0-9.-]+|\[[a-f0-9]*:[a-f0-9\.:]+\])(:\d+)?$")
django/django/contrib/gis/geometry.py:7:wkt_regex = re.compile(r'^(SRID=(?P<srid>\-?\d+);)?'
django/django/contrib/gis/gdal/libgdal.py:87:version_regex = re.compile(r'^(?P<major>\d+)\.(?P<minor>\d+)(\.(?P<subminor>\d+))?')
django/django/contrib/gis/geos/geometry.py:116:            match = re.match(b'SRID=(?P<srid>\-?\d+)', srid_part)
django/django/contrib/gis/db/backends/postgis/operations.py:344:        proj_regex = re.compile(r'(\d+)\.(\d+)\.(\d+)')
django/django/contrib/humanize/templatetags/humanize.py:48:    new = re.sub(r"^(-?\d+)(\d{3})", r'\g<1>,\g<2>', orig)
django/django/core/management/commands/makemessages.py:402:        m = re.search(r'(\d+)\.(\d+)\.?(\d+)?', out)
django/django/core/validators.py:78:    ipv4_re = r'(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
django/django/core/validators.py:99:        r'(?::\d{2,5})?'  # port
django/django/core/validators.py:136:            host_match = re.search(r'^\[(.+)\](?::\d{2,5})?$', urlsplit(value).netloc)
django/django/core/validators.py:153:    _lazy_re_compile(r'^-?\d+\Z'),
django/django/core/validators.py:296:    regexp = _lazy_re_compile(r'^%(neg)s\d+(?:%(sep)s%(neg)s\d+)*\Z' % {
django/django/template/defaultfilters.py:238:    return re.sub(r"\d([A-Z])", lambda m: m.group(0).lower(), t)
django/django/views/i18n.py:249:        match = re.search(r'nplurals=\s*(\d+)', self._plural_string or '')
django/django/utils/dateparse.py:14:    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})$'
django/django/utils/dateparse.py:18:    r'(?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
django/django/utils/dateparse.py:19:    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
django/django/utils/dateparse.py:23:    r'(?P<year>\d{4})-(?P<month>\d{1,2})-(?P<day>\d{1,2})'
django/django/utils/dateparse.py:24:    r'[T ](?P<hour>\d{1,2}):(?P<minute>\d{1,2})'
django/django/utils/dateparse.py:25:    r'(?::(?P<second>\d{1,2})(?:\.(?P<microsecond>\d{1,6})\d{0,6})?)?'
django/django/utils/dateparse.py:26:    r'(?P<tzinfo>Z|[+-]\d{2}(?::?\d{2})?)?$'
django/django/utils/dateparse.py:31:    r'(?:(?P<days>-?\d+) (days?, )?)?'
django/django/utils/dateparse.py:32:    r'((?:(?P<hours>-?\d+):)(?=\d+:\d+))?'
django/django/utils/dateparse.py:33:    r'(?:(?P<minutes>-?\d+):)?'
django/django/utils/dateparse.py:34:    r'(?P<seconds>-?\d+)'
django/django/utils/dateparse.py:35:    r'(?:\.(?P<microseconds>\d{1,6})\d{0,6})?'
django/django/utils/dateparse.py:44:    r'(?:(?P<days>\d+(.\d+)?)D)?'
django/django/utils/dateparse.py:46:    r'(?:(?P<hours>\d+(.\d+)?)H)?'
django/django/utils/dateparse.py:47:    r'(?:(?P<minutes>\d+(.\d+)?)M)?'
django/django/utils/dateparse.py:48:    r'(?:(?P<seconds>\d+(.\d+)?)S)?'
django/django/utils/dateparse.py:58:    r'(?:(?P<days>-?\d+) (days? ?))?'
django/django/utils/dateparse.py:60:    r'(?P<hours>\d+):'
django/django/utils/dateparse.py:61:    r'(?P<minutes>\d\d):'
django/django/utils/dateparse.py:62:    r'(?P<seconds>\d\d)'
django/django/utils/dateparse.py:63:    r'(?:\.(?P<microseconds>\d{1,6}))?'
django/django/utils/http.py:30:__D = r'(?P<day>\d{2})'
django/django/utils/http.py:31:__D2 = r'(?P<day>[ \d]\d)'
django/django/utils/http.py:33:__Y = r'(?P<year>\d{4})'
django/django/utils/http.py:34:__Y2 = r'(?P<year>\d{2})'
django/django/utils/http.py:35:__T = r'(?P<hour>\d{2}):(?P<min>\d{2}):(?P<sec>\d{2})'
django/django/utils/html.py:27:unencoded_ampersands_re = re.compile(r'&(?!(\w+|#\d+);)')
django/django/test/client.py:34:CONTENT_TYPE_RE = re.compile(r'.*; charset=([\w\d-]+);?')
django/django/forms/widgets.py:925:    date_re = re.compile(r'(\d{4}|0)-(\d\d?)-(\d\d?)$')
django/django/db/backends/sqlite3/introspection.py:8:field_size_re = re.compile(r'^\s*(?:var)?char\s*\(\s*(\d+)\s*\)\s*$')
django/django/db/backends/mysql/base.py:49:server_version_re = re.compile(r'(\d{1,2})\.(\d{1,2})\.(\d{1,2})')
django/django/db/migrations/writer.py:170:            if re.match(r"^import (.*)\.\d+[^\s]*$", line):
django/django/db/migrations/autodetector.py:1233:        match = re.match(r'^\d+', name)
django/tests/i18n/tests.py:524:            self.assertEqual(r'j \d\e F \d\e Y', get_format('DATE_FORMAT'))
django/tests/logging_tests/tests.py:527:            self.assertRegex(logger_output.getvalue(), r'^\[[-:,.\s\d]+\] %s' % log_msg)
django/tests/gis_tests/geo3d/tests.py:218:        ref_kml_regex = re.compile(r'^<Point><coordinates>-95.363\d+,29.763\d+,18</coordinates></Point>$')
django/tests/sitemaps_tests/urls/http.py:198:    url(r'^i18n/testmodel/(?P<id>\d+)/$', testmodelview, name='i18n_testmodel'),
django/tests/urlpatterns_reverse/urls.py:28:    url(r'^optional/(?P<arg1>\d+)/(?:(?P<arg2>\d+)/)?', absolute_kwargs_view, name="named_optional"),
django/tests/urlpatterns_reverse/urls.py:29:    url(r'^optional/(?P<arg1>\d+)/(?:(?P<arg2>\d+)/)?$', absolute_kwargs_view, name="named_optional_terminated"),

However, when I sent a pull request, I found the checks are not successful. I am still trying to figure it out. Help are welcome**

comment:9 in reply to:  7 Changed 8 weeks ago by JunyiJ

Based on https://docs.python.org/3/library/re.html:

\d
For Unicode (str) patterns:

Matches any Unicode decimal digit (that is, any character in Unicode character category [Nd]). This includes [0-9], and also many other digit characters. If the ASCII flag is used only [0-9] is matched (but the flag affects the entire regular expression, so in such cases using an explicit [0-9] may be a better choice).

I think using [0-9] may be better than ASCII. Correct me if I am wrong.

Replying to Sergey Fedoseev:

I'm not sure if this change should be done for bytes, also the usage of https://docs.python.org/3/library/re.html#re.ASCII flag could be better alternative to explicit [0-9].

comment:10 Changed 4 weeks ago by Tim Graham

Has patch: set
Needs tests: set

PR. I think tests should be added to demonstrate the benefit of each change.

Note: See TracTickets for help on using tickets.
Back to Top