Opened 8 hours ago
Last modified 5 hours ago
#36747 new Bug
parse_duration() fails to parse valid ISO-8601 durations including years, months, and weeks due to incorrect regex — at Initial Version
| Reported by: | florianvazelle | Owned by: | |
|---|---|---|---|
| Component: | Uncategorized | Version: | 5.2 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Description
django.utils.dateparse.parse_duration() claims to support ISO-8601 duration strings, but the current implementation only handles the PnDTnHnMnS subset. Valid ISO-8601 components such as years (Y), months (M), and weeks (W) are not supported.
The internal regex used for ISO-8601 durations:
iso8601_duration_re = _lazy_re_compile(
r"^(?P<sign>[-+]?)"
r"P"
r"(?:(?P<days>\d+([.,]\d+)?)D)?"
r"(?:T"
r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
r")?"
r"$"
)
This means Django currently rejects valid duration strings such as:
P1Y2M P3W P1Y P2M10DT2H
Despite the documentation suggesting ISO-8601 support, these forms cannot be parsed.
Steps to Reproduce
from django.utils.dateparse import parse_duration
parse_duration("P1Y") # returns None
parse_duration("P2M") # returns None
parse_duration("P3W") # returns None
parse_duration("P1Y2M3DT4H") # returns None
Expected Behavior
parse_duration() should parse all valid ISO-8601 durations that can be represented as timedelta, including:
PnWPnYnMnD- ...
Django should capture each calendar unit, or clearly state limitations.
Proposed Fix
- Replace the current
iso8601_duration_rewith one that uses distinct group names for each ISO-8601 calendar unit:
iso8601_duration_re = _lazy_re_compile(
r"^(?P<sign>[-+]?)"
r"P"
r"(?:(?P<years>\d+([.,]\d+)?)Y)?"
r"(?:(?P<months>\d+([.,]\d+)?)M)?"
r"(?:(?P<weeks>\d+([.,]\d+)?)W)?"
r"(?:(?P<days>\d+([.,]\d+)?)D)?"
r"(?:T"
r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
r")?"
r"$"
)
- Extend
parse_duration()to convert these new fields to timedelta.
def parse_duration(value):
match = (
standard_duration_re.match(value)
or iso8601_duration_re.match(value)
or postgres_interval_re.match(value)
)
if match:
kw = match.groupdict()
sign = -1 if kw.pop("sign", "+") == "-" else 1
if kw.get("microseconds"):
kw["microseconds"] = kw["microseconds"].ljust(6, "0")
kw = {k: float(v.replace(",", ".")) for k, v in kw.items() if v is not None}
days = datetime.timedelta(kw.pop("days", 0.0) or 0.0)
if match.re == iso8601_duration_re:
+ years = kw.pop("years", 0.0)
+ months = kw.pop("months", 0.0)
+ weeks = kw.pop("weeks", 0.0)
+
+ days = datetime.timedelta(years=years, months=months, days=kw.pop("days", 0.0) + (weeks * 7))
days *= sign
return days + sign * datetime.timedelta(**kw)
I can provide a full patch (tests + implementation) if desired.