Opened 8 hours ago

Last modified 5 hours ago

#36747 new Bug

parse_duration() fails to parse valid ISO-8601 durations including years, months, and weeks due to incorrect regex — at Initial Version

Reported by: florianvazelle Owned by:
Component: Uncategorized Version: 5.2
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Description

django.utils.dateparse.parse_duration() claims to support ISO-8601 duration strings, but the current implementation only handles the PnDTnHnMnS subset. Valid ISO-8601 components such as years (Y), months (M), and weeks (W) are not supported.

The internal regex used for ISO-8601 durations:

iso8601_duration_re = _lazy_re_compile(
    r"^(?P<sign>[-+]?)"
    r"P"
    r"(?:(?P<days>\d+([.,]\d+)?)D)?"
    r"(?:T"
    r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
    r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
    r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
    r")?"
    r"$"
)

This means Django currently rejects valid duration strings such as:

P1Y2M
P3W
P1Y
P2M10DT2H

Despite the documentation suggesting ISO-8601 support, these forms cannot be parsed.

Steps to Reproduce

from django.utils.dateparse import parse_duration

parse_duration("P1Y")      # returns None
parse_duration("P2M")      # returns None
parse_duration("P3W")      # returns None
parse_duration("P1Y2M3DT4H")  # returns None

Expected Behavior

parse_duration() should parse all valid ISO-8601 durations that can be represented as timedelta, including:

  • PnW
  • PnYnMnD
  • ...

Django should capture each calendar unit, or clearly state limitations.

Proposed Fix

  1. Replace the current iso8601_duration_re with one that uses distinct group names for each ISO-8601 calendar unit:
iso8601_duration_re = _lazy_re_compile(
    r"^(?P<sign>[-+]?)"
    r"P"
    r"(?:(?P<years>\d+([.,]\d+)?)Y)?"
    r"(?:(?P<months>\d+([.,]\d+)?)M)?"
    r"(?:(?P<weeks>\d+([.,]\d+)?)W)?"
    r"(?:(?P<days>\d+([.,]\d+)?)D)?"
    r"(?:T"
    r"(?:(?P<hours>\d+([.,]\d+)?)H)?"
    r"(?:(?P<minutes>\d+([.,]\d+)?)M)?"
    r"(?:(?P<seconds>\d+([.,]\d+)?)S)?"
    r")?"
    r"$"
)
  1. Extend parse_duration() to convert these new fields to timedelta.
def parse_duration(value):
    match = (
        standard_duration_re.match(value)
        or iso8601_duration_re.match(value)
        or postgres_interval_re.match(value)
    )
    if match:
        kw = match.groupdict()
        sign = -1 if kw.pop("sign", "+") == "-" else 1
        if kw.get("microseconds"):
            kw["microseconds"] = kw["microseconds"].ljust(6, "0")
        kw = {k: float(v.replace(",", ".")) for k, v in kw.items() if v is not None}
        days = datetime.timedelta(kw.pop("days", 0.0) or 0.0)

        if match.re == iso8601_duration_re:
+           years = kw.pop("years", 0.0)
+           months = kw.pop("months", 0.0)
+           weeks = kw.pop("weeks", 0.0)
+            
+           days = datetime.timedelta(years=years, months=months, days=kw.pop("days", 0.0) + (weeks * 7))
            days *= sign

        return days + sign * datetime.timedelta(**kw)

I can provide a full patch (tests + implementation) if desired.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top