id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
32892	Optimisation for parse_datetime by preferring datetime.fromisoformat for well-formed values.	Keryn Knight	Keryn Knight	"With 4.0+ having a minimum supported version of python `3.8` an opportunity to speedup the `parse_datetime` (and possibly `parse_date`) function may exist.

As of python 3.7, the `datetime` class includes a `fromisoformat` method which handles the happy path much faster than the regex based method used by `parse_datetime`

The values it cannot parse, from the whole Django test suite, are given as examples below (duplicates remain). For those cases (eg: the recently handled #32727) it still requires going through the regex variant:

{{{
'2010-1-10 14:49:00'
'2006-10-25 14:30:45.0002'
'2014-09-23T22:34Z'
'2014-09-23T22:34Z'
'2014-09-23T28:23'
'2011-09-01T10:20:30Z'
'2012-4-9 4:8:16'
'2012-04-23T09:15:00Z'
'2012-4-9 4:8:16-0320'
'2012-04-23T10:20:30.400+02'
'2012-04-23T10:20:30.400-02'
'2012-04-23T10:20:30,400-02'
'2012-04-23T10:20:30.400 +0230'
'2012-04-23T10:20:30,400 +00'
'2012-04-23T10:20:30   -02'
'2012-04-56T09:15:90'
'2011-10-32 10:10'
}}}

The only difference in output as far as I can tell is that the `timezone` instance attached to the `tzinfo` parameter sets the `_name` attribute to the fixed offset (`str`) where the stdlib one doesn't set it, and thus it's `None`. I think that's fine, but YMMV.

On the happy path, using the regex version of `parse_datetime` as is:
{{{
'2021-06-30 10:42:12.638719'        -> 5.99 µs ± 88.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'2008-04-02 12:00:00'               -> 5.57 µs ± 50.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'2011-11-04T00:05:23+04:00'         -> 8.39 µs ± 98.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'2011-11-04 00:05:23.283+00:00'     -> 8.81 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}

vs using `fromisoformat`:
{{{
'2021-06-30 10:42:12.638719'        -> 227 ns ± 0.935 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each
'2008-04-02 12:00:00'               -> 216 ns ± 2.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2011-11-04T00:05:23+04:00'         -> 266 ns ± 0.612 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2011-11-04 00:05:23.283+00:00'     -> 234 ns ± 2.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
}}}

when combining both, such as:

{{{
def parse_datetime(value):
    """""" ... """"""
    try:
        return datetime.datetime.fromisoformat(value)
    except (TypeError, ValueError):
        match = datetime_re.match(value)
        if match:
            ...
}}}

it's easy enough to spot which formats end up hitting the regex case:

{{{
'2021-06-30 10:42:12.638719'        -> 230 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2008-04-02 12:00:00'               -> 215 ns ± 2.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2011-11-04T00:05:23+04:00'         -> 270 ns ± 4.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2011-11-04 00:05:23.283+00:00'     -> 234 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
'2012-04-23T10:20:30.400 +0230'     -> 10.3 µs ± 281 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'2012-04-23T10:20:30   -02'         -> 9.14 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
}}}

Most of the speedup gain is going to be found when using sqlite3 as a DB backend, because it has convertery UDF type things for them, and generally speaking Django stores and parses them in the well-formed format:

{{{
Database.register_converter(""datetime"", decoder(parse_datetime))
Database.register_converter(""timestamp"", decoder(parse_datetime))
}}}

when using `cProfile` to execute `tuple(User.objects.all())` with 101 users, `parse_datetime` accounts for 10+% of the overall time (with `Model.__init__` being the only bigger component), ordered by tottime (column 2), for the purposes of amortising the noise elsewhere, the call has been executed `1000` times in a loop:

{{{
101000    0.451    0.000    0.628    0.000 django/db/models/base.py:406(__init__)
102000    0.379    0.000    0.837    0.000 django/utils/dateparse.py:98(parse_datetime)
102000    0.202    0.000    0.699    0.000 django/db/models/sql/compiler.py:1128(apply_converters)
102000    0.158    0.000    0.158    0.000 django/utils/dateparse.py:124(<dictcomp>)
}}}

but after adapting as above, `parse_datetime` is about 1%, same ordering:
{{{
101000    0.433    0.000    0.606    0.000 django/db/models/base.py:406(__init__)
102000    0.198    0.000    0.682    0.000 django/db/models/sql/compiler.py:1128(apply_converters)
102000    0.031    0.000    0.048    0.000 django/utils/dateparse.py:98(parse_datetime)
}}}
(the dictcomp `{k: int(v) for k, v in kw.items() if v is not None}` is never accounted for because it never executes)

It would be possible to restrict the variants passed through to the `fromisoformat` branch further, because the minimum acceptable length string is `10` (for `YYYY-MM-DD`) and then there's ... I dunno, some other number of valid options for the time parsing (`HH[:MM[:SS[.fff[fff]]]][+HH:MM[:SS[.ffffff]]]` - eg, the tz string is only `5` `8` or `15` chars long), but I doubt it's worth fussing over that.

I suspect the same is true for `parse_date` which could use `date.fromisoformat` if the input value is exactly `10` long, and fall back to the regex method for the optional leading-zeroes `date_re` allows for.

Zero tests seem to fail with the adapted method proposed above."	Cleanup/optimization	closed	Utilities	dev	Normal	fixed			Ready for checkin	1	0	0	0	0	0