Opened 5 years ago

Last modified 10 months ago

#25762 assigned Cleanup/optimization

Optimize numberformat.format

Reported by: Jaap Roes Owned by: tim-mccurrach
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: Johannes Hoppe Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

numberformat.format is used a lot by Django and in most cases it's called with a number. Yet the current implementation is pretty much built for strings.

I've refactored the function to be up to 4x faster on my machine (and in the worst case it performs roughly the same). Attached is the benchmark script I've used.

Attachments (1)

bench_numberformat.py (2.6 KB) - added by Jaap Roes 5 years ago.
Benchmark for numberformat.format

Download all attachments as: .zip

Change History (8)

Changed 5 years ago by Jaap Roes

Attachment: bench_numberformat.py added

Benchmark for numberformat.format

comment:1 Changed 5 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

I'll mention djangobench in case you want to adapt any of your testing scripts so they can live on.

comment:2 Changed 5 years ago by Jaap Roes

Ah nice, I'll look into adding some benchmarks to djangobench

comment:3 Changed 5 years ago by Jaap Roes

By the way, while refactoring this code I was wondering if supporting NUMBER_GROUPING other than 3 is even useful. None of the locales shipped with Django define anything other than 3, and the other related configuration options (THOUSAND_SEPARATOR and USE_THOUSAND_SEPARATOR) all refer to grouping by 3 anyway.

It would also be nice if number formatting on strings was deprecated. It seems that Django already tries to call it with numbers in the places it's used, and there's no real check if the strings that are passed into the function are even number like.

comment:4 Changed 5 years ago by Tim Graham

Patch needs improvement: set

Left comments for improvement on the pull request. Maybe you'd like to raise the other questions on the DevelopersMailingList. I'm not sure about the answers myself.

comment:5 Changed 12 months ago by tim-mccurrach

Owner: changed from nobody to tim-mccurrach
Status: newassigned

comment:6 Changed 12 months ago by tim-mccurrach

The approach used in the linked PR (https://github.com/django/django/pull/5668/files) actually has a problem. The following approach is used to avoid the rounding behaviour of the builtin format function:

                # If `number` is 10.888 and `decimal_pos` is 2 return 10.88 not 10.89.
                format_string = '.%sf' % (decimal_pos + 1)

and the string is then truncated to decimal_pos decimals places. However, for a number such as 10.999, this will still be rounded 11.00. This is inconsistent with the behaviour of the rest of the function, which truncates and doesn't round.

We therefore can't use the bultin format to do any kind of decimal formatting as there doesn't appear (as it currently stands) to be an option to truncate with it, however the speed advantages it offers can still be used for the common case of seperating a number into thousands.

Using the test cases in bench_number_format.py linked above, an improved function performs anywhere between 2x and 5x faster for most cases, and again in the worst case is roughly the same (fluctuating between +2% and -2% compared with the original function).

I'll add a new PR.

comment:7 Changed 10 months ago by Johannes Hoppe

Cc: Johannes Hoppe added
Note: See TracTickets for help on using tickets.
Back to Top