Opened 2 years ago

Closed 2 years ago

#23399 closed Cleanup/optimization (fixed)

Update int_to_base36

Reported by: None Owned by: None
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I propose change function int_to_base36:

def int_to_base36(n):
    assert isinstance(n, (int, long))
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    sign = ''
    if n < 0:
        sign, n = '-', -n
    if 0 <= n < 36:
        return sign + c[n]
    b36 = ''
    while n != 0:
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return sign + b36

I think that it's better and faster.

Change History (5)

comment:1 Changed 2 years ago by Tim Graham

This implementation isn't backwards compatible with respect to the errors it raises on invalid inputs.

Some benchmarks would be helpful as far as speed goes.

comment:2 Changed 2 years ago by Keryn Knight

It's an interesting idea. For the purposes of demonstration, here's some back-of-the-napkin unscientific benchmarks, from Python 2.7.6, with sys.maxint = 9223372036854775807:


time python baseline.py

import sys                                                                                                          
print("sys.maxint is: %s" % sys.maxint)
for x in xrange(0, 1000000):
    continue
print("Complete")

Consistently yields something about the range:

real 0m0.080s
user 0m0.066s
sys 0m0.012s


the version in django:

time python original.py

import sys                                                                                                          
from django.utils.http import int_to_base36
print("sys.maxint is: %s" % sys.maxint)

for x in xrange(0, 1000000):
    int_to_base36(i=x)

print("Complete")

yields:

real 0m3.337s
user 0m3.314s
sys 0m0.020s


The version proposed by liminspace, adjusted to raise the same TypeErrors as Django's existing one:

import sys                                                                                                          
from django.utils import six
print("sys.maxint is: %s" % sys.maxint)

def int_to_base36(n):
    if n < 0: 
        raise ValueError("Negative base36 conversion input.")
    if six.PY2:
        if not isinstance(n, six.integer_types):
            raise TypeError("Non-integer base36 conversion input.")
        if n > sys.maxint:
            raise ValueError("Base36 conversion input too large.")
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    sign = '' 
    if n < 0: 
        sign, n = '-', -n 
    if 0 <= n < 36:
        return sign + c[n]
    b36 = '' 
    while n != 0: 
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return sign + b36

for x in xrange(0, 1000000):
    int_to_base36(n=x)

print("Complete")

yields around:

real 0m2.859s
user 0m2.847s
sys 0m0.010s


So, over the first million smallest integers, there is some small speed-up (takes about 85% of the time), by the look of those numbers.
For the top million (changed to xrange(maxint - 1000000, 100000), the times seem a little more pronounced (the new version is about 60% of the original's time):

time python baseline.py
real 0m0.069s
user 0m0.060s
sys 0m0.008s

Django's original:

time python original.py
real 0m11.840s
user 0m11.805s
sys 0m0.029s

The new version I pasted above:

time python new.py
real 0m6.897s
user 0m6.877s
sys 0m0.017s

comment:3 Changed 2 years ago by None

kezabelle, try this:

def int_to_base36(n):
    if n < 0: 
        raise ValueError("Negative base36 conversion input.")
    if six.PY2:
        if not isinstance(n, six.integer_types):
            raise TypeError("Non-integer base36 conversion input.")
        if n > sys.maxint:
            raise ValueError("Base36 conversion input too large.")
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    if n < 36:
        return c[n]
    b36 = '' 
    while n != 0: 
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return b36
Last edited 2 years ago by None (previous) (diff)

comment:4 Changed 2 years ago by Keryn Knight

First million:

time python new2.py
real 0m2.682s
user 0m2.667s
sys 0m0.012s


last million:

time python new2a.py
real 0m6.539s
user 0m6.520s
sys 0m0.015s

Also did a quick test of assert new_int_to_base36(n=x) == original_int_to_base36(i=x) for both the first and last million, which raised no errors.

comment:5 Changed 2 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 2508be35ca8bc5440b93d3152cd80ee5f8e159e3:

Fixed #23399 -- Optimized django.utils.http.int_to_bas36()

Thanks liminspace for the patch and Keryn Knight for benchmarking.

Note: See TracTickets for help on using tickets.
Back to Top