#23399 closed Cleanup/optimization (fixed)

Update int_to_base36

Reported by: liminspace Owned by: liminspace
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I propose change function int_to_base36:

def int_to_base36(n):
    assert isinstance(n, (int, long))
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    sign = ''
    if n < 0:
        sign, n = '-', -n
    if 0 <= n < 36:
        return sign + c[n]
    b36 = ''
    while n != 0:
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return sign + b36

I think that it's better and faster.

Change History (5)

comment:1 Changed 11 months ago by timgraham

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

This implementation isn't backwards compatible with respect to the errors it raises on invalid inputs.

Some benchmarks would be helpful as far as speed goes.

comment:2 Changed 11 months ago by kezabelle

It's an interesting idea. For the purposes of demonstration, here's some back-of-the-napkin unscientific benchmarks, from Python 2.7.6, with sys.maxint = 9223372036854775807:


time python baseline.py

import sys                                                                                                          
print("sys.maxint is: %s" % sys.maxint)
for x in xrange(0, 1000000):
    continue
print("Complete")

Consistently yields something about the range:

real 0m0.080s
user 0m0.066s
sys 0m0.012s


the version in django:

time python original.py

import sys                                                                                                          
from django.utils.http import int_to_base36
print("sys.maxint is: %s" % sys.maxint)

for x in xrange(0, 1000000):
    int_to_base36(i=x)

print("Complete")

yields:

real 0m3.337s
user 0m3.314s
sys 0m0.020s


The version proposed by liminspace, adjusted to raise the same TypeErrors as Django's existing one:

import sys                                                                                                          
from django.utils import six
print("sys.maxint is: %s" % sys.maxint)

def int_to_base36(n):
    if n < 0: 
        raise ValueError("Negative base36 conversion input.")
    if six.PY2:
        if not isinstance(n, six.integer_types):
            raise TypeError("Non-integer base36 conversion input.")
        if n > sys.maxint:
            raise ValueError("Base36 conversion input too large.")
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    sign = '' 
    if n < 0: 
        sign, n = '-', -n 
    if 0 <= n < 36:
        return sign + c[n]
    b36 = '' 
    while n != 0: 
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return sign + b36

for x in xrange(0, 1000000):
    int_to_base36(n=x)

print("Complete")

yields around:

real 0m2.859s
user 0m2.847s
sys 0m0.010s


So, over the first million smallest integers, there is some small speed-up (takes about 85% of the time), by the look of those numbers.
For the top million (changed to xrange(maxint - 1000000, 100000), the times seem a little more pronounced (the new version is about 60% of the original's time):

time python baseline.py
real 0m0.069s
user 0m0.060s
sys 0m0.008s

Django's original:

time python original.py
real 0m11.840s
user 0m11.805s
sys 0m0.029s

The new version I pasted above:

time python new.py
real 0m6.897s
user 0m6.877s
sys 0m0.017s

comment:3 Changed 11 months ago by liminspace

kezabelle, try this:

def int_to_base36(n):
    if n < 0: 
        raise ValueError("Negative base36 conversion input.")
    if six.PY2:
        if not isinstance(n, six.integer_types):
            raise TypeError("Non-integer base36 conversion input.")
        if n > sys.maxint:
            raise ValueError("Base36 conversion input too large.")
    c = '0123456789abcdefghijklmnopqrstuvwxyz'
    if n < 36:
        return c[n]
    b36 = '' 
    while n != 0: 
        n, i = divmod(n, 36)
        b36 = c[i] + b36
    return b36
Last edited 11 months ago by liminspace (previous) (diff)

comment:4 Changed 11 months ago by kezabelle

First million:

time python new2.py
real 0m2.682s
user 0m2.667s
sys 0m0.012s


last million:

time python new2a.py
real 0m6.539s
user 0m6.520s
sys 0m0.015s

Also did a quick test of assert new_int_to_base36(n=x) == original_int_to_base36(i=x) for both the first and last million, which raised no errors.

comment:5 Changed 11 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 2508be35ca8bc5440b93d3152cd80ee5f8e159e3:

Fixed #23399 -- Optimized django.utils.http.int_to_bas36()

Thanks liminspace for the patch and Keryn Knight for benchmarking.

Note: See TracTickets for help on using tickets.
Back to Top