Opened 14 years ago

Closed 13 years ago

Last modified 12 years ago

#14290 closed (fixed)

Rendering is very inefficient when USE_L10N. Caching format values gives 2-7 times improvement

Reported by: Teemu Kurppa Owned by: Jannis Leidel
Component: Internationalization Version: dev
Severity: Keywords: localization, rendering,
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django's rendering is really inefficient if localization is turned on (USE_L10N = True).

I have a spreadsheet like view with a few thousand numbers, dates and related data and it spend almost all of it's time in template rendering even when I preprocessed data carefully.
I investigated the issue and it seems that Django's localization implementation is doing a lot of unnecessary work for every rendered number, date and time.

Following patch introduces caching of locale-specific format values.

Speed improvements are considerable. On my laptop and our server, this gives 5-7 times improvement (from ~5 sec to <1sec) to rendering speed of an anomalistic view with 10000 integers. For my real-world spreadsheet view, the whole processing time of Django improves 2-3 times (from ~2sec to ~0.8sec)

In addition to the patch, I've included a test project, which can be used to test observe and test the problem.

Run test to see raw rendering performance

python manage.py test numbersperf

And to test it with server try

python manage.py runserver 

And then visit these urls to observe and profile the problem

http://localhost:8000/test/10000
http://localhost:8000/test/10000?timing
http://localhost:8000/test/10000?prof

Attachments (5)

l10n_performance.diff (2.5 KB ) - added by Teemu Kurppa <teemu.kurppa@…> 14 years ago.
Patch to cache localization formats
l10nperf.tar.gz (26.0 KB ) - added by Teemu Kurppa <teemu.kurppa@…> 14 years ago.
Test project to test rendering performance under L10N
ticket14290_agressive.diff (5.1 KB ) - added by Anssi Kääriäinen 14 years ago.
More aggressive optimizations
ticket14290.diff (9.1 KB ) - added by Anssi Kääriäinen 14 years ago.
For SVN HEAD
ticket14290_for_12.diff (6.0 KB ) - added by Anssi Kääriäinen 14 years ago.
For 1.2

Download all attachments as: .zip

Change History (27)

by Teemu Kurppa <teemu.kurppa@…>, 14 years ago

Attachment: l10n_performance.diff added

Patch to cache localization formats

by Teemu Kurppa <teemu.kurppa@…>, 14 years ago

Attachment: l10nperf.tar.gz added

Test project to test rendering performance under L10N

comment:1 by Teemu Kurppa <teemu.kurppa@…>, 14 years ago

Owner: changed from nobody to anonymous
Status: newassigned

comment:2 by Teemu Kurppa, 14 years ago

Owner: changed from anonymous to Teemu Kurppa
Status: assignednew

comment:3 by Jannis Leidel, 14 years ago

Owner: changed from Teemu Kurppa to Jannis Leidel
Status: newassigned
Triage Stage: UnreviewedAccepted

comment:4 by Gregor Müllegger, 14 years ago

I think the following change at the bottom of django/utils/formats.py would have the same effect with much less code:

from django.utils.functional import memoize

get_format = memoize(get_format)

This would also cache the results of get_format dependend of the arguments you pass to the function.

comment:5 by Teemu Kurppa, 14 years ago

grefmueller, that sounds nice, thanks.

I just noticed that if you are using LocaleMiddleware, my patch works incorrectly. It seems that the expensive operation is figuring out language and locale. Optimization for this case is definitely possible, but requires a bigger code change. Language and locale should be queried once per request. Current Django implementation does it for every rendered date and time and for numbers even 3 times. Lang or locale per request should be stored either in global state or passed to every localization function. I'm leaving for a short vacation, so I don't have a time to do that big change now, so if somebody is willing to fix this issue so that it works even in the case of LocaleMiddleware, be my guest.

comment:6 by Anssi Kääriäinen, 14 years ago

Patch needs improvement: set

You can't cache the active language. That would break changing the current language via activate, which can be done mid-request. Also, I don't think that the get_language calls are the main problem, it is probably the re-importing done for every call...

What could be done is:

  1. Check the active language once in get_format. This

will reduce the calls to get_language.

  1. Allow passing the active language to get_format:
def get_format(format_type, lang=None):
    if lang is None:
        lang = get_language()
    ...

And use this in callers that can cache the language, when they know the language will not change between calls to get_format. This will likely get dirty when trying to cache the active language in upper layers.

  1. Build a cache of (language, format_type) -> format

However, I am not sure if this is valid, as it might be possible that the list of format modules or what a given module will return can be changed dynamically (for example database based format module, although I know that is a stupid thing to do). The intention probably is that format modules will always return the same thing.

If 1 and 3 will be done, I am sure the performance will be very good. However 3 needs input from somebody with more knowledge about format modules. The main questions are if they are supposed to always return the same language, and if the list of the format modules is supposed to be static.


comment:7 by Anssi Kääriäinen, 14 years ago

Ok, now there is a get_language safe patch for this.

The performance statistics are as follows, using http://localhost:8000/test/10000?timing:

With this patch, 1,5 seconds on average

  • get_language calls (10000) use about 0.3 seconds of this.
  • smart_str in start of get_format uses 0.15 seconds.
  • the new caching mechanism uses about 0.1 seconds of this.

Without this patch, about 9 seconds on average.

With USE_L10N=False, about 0.7 seconds.

I haven't yet ran the tests, will do that tomorrow.

by Anssi Kääriäinen, 14 years ago

Attachment: ticket14290_agressive.diff added

More aggressive optimizations

comment:8 by Anssi Kääriäinen, 14 years ago

Patch needs improvement: unset

I have attached two patches, the ticket14290.diff patch does some more optimizations. In particular it makes get_language calls much faster.

The more aggressive version (ticket14290_aggressive.diff) does even more aggressive optimizations, but it is a bit (more) ugly...

The first patch gets performance from 1.5 seconds down to 1.3 seconds, and the second version gets it down to 1.0 seconds.

Both patches pass tests.

PS: the real problem seems to be that if settings.FOO is called in hot spots, it will cause performance problems. Will look to that next, but that will be another ticket.

comment:9 by Luke Plant, 14 years ago

Thanks so much for the work on this. In the patch, the use of a 1-tuple as a value seems to be both strange and unnecessary - just store the value or None directly, and use a KeyError exception to differentiate between None and the key not being present at all.

comment:10 by Anssi Kääriäinen, 14 years ago

I will post an updated patch today. I will update the usage of the cache dictionary, but there is some other work to do also.

Also, my performance numbers for the SVN base case were a bit off. It actually takes 16.5 seconds on my laptop to run this test. With the updated patch and also the ticket #12497 patch it takes 0.95 seconds. There is even more that can be done, but that work might go into premature optimizations category.

comment:11 by Anssi Kääriäinen, 14 years ago

Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:

versionwithout settings optimizationwith settings optimizationwith settings + translations optimization
SVN HEAD, L18N=True16.5 seconds15.5 seconds13.0 seconds
get_format cached, L!8N=True1.41.050.85
SVN HEAD, L18N=False1.080.850.82
get_format cached, L18N=False1.41.020.82

So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...

All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.

I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.

The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?

PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?

comment:12 by Teemu Kurppa, 14 years ago

Anssi, fantastic work you've done there. Once I get back from my short vacation next week, I'll give an update how your patches affect the performance of my real world spreadsheet scenario. Thanks for diving into this.

in reply to:  11 ; comment:13 by Jannis Leidel, 14 years ago

Needs documentation: set
Needs tests: set

Replying to akaariai:

Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:

versionwithout settings optimizationwith settings optimizationwith settings + translations optimization
SVN HEAD, L18N=True16.5 seconds15.5 seconds13.0 seconds
get_format cached, L!8N=True1.41.050.85
SVN HEAD, L18N=False1.080.850.82
get_format cached, L18N=False1.41.020.82

You probably meant the L10N setting here, which triggers the format localization.

So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...

All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.

I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.

Indeed, you did a bit of a yak shaving here.

The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?

Yes, caching the format modules is fine since the formats are startup variables similar to the languages that are cached in a dict, too. In other words, clearing the cache dict to dynamically change the format strings is an an easy option for this edge case. For everything else we provide a way to override the default format modules (FORMAT_MODULE_PATH).

PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?

Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk. Can you think of a way to add the caching without changing the API (e.g. function signature), since that seems like a bug fix to me.

in reply to:  13 comment:14 by Luke Plant, 14 years ago

Replying to jezdez:

Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk.

I think we can bend the rules here - there isn't an absolutely strict rule on no feature additions, and in fact we've already had some in a similar situation - [13642] which fixed #14116 on 1.2.X, but added a feature to the test client in order to do so.

comment:15 by Luke Plant, 14 years ago

I actually meant that the feature had to be added to fix #14156.

The only slight issue with adding features is that you need to add a 'versionadded' attribute in the docs that refers to '1.2.4', which doesn't exist yet, and isn't automatically converted to 'development version' like '1.3' is. But again there is precedent for that.

comment:16 by Anssi Kääriäinen, 14 years ago

Another day, another status update:

  • I have now patch against 1.2. No API changes, performance when USE_L10N = True: 1.25 seconds, L10N = False, a bit less than 0.8 seconds. So win in both situations. No need for the other patches.
  • Updated patch for 1.3. This one gets 0.55 seconds when USE_L10N = True (30ximprovement), and 0.52 seconds when false (2ximprovement). This is with all three patches applied.
  • Again all tests passed, both on 1.3 and 1.2.

There is a small modification to django/utils/encoding.py:force_unicode() in the 1.3 version of the patch. A isinstance(s, unicode) check is added in the beginning of the function. The reasoning for the modification is simple: That function gets called _very_ often with strings already unicode (30000 calls in the runserver test case). So, make that case a bit faster. This modification might deserve another ticket.

I will not have time to work on these patches for the next couple of days. But I will try to follow this ticket.

by Anssi Kääriäinen, 14 years ago

Attachment: ticket14290.diff added

For SVN HEAD

by Anssi Kääriäinen, 14 years ago

Attachment: ticket14290_for_12.diff added

For 1.2

comment:17 by Anssi Kääriäinen, 14 years ago

There is a different approach to all this. Cache the format string for given (language, format_type) in render context. This approach is even faster than the method used in the current implementation. It gets around 0.43 for the USE_L10N case, and 0.37 if not USE_L10N. This is with the settings optimization and translations optimization patches. The speedup is because we avoid going through a lot of functions when formatting, and we can avoid escaping formatted numbers. The speedup is for integers. Formatting floats, and the various date types is still slow. We could make the USE_L10N case as fast as the USE_L10N=False case, but that would mean that language changes in template tags or filters will not have any effect to localization in the rendering. Making a default tag that changes the language by flushing the cached values in context is easy, though. But this would most likely be backwards incompatible change...

It is of course possible to apply the patch in this ticket AND use caching of formats in render context. This way using xxx_format would be fast if used in python code, and localization would be even faster when used in template rendering.

For the interested, Jinja2 gets around 0.1 seconds for this case, with escaping on.

There is a patch showing how the concept works at http://github.com/akaariai/django. Branch ticket14290_filter is having just this idea, ticket_14290_alt has all the patches (plus a bit more) applied. Try with DEBUG=True, the DEBUG=False renderer is a bit broken at the moment.

comment:18 by Teemu Kurppa, 14 years ago

Just a quick update that the latest patch by akaariai is a great improvement for my real world scenario too.

I've a sparse spreadsheet view for inputting and viewing weekly / monthly sales data. Cells are either empty, ints, decimals or dates in that order of frequency. Non-trivial database queries are used to construct this view but Django rendering is still the biggest bottleneck.

xSVN HEADSVN HEAD with patchImprovement
Monthly (~8000 cells)8.5s2.6s3.3x
Weekly (~2000 cells)1.9s0.75s2.5x

Thus, the patch is awesome improvement for real world scenario. After applying the patch, still a good chunk of time is spent in rendering, I think there are opportunities for improvement here, I'll look if I find some low-hanging fruits.

Profiling data after the patch is applied:

         956888 function calls (867495 primitive calls) in 3.294 CPU seconds
...
  ---- By file ----

      tottime
28.6%   0.871 /Users/teemu/projects/xxxx/django-performance/django/template/__init__.py
 9.0%   0.276 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py
 6.1%   0.186 /Users/teemu/projects/xxxx/xxxxsales/sales/spreadsheet.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/utils/safestring.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/db/models/base.py
 5.2%   0.160 /Users/teemu/projects/xxxx/django-performance/django/utils/formats.py
 4.4%   0.133 /Users/teemu/projects/xxxx/django-performance/django/utils/functional.py
 4.2%   0.128 /Users/teemu/projects/xxxx/django-performance/django/template/context.py
 3.7%   0.114 /Users/teemu/projects/xxxx/django-performance/django/template/defaulttags.py
 3.4%   0.103 /Users/teemu/projects/xxxx/django-performance/django/utils/translation/__init__.py
 2.8%   0.086 /Users/teemu/projects/xxxx/django-performance/django/utils/encoding.py

comment:19 by Anssi Kääriäinen, 14 years ago

Was this with just the latest patch to this ticket, or also with #14297 and #14306 applied? Also, could you try this with the github branch ticket_14290_alt? It should be even faster than the latest patch to this ticket.

But, if you really want to "render" the table fast, use python code directly to do that. It is easy to do:

buffer = []
for obj in queryset:
    buffer.append(u'<tr><td>')
    # escape and localize as needed.
    buffer.append(obj.first_val)
    buffer.append(u'</td><td>')
    buffer.append(obj.second_val)
    ...
u''.join(buffer)

No amount of optimization to the template rendering system can beat the performance of that. I think you will easily get to somewhere around 1s for the monthly one. But going much lower than that seems hard, as then it is the database that will be your problem.

comment:20 by Jannis Leidel, 13 years ago

Resolution: fixed
Status: assignedclosed

(In [13898]) Fixed #14290 -- Made format localization faster by caching the format modules. Thanks, Teemu Kurppa and Anssi Kääriäinen for the report and initial patches.

comment:21 by Jannis Leidel, 13 years ago

Also partially applied to 1.2.X in r13903.

comment:22 by Jacob, 12 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top