Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#14290 closed (fixed)

Rendering is very inefficient when USE_L10N. Caching format values gives 2-7 times improvement

Reported by: teemu Owned by: jezdez
Component: Internationalization Version: master
Severity: Keywords: localization, rendering,
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: UI/UX:

Description

Django's rendering is really inefficient if localization is turned on (USE_L10N = True).

I have a spreadsheet like view with a few thousand numbers, dates and related data and it spend almost all of it's time in template rendering even when I preprocessed data carefully.
I investigated the issue and it seems that Django's localization implementation is doing a lot of unnecessary work for every rendered number, date and time.

Following patch introduces caching of locale-specific format values.

Speed improvements are considerable. On my laptop and our server, this gives 5-7 times improvement (from ~5 sec to <1sec) to rendering speed of an anomalistic view with 10000 integers. For my real-world spreadsheet view, the whole processing time of Django improves 2-3 times (from ~2sec to ~0.8sec)

In addition to the patch, I've included a test project, which can be used to test observe and test the problem.

Run test to see raw rendering performance

python manage.py test numbersperf

And to test it with server try

python manage.py runserver 

And then visit these urls to observe and profile the problem

http://localhost:8000/test/10000
http://localhost:8000/test/10000?timing
http://localhost:8000/test/10000?prof

Attachments (5)

l10n_performance.diff (2.5 KB) - added by Teemu Kurppa <teemu.kurppa@…> 4 years ago.
Patch to cache localization formats
l10nperf.tar.gz (26.0 KB) - added by Teemu Kurppa <teemu.kurppa@…> 4 years ago.
Test project to test rendering performance under L10N
ticket14290_agressive.diff (5.1 KB) - added by akaariai 4 years ago.
More aggressive optimizations
ticket14290.diff (9.1 KB) - added by akaariai 4 years ago.
For SVN HEAD
ticket14290_for_12.diff (6.0 KB) - added by akaariai 4 years ago.
For 1.2

Download all attachments as: .zip

Change History (27)

Changed 4 years ago by Teemu Kurppa <teemu.kurppa@…>

Patch to cache localization formats

Changed 4 years ago by Teemu Kurppa <teemu.kurppa@…>

Test project to test rendering performance under L10N

comment:1 Changed 4 years ago by Teemu Kurppa <teemu.kurppa@…>

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to anonymous
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:2 Changed 4 years ago by teemu

  • Owner changed from anonymous to teemu
  • Status changed from assigned to new

comment:3 Changed 4 years ago by jezdez

  • Owner changed from teemu to jezdez
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 4 years ago by gregmuellegger

I think the following change at the bottom of django/utils/formats.py would have the same effect with much less code:

from django.utils.functional import memoize

get_format = memoize(get_format)

This would also cache the results of get_format dependend of the arguments you pass to the function.

comment:5 Changed 4 years ago by teemu

grefmueller, that sounds nice, thanks.

I just noticed that if you are using LocaleMiddleware, my patch works incorrectly. It seems that the expensive operation is figuring out language and locale. Optimization for this case is definitely possible, but requires a bigger code change. Language and locale should be queried once per request. Current Django implementation does it for every rendered date and time and for numbers even 3 times. Lang or locale per request should be stored either in global state or passed to every localization function. I'm leaving for a short vacation, so I don't have a time to do that big change now, so if somebody is willing to fix this issue so that it works even in the case of LocaleMiddleware, be my guest.

comment:6 Changed 4 years ago by akaariai

  • Patch needs improvement set

You can't cache the active language. That would break changing the current language via activate, which can be done mid-request. Also, I don't think that the get_language calls are the main problem, it is probably the re-importing done for every call...

What could be done is:

  1. Check the active language once in get_format. This

will reduce the calls to get_language.

  1. Allow passing the active language to get_format:
def get_format(format_type, lang=None):
    if lang is None:
        lang = get_language()
    ...

And use this in callers that can cache the language, when they know the language will not change between calls to get_format. This will likely get dirty when trying to cache the active language in upper layers.

  1. Build a cache of (language, format_type) -> format

However, I am not sure if this is valid, as it might be possible that the list of format modules or what a given module will return can be changed dynamically (for example database based format module, although I know that is a stupid thing to do). The intention probably is that format modules will always return the same thing.

If 1 and 3 will be done, I am sure the performance will be very good. However 3 needs input from somebody with more knowledge about format modules. The main questions are if they are supposed to always return the same language, and if the list of the format modules is supposed to be static.


comment:7 Changed 4 years ago by akaariai

Ok, now there is a get_language safe patch for this.

The performance statistics are as follows, using http://localhost:8000/test/10000?timing:

With this patch, 1,5 seconds on average

  • get_language calls (10000) use about 0.3 seconds of this.
  • smart_str in start of get_format uses 0.15 seconds.
  • the new caching mechanism uses about 0.1 seconds of this.

Without this patch, about 9 seconds on average.

With USE_L10N=False, about 0.7 seconds.

I haven't yet ran the tests, will do that tomorrow.

Changed 4 years ago by akaariai

More aggressive optimizations

comment:8 Changed 4 years ago by akaariai

  • Patch needs improvement unset

I have attached two patches, the ticket14290.diff patch does some more optimizations. In particular it makes get_language calls much faster.

The more aggressive version (ticket14290_aggressive.diff) does even more aggressive optimizations, but it is a bit (more) ugly...

The first patch gets performance from 1.5 seconds down to 1.3 seconds, and the second version gets it down to 1.0 seconds.

Both patches pass tests.

PS: the real problem seems to be that if settings.FOO is called in hot spots, it will cause performance problems. Will look to that next, but that will be another ticket.

comment:9 Changed 4 years ago by lukeplant

Thanks so much for the work on this. In the patch, the use of a 1-tuple as a value seems to be both strange and unnecessary - just store the value or None directly, and use a KeyError exception to differentiate between None and the key not being present at all.

comment:10 Changed 4 years ago by akaariai

I will post an updated patch today. I will update the usage of the cache dictionary, but there is some other work to do also.

Also, my performance numbers for the SVN base case were a bit off. It actually takes 16.5 seconds on my laptop to run this test. With the updated patch and also the ticket #12497 patch it takes 0.95 seconds. There is even more that can be done, but that work might go into premature optimizations category.

comment:11 follow-up: Changed 4 years ago by akaariai

Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:

versionwithout settings optimizationwith settings optimizationwith settings + translations optimization
SVN HEAD, L18N=True16.5 seconds15.5 seconds13.0 seconds
get_format cached, L!8N=True1.41.050.85
SVN HEAD, L18N=False1.080.850.82
get_format cached, L18N=False1.41.020.82

So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...

All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.

I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.

The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?

PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?

comment:12 Changed 4 years ago by teemu

Anssi, fantastic work you've done there. Once I get back from my short vacation next week, I'll give an update how your patches affect the performance of my real world spreadsheet scenario. Thanks for diving into this.

comment:13 in reply to: ↑ 11 ; follow-up: Changed 4 years ago by jezdez

  • Needs documentation set
  • Needs tests set

Replying to akaariai:

Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:

versionwithout settings optimizationwith settings optimizationwith settings + translations optimization
SVN HEAD, L18N=True16.5 seconds15.5 seconds13.0 seconds
get_format cached, L!8N=True1.41.050.85
SVN HEAD, L18N=False1.080.850.82
get_format cached, L18N=False1.41.020.82

You probably meant the L10N setting here, which triggers the format localization.

So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...

All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.

I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.

Indeed, you did a bit of a yak shaving here.

The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?

Yes, caching the format modules is fine since the formats are startup variables similar to the languages that are cached in a dict, too. In other words, clearing the cache dict to dynamically change the format strings is an an easy option for this edge case. For everything else we provide a way to override the default format modules (FORMAT_MODULE_PATH).

PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?

Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk. Can you think of a way to add the caching without changing the API (e.g. function signature), since that seems like a bug fix to me.

comment:14 in reply to: ↑ 13 Changed 4 years ago by lukeplant

Replying to jezdez:

Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk.

I think we can bend the rules here - there isn't an absolutely strict rule on no feature additions, and in fact we've already had some in a similar situation - [13642] which fixed #14116 on 1.2.X, but added a feature to the test client in order to do so.

comment:15 Changed 4 years ago by lukeplant

I actually meant that the feature had to be added to fix #14156.

The only slight issue with adding features is that you need to add a 'versionadded' attribute in the docs that refers to '1.2.4', which doesn't exist yet, and isn't automatically converted to 'development version' like '1.3' is. But again there is precedent for that.

comment:16 Changed 4 years ago by akaariai

Another day, another status update:

  • I have now patch against 1.2. No API changes, performance when USE_L10N = True: 1.25 seconds, L10N = False, a bit less than 0.8 seconds. So win in both situations. No need for the other patches.
  • Updated patch for 1.3. This one gets 0.55 seconds when USE_L10N = True (30ximprovement), and 0.52 seconds when false (2ximprovement). This is with all three patches applied.
  • Again all tests passed, both on 1.3 and 1.2.

There is a small modification to django/utils/encoding.py:force_unicode() in the 1.3 version of the patch. A isinstance(s, unicode) check is added in the beginning of the function. The reasoning for the modification is simple: That function gets called _very_ often with strings already unicode (30000 calls in the runserver test case). So, make that case a bit faster. This modification might deserve another ticket.

I will not have time to work on these patches for the next couple of days. But I will try to follow this ticket.

Changed 4 years ago by akaariai

For SVN HEAD

Changed 4 years ago by akaariai

For 1.2

comment:17 Changed 4 years ago by akaariai

There is a different approach to all this. Cache the format string for given (language, format_type) in render context. This approach is even faster than the method used in the current implementation. It gets around 0.43 for the USE_L10N case, and 0.37 if not USE_L10N. This is with the settings optimization and translations optimization patches. The speedup is because we avoid going through a lot of functions when formatting, and we can avoid escaping formatted numbers. The speedup is for integers. Formatting floats, and the various date types is still slow. We could make the USE_L10N case as fast as the USE_L10N=False case, but that would mean that language changes in template tags or filters will not have any effect to localization in the rendering. Making a default tag that changes the language by flushing the cached values in context is easy, though. But this would most likely be backwards incompatible change...

It is of course possible to apply the patch in this ticket AND use caching of formats in render context. This way using xxx_format would be fast if used in python code, and localization would be even faster when used in template rendering.

For the interested, Jinja2 gets around 0.1 seconds for this case, with escaping on.

There is a patch showing how the concept works at http://github.com/akaariai/django. Branch ticket14290_filter is having just this idea, ticket_14290_alt has all the patches (plus a bit more) applied. Try with DEBUG=True, the DEBUG=False renderer is a bit broken at the moment.

comment:18 Changed 4 years ago by teemu

Just a quick update that the latest patch by akaariai is a great improvement for my real world scenario too.

I've a sparse spreadsheet view for inputting and viewing weekly / monthly sales data. Cells are either empty, ints, decimals or dates in that order of frequency. Non-trivial database queries are used to construct this view but Django rendering is still the biggest bottleneck.

xSVN HEADSVN HEAD with patchImprovement
Monthly (~8000 cells)8.5s2.6s3.3x
Weekly (~2000 cells)1.9s0.75s2.5x

Thus, the patch is awesome improvement for real world scenario. After applying the patch, still a good chunk of time is spent in rendering, I think there are opportunities for improvement here, I'll look if I find some low-hanging fruits.

Profiling data after the patch is applied:

         956888 function calls (867495 primitive calls) in 3.294 CPU seconds
...
  ---- By file ----

      tottime
28.6%   0.871 /Users/teemu/projects/xxxx/django-performance/django/template/__init__.py
 9.0%   0.276 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py
 6.1%   0.186 /Users/teemu/projects/xxxx/xxxxsales/sales/spreadsheet.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/utils/safestring.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/db/models/base.py
 5.2%   0.160 /Users/teemu/projects/xxxx/django-performance/django/utils/formats.py
 4.4%   0.133 /Users/teemu/projects/xxxx/django-performance/django/utils/functional.py
 4.2%   0.128 /Users/teemu/projects/xxxx/django-performance/django/template/context.py
 3.7%   0.114 /Users/teemu/projects/xxxx/django-performance/django/template/defaulttags.py
 3.4%   0.103 /Users/teemu/projects/xxxx/django-performance/django/utils/translation/__init__.py
 2.8%   0.086 /Users/teemu/projects/xxxx/django-performance/django/utils/encoding.py

comment:19 Changed 4 years ago by akaariai

Was this with just the latest patch to this ticket, or also with #14297 and #14306 applied? Also, could you try this with the github branch ticket_14290_alt? It should be even faster than the latest patch to this ticket.

But, if you really want to "render" the table fast, use python code directly to do that. It is easy to do:

buffer = []
for obj in queryset:
    buffer.append(u'<tr><td>')
    # escape and localize as needed.
    buffer.append(obj.first_val)
    buffer.append(u'</td><td>')
    buffer.append(obj.second_val)
    ...
u''.join(buffer)

No amount of optimization to the template rendering system can beat the performance of that. I think you will easily get to somewhere around 1s for the monthly one. But going much lower than that seems hard, as then it is the database that will be your problem.

comment:20 Changed 4 years ago by jezdez

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [13898]) Fixed #14290 -- Made format localization faster by caching the format modules. Thanks, Teemu Kurppa and Anssi Kääriäinen for the report and initial patches.

comment:21 Changed 4 years ago by jezdez

Also partially applied to 1.2.X in r13903.

comment:22 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top