#14290 closed (fixed)
Rendering is very inefficient when USE_L10N. Caching format values gives 2-7 times improvement
| Reported by: | Teemu Kurppa | Owned by: | Jannis Leidel | 
|---|---|---|---|
| Component: | Internationalization | Version: | dev | 
| Severity: | Keywords: | localization, rendering, | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | yes | 
| Needs tests: | yes | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
Django's rendering is really inefficient if localization is turned on (USE_L10N = True).  
I have a spreadsheet like view with a few thousand numbers, dates and related data and it spend almost all of it's time in template rendering even when I preprocessed data carefully.
I investigated the issue and it seems that Django's localization implementation is doing a lot of unnecessary work for every rendered number, date and time. 
Following patch introduces caching of locale-specific format values.
Speed improvements are considerable. On my laptop and our server, this gives 5-7 times improvement (from ~5 sec to <1sec) to rendering speed of an anomalistic view with 10000 integers. For my real-world spreadsheet view, the whole processing time of Django improves  2-3 times (from ~2sec to ~0.8sec) 
In addition to the patch, I've included a test project, which can be used to test observe and test the problem.
Run test to see raw rendering performance
python manage.py test numbersperf
And to test it with server try
python manage.py runserver
And then visit these urls to observe and profile the problem
http://localhost:8000/test/10000 http://localhost:8000/test/10000?timing http://localhost:8000/test/10000?prof
Attachments (5)
Change History (27)
by , 15 years ago
| Attachment: | l10n_performance.diff added | 
|---|
by , 15 years ago
| Attachment: | l10nperf.tar.gz added | 
|---|
Test project to test rendering performance under L10N
comment:1 by , 15 years ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
comment:2 by , 15 years ago
| Owner: | changed from to | 
|---|---|
| Status: | assigned → new | 
comment:3 by , 15 years ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
| Triage Stage: | Unreviewed → Accepted | 
comment:4 by , 15 years ago
I think the following change at the bottom of django/utils/formats.py would have the same effect with much less code:
from django.utils.functional import memoize get_format = memoize(get_format)
This would also cache the results of get_format dependend of the arguments you pass to the function.
comment:5 by , 15 years ago
grefmueller, that sounds nice, thanks. 
I just noticed that if you are using LocaleMiddleware, my patch works incorrectly. It seems that the expensive operation is figuring out language and locale. Optimization for this case is definitely possible, but requires a bigger code change. Language and locale should be queried once per request. Current Django implementation does it for every rendered date and time and for numbers even 3 times. Lang or locale per request should be stored either in global state or passed to every localization function. I'm leaving for a short vacation, so I don't have a time to do that big change now, so if somebody is willing to fix this issue so that it works even in the case of LocaleMiddleware, be my guest. 
comment:6 by , 15 years ago
| Patch needs improvement: | set | 
|---|
You can't cache the active language. That would break changing the current language via activate, which can be done mid-request. Also, I don't think that the get_language calls are the main problem, it is probably the re-importing done for every call...
What could be done is:
- Check the active language once in get_format. This
will reduce the calls to get_language.
- Allow passing the active language to get_format:
def get_format(format_type, lang=None):
    if lang is None:
        lang = get_language()
    ...
And use this in callers that can cache the language, when they know the language will not change between calls to get_format. This will likely get dirty when trying to cache the active language in upper layers.
- Build a cache of (language, format_type) -> format
However, I am not sure if this is valid, as it might be possible that the list of format modules or what a given module will return can be changed dynamically (for example database based format module, although I know that is a stupid thing to do). The intention probably is that format modules will always return the same thing. 
If 1 and 3 will be done, I am sure the performance will be very good. However 3 needs input from somebody with more knowledge about format modules. The main questions are if they are supposed to always return the same language, and if the list of the format modules is supposed to be static.
 
comment:7 by , 15 years ago
Ok, now there is a get_language safe patch for this.
The performance statistics are as follows, using http://localhost:8000/test/10000?timing:
With this patch, 1,5 seconds on average
- get_language calls (10000) use about 0.3 seconds of this.
- smart_str in start of get_format uses 0.15 seconds.
- the new caching mechanism uses about 0.1 seconds of this.
Without this patch, about 9 seconds on average.
With USE_L10N=False, about 0.7 seconds.
I haven't yet ran the tests, will do that tomorrow.
comment:8 by , 15 years ago
| Patch needs improvement: | unset | 
|---|
I have attached two patches, the ticket14290.diff patch does some more optimizations. In particular it makes get_language calls much faster.
The more aggressive version (ticket14290_aggressive.diff) does even more aggressive optimizations, but it is a bit (more) ugly... 
The first patch gets performance from 1.5 seconds down to 1.3 seconds, and the second version gets it down to 1.0 seconds.
Both patches pass tests.
PS: the real problem seems to be that if settings.FOO is called in hot spots, it will cause performance problems. Will look to that next, but that will be another ticket.
comment:9 by , 15 years ago
Thanks so much for the work on this. In the patch, the use of a 1-tuple as a value seems to be both strange and unnecessary - just store the value or None directly, and use a KeyError exception to differentiate between None and the key not being present at all.
comment:10 by , 15 years ago
I will post an updated patch today. I will update the usage of the cache dictionary, but there is some other work to do also. 
Also, my performance numbers for the SVN base case were a bit off. It actually takes 16.5 seconds on my laptop to run this test. With the updated patch and also the ticket #12497 patch it takes 0.95 seconds. There is even more that can be done, but that work might go into premature optimizations category.
follow-up: 13 comment:11 by , 15 years ago
Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:
| version | without settings optimization | with settings optimization | with settings + translations optimization | 
| SVN HEAD, L18N=True | 16.5 seconds | 15.5 seconds | 13.0 seconds | 
| get_format cached, L!8N=True | 1.4 | 1.05 | 0.85 | 
| SVN HEAD, L18N=False | 1.08 | 0.85 | 0.82 | 
| get_format cached, L18N=False | 1.4 | 1.02 | 0.82 | 
So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...
All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.
I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.
The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?
PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?
comment:12 by , 15 years ago
Anssi, fantastic work you've done there. Once I get back from my short vacation next week, I'll give an update how your patches affect the performance of my real world spreadsheet scenario. Thanks for diving into this. 
follow-up: 14 comment:13 by , 15 years ago
| Needs documentation: | set | 
|---|---|
| Needs tests: | set | 
Replying to akaariai:
Update patch attached (ticket14290.diff, the aggressive one isn't good). It should be nice and clean now. There are also two related tickets with patches, settings optimization (#14297) and translations cleanup (#14306), witch has better performance as a bonus. Performance numbers for various combinations:
| version | without settings optimization | with settings optimization | with settings + translations optimization | 
| SVN HEAD, L18N=True | 16.5 seconds | 15.5 seconds | 13.0 seconds | 
| get_format cached, L!8N=True | 1.4 | 1.05 | 0.85 | 
| SVN HEAD, L18N=False | 1.08 | 0.85 | 0.82 | 
| get_format cached, L18N=False | 1.4 | 1.02 | 0.82 | 
You probably meant the L10N setting here, which triggers the format localization.
So, all in all, from 16.5 seconds to about 0.8 seconds. After these patches USE_L18N should have almost no effect on performance. Note that with USE_L18N = False the performance will degrade a little, if other patches are rejected. I am sorry that there are no numbers for with translations optimization + this ticket, that would also be interesting combination...
All patches should pass tests separately. Also when all patches are applied, tests are passed. I haven't tested all the combinations.
I know I have probably done too much optimization (that is, just caching the value would be enough), but in my opinion the related patches are cleaner than the current implementations. Particularly so for the translations optimization patch.
Indeed, you did a bit of a yak shaving here.
The only question regarding this patch is: Is it considered safe to cache the values, that is, are the format modules allowed to change dynamically which format they return or not?
Yes, caching the format modules is fine since the formats are startup variables similar to the languages that are cached in a dict, too. In other words, clearing the cache dict to dynamically change the format strings is an an easy option for this edge case. For everything else we provide a way to override the default format modules (FORMAT_MODULE_PATH).
PS: This patch has a nice side effect: Now it is possible to get the format for other than the active language, too. Should this be tested & documented?
Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk. Can you think of a way to add the caching without changing the API (e.g. function signature), since that seems like a bug fix to me.
comment:14 by , 15 years ago
Replying to jezdez:
Yes, this should be tested and documented. Unfortunately this brings us a bit in a trouble, since we can't add a new feature to the 1.2.X branch, but only in trunk.
I think we can bend the rules here - there isn't an absolutely strict rule on no feature additions, and in fact we've already had some in a similar situation - [13642] which fixed #14116 on 1.2.X, but added a feature to the test client in order to do so.
comment:15 by , 15 years ago
I actually meant that the feature had to be added to fix #14156.
The only slight issue with adding features is that you need to add a 'versionadded' attribute in the docs that refers to '1.2.4', which doesn't exist yet, and isn't automatically converted to 'development version' like '1.3' is. But again there is precedent for that.
comment:16 by , 15 years ago
Another day, another status update:
- I have now patch against 1.2. No API changes, performance when USE_L10N = True: 1.25 seconds, L10N = False, a bit less than 0.8 seconds. So win in both situations. No need for the other patches.
- Updated patch for 1.3. This one gets 0.55 seconds when USE_L10N = True (30ximprovement), and 0.52 seconds when false (2ximprovement). This is with all three patches applied.
- Again all tests passed, both on 1.3 and 1.2.
There is a small modification to django/utils/encoding.py:force_unicode() in the 1.3 version of the patch. A isinstance(s, unicode) check is added in the beginning of the function. The reasoning for the modification is simple: That function gets called _very_ often with strings already unicode (30000 calls in the runserver test case). So, make that case a bit faster. This modification might deserve another ticket.
I will not have time to work on these patches for the next couple of days. But I will try to follow this ticket.
comment:17 by , 15 years ago
There is a different approach to all this. Cache the format string for given (language, format_type) in render context. This approach is even faster than the method used in the current implementation. It gets around 0.43 for the USE_L10N case, and 0.37 if not USE_L10N. This is with the settings optimization and translations optimization patches. The speedup is because we avoid going through a lot of functions when formatting, and we can avoid escaping formatted numbers. The speedup is for integers. Formatting floats, and the various date types is still slow. We could make the USE_L10N case as fast as the USE_L10N=False case, but that would mean that language changes in template tags or filters will not have any effect to localization in the rendering. Making a default tag that changes the language by flushing the cached values in context is easy, though. But this would most likely be backwards incompatible change...
It is of course possible to apply the patch in this ticket AND use caching of formats in render context. This way using xxx_format would be fast if used in python code, and localization would be even faster when used in template rendering.
For the interested, Jinja2 gets around 0.1 seconds for this case, with escaping on.
There is a patch showing how the concept works at http://github.com/akaariai/django. Branch ticket14290_filter is having just this idea, ticket_14290_alt has all the patches (plus a bit more) applied. Try with DEBUG=True, the DEBUG=False renderer is a bit broken at the moment.
comment:18 by , 15 years ago
Just a quick update that the latest patch by akaariai is a great improvement for my real world scenario too.
I've a sparse spreadsheet view for inputting and viewing weekly / monthly sales data. Cells are  either empty, ints, decimals or dates in that order of frequency. Non-trivial database queries are used to construct this view but Django rendering is still the biggest bottleneck.
| x | SVN HEAD | SVN HEAD with patch | Improvement | 
| Monthly (~8000 cells) | 8.5s | 2.6s | 3.3x | 
| Weekly (~2000 cells) | 1.9s | 0.75s | 2.5x | 
Thus, the patch is awesome improvement for real world scenario. After applying the patch, still a good chunk of time is spent in rendering, I think there are opportunities for improvement here, I'll look if I find some low-hanging fruits. 
Profiling data after the patch is applied:
         956888 function calls (867495 primitive calls) in 3.294 CPU seconds
...
  ---- By file ----
      tottime
28.6%   0.871 /Users/teemu/projects/xxxx/django-performance/django/template/__init__.py
 9.0%   0.276 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py
 6.1%   0.186 /Users/teemu/projects/xxxx/xxxxsales/sales/spreadsheet.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/utils/safestring.py
 6.0%   0.184 /Users/teemu/projects/xxxx/django-performance/django/db/models/base.py
 5.2%   0.160 /Users/teemu/projects/xxxx/django-performance/django/utils/formats.py
 4.4%   0.133 /Users/teemu/projects/xxxx/django-performance/django/utils/functional.py
 4.2%   0.128 /Users/teemu/projects/xxxx/django-performance/django/template/context.py
 3.7%   0.114 /Users/teemu/projects/xxxx/django-performance/django/template/defaulttags.py
 3.4%   0.103 /Users/teemu/projects/xxxx/django-performance/django/utils/translation/__init__.py
 2.8%   0.086 /Users/teemu/projects/xxxx/django-performance/django/utils/encoding.py
comment:19 by , 15 years ago
Was this with just the latest patch to this ticket, or also with #14297 and #14306 applied? Also, could you try this with the github branch ticket_14290_alt? It should be even faster than the latest patch to this ticket.
But, if you really want to "render" the table fast, use python code directly to do that. It is easy to do:
buffer = []
for obj in queryset:
    buffer.append(u'<tr><td>')
    # escape and localize as needed.
    buffer.append(obj.first_val)
    buffer.append(u'</td><td>')
    buffer.append(obj.second_val)
    ...
u''.join(buffer)
No amount of optimization to the template rendering system can beat the performance of that. I think you will easily get to somewhere around 1s for the monthly one. But going much lower than that seems hard, as then it is the database that will be your problem.
comment:20 by , 15 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | assigned → closed | 
Patch to cache localization formats