Opened 15 years ago
Closed 11 years ago
#11572 closed Cleanup/optimization (worksforme)
Very high memory usage by big sitemaps
Reported by: | Piotr Maliński | Owned by: | nobody |
---|---|---|---|
Component: | contrib.sitemaps | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | simon@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I'm using Py 2.5, Django 1.X, Nginx/FastCGI hosting at megiteam.pl. The site has a big sitemap - 9K elements, 1,7MB sitemap.xml file. The sitemap is done by the book with Sitemap framework:
class MainMap(Sitemap): changefreq = "never" priority = 0.5 def items(self): return JobOffer.objects.filter(published=True, inactive=False) def lastmod(self, obj): return obj.published_at
The problem is that looking at memstat -v after requesting sitemap.xml shows that memory usage boosts from 6MB just after restart to 105MB, and keeps at that level for every next request of the sitemap (where the file is 1,7MB). If I limit the query to 1000 elements I get ~22MB memory usage.
Change History (10)
comment:1 by , 15 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 15 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
How about we modify the implementation to stream the response, instead of rendering the template into potentially quite a large string for every request?
comment:3 by , 15 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Your memory problems are mostly due to the overhead of all those model objects; the rendered template won't be nearly as big. At any rate, there are other tickets open regarding streaming HTTP responses, which I'd advise you to look at.
comment:4 by , 14 years ago
@ubernostrum: This issue took down our site and we've been debugging it for a week straight. Who thought a sitemap could make the apache process consume too much memory and finally get the process killed?
Anyway. I believe two things should be done to prevent this from happening to sites that grow:
1) Lower the default number of elements that get inserted into a sitemap. Let people up this value with an override to be the old one.
2) The memory is not released after the sitemap has been generated (I think). Tested on Windows by looking at the memory used with python.exe before and after generating the sitemap (13 Mb before, 87 Mb after). Also looking at the RSS memory on an ununtu server, it jumps 90 Mb when accessing the sitemap, and does not go down when it's done.
I think this bug should be opened again.
comment:5 by , 12 years ago
Cc: | added |
---|---|
Component: | Contrib apps → contrib.sitemaps |
Easy pickings: | unset |
Resolution: | invalid |
Severity: | → Normal |
Status: | closed → reopened |
Triage Stage: | Unreviewed → Design decision needed |
Type: | → Cleanup/optimization |
UI/UX: | unset |
Version: | 1.0 → master |
Big sitemaps have crashed my site(s) too. Hate to reopen, but the current approach is inadequate. Sitemaps need to stream by default.
comment:6 by , 12 years ago
Just remembered, I wrote this a while back. I've put it up on github --
https://github.com/s29/django-fastsitemaps
Maybe streaming could be included in core sitemaps as an option.
comment:7 by , 12 years ago
Setting aside building a string vs streaming HTTP, surely it is a bug that the memory is never released for the life of the process, no?
comment:8 by , 12 years ago
Status: | reopened → new |
---|
comment:9 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Streaming responses are now supported in core. But as pointed out earlier in the comments, this isn't the problem; the problem is pulling objects from the database. So I'm not sure there's actually much to be gained on this side.
Currently, as soon as you access the first object of a queryset, the entire queryset is brought into memory. This might be optimized with server side cursors, but that's another story (with its own share of problems).
I agree that there's some room for optimization here. To move forward, this ticket needs:
- a concrete proposal — a link to a project with similar goals isn't sufficient, which parts to you want to integrate exactly, and how do you guarantee backwards compatibility?
- a benchmark proving the benefits
comment:10 by , 11 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Reading this ticket again, there's just some handwaving about magical "streaming" that's going to fix everything, but it isn't clear whether that's HTTP streaming or server-side database cursors...
In the absence of a concrete proposal, I'm going to close this ticket as "needsinfo". Please reopen if you can suggest implementation changes or documentatation changes.
I'm not sure what the bug is here; querying thousands of objects in one go (as you're doing when you fetch the list of items) should be expected to increase memory use. You may want to look into alternative query methods which allow you to conserve memory, but in general this is going to be more memory-intensive as the number of objects involved grows.