Opened 9 years ago
Closed 9 years ago
#25072 closed Bug (fixed)
GDALRaster Garbage Collection not working
Reported by: | Daniel Wiesmann | Owned by: | nobody |
---|---|---|---|
Component: | GIS | Version: | dev |
Severity: | Normal | Keywords: | memory gis raster |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I noticed that when creating many GDALRasters
in a single context, the memory allocated to GDALRasters does not get released properly. For instance, the following loop as an ever growing memory consumption:
from django.contrib.gis.gdal import GDALRaster n = 1000 for x in range(n): rast = GDALRaster({ 'driver': 'MEM', 'width': 500, 'height': 500, 'nr_of_bands': 1, 'srid': 3086, 'origin': (500000, 400000), 'scale': (1, -1), 'skew': (0, 0), 'bands': [{ 'nodata_value': 10, 'data': range(500**2) }], })
Adding rast.__del__()
to the loop solves the problem, but I guess that should be automatic.
Attachments (2)
Change History (16)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
I don't have much experience with garbage collection, but after doing some reading I think this is expected as there is no guarantee that the garbage collector will run in code like this. If you put that code in a function, is __del__()
called when the function exits (i.e. local variables should be garbage collected at that point)?
comment:3 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
It might be related to bands. I made some tests, and as soon as I access self.bands
, the raster in not garbage collected. Perhaps because there is a circular reference between raster (bands cached property) and bands (band.source). That's a guess for now...
comment:4 by , 9 years ago
Daniel, could you test the attached patch and see if that changes anything in your experiments?
comment:5 by , 9 years ago
Thanks Claude, I can confirm that your patch solves the problem for my case. I just tried the patch on both the loop posted above and the use case where I spotted the problem in the first place. The memory usage stayed flat in both runs.
comment:6 by , 9 years ago
I just noticed that I started getting segmentation faults when accessing the GDALBand data
method after applying the patch. I will have a closer look at what is going on there.
comment:7 by , 9 years ago
Yes, you'll get segmentation faults if you use a band while its source has been garbage collected. The patch need improvements to safeguard against this, if possible.
comment:8 by , 9 years ago
I tracked down the segmentation fault I was getting, see the diff I attached. Somehow accessing the data directly from a queryset without creating intermediate objects gives me a segmentation fault. With an intermediate object it does not.
comment:9 by , 9 years ago
Here's another possible solution: https://github.com/django/django/pull/4971
That is stop to cache GDALRaster.bands
, which is the cause of the other side of the circular reference.
comment:10 by , 9 years ago
That looks fine to me, I think most rasters have only a few bands.
Maybe we can add a method to get only one specific band by index? Like this single bands can be accessed efficiently. Something along the lines of:
def get_band(self, index): if index < 0 or index >= capi.get_ds_raster_count(self._ptr): raise GDALException('Band index out of range.') return GDALBand(self, index)
comment:11 by , 9 years ago
Has patch: | set |
---|
Your wishes are my orders :-) Please check the latest PR version (while still checking the initial memory issue).
comment:12 by , 9 years ago
Thanks Claude!
I just ran your patch through the loop and applied use case and it worked perfectly, including low memory use.
The issues that came up through this ticket has made me realize that it could be important to only instantiate specific bands (some time series rasters might have lots of bands, its is not common, but happens). That's why I asked for this...
comment:13 by , 9 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Unfortunately this goes beyond my knowledge of how python works. I am happy to work on this if its productive, but would need help.