segfault: django.contrib.gis.gdal.DataSource sub-objects not holding references to parents
|Reported by:||Matthew.D.Hancher@…||Owned by:||jbronn|
|Severity:||Keywords:||datasource layer segfault|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
GeoDjango is setting up its users for segfaults by not being careful with managing references in ctypes-based objects. I've only personally encountered it with gdal.DataSource so far, but a glance at the code makes me suspect it's more rampant, given GeoDjango's extensive use of ctypes.
Consider this toy sequence:
def get_layer(id=0): ds = DataSource('mydataset.shp') return ds[id] layer = get_layer() print layer.num_feat
This innocent-looking snippet will crash Python hard. The problem is that the layer object that is returned by ds[id] does not hold a reference back to the ds object, which in turn is holding the only reference to the underlying ctypes object. The layer object only holds a glorified void*. As soon as you try to use it after the ds object has gone out of scope, blammo.
There are several places this could get fixed by sticking a back-reference into the layer object. You could do it inside DataSource.__getitem__, though that would be a little contrived. Better, you could create variants of django.contrib.gis.gdal.prototypes.generation.voidptr_output and friends that take an extra argument (the parent object) and stuff a reference to it into the result along with all the other stuff they're jamming in there. That's probably what I would do. Whatever solution is chosen, it should be general enough to make it easy to solve this problem in all the places it occurs.
To sum up in a nutshell: every return value from a ctypes-based object needs to either only use copies of data, or it needs to also include (directly or indirectly) a reference back to the underlying ctypes object. Otherwise you're just asking for trouble.
Once you understand it, this is pretty easy to work around --- in this case you just have to hang on to an extra copy of each DataSource object as long as any of its children might continue to exist. It is a tad annoying, though.
Change History (9)
comment:1 Changed 5 years ago by jbronn
- milestone set to post-1.0
- Needs documentation unset
- Needs tests unset
- Owner changed from nobody to jbronn
- Patch needs improvement unset
- Triage Stage changed from Unreviewed to Accepted
Changed 5 years ago by jbronn
comment:7 Changed 5 years ago by jbronn
- Resolution set to fixed
- Status changed from assigned to closed