#9448 closed (fixed)
segfault: django.contrib.gis.gdal.DataSource sub-objects not holding references to parents
Reported by: | Owned by: | jbronn | |
---|---|---|---|
Component: | GIS | Version: | 1.0 |
Severity: | Keywords: | datasource layer segfault | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
GeoDjango is setting up its users for segfaults by not being careful with managing references in ctypes
-based objects. I've only personally encountered it with gdal.DataSource
so far, but a glance at the code makes me suspect it's more rampant, given GeoDjango's extensive use of ctypes
.
Consider this toy sequence:
def get_layer(id=0): ds = DataSource('mydataset.shp') return ds[id] layer = get_layer() print layer.num_feat
This innocent-looking snippet will crash Python hard. The problem is that the layer object that is returned by ds[id]
does not hold a reference back to the ds
object, which in turn is holding the only reference to the underlying ctypes
object. The layer object only holds a glorified void*
. As soon as you try to use it after the ds
object has gone out of scope, blammo.
There are several places this could get fixed by sticking a back-reference into the layer object. You could do it inside DataSource.__getitem__
, though that would be a little contrived. Better, you could create variants of django.contrib.gis.gdal.prototypes.generation.voidptr_output
and friends that take an extra argument (the parent object) and stuff a reference to it into the result along with all the other stuff they're jamming in there. That's probably what I would do. Whatever solution is chosen, it should be general enough to make it easy to solve this problem in all the places it occurs.
To sum up in a nutshell: every return value from a ctypes
-based object needs to either only use copies of data, or it needs to also include (directly or indirectly) a reference back to the underlying ctypes
object. Otherwise you're just asking for trouble.
Once you understand it, this is pretty easy to work around --- in this case you just have to hang on to an extra copy of each DataSource
object as long as any of its children might continue to exist. It is a tad annoying, though.
Attachments (1)
Change History (9)
comment:1 by , 16 years ago
milestone: | → post-1.0 |
---|---|
Owner: | changed from | to
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 16 years ago
Justin,
I'm happy to help; I just didn't want to dive in too deep until someone with a broader understanding than I scoped the problem properly. If you're confident that the problem is GDAL-specific, and that it should be fixed at that level, then I can certainly take a closer look at how to do it.
FWIW, I'm pretty sure that my original comments about how to fix it by modifying voidptr_output
were complete gibberish; I had misunderstood that code at first skim. I'm not convinced that isn't roughly the right place to do it anyway, though. I'll poke around some more and get back to you.
by , 16 years ago
Attachment: | layer_reference_ds.diff added |
---|
Patch that creates a reference to the DataSource inside each Layer
comment:3 by , 16 years ago
Has patch: | set |
---|---|
Status: | new → assigned |
Attached is a patch with tests that keeps a reference of the DataSource
with each Layer
; see if it works for you. Moreover, there doesn't appear to be a problem with OGR features:
def get_shp_feature(shp_file): ds = DataSource(shp_file) lyr = ds[0] return lyr[0] feat = get_shp_feature('/path/to/foo.shp') print feat.geom print feat.fields
As the above code demonstrates, accessing a Feature
still works even though the reference to the DataSource
is no longer active. Thus, the DataSource
->Layer
problem is fixed (with the patch) and Layer
->Feature
references appear fine without further patching.
However, I still want to investigate more on why the Feature
references are OK so I can be positive that its not required to keep a DataSource
reference on each Feature
as well.
follow-up: 6 comment:4 by , 16 years ago
Justin,
Well, you beat me to the patch; I was sleeping on a diff that is basically indistinguishable from yours.
The reason that it's a problem for Layer
s but not for Feature
s turns out to be something of an OGR wart: GetLayer
returns a pointer to a layer object that is owned by the DataSet
, but functions like GetNextFeature
return copies of the individual features, which it's up to you to destroy. You’re already doing that correctly: that's why Feature
needs (and has) a __del__
method but Layer
does not.
I spent some time last night looking through the code in more detail, and I'm now pretty convinced that the Layer
code is the only place that has his issue. I agree that you need to fix it at the level of the Layer
and DataSet
objects, and not lower down in the ctypes
wrapper muck, because resource destruction is handled by the DataSource
's __del__
method and so it's the lifetime of the DataSource
object itself that we need to extend.
This patch looks great to me, and fixes my problem.
comment:6 by , 16 years ago
Keywords: | datasource layer segfault added |
---|
Replying to anonymous:
The reason that it's a problem for
Layer
s but not forFeature
s turns out to be something of an OGR wart:GetLayer
returns a pointer to a layer object that is owned by theDataSet
, but functions likeGetNextFeature
return copies of the individual features, which it's up to you to destroy. You’re already doing that correctly: that's whyFeature
needs (and has) a__del__
method butLayer
does not.
Thanks for the help looking this up, that explains it and reminds me what I was doing when I wrote it.
This patch looks great to me, and fixes my problem.
Great! My apologies for letting a segfault slip through, but thankfully it was found before 1.0.1 -- which this patch will be included in.
comment:7 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
No, it is not more rampant, and is most likely isolated to the
gdal
interface with respect toLayer
s being tied toDataSource
s, and possiblyFeature
s being tied toLayer
s. In the GEOS interface, for example, child objects are always clones of their parents internal pointers. Similarly,SpatialReference
pointers are cloned; this appears to be a situation where the abstraction leaked.I agree this is a problem. However, I'd appreciate help coming up with a patch and tests.