Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#9448 closed (fixed)

segfault: django.contrib.gis.gdal.DataSource sub-objects not holding references to parents

Reported by: Matthew.D.Hancher@… Owned by: jbronn
Component: GIS Version: 1.0
Severity: Keywords: datasource layer segfault
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

GeoDjango is setting up its users for segfaults by not being careful with managing references in ctypes-based objects. I've only personally encountered it with gdal.DataSource so far, but a glance at the code makes me suspect it's more rampant, given GeoDjango's extensive use of ctypes.

Consider this toy sequence:

def get_layer(id=0):
  ds = DataSource('mydataset.shp')
  return ds[id]

layer = get_layer()
print layer.num_feat

This innocent-looking snippet will crash Python hard. The problem is that the layer object that is returned by ds[id] does not hold a reference back to the ds object, which in turn is holding the only reference to the underlying ctypes object. The layer object only holds a glorified void*. As soon as you try to use it after the ds object has gone out of scope, blammo.

There are several places this could get fixed by sticking a back-reference into the layer object. You could do it inside DataSource.__getitem__, though that would be a little contrived. Better, you could create variants of django.contrib.gis.gdal.prototypes.generation.voidptr_output and friends that take an extra argument (the parent object) and stuff a reference to it into the result along with all the other stuff they're jamming in there. That's probably what I would do. Whatever solution is chosen, it should be general enough to make it easy to solve this problem in all the places it occurs.

To sum up in a nutshell: every return value from a ctypes-based object needs to either only use copies of data, or it needs to also include (directly or indirectly) a reference back to the underlying ctypes object. Otherwise you're just asking for trouble.

Once you understand it, this is pretty easy to work around --- in this case you just have to hang on to an extra copy of each DataSource object as long as any of its children might continue to exist. It is a tad annoying, though.

Attachments (1)

layer_reference_ds.diff (2.8 KB) - added by jbronn 8 years ago.
Patch that creates a reference to the DataSource inside each Layer

Download all attachments as: .zip

Change History (9)

comment:1 Changed 8 years ago by jbronn

milestone: post-1.0
Owner: changed from nobody to jbronn
Triage Stage: UnreviewedAccepted

GeoDjango is setting up its users for segfaults by not being careful with managing references in ctypes-based objects. I've only personally encountered it with gdal.DataSource so far, but a glance at the code makes me suspect it's more rampant, given GeoDjango's extensive use of ctypes.

No, it is not more rampant, and is most likely isolated to the gdal interface with respect to Layers being tied to DataSources, and possibly Features being tied to Layers. In the GEOS interface, for example, child objects are always clones of their parents internal pointers. Similarly, SpatialReference pointers are cloned; this appears to be a situation where the abstraction leaked.

I agree this is a problem. However, I'd appreciate help coming up with a patch and tests.

comment:2 Changed 8 years ago by Matthew.D.Hancher@…

Justin,

I'm happy to help; I just didn't want to dive in too deep until someone with a broader understanding than I scoped the problem properly. If you're confident that the problem is GDAL-specific, and that it should be fixed at that level, then I can certainly take a closer look at how to do it.

FWIW, I'm pretty sure that my original comments about how to fix it by modifying voidptr_output were complete gibberish; I had misunderstood that code at first skim. I'm not convinced that isn't roughly the right place to do it anyway, though. I'll poke around some more and get back to you.

Changed 8 years ago by jbronn

Attachment: layer_reference_ds.diff added

Patch that creates a reference to the DataSource inside each Layer

comment:3 Changed 8 years ago by jbronn

Has patch: set
Status: newassigned

Attached is a patch with tests that keeps a reference of the DataSource with each Layer; see if it works for you. Moreover, there doesn't appear to be a problem with OGR features:

def get_shp_feature(shp_file):
    ds = DataSource(shp_file)
    lyr = ds[0]
    return lyr[0]

feat = get_shp_feature('/path/to/foo.shp')
print feat.geom
print feat.fields

As the above code demonstrates, accessing a Feature still works even though the reference to the DataSource is no longer active. Thus, the DataSource->Layer problem is fixed (with the patch) and Layer->Feature references appear fine without further patching.

However, I still want to investigate more on why the Feature references are OK so I can be positive that its not required to keep a DataSource reference on each Feature as well.

comment:4 Changed 8 years ago by anonymous

Justin,

Well, you beat me to the patch; I was sleeping on a diff that is basically indistinguishable from yours.

The reason that it's a problem for Layers but not for Features turns out to be something of an OGR wart: GetLayer returns a pointer to a layer object that is owned by the DataSet, but functions like GetNextFeature return copies of the individual features, which it's up to you to destroy. You’re already doing that correctly: that's why Feature needs (and has) a __del__ method but Layer does not.

I spent some time last night looking through the code in more detail, and I'm now pretty convinced that the Layer code is the only place that has his issue. I agree that you need to fix it at the level of the Layer and DataSet objects, and not lower down in the ctypes wrapper muck, because resource destruction is handled by the DataSource's __del__ method and so it's the lifetime of the DataSource object itself that we need to extend.

This patch looks great to me, and fixes my problem.

comment:5 Changed 8 years ago by Matthew.D.Hancher@…

(Whoops, that last comment was from me, if that wasn't obvious.)

comment:6 in reply to:  4 Changed 8 years ago by jbronn

Keywords: datasource layer segfault added

Replying to anonymous:

The reason that it's a problem for Layers but not for Features turns out to be something of an OGR wart: GetLayer returns a pointer to a layer object that is owned by the DataSet, but functions like GetNextFeature return copies of the individual features, which it's up to you to destroy. You’re already doing that correctly: that's why Feature needs (and has) a __del__ method but Layer does not.

Thanks for the help looking this up, that explains it and reminds me what I was doing when I wrote it.

This patch looks great to me, and fixes my problem.

Great! My apologies for letting a segfault slip through, but thankfully it was found before 1.0.1 -- which this patch will be included in.

comment:7 Changed 8 years ago by jbronn

Resolution: fixed
Status: assignedclosed

(In [9285]) Fixed #9448 -- Layer objects now carry a reference to their parent DataSource. Thanks, Matthew D. Hancher for the bug report.

Backport of r9284 from trunk.

comment:8 Changed 8 years ago by (none)

milestone: post-1.0

Milestone post-1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top