Opened 5 months ago
Last modified 9 days ago
#36441 assigned Cleanup/optimization
Add lazy loading support for GDAL library in `django.contrib.gis.gdal`
| Reported by: | Josh Thomas | Owned by: | Josh Thomas |
|---|---|---|---|
| Component: | GIS | Version: | dev |
| Severity: | Normal | Keywords: | gdal |
| Cc: | Claude Paroz | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I'd like to propose enabling lazy loading of the GDAL library in django.contrib.gis.gdal, matching the existing behavior of django.contrib.gis.geos with the GEOS library.
Currently, calling django.setup() on projects using GeoDjango fails if GDAL is not installed, even when GDAL functionality is not being used. This creates an inconsistency in the GeoDjango framework where the GEOS library supports lazy loading (implemented in 2015 in 61d09e61f5747d7a70268ca8d5e770486877500b) but GDAL does not. The proposed change would implement lazy loading for the GDAL module similar to the existing GEOS implementation, ensuring the GDAL library is only loaded when actually accessed or used.
I have a working proof-of-concept in a branch on my fork of the Django repo ready for review (diff).
The implementation follows the same approach as the original GEOS lazy loading commit, wrapping the library loading in SimpleLazyObject and converting function prototypes to use lazy loading. The public API remains unchanged (everything imported in django/contrib/gis/gdal/__init__.py continues to work as before), though the imports related to the GDAL version from django.contrib.gis.gdal.libgdal now use lazy evaluation. The existing test suite passes without modification, and I've added new tests specifically for the lazy loading behavior. Documentation updates have not been included yet as I wasn't sure if that was needed for a change like this.
I have created both a forum post and an issue on django/new-features. Both have only been open for a few days, so not a ton of time for people to see them. However, the initial reception has been positive, with the forum post receiving several likes and the new-features issue getting only thumbs-up reactions.
History: GDAL was initially (inadvertently) required before it was made optional in #5433 and then was made required in #26753 in order to simply the code base.
Change History (22)
comment:1 by , 5 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 5 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:4 by , 5 months ago
| Needs documentation: | set |
|---|
comment:5 by , 7 weeks ago
| Needs documentation: | unset |
|---|
comment:6 by , 7 weeks ago
I've updated the release note documentation updates from 6.0 to 6.1, as I assume due to the feature freeze this PR will not be making it in to 6.0.
follow-up: 8 comment:7 by , 4 weeks ago
| Patch needs improvement: | set |
|---|
I reviewed PR #19529 and ran the new GDAL lazy-loading tests locally. With PostGIS configured and GDAL missing, django.setup() no longer errors, and the tests in gis_tests.gdal_tests.tests pass.
There is a missing import in django/contrib/gis/gdal/driver.py:
NameError: name 'classproperty' is not defined
This can be fixed by adding:
from django.utils.functional import classproperty
Once that is corrected, the patch should be ready for checkin. Marking as Patch needs improvement until the import is added.
comment:8 by , 3 weeks ago
| Patch needs improvement: | unset |
|---|
Ah! Thanks for the heads up. That came up in a review from @benjaoming and I accepted the suggestion without thinking about the import. I've fixed the import and rebased to the most recent changes on main.
Replying to Brad Stricherz:
I reviewed PR #19529 and ran the new GDAL lazy-loading tests locally. With PostGIS configured and GDAL missing, django.setup() no longer errors, and the tests in gis_tests.gdal_tests.tests pass.
There is a missing import in django/contrib/gis/gdal/driver.py:
NameError: name 'classproperty' is not definedThis can be fixed by adding:
from django.utils.functional import classpropertyOnce that is corrected, the patch should be ready for checkin. Marking as Patch needs improvement until the import is added.
comment:9 by , 3 weeks ago
| Patch needs improvement: | set |
|---|
I think we need to add a system check to ensure misconfiguration is still caught, and let laziness be achieved if desired by skipping said check.
comment:10 by , 3 weeks ago
I'm skeptical that a system check is appropriate. The system check for Pillow only runs if you have an ImageField in your model. Similarly, I think it makes sense that a message about GDAL wouldn't appear until you use a function that requires it.
comment:11 by , 3 weeks ago
The system check for Pillow only runs if you have an ImageField in your model.
I had in mind something just like that, a check that runs only on BaseSpatialField. Would that be appropriate?
comment:12 by , 3 weeks ago
I'm not opposed to a system check. I can see the value from an API stability perspective if people have been relying on eager failure to catch misconfigurations. (Though I gotta admit it makes me think a little bit of this XKCD 😆.)
GEOS has been lazy-loaded for ~10 years without a check, meaning misconfigured GEOS installations have only been caught at runtime when geometry fields are actually used and I haven’t really seen complaints about that (though maybe I’ve just missed them!).
The question is: is a check needed because GDAL genuinely needs different treatment than GEOS, or is this just noticing the gap while touching GDAL code? I admit my ignorance on the details of the differences between the two libraries here.
What would be the preference?
- Add it to
BaseSpatialFieldwhich would check both GEOS + GDAL - Add it to
RasterFieldonly, which uses GDAL but not GEOS - Do nothing and let GDAL follow the same lazy pattern as GEOS
comment:13 by , 3 weeks ago
I naively assumed importing gdal at the top of django.contrib.gis.db.models.fields would raise an exception, but I guess the point of that change is that it doesn't. So at what point would an exception be raised? How does it compare to GEOS? Also consider that it's possible to use aspects of GeoDjango that rely on GEOS and GDAL without using the model fields.
I'm not sure if there's any reason to treat GEOS + GDAL differently. My impression is that this is a new idea that Jacob had. If we do add it, the check for both could be authored as a separate commit.
comment:14 by , 12 days ago
I doubt anything is more logically special about GDAL over GEOS in this regard, but we do have this language from the Django 1.11 release saying it's required to use GeoDjango:
To simplify the codebase and because it’s easier to install than when contrib.gis was first released, GDAL is now a required dependency for GeoDjango. In older versions, it’s only required for SQLite.
The complicated mechanism for detecting whether GDAL was present was removed in #28160.
Is GEOS any different? I don't know. It sounds like it's required from reading GeoDjango requirements. I don't get the sense that either of them are optional.
A system check to catch misconfiguration is in the spirit of the contrib.postgres check from #32770. I still think I would miss it if the current proposal takes away the eager imports.
Also consider that it's possible to use aspects of GeoDjango that rely on GEOS and GDAL without using the model fields.
I agree -- this should probably just be a system check registered at the contrib.gis app level, not field level.
Given that we didn't have a check for GEOS, I can relent on blocking this work over it if no one else thinks it's important. I only have the anecdotal evidence that it was only ever the GDAL installation that gave me trouble. (Thanks, HomeBrew?🍺)
Also anecdotal, but on the project forum for the software I used to work on, the user support requests also seemed to be about 2x for configuring GDAL vs GEOS, but I acknowledge that's a supremely small sample size...
I think a system check at the app level makes sense to ship in the same release we are doing this other work. If Josh is up for it, I think it makes sense to tag along now as a Refs #... commit. Otherwise I can pick it up.
comment:15 by , 12 days ago
(We might even say that not detecting misconfiguration would re-cause #28160.)
comment:16 by , 12 days ago
| Description: | modified (diff) |
|---|
I don't know about that proposal. It seems strange to have to silence warnings for "optional dependencies" that may not be needed. I would like to see my questions in comment:13 answered in order to understand how a "missing GDAL" error would be surfaced to a user.
By the way, despite the lazy loading commit for GEOS, it's still listed as a required dependency. I guess not much of GeoDjango is usable without it.
comment:17 by , 11 days ago
I think I'm missing your meaning, Tim -- why do you refer to GDAL as optional? Your link docs both libraries as required, and I'm not aware of test configurations supporting the idea that it's optional.
Josh is writing a language server where he wants to be able defer the imports. That's not a typical use case, so I think it's fair to ask his project to silence a system check (I'm assuming a language server would skip all checks anyway for speed). I don't think that makes GDAL "optional" for most contrib.gis installs.
I would like to see my questions in comment:13 answered in order to understand how a "missing GDAL" error would be surfaced to a user.
The PR has the effect of deferring any ImproperlyConfigured exception from startup to the first use of GDAL, presumably in a view. For a site-wide context processor, every view would fail, so you'd have a reasonable chance of catching it immediately (indeed, the test project I'm using does this). But where GIS is used more sparingly (e.g. just in an admin widget), you wouldn't. The failure would be deferred until someone logged into the admin. Deferring the error to an indeterminate time seems like an unwelcome change. Am I understanding your question?
comment:18 by , 11 days ago
| Cc: | added |
|---|
comment:19 by , 11 days ago
Here's the deferred traceback when visiting the admin (editing an instance having a value in a geometry field).
comment:20 by , 11 days ago
So the idea is to continue documenting GEOS and GDAL and required dependencies but explaining that for some niche use cases, you may able to use parts of GeoDjango without them? I suppose it's okay.
comment:21 by , 10 days ago
I think I can live with that. But still, having a way to find out if GeoDjango requirements are installed before reaching the loading error would be a nice addition, maybe in the form of a management command? This should be an other ticket, probably.
comment:22 by , 9 days ago
This should be an other ticket, probably.
Happy to manage this however folks like, but my proposal for the system check was intended to meet that need. This would be roughly analogous to what we do for Pillow, just registered for contrib.gis users. We could take it as a Refs #... commit or a new ticket, but I'd want to have it at the same time.
explaining that for some niche use cases, you may able to use parts of GeoDjango without them?
I don't think we need any documentation updates, this is just a cleanup ticket to add flexibility for some tooling.
Thanks Josh, sounds good.
You said something about having a branch, so you can assign yourself and open a PR.