Opened 5 months ago

Last modified 9 days ago

#36441 assigned Cleanup/optimization

Add lazy loading support for GDAL library in `django.contrib.gis.gdal`

Reported by: Josh Thomas Owned by: Josh Thomas
Component: GIS Version: dev
Severity: Normal Keywords: gdal
Cc: Claude Paroz Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

I'd like to propose enabling lazy loading of the GDAL library in django.contrib.gis.gdal, matching the existing behavior of django.contrib.gis.geos with the GEOS library.

Currently, calling django.setup() on projects using GeoDjango fails if GDAL is not installed, even when GDAL functionality is not being used. This creates an inconsistency in the GeoDjango framework where the GEOS library supports lazy loading (implemented in 2015 in 61d09e61f5747d7a70268ca8d5e770486877500b) but GDAL does not. The proposed change would implement lazy loading for the GDAL module similar to the existing GEOS implementation, ensuring the GDAL library is only loaded when actually accessed or used.

I have a working proof-of-concept in a branch on my fork of the Django repo ready for review (diff).

The implementation follows the same approach as the original GEOS lazy loading commit, wrapping the library loading in SimpleLazyObject and converting function prototypes to use lazy loading. The public API remains unchanged (everything imported in django/contrib/gis/gdal/__init__.py continues to work as before), though the imports related to the GDAL version from django.contrib.gis.gdal.libgdal now use lazy evaluation. The existing test suite passes without modification, and I've added new tests specifically for the lazy loading behavior. Documentation updates have not been included yet as I wasn't sure if that was needed for a change like this.

I have created both a forum post and an issue on django/new-features. Both have only been open for a few days, so not a ton of time for people to see them. However, the initial reception has been positive, with the forum post receiving several likes and the new-features issue getting only thumbs-up reactions.

History: GDAL was initially (inadvertently) required before it was made optional in #5433 and then was made required in #26753 in order to simply the code base.

Change History (22)

comment:1 by Carlton Gibson, 5 months ago

Triage Stage: UnreviewedAccepted

Thanks Josh, sounds good.

You said something about having a branch, so you can assign yourself and open a PR.

comment:2 by Josh Thomas, 5 months ago

Owner: set to Josh Thomas
Status: newassigned

comment:3 by Josh Thomas, 5 months ago

Has patch: set

comment:4 by Sarah Boyce, 5 months ago

Needs documentation: set

comment:5 by Josh Thomas, 7 weeks ago

Needs documentation: unset

comment:6 by Josh Thomas, 7 weeks ago

I've updated the release note documentation updates from 6.0 to 6.1, as I assume due to the feature freeze this PR will not be making it in to 6.0.

comment:7 by Brad Stricherz, 4 weeks ago

Patch needs improvement: set

I reviewed PR #19529 and ran the new GDAL lazy-loading tests locally. With PostGIS configured and GDAL missing, django.setup() no longer errors, and the tests in gis_tests.gdal_tests.tests pass.

There is a missing import in django/contrib/gis/gdal/driver.py:

NameError: name 'classproperty' is not defined

This can be fixed by adding:

from django.utils.functional import classproperty

Once that is corrected, the patch should be ready for checkin. Marking as Patch needs improvement until the import is added.

in reply to:  7 comment:8 by Josh Thomas, 3 weeks ago

Patch needs improvement: unset

Ah! Thanks for the heads up. That came up in a review from @benjaoming and I accepted the suggestion without thinking about the import. I've fixed the import and rebased to the most recent changes on main.

Replying to Brad Stricherz:

I reviewed PR #19529 and ran the new GDAL lazy-loading tests locally. With PostGIS configured and GDAL missing, django.setup() no longer errors, and the tests in gis_tests.gdal_tests.tests pass.

There is a missing import in django/contrib/gis/gdal/driver.py:

NameError: name 'classproperty' is not defined

This can be fixed by adding:

from django.utils.functional import classproperty

Once that is corrected, the patch should be ready for checkin. Marking as Patch needs improvement until the import is added.

comment:9 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

I think we need to add a system check to ensure misconfiguration is still caught, and let laziness be achieved if desired by skipping said check.

comment:10 by Tim Graham, 3 weeks ago

I'm skeptical that a system check is appropriate. The system check for Pillow only runs if you have an ImageField in your model. Similarly, I think it makes sense that a message about GDAL wouldn't appear until you use a function that requires it.

comment:11 by Jacob Walls, 3 weeks ago

The system check for Pillow only runs if you have an ImageField in your model.

I had in mind something just like that, a check that runs only on BaseSpatialField. Would that be appropriate?

comment:12 by Josh Thomas, 3 weeks ago

I'm not opposed to a system check. I can see the value from an API stability perspective if people have been relying on eager failure to catch misconfigurations. (Though I gotta admit it makes me think a little bit of this XKCD 😆.)

GEOS has been lazy-loaded for ~10 years without a check, meaning misconfigured GEOS installations have only been caught at runtime when geometry fields are actually used and I haven’t really seen complaints about that (though maybe I’ve just missed them!).

The question is: is a check needed because GDAL genuinely needs different treatment than GEOS, or is this just noticing the gap while touching GDAL code? I admit my ignorance on the details of the differences between the two libraries here.

What would be the preference?

  1. Add it to BaseSpatialField which would check both GEOS + GDAL
  2. Add it to RasterField only, which uses GDAL but not GEOS
  3. Do nothing and let GDAL follow the same lazy pattern as GEOS
Last edited 3 weeks ago by Josh Thomas (previous) (diff)

comment:13 by Tim Graham, 3 weeks ago

I naively assumed importing gdal at the top of django.contrib.gis.db.models.fields would raise an exception, but I guess the point of that change is that it doesn't. So at what point would an exception be raised? How does it compare to GEOS? Also consider that it's possible to use aspects of GeoDjango that rely on GEOS and GDAL without using the model fields.

I'm not sure if there's any reason to treat GEOS + GDAL differently. My impression is that this is a new idea that Jacob had. If we do add it, the check for both could be authored as a separate commit.

comment:14 by Jacob Walls, 11 days ago

I doubt anything is more logically special about GDAL over GEOS in this regard, but we do have this language from the Django 1.11 release saying it's required to use GeoDjango:

To simplify the codebase and because it’s easier to install than when contrib.gis was first released, GDAL is now a required dependency for GeoDjango. In older versions, it’s only required for SQLite.

The complicated mechanism for detecting whether GDAL was present was removed in #28160.

Is GEOS any different? I don't know. It sounds like it's required from reading GeoDjango requirements. I don't get the sense that either of them are optional.

A system check to catch misconfiguration is in the spirit of the contrib.postgres check from #32770. I still think I would miss it if the current proposal takes away the eager imports.

Also consider that it's possible to use aspects of GeoDjango that rely on GEOS and GDAL without using the model fields.

I agree -- this should probably just be a system check registered at the contrib.gis app level, not field level.


Given that we didn't have a check for GEOS, I can relent on blocking this work over it if no one else thinks it's important. I only have the anecdotal evidence that it was only ever the GDAL installation that gave me trouble. (Thanks, HomeBrew?🍺)

Also anecdotal, but on the project forum for the software I used to work on, the user support requests also seemed to be about 2x for configuring GDAL vs GEOS, but I acknowledge that's a supremely small sample size...

I think a system check at the app level makes sense to ship in the same release we are doing this other work. If Josh is up for it, I think it makes sense to tag along now as a Refs #... commit. Otherwise I can pick it up.

Last edited 11 days ago by Tim Graham (previous) (diff)

comment:15 by Jacob Walls, 11 days ago

(We might even say that not detecting misconfiguration would re-cause #28160.)

comment:16 by Tim Graham, 11 days ago

Description: modified (diff)

I don't know about that proposal. It seems strange to have to silence warnings for "optional dependencies" that may not be needed. I would like to see my questions in comment:13 answered in order to understand how a "missing GDAL" error would be surfaced to a user.

By the way, despite the lazy loading commit for GEOS, it's still listed as a required dependency. I guess not much of GeoDjango is usable without it.

comment:17 by Jacob Walls, 11 days ago

I think I'm missing your meaning, Tim -- why do you refer to GDAL as optional? Your link docs both libraries as required, and I'm not aware of test configurations supporting the idea that it's optional.

Josh is writing a language server where he wants to be able defer the imports. That's not a typical use case, so I think it's fair to ask his project to silence a system check (I'm assuming a language server would skip all checks anyway for speed). I don't think that makes GDAL "optional" for most contrib.gis installs.

I would like to see my questions in comment:13 answered in order to understand how a "missing GDAL" error would be surfaced to a user.

The PR has the effect of deferring any ImproperlyConfigured exception from startup to the first use of GDAL, presumably in a view. For a site-wide context processor, every view would fail, so you'd have a reasonable chance of catching it immediately (indeed, the test project I'm using does this). But where GIS is used more sparingly (e.g. just in an admin widget), you wouldn't. The failure would be deferred until someone logged into the admin. Deferring the error to an indeterminate time seems like an unwelcome change. Am I understanding your question?

comment:18 by Jacob Walls, 11 days ago

Cc: Claude Paroz added

comment:19 by Jacob Walls, 11 days ago

Here's the deferred traceback when visiting the admin (editing an instance having a value in a geometry field):

...

  File "/Users/jwalls/django/django/forms/utils.py", line 79, in __str__
    return self.as_widget()
           ^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/forms/boundfield.py", line 108, in as_widget
    return widget.render(
           
  File "/Users/jwalls/django/django/forms/widgets.py", line 330, in render
    context = self.get_context(name, value, attrs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/forms/widgets.py", line 64, in get_context
    context["serialized"] = self.serialize(value)
                            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/forms/widgets.py", line 90, in serialize
    return value.json if value else ""
           ^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/geos/geometry.py", line 429, in json
    return self.ogr.json
           ^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/geos/geometry.py", line 472, in ogr
    return gdal.OGRGeometry(self._ogr_ptr(), self.srs)
                            ^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/geos/geometry.py", line 467, in _ogr_ptr
    return gdal.OGRGeometry._from_wkb(self.wkb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/geometries.py", line 155, in _from_wkb
    return capi.from_wkb(
           
  File "/Users/jwalls/django/django/contrib/gis/gdal/prototypes/geom.py", line 65, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 47, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/prototypes/geom.py", line 69, in func
    if GDAL_VERSION >= (3, 3):
       ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 149, in __ge__
    return self.__cast() >= other
           ^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 110, in __cast
    return func(*self._args, **self._kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/libgdal.py", line 172, in gdal_version_info
    ver = gdal_version()
          ^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/libgdal.py", line 163, in gdal_version
    return _version_info(b"RELEASE_NAME")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/libgdal.py", line 137, in __call__
    return self.func(*args)
           ^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 47, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
                                         ^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/libgdal.py", line 144, in func
    func = lgdal[self.func_name]
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 251, in inner
    self._setup()
    ^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/utils/functional.py", line 404, in _setup
    self._wrapped = self._setupfunc()
                    ^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/django/django/contrib/gis/gdal/libgdal.py", line 80, in load_gdal
    raise ImproperlyConfigured(
    ^

Exception Type: ImproperlyConfigured at /admin/models/geojsongeometry/1/change/
Exception Value: Could not find the GDAL library (tried "gdal", "GDAL", "gdal3.11.0", "gdal3.10.0", "gdal3.9.0", "gdal3.8.0", "gdal3.7.0", "gdal3.6.0", "gdal3.5.0", "gdal3.4.0", "gdal3.3.0", "gdal3.2.0", "gdal3.1.0"). Is GDAL installed? If it is, try setting GDAL_LIBRARY_PATH in your settings.
Last edited 11 days ago by Jacob Walls (previous) (diff)

comment:20 by Tim Graham, 10 days ago

So the idea is to continue documenting GEOS and GDAL and required dependencies but explaining that for some niche use cases, you may able to use parts of GeoDjango without them? I suppose it's okay.

comment:21 by Claude Paroz, 10 days ago

I think I can live with that. But still, having a way to find out if GeoDjango requirements are installed before reaching the loading error would be a nice addition, maybe in the form of a management command? This should be an other ticket, probably.

comment:22 by Jacob Walls, 9 days ago

This should be an other ticket, probably.

Happy to manage this however folks like, but my proposal for the system check was intended to meet that need. This would be roughly analogous to what we do for Pillow, just registered for contrib.gis users. We could take it as a Refs #... commit or a new ticket, but I'd want to have it at the same time.

explaining that for some niche use cases, you may able to use parts of GeoDjango without them?

I don't think we need any documentation updates, this is just a cleanup ticket to add flexibility for some tooling.

Note: See TracTickets for help on using tickets.
Back to Top