Opened 3 years ago

Last modified 3 years ago

#18887 assigned Cleanup/optimization

LineString array method (property) returns different data type without and with NumPy installed

Reported by: mal Owned by: Ubercore
Component: GIS Version: 1.4
Severity: Normal Keywords: GIS, NumPy, LineString
Cc: deprince@…, Ubercore Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Apparently LineString (and probably other geometry types) return data in different types depending on whether NumPy is installed.

Simple test case (reproducible in Django 1.3, 1.3.1 and 1.4.1).

Clean virtual env with only Django installed (and ipython).
First - no NumPy installed.

In [1]: from django.contrib.gis.geos import LineString
In [2]: line = LineString((0, 0), (3, 3))
In [3]: line.array
Out[3]: [(0.0, 0.0), (3.0, 3.0)]

Now - install NumPy and try again.

In [1]: from django.contrib.gis.geos import LineString
In [2]: line = LineString((0, 0), (3, 3))
In [3]: line.array
Out[3]:
array([[ 0.,  0.],
       [ 3.,  3.]])

[(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.?)

This is rather serious issue.

Change History (15)

comment:1 in reply to: ↑ description ; follow-up: Changed 3 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Replying to mal:

This is rather serious issue.

Maybe you could add a real use case demonstrating why you consider this as a serious issue?

comment:2 in reply to: ↑ 1 Changed 3 years ago by anonymous

Replying to claudep:

Replying to mal:

This is rather serious issue.

Maybe you could add a real use case demonstrating why you consider this as a serious issue?

Because it returns two different values depending on if a module is installed or not? Any conditionals on the result will be dependant on if numpy is installed or not.

comment:3 Changed 3 years ago by claudep

  • Triage Stage changed from Unreviewed to Accepted

Accepting on the base that this should be at least documented.

I see the potential issues, but apart documentation, do you see any possible improvements?

comment:4 Changed 3 years ago by deprince

  • Cc deprince@… added

Would it be an option to make numpy a dependency for the installation of gis?

comment:5 Changed 3 years ago by Adam DePrince <deprince@…>

Scratch that ... you don't want numpy to be required to install django. It seems if the user wants a list they would call list(LineString(...)) instead of LineString(...).array. Perhaps we should issue a warning when the user calls .array without numpy installed?

>>> line = LineString((0,0), (3,3))
>>> line.array
array([[ 0.,  0.],
       [ 3.,  3.]])
>>> list(line) 
[(0.0, 0.0), (3.0, 3.0)]

comment:6 Changed 3 years ago by Ubercore

  • Owner changed from nobody to Ubercore

comment:7 Changed 3 years ago by Ubercore

I've confirmed this, going to look into it a little more closely to see if we can make the behavior predictable somehow, without losing the ability to take advantage of numpy if installed.

comment:8 Changed 3 years ago by Ubercore

  • Cc Ubercore added
  • Triage Stage changed from Accepted to Design decision needed

Yeah, the only reasonable options I see are the ones outlined: warning when array is called, and/or noting this in the docs. The behavior here seems to be intended, not a bug. Marking as design decision needed.

comment:9 Changed 3 years ago by Adam DePrince <deprince@…>

Here's a proposed fix if a warning in the absence of numpy is desired.

https://github.com/adamdeprince/django/commit/d02431ac5f15616df4cb3d39592651e59fdde17f

comment:10 Changed 3 years ago by Ubercore

  • Has patch set
  • Status changed from new to assigned
  • Triage Stage changed from Design decision needed to Accepted

I've submitted a pull request for the following commit, which adds a setting: https://github.com/peterlandry/django/commit/0c9839355a005e8805467aa09d4fcf9795965b76

comment:11 Changed 3 years ago by Ubercore

  • Triage Stage changed from Accepted to Ready for checkin

comment:12 Changed 3 years ago by Adam DePrince <deprince@…>

No, this doesn't capture what is really happening.

There is a bug with this: [(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.?)

This snippet doesn't mean

{{
not ([(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.?))
}}

it means these types are completely incomparable.

If you compare these two you get completely different behavior than if you compared two lists.

numpy.array([(0.0, 0.0), (3.0, 3.0)]) == [(0.0, 0.0), (3.0, 3.0)]
array([[ True, True],

[ True, True]], dtype=bool

There will be two populations that use this API. One will have numpy installed on their dev machines and one won't. When they release extensions and share code those they share it with may not have the same numpy status.

It seems pretty clear that if you want a list you should call list(LineString(...)) and if you actually want a numpy array you could call .array and expect it. If the system will silently break that promise you should have an exception, but we don't want to break existing code, so I'm proposing we raise a warning instead. This diff doesn't do this. My proposed code does.

We can't have two APIs that depend on the module you have installed; I really feel strongly that .array without numpy installed should warn.

comment:13 Changed 3 years ago by Adam DePrince <deprince@…>

comment:14 Changed 3 years ago by claudep

  • Triage Stage changed from Ready for checkin to Accepted

Please do not mark your own patch as RFC.

comment:15 Changed 3 years ago by ptone

  • Needs documentation set
  • Patch needs improvement set
  • Type changed from Bug to Cleanup/optimization

I'm inclined to agree with claudep here that this is primarily a documentation issue - an admonition probably.

The Numpy related LineString methods aren't documented at all.

A warning would become annoying if you've accepted the non-numpy behavior in your code, and don't plan on changing it. This is as valid as the Numpy approach as there has been no documentation. So why should that user now suffer eternal warnings?

The comment on the property suggests that .array was to return a numpy array, but all the other properties are commented as returning a "list or numpy array"

If the numpy difference is documented clearly - anyone wanting to right portable/sharable code can check from django.contrib.gis.geos.base import numpy and make sure they standardize the data.

I'm closing the current pull requests as the wrong direction - not RFC.

Note: See TracTickets for help on using tickets.
Back to Top