Opened 8 years ago

Closed 4 days ago

Last modified 3 days ago

#29222 closed Bug (wontfix)

Substr on NULL values returns incorrect results with pattern lookups.

Reported by: Mariusz Felisiak Owned by: Siddharth Panditrao
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: Oracle
Cc: Mariusz Felisiak Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mariusz Felisiak)

feb683c4c2c5ecfb61e4cb490c3e357450c0c0e8 revealed an unrelated issue on Oracle. SUBSTR(NULL, x, y) returns NULL on Oracle which can be concatenate with other strings, hence if we use it with pattern lookups then all rows match an query, e.g.

  • __startswith=Substr(sth, x, y) -> LIKE SUBSTR(sth, x, y) || '%' -> LIKE '%',
  • __endswith=Substr(sth, x, y) -> LIKE '%' || SUBSTR(sth, x, y) -> LIKE '%',
  • __contains=Substr(sth, x, y) -> LIKE '%' || SUBSTR(sth, x, y) || '%' -> LIKE '%%',

which is unexpected.

"Although Oracle treats zero-length character strings as nulls, concatenating a zero-length character string with another operand always results in the other operand, so null can result only from the concatenation of two null strings. However, this may not continue to be true in future versions of Oracle Database. To concatenate an expression that might be null, use the NVL function to explicitly convert the expression to a zero-length string." (see https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Concatenation-Operator.html#GUID-08C10738-706B-4290-B7CD-C279EBC90F7E).

Change History (10)

comment:1 by Mariusz Felisiak, 8 years ago

Cc: Mariusz Felisiak added

comment:2 by Mariusz Felisiak, 8 years ago

Description: modified (diff)

comment:3 by Tim Graham, 8 years ago

Triage Stage: UnreviewedAccepted

comment:4 by Mariusz Felisiak, 8 years ago

Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

comment:5 by Mariusz Felisiak, 4 years ago

Description: modified (diff)
Owner: Mariusz Felisiak removed
Status: assignednew

comment:6 by Siddharth Panditrao, 4 days ago

Owner: set to Siddharth Panditrao
Status: newassigned

Hi! I'd like to work on this issue. This would be my first contribution to Django, so please let me know if I'm on the right track.

What I've Found

When using Substr with pattern lookups like __startswith on Oracle, if the Substr returns NULL, it ends up matching every row instead of none.

The problem is Oracle-specific: NULL || '%' becomes just '%' in Oracle (universal match), but stays NULL in other databases (no match).

My Plan

I'm thinking of wrapping the expression with NVL({}, CHR(0)) in the Oracle backend's _pattern_ops. This would convert NULL to a null byte character instead of letting it vanish during concatenation.

Something like:

Current: "startswith": "{} || '%%'"
Fixed:   "startswith": "NVL({}, CHR(0)) || '%%'"

Testing

I've put together a test case on a branch that reproduces the bug - it passes on Oracle (showing the bug) and fails on other databases (confirming it's Oracle-only).

Does this approach make sense? Happy to submit a PR if this looks good, just want to make sure I'm heading in the right direction before diving deeper.

comment:7 by Simon Charette, 4 days ago

I left a note to this effect on the forum but I'm afraid there isn't much we can do here given __startswith and related lookups have historically accepted an empty strings as their right-side-hand to match all rows.

Because '' and NULL are the same thing on Oracle we have to choose; we can't return all rows in the first case and no rows in the second like on all other backends.

Given these lookups are meant to treat with text fields left-hand-sides in which we document that storing `NULL` is an anti-pattern

Avoid using null on string-based fields such as CharField and TextField. The Django convention is to use an empty string, not NULL, as the “no data” state for string-based fields.

and things have been working this way on Oracle since their inception I'd be more inclined to wont-fix this ticket and accept it as a unfortunate side effect of interprets_empty_strings_as_nulls than change things for an arguably worst default.

Last edited 4 days ago by Simon Charette (previous) (diff)

in reply to:  7 ; comment:8 by Mariusz Felisiak, 4 days ago

Resolution: wontfix
Status: assignedclosed
Triage Stage: AcceptedUnreviewed

Replying to Simon Charette:

I left a note to this effect on the forum but I'm afraid there isn't much we can do here given __startswith and related lookups have historically accepted an empty strings as their right-side-hand to match all rows.

Because '' and NULL are the same thing on Oracle we have to choose; we can't return all rows in the first case and no rows in the second like on all other backends.

Given these lookups are meant to treat with text fields left-hand-sides in which we document that storing `NULL` is an anti-pattern

Avoid using null on string-based fields such as CharField and TextField. The Django convention is to use an empty string, not NULL, as the “no data” state for string-based fields.

and things have been working this way on Oracle since their inception I'd be more inclined to wont-fix this ticket and accept it as a unfortunate side effect of interprets_empty_strings_as_nulls than change things for an arguably worst default.

Yes, exactly, that's why I didn't fix this for so long.

in reply to:  8 comment:9 by Siddharth Panditrao, 4 days ago

Replying to Mariusz Felisiak:

Replying to Simon Charette:

I left a note to this effect on the forum but I'm afraid there isn't much we can do here given __startswith and related lookups have historically accepted an empty strings as their right-side-hand to match all rows.

Because '' and NULL are the same thing on Oracle we have to choose; we can't return all rows in the first case and no rows in the second like on all other backends.

Given these lookups are meant to treat with text fields left-hand-sides in which we document that storing `NULL` is an anti-pattern

Avoid using null on string-based fields such as CharField and TextField. The Django convention is to use an empty string, not NULL, as the “no data” state for string-based fields.

and things have been working this way on Oracle since their inception I'd be more inclined to wont-fix this ticket and accept it as a unfortunate side effect of interprets_empty_strings_as_nulls than change things for an arguably worst default.

Yes, exactly, that's why I didn't fix this for so long.

Thanks for explaining that! Makes sense now why fixing this would break the existing empty-string behavior on Oracle. I agree this should be marked as wontfix given the trade-offs involved. Should I update the ticket status myself, or will a maintainer handle that? And would it be helpful if I added a test or docs clarifying this Oracle-specific behavior?

This was my first attempt at contributing to Django, so I appreciate you taking the time to walk through the reasoning.

comment:10 by Simon Charette, 3 days ago

Hello Siddhahrt,

Should I update the ticket status myself, or will a maintainer handle that?

Mariusz has already taken care of that in comment:8 so you should be all set.

And would it be helpful if I added a test or docs clarifying this Oracle-specific behavior?

I think we're already covered with this documentation. The fact it states

When fetching from the database, it is assumed that a NULL value in one of these fields really means the empty string, and the data is silently converted to reflect this assumption.

strengthen my believe we're making the right call here as we consider that NULL means empty string by continuing to match all rows.

This was my first attempt at contributing to Django, so I appreciate you taking the time to walk through the reasoning.

Happy to help and thanks for the triage and giving it a shot.

Triaging issues and spending time understanding them is often more valuable than landing code itself as it strengthen our common understanding of the problem and allow us to build a consistent experience accross the diverse interfaces the ORM provides.

Note: See TracTickets for help on using tickets.
Back to Top