Opened 2 years ago

Closed 2 years ago

#34160 closed Bug (wontfix)

Django 4.1 Expression contains mixed types for (Big/Small)IntegerFields.

Reported by: Martin Lehoux Owned by: Martin Lehoux
Component: Database layer (models, ORM) Version: 4.1
Severity: Release blocker Keywords:
Cc: Simon Charette, Luke Plant Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

During an update from Django 4.0.8 to 4.1.3, an unexpected bug occured in our tests.

django.core.exceptions.FieldError: Expression contains mixed types: IntegerField, SmallIntegerField. You must set output_field.

I understand what it means, but I couldn't find what changed between these two versions that could explain the appearance of this error.

Case(
    When(
        True,
         then=F("inventory_count") + Value(1),
    ),
    default=F("inventory_count"),
)

I could easily fix it with

Case(
    When(
        True,
         then=F("inventory_count") + Value(1),
    ),
    default=F("inventory_count"),
    output_field=IntegerField(),
)

Change History (20)

comment:1 by Martin Lehoux, 2 years ago

Summary: Django 4.1Django 4.1 Expression contains mixed types

comment:2 by Mariusz Felisiak, 2 years ago

Cc: Simon Charette Luke Plant added
Severity: NormalRelease blocker
Summary: Django 4.1 Expression contains mixed typesDjango 4.1 Expression contains mixed types for (Big/Small)IntegerFields.
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Thanks for the report. I think we should automatically resolved output_field for a mix of integer types, e.g.:

  • IntegerField and SmallIntegerField -> IntegerField,
  • IntegerField and BigIntegerField -> BigIntegerField,
  • BigIntegerField and SmallIntegerField -> BigIntegerField.

Regression in 40b8a6174f001a310aa33f7880db0efeeb04d4c4.

comment:3 by Luke Plant, 2 years ago

I agree we should try to automatically resolve the type. Based on my experience with the related patch, I think it would be worth checking what MySQL and Postgres do for cases like this - there could be some unexpected surprises e.g. the database actually casts to the smaller of the type, or the first type. In this case, the user may need to know that it isn't going to do what you expect.

comment:4 by Simon Charette, 2 years ago

I agree with your suggested solution Markusz. Adding entries to _connector_combinations seems like the way to go for a backport.

Based on my experience with the related patch, I think it would be worth checking what MySQL and Postgres do for cases like this - there could be some unexpected surprises

I know that Postgres will silently cast values outside of the bigint range to numeric with its unfortunate side effect so I'm pretty that it has operators defined to make integer comparison as implicit as possible.

comment:5 by Mariusz Felisiak, 2 years ago

Martin, Would you like to prepare a patch?

comment:6 by Martin Lehoux, 2 years ago

Hi, I'd be happy to! I will have a look tonight, and will ask a few questions if I have trouble understanding what to do (but from what I read the infrastructure is already there for this feature)

in reply to:  6 comment:7 by Mariusz Felisiak, 2 years ago

Replying to Martin Lehoux:

Hi, I'd be happy to! I will have a look tonight, and will ask a few questions if I have trouble understanding what to do (but from what I read the infrastructure is already there for this feature)

Thanks, it should be enough to add extra combinations for integer fields (a regression test is also required, see test_resolve_output_field_number()).

comment:8 by Martin Lehoux, 2 years ago

Owner: changed from nobody to Martin Lehoux
Status: newassigned

comment:9 by Martin Lehoux, 2 years ago

Isn't it the Case that has an issue computing it's output_field? If it's the case, I'm not sure that the files you pointed to me were the right ones. (I say that because some of the tests cases, including the one in my example, already worked)
django/db/models/expressions.py

@deconstructible(path="django.db.models.Case")
class Case(SQLiteNumericMixin, Expression):
    template = "CASE %(cases)s ELSE %(default)s END"
    case_joiner = " "

    def __init__(self, *cases, default=None, output_field=None, **extra):
        if not all(isinstance(case, When) for case in cases):
            raise TypeError("Positional arguments must all be When objects.")
        super().__init__(output_field)
        self.cases = list(cases)
        self.default = self._parse_expressions(default)[0]
        self.extra = extra
Last edited 2 years ago by Martin Lehoux (previous) (diff)

comment:10 by Simon Charette, 2 years ago

What I believe is happening here is that F("inventory_count") + Value(1) has its output_field resolved to IntegerField because it's the common base of inventory_count (which I assume is a SmallIntegerField since the models were not provided) and Value(1).

While CombinedExpression._resolve_output_field resolving logic deals with subclassing properly BaseExpression._resolve_output_field isn't smart enough to resolve a mix of IntegerField subclasses to the largest denominator. By smart I mean that there isn't a proper way do this at such a low level for all subclasses of BaseExpression hence why it's considered a bad idea to event try doing so.

The three ways we could be fixing that would be:

  1. Document this requirement the usage of explicit output_field either to Value beyond the 3.2 release notes.
  2. Make Value._resolve_field use different kind of IntegerField subclasses when dealing with smaller/larger values. This could have the unintended side effect of surfacing even more errors for mixed type as Value(n < 255) would resolve to SmallIntegerField so I think this is a bad idea.
  3. Actually make BasExpression._resolve_output_field smarter with the deprecation period it requires.
Last edited 2 years ago by Simon Charette (previous) (diff)

comment:11 by Martin Lehoux, 2 years ago

I can confirm inventory_count = models.SmallIntegerField()

comment:12 by Martin Lehoux, 2 years ago

If I understand well, you mean that the future vision (given the comment you pointed out) is not to try to infer everything, and that I should indeed in my code specify this output_field in this case?

Just trying to understand what I should do on my side, and if there's a way for me to help you with that

comment:13 by Simon Charette, 2 years ago

Martin, the only way I see on how this can be solved is to revisit how BaseExpression._resolve_output_field deals with variadic merge expressions (e.g. Case, Coalesce) in a way that tries to be a bit more clever with regards to type integer types handling. You could give it a shot but be aware that it's going to be exploration work, certainly a nice way to learn more about the ORM works but not necessarily one that will lead to a fool proof solution.

In the mean time, I think that the only solution is effectively to add explicit output_field.

I'd be happy to review your exploration work but just a small disclaimer that I'm not sure what the proper solution might be.

comment:14 by Martin Lehoux, 2 years ago

Alright thank you for you explanation, I'd be happy to have a look. I will get back to you if I find something interesting

comment:15 by Mariusz Felisiak, 2 years ago

Case() is trying to resolve an output type only for specific fields that cannot be implicitly cast on PostgreSQL (namely GenericIPAddressField, IPAddressField, TimeField, and UUIDField) in other cases PostgreSQL does the work, see docs. As far as I'm aware, we should be safe to ignore errors in resolving an output field in other cases:

  • django/db/models/expressions.py

    diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
    index c270ef16c7..575712806f 100644
    a b class Case(SQLiteNumericMixin, Expression):  
    14831483        sql_params.extend(default_params)
    14841484        template = template or template_params.get("template", self.template)
    14851485        sql = template % template_params
    1486         if self._output_field_or_none is not None:
    1487             sql = connection.ops.unification_cast_sql(self.output_field) % sql
     1486        try:
     1487            if self._output_field_or_none is not None:
     1488                sql = connection.ops.unification_cast_sql(self.output_field) % sql
     1489        except FieldError:
     1490            pass
    14881491        return sql, sql_params
    14891492
    14901493    def get_group_by_cols(self):

in reply to:  15 comment:16 by Mariusz Felisiak, 2 years ago

Replying to Mariusz Felisiak:

Case() is trying to resolve an output type only for specific fields that cannot be implicitly cast on PostgreSQL (namely GenericIPAddressField, IPAddressField, TimeField, and UUIDField) in other cases PostgreSQL does the work, see docs. As far as I'm aware, we should be safe to ignore errors in resolving an output field in other cases:

After a deeper investigation, this ☝️ only fixes one case:

Case(
    When(Value(True), then=F("small_integer") + Value(1)),  
    default=F("integer"),
)

the following still crashes:

Case(
    When(Value(True), then=F("small_integer") + Value(1)),  
    default=F("small_integer"),
)

I don't think it's worth changing anymore. I theory, 40b8a6174f001a310aa33f7880db0efeeb04d4c4 is a regression, however the previous behavior was also buggy, e.g. F("small_integer") + Value(100000000) was resolved to SmallIntegerField, so I think it better to raise an exception in all cases. I'm going to add a small release note and close this as "wontfix" when it gets merged.

comment:17 by Mariusz Felisiak, 2 years ago

comment:18 by GitHub <noreply@…>, 2 years ago

In e8dcef15:

Refs #33397, Refs #34160 -- Added release note for resolving output_field changes.

comment:19 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 58156f4e:

[4.1.x] Refs #33397, Refs #34160 -- Added release note for resolving output_field changes.

Backport of e8dcef155c1848ef49e54f787a7d20faf3bf9296 from main

comment:20 by Mariusz Felisiak, 2 years ago

Resolution: wontfix
Status: assignedclosed
Triage Stage: AcceptedUnreviewed
Note: See TracTickets for help on using tickets.
Back to Top