Context Navigation

← Previous Ticket
Next Ticket →

#33647 assigned Bug

bulk_update and bulk_create silently truncating values for size limited fields on postgres

Reported by:	jerch	Owned by:	Rowan Douglas
Component:	Database layer (models, ORM)	Version:	4.0
Severity:	Normal	Keywords:
Cc:	Simon Charette, Lily, Rowan Douglas	Triage Stage:	Accepted
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Jacob Walls)

EDIT: This issue originally only affected bulk_update, then started affecting bulk_create in Django 5.2, but that aspect was fixed in Django 5.2.10 and Django 6.0.1.

Original report follows:

On postgres backend, bulk_update passes overlong values for size limited fields along without any notification/exception, instead truncating the value.

Repro:

Code highlighting:

# some model to repro
class TestModel(models.Model):
    name = models.CharField(max_length=32)

# in the shell
>>> from bulk_test.models import TestModel
>>> tm=TestModel(name='hello')
>>> tm.save()
>>> tm.name
'hello'
>>> tm.name='m'*100
>>> tm.save()  # good, raises:
...
django.db.utils.DataError: value too long for type character varying(32)

>>> TestModel.objects.all().values('name')
<QuerySet [{'name': 'hello'}]>
>>> TestModel.objects.all().update(name='z'*100)  # good, raises as well:
...
django.db.utils.DataError: value too long for type character varying(32)

>>> TestModel.objects.all().values('name')
<QuerySet [{'name': 'hello'}]>
>>> TestModel.objects.bulk_update([tm], ['name'])  # not raising, instead truncating:
1
>>> TestModel.objects.all().values('name')
<QuerySet [{'name': 'mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm'}]>

Not sure, if this is intended/expected behavior, well it is inconsistent to .save or .update, which both raise here. I only tested postgres backend for this, it may apply to other size limiting databases as well (sqlite itself is not affected, as it does not limit values).

If this is intended, it may be a good idea to at least document the slightly different behavior, so users are aware of it, and can prepare their code to avoid silent truncation with follow-up errors. A better way prolly would fix bulk_update to spot value overflows and raise, but I am not sure, if thats feasible.

Change History (43)

follow-up: 2 comment:1 by Simon Charette, 4 years ago

Triage Stage:	Unreviewed → Accepted
Type:	Uncategorized → Bug

I manage to reproduce, this is due to requires_casted_case_in_updates=True on the Postgres backend does a silent ::varchar(2) cast on the CASE statement.

diff --git a/tests/queries/test_bulk_update.py b/tests/queries/test_bulk_update.py
index bc252c21c6..f7244aab72 100644
--- a/tests/queries/test_bulk_update.py
+++ b/tests/queries/test_bulk_update.py
@@ -3,7 +3,7 @@
 from django.core.exceptions import FieldDoesNotExist
 from django.db.models import F
 from django.db.models.functions import Lower
-from django.db.utils import IntegrityError
+from django.db.utils import DataError, IntegrityError
 from django.test import TestCase, override_settings, skipUnlessDBFeature

 from .models import (
@@ -259,6 +259,14 @@ def test_ipaddressfield(self):
                     CustomDbColumn.objects.filter(ip_address=ip), models
                 )

+    def test_charfield_constraint(self):
+        article = Article.objects.create(
+            name="a" * 20, created=datetime.datetime.today()
+        )
+        article.name = "b" * 50
+        with self.assertRaises(DataError):
+            Article.objects.bulk_update([article], ["name"])
+
     def test_datetime_field(self):
         articles = [
             Article.objects.create(name=str(i), created=datetime.datetime.today())

We'll need to find an elegant way to cast to varchar instead of varchar(N)

in reply to: 1 comment:2 by jerch, 4 years ago

Replying to Simon Charette:

... this is due to requires_casted_case_in_updates=True on the Postgres backend does a silent ::varchar(2) cast on the CASE statement.

Then only postgres is affected here? (from the code it seems that other backends dont set this flag...)

For postgres the next question would be, if other data types with contraints are affected as well (basically any type, that allows narrowing by type(???) notation), or if this is a varchar only edge case. From https://www.postgresql.org/docs/current/datatype.html possible candidates for type != type(???) behavior are:

bit
bit varying
character
character varying
interval
numeric
time
timestamp

Imho django uses most of these for some field type (beside bit/bit varying?)

In general the broader "super" type with no constraints can be derived in postgres like this:

Code highlighting:

postgres=# select 'varchar(5)'::regtype;
      regtype      
-------------------
 character varying
(1 row)

Maybe it is enough to apply the super type to the cast in that line https://github.com/django/django/blob/a1e4e86f923dc8387b0a9c3025bdd5d096a6ebb8/django/db/models/query.py#L765?

Edit:
Just tested it - affected by falsey truncation are bit, bit varying, character and character varying. All others are specifying precision, which is either not explicitly set by django, or would raise anyway for non-suitable values (thats the case for numeric).

Edit2:
Arrayfields also might be affected for these 4 types, as select '{1234567890}'::varchar(6)[] also silently truncates the content.

Last edited 4 years ago by jerch (previous) (diff)

comment:3 by Simon Charette, 4 years ago

Cc:	Simon Charette added

Then only postgres is affected here?

yep, if you set this feature flag to False on the Postgres backend and run the queries.test_bulk_update you'll be able to see why it was added in the first place.

In general the broader "super" type with no constraints can be derived in postgres like this:
...
Maybe it is enough to apply the super type to the cast in that line

That could be one approach yes, we'd likely need to adapt CAST to allow for such usage though. Not sure of what the argument should be named though, maybe generic which defaults to False? Not sure what generic=True would mean in the case of Cast(expr, ArrayField(CharField(max_length=20), generic=True) would it be ::varchar[] or ::array.

follow-ups: 5 6 comment:4 by Simon Charette, 4 years ago

An alternative approach might to commit to dropping the whole CASE/WHEN wrapping altogether on backends that support it as described in #29771.

We know the underlying expression construction approach performs poorly #31202 and the following form doesn't suffer for the type inference issue we're experiencing here

UPDATE test_model SET name = v.name
FROM (VALUES
   (1, ‘aaaaaaaaaa’),
   (2, ‘bbbbbbbbbb’)
) AS v(id, name)
WHERE test_model.id = v.id

in reply to: 4 comment:5 by jerch, 4 years ago

Replying to Simon Charette:

An alternative approach might to commit to dropping the whole CASE/WHEN wrapping altogether on backends that support it as described in #29771.

Hmm, yes perfwise I totally agree - the UPDATE FROM VALUES variants are much better for pumping tons of individual values. I already tried to address that pattern in https://github.com/netzkolchose/django-fast-update, with string formatting atm. The runtime numbers there speak for themselves. But for a serious integration in the ORM there are some obstacles to overcome:

f-expressions wont work anymore (at least not without big workarounds)
profound ORM integration needs serious rework on the update sql compiler (or even a separate one just for the update + values pattern)
depends on recent db engines (+distinction of mysql8 vs. mariadb)

Regarding f-expressions - idk if thats a biggie: bulk_update always occured to me as an interface to pump individual values rather than doing column trickery with it, so I would not mind, if that interface does not support f-expressions anymore. But thats just me, if people insist on using f-expressions here as well, this would need a workaround of unknown complexity.

Imho the second point can be done, it just needs someone with time and enough dedication (well I can try that, but would need serious help, as I lack deeper knowledge of ORM internals).

The last point is more tricky - how to deal with older or incompatible database engines here? Create a fallback (current implementation)? Or just confront users with "nope, not supported here, get a newer/compatible db engine"? It also raises the question, where to park the actual implementation - while the ORM itself could blueprint the UPDATE FROM VALUES pattern in ORM style, the backends would have to translate it into their very own style, or even substyles for mysql (mysql8 != mariadb here).

I guess that such a ground-shaking change to django would need some sort of consensus first?

Last edited 4 years ago by jerch (previous) (diff)

in reply to: 4 comment:6 by jerch, 4 years ago

Replying to Simon Charette:

An alternative approach might to commit to dropping the whole CASE/WHEN wrapping altogether on backends that support it as described in #29771.

We know the underlying expression construction approach performs poorly #31202 and the following form doesn't suffer for the type inference issue we're experiencing here
UPDATE test_model SET name = v.name
FROM (VALUES
   (1, ‘aaaaaaaaaa’),
   (2, ‘bbbbbbbbbb’)
) AS v(id, name)
WHERE test_model.id = v.id

Can you give me a pointer, how to get a discussion about that rolling? Or who to contact? I already wrote about my implementation of that idea 3 months ago, no response from anyone. Then I put everything into a neat package with extensive tests, no response (beside some 3rd person giving really helpful feedback). I even wrote a mailing list request about it yday to discuss some details - no response either (though its still pretty fresh). I really dont know what else to do. Could it be that no one is actually interested in a revamped bulk_update implementation in django? Or is django development known to have a very slow pace / being in maintenance mode mostly? I dont want to blame anyone - could all be me pushing the wrong buttons, but I've never faced it to that degree in any other OSS project.

comment:7 by Carlton Gibson, 4 years ago

Link to the mailing list thread

Hi Jörg — thanks for the input here. Sorry you're feeling frustrated.

Could it be that no one is actually interested in a revamped bulk_update implementation in django? Or is django development known to have a very slow pace / being in maintenance mode mostly?

So there's three points there:

I suspect it's not lots of people who are directly vested, but there are a number of regular contributors to the ORM (Simon included) and I'd imagine this is a topic of interest, but, as you've already pointed out in your mailing list post, there are several tradeoffs to consider, and it'll need some thought. Folks have limit bandwidth: that doesn't entail no interest. I hope that's clear.
Django does have a slow pace. That's OK. After 16+ years, that's proven to be one of its strengths. It's a big project, with a lot of surface area, and (again) folks have limited bandwidth. It's one reason why third-party packages, such as the one you've done, are a good way to go, as they allow a faster pace, and a sandbox to work on issues.
Despite the slow pace, Django is in anything but maintenance mode: you need only look at the release notes over the last few major releases to see that new features are constantly being worked on and delivered. If you zoom-out from any particular issue, I contest, the development pace is actually quite rapid for a project of Django's size and maturity (despite being "slow" on the surface.)

We're currently heads-down working towards the feature freeze for Django 4.1 — there is no chance (I'd think) of this getting addressed for that. That leaves a realistic opportunity to discuss it for Django 4.2, and if you're keen, and the technical questions can be resolved, there's no reason it couldn't get in for that. If we miss that, then the next one... — Again zooming out, it soon fades that it took x-cycles to get any particular feature work completed.

Looking at the timestamps on the discussion here, not much time has passed between comments. I'd suggest a little patience, and working on the third-party implementation to resolve any outstanding issues in that time. If it's ready™ following up on the mailing list thread may be appropriate to let folks know they can give it a try.

I hope that all makes sense, and helps anchor expectations. There's a nice comic here which I always try to keep in mind.

Kind Regards,

Carlton

Last edited 4 years ago by Carlton Gibson (previous) (diff)

comment:8 by jerch, 4 years ago

Hey Carlton,

thanks for the headsup. I didnt meant to sound like a drama queen, sorry if it came that way. Also I am not eager to push my ideas through at any price, since I could be totally on the wrong track, simply for reasons I've overlooked. So it is more about getting feedback at all, whether things go into the right direction or not.

What I've learned from >20ys OSS contributions - giving ppl early feedback helps to keep them engaged, and lowers the risk of time consuming dead end implementations (time on both ends, the implementer and the reviewers/maintainers), esp. when the bigger picture needs to be addressed (API changes, bigger codebase changes involved at several places). I know that django is a very big codebase with lots of legacy, which imho makes it even harder for someone from outside to get involved. This for sure is a balancing act to maintain. Ofc slow pace is not a bad thing - in fact I like django for not buying every shiny new idea in town, as it gives very solid development experience (using django myself since version 0.8).
At this point I wonder if the separated issue tracker in trac vs. repo in github might be part of a communication/transfer issue? (At least for me as maintainer of several projects github/gitlab made conceptual discussions and overall communications alot easier than back in SVN/mailing list times...)

I hope I did not derail this issue too much. :)

follow-up: 10 comment:9 by Carlton Gibson, 4 years ago

Hi Jörg.

...giving ppl early feedback helps to keep them engaged, and lowers the risk of time consuming dead end implementations (time on both ends, the implementer and the reviewers/maintainers), esp. when the bigger picture needs to be addressed (API changes, bigger codebase changes involved at several places)

Sure. I'm not sure the 11 days since you opened the ticket is that much for folks to come to a view. If your workaround is preforming well for you, that's good input. Otherwise you may need a little patience, though I see some input on the mailing list... 🙂

...if the separated issue tracker in trac...

That's out of scope for this one I'm afraid 😅. (There are mailing list discussions about it, but it's not trivial, not least because of the history in Trac... — As I say out of scope for here... 😬)

Thanks.

in reply to: 9 comment:10 by jerch, 4 years ago

Replying to Carlton Gibson:

... I'm not sure the 11 days since you opened the ticket is that much for folks to come to a view. If your workaround is preforming well for you, that's good input. Otherwise you may need a little patience, though I see some input on the mailing list... 🙂

Well this ticket here is only loosely related and much younger (found the issue while doing some bulk_update tests). Ah whatever, will just wait on the mailing list input...

comment:11 by Dan Hamilton, 3 years ago

Owner:	changed from nobody to Dan Hamilton
Status:	new → assigned

comment:12 by Dan Hamilton, 3 years ago

Owner:	Dan Hamilton removed
Status:	assigned → new

comment:13 by Priyank Panchal, 3 years ago

Owner:	set to Priyank Panchal
Status:	new → assigned

comment:14 by David Sanders, 3 years ago

Hi Priyank,

Just checking are you still interested in working on this?

comment:15 by Priyank Panchal, 3 years ago

Owner:	Priyank Panchal removed
Status:	assigned → new

comment:16 by Priyank Panchal, 3 years ago

Hello, I have attempted to address this issue ,and I've identified the problem. It appears that the problem arises whenever the CAST() function is executed. it seems that changing these parameters to False is not an option. Additionally, the issue with SQLite occurs when all characters are stored in the database.and I'm still interested working on this tickets. What would be the best approach to resolve this problem?

Last edited 3 years ago by Priyank Panchal (previous) (diff)

follow-up: 18 comment:17 by Akash Kumar Sen, 3 years ago

Has patch:	set
Owner:	set to Akash Kumar Sen
Status:	new → assigned

Sorry I missed your comment and accidentally created a patch, Let's connect if you need help in some other ORM ticket.

Patch - https://github.com/django/django/pull/17363

in reply to: 17 comment:18 by Mariusz Felisiak, 3 years ago

Has patch:	unset

Replying to Akash Kumar Sen:

Sorry I missed your comment and accidentally created a patch, Let's connect if you need help in some other ORM ticket.

Patch - https://github.com/django/django/pull/17363

We cannot regress Cast() to avoid truncating values in bulk_update().

follow-up: 20 comment:19 by Akash Kumar Sen, 3 years ago

We cannot regress Cast() to avoid truncating values in bulk_update() .

I don't think we are regressing the Cast() here the query generated is

UPDATE "queries_article" SET "name" = (CASE WHEN ("queries_article"."id" = 1) THEN bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb WHEN ("queries_article"."id" = 2) THEN bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb ELSE NULL END)::varchar WHERE "queries_article"."id" IN (1, 2)

it is just generating the super type i.e. varchar instead of varchar(20) in case of a CharField

If you can explain a little further that would be great Mariusz

Last edited 3 years ago by Akash Kumar Sen (previous) (diff)

in reply to: 19 comment:20 by Mariusz Felisiak, 3 years ago

If you can explain a little further that would be great Mariusz

Explained in PR.

comment:21 by Akash Kumar Sen, 3 years ago

Has patch:	set

comment:22 by Mariusz Felisiak, 3 years ago

Patch needs improvement:	set

follow-up: 24 comment:23 by Akash Kumar Sen, 3 years ago

I have checked your GitHub comment. Any suggestions you have in mind?
Initially I went for updating the compiler itself, but that seems to be a much more tedious task. I have one more hacky Idea like this which is as follows:

Introduce a new database function named CastSuperType that will always cast the super type for every possible arguments.
Like varchar for varchar(20) and similar equivalents for all the other fields that supports casting.

I also thought of having a different SQLUpdateCompiler compiler for PostgreSQL, but as the UpdateQuery(https://github.com/Akash-Kumar-Sen/django/blob/bulk_update/django/db/models/sql/subqueries.py#L48) code is shared between the databases I am finding a hard time to do that.

Last edited 3 years ago by Akash Kumar Sen (previous) (diff)

in reply to: 23 comment:24 by Mariusz Felisiak, 3 years ago

Has patch:	unset
Patch needs improvement:	unset

Replying to Akash Kumar Sen:

I have checked your GitHub comment. Any suggestions you have in mind?
Initially I went for updating the compiler itself, but that seems to be a much more tedious task. I have one more hacky Idea like this which is as follows:

Introduce a new database function named CastSuperType that will always cast the super type for every possible arguments.
Like varchar for varchar(20) and similar equivalents for all the other fields that supports casting.
I also thought of having a different SQLUpdateCompiler compiler for PostgreSQL, but as the UpdateQuery(https://github.com/Akash-Kumar-Sen/django/blob/bulk_update/django/db/models/sql/subqueries.py#L48) code is shared between the databases I am finding a hard time to do that.

This is a tricky issue to solve, and we cannot move forward with a stub solution just for this reason. I suspect that we will need to revisit bulk_update() to make it work properly, but I don't have any specific advice.

comment:25 by Akash Kumar Sen, 3 years ago

Following this approach mentioned by Simon in comment 3 would be reasonable I think.

That could be one approach yes, we'd likely need to adapt CAST to allow for such usage though. Not sure of what the argument should be named though, maybe generic which
defaults to False? Not sure what generic=True would mean in the case of Cast(expr, ArrayField(CharField(max_length=20), generic=True) would it be ::varchar[] or ::array.

comment:26 by Craig de Stigter, 2 years ago

related: #35362

comment:27 by Rowan Douglas, 8 months ago

Hi I believe this is now also happening with bulk_create after changing the Postgres strategy to use UNNEST. I assume that I should not create a new bug for this behaviour? I would be interested it taking on this issue if that would be appropriate.

comment:28 by Simon Charette, 8 months ago

I believe this is now also happening with bulk_create after changing the Postgres strategy to use UNNEST

That would make sense as we use Field.db_type in the same way to generate the vector of data. I think this could qualify as a backport for 5.2 given it can result in silent data losses but I'd like to hear a fellow on that.

I would be interested it taking on this issue if that would be appropriate.

Sure thing, make sure you're fully up-to-date with the discussions here though. One way to approach the problem would be to change Field.db_type signature to something like (connection, parametrized=True) and have both bulk_update and the PostgreSQL UNNEST strategy for bulk_create call db_type with parametrized=False.

Obviously we can't backport such a change to 5.2 and 6.0 so what we could do in a potential backport at first is adjust assemble_as_sql to do something like field.db_type(connection).split("(")[0] when building db_types.

We could then make adjusting in a following commit targeting main by introducing a parametrized=False kwarg in db_type with a deprecation period and use that in bulk_update and Postgres bulk_create.

FWIW you might also be interested in #14094 and #34887 which respectively added support for unparametrized varchar (unlimited length) on Postgres and SQLite and the more recent #24920 which added the same for DecimalField (backed by NUMERIC which is affected by the same root problem as the one discussed here for CharField) on all backends but MySQL.

I'm bringing the DecimalField example because on MySQL DECIMAL without parameters appears to be an alias for DECIMAL(10, 0) so we likely will have to special case some cases here and there to have parametrized=False mean the loosest type this backend supports on all backends.

Last edited 8 months ago by Simon Charette (previous) (diff)

comment:29 by Jacob Walls, 8 months ago

Owner:	changed from Akash Kumar Sen to Rowan Douglas
Severity:	Normal → Release blocker
Summary:	bulk_update silently truncating values for size limited fields → bulk_update and bulk_create silently truncating values for size limited fields on postgres

Simon's plan in comment:28 sounds promising, and I agree that we should backport the bulk_create() aspect (via assemble_as_sql) to 5.2 in a backward-compatible way and then defer any following work to future releases (non-blocking).

Rowan, thanks for the offer to help!

comment:30 by Rowan Douglas, 8 months ago

Thank you for the suggestions. I will make sure to go over the history of this ticket and the related tickets you mentioned before starting.

As a new contributor, I would like to clarify the backport reasoning. Am I correct in understanding that changing the signature of Field.db_type is only something that can occur on the active branch as it is not purely a fix, but also a change in the general interface. Therefore we will make a specific fix for the stable branches first, and then introduce the bigger change as part of the next release.

Let me know if I have misunderstood anything.

comment:31 by Jacob Walls, 8 months ago

In principle that's right, but all contributions go through main, and mergers will handle any backports, so just target main.

We really only need to repair 6.0 and 5.2 to the state before the UNNEST strategy replicated this old issue inside bulk_create. Then the parametrized=False bit to clean up that planned Postgres bulk_create fix as well as finally fix bulk_update would not get backported, so make sure that work is in a separate commit.

Thanks for the report and the offer to contribute, much appreciated!

Last edited 8 months ago by Jacob Walls (previous) (diff)

comment:32 by Jacob Walls, 7 months ago

Hi Rowan, just a heads up that we're targeting a patch release fixing 6.0 regressions on January 6. I'm likely to assign this to myself around Dec 30 or so if there's not a PR yet. Just wanted to over-communicate with the holidays approaching. Thanks again.

comment:33 by Jacob Walls, 7 months ago

#36823 was a dupe for the 5.2 behavior in bulk_create().

comment:34 by Simon Charette, 7 months ago

Jacob, just to make sure we don't end up racing for an alternative solution during the holidays I've tested out the split("(")[0] approach here and it passes the full suite.

comment:35 by Lily, 7 months ago

Cc:	Lily added

comment:36 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

In d6ae2ed:

Refs #33647 -- Fixed silent data truncation in bulk_create on Postgres.

Regression in a16eedcf9c69d8a11d94cac1811018c5b996d491.

The UNNEST strategy is affected by the same problem bulk_update has wrt/
to silent data truncation due to its usage of db_type which always returns
a parametrized subtype.

comment:37 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

In 764af478:

[6.0.x] Refs #33647 -- Fixed silent data truncation in bulk_create on Postgres.

Regression in a16eedcf9c69d8a11d94cac1811018c5b996d491.

The UNNEST strategy is affected by the same problem bulk_update has wrt/
to silent data truncation due to its usage of db_type which always returns
a parametrized subtype.

Backport of d6ae2ed868e43671afc4d433c3d8f4d27f7eb555 from main.

comment:38 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

In 2ca2afdf:

[5.2.x] Refs #33647 -- Fixed silent data truncation in bulk_create on Postgres.

Regression in a16eedcf9c69d8a11d94cac1811018c5b996d491.

The UNNEST strategy is affected by the same problem bulk_update has wrt/
to silent data truncation due to its usage of db_type which always returns
a parametrized subtype.

Backport of d6ae2ed868e43671afc4d433c3d8f4d27f7eb555 from main.

comment:39 by Jacob Walls, 7 months ago

Description:	modified (diff)
Severity:	Release blocker → Normal

Downgrading from release blocker now that we've backported the fix to bulk_create(). Only bulk_update() is still affected (as before). Clarified the ticket description, but leaving both methods in the ticket title so that the reference from the release notes is understandable.

comment:40 by Simon Charette, 6 months ago

When reviewing the currently proposed patch for #23902 it reminded me that we also use Field.db_type based casting when performing field type alteration on Postgres since #25002.

We ran into the exact same data truncation problem discussed here in #28816 by naively using db_type casting via USING and I only now realize that we only fixed in 1378d665a1c85897d951f2ca9618b848fdbba2e7 for a subset of the cases.

We fixed the problem by not casting when the internal type remains the same (e.g. varfield(255) -> varfield(254)) but the data loss problem persist when moving from something like integer as to varchar(2) for example as len(str(999)) > 2 obviously.

It stroke me as yet another location where Field.db_type(parametrized=False) would be useful as it would allow us to

Keep not casting via USING when the unparametrized type remain the same (what #28816 started doing)
In other case make sure to use USING %(unparametrized_type)s to avoid silent data truncations

Here's a DryORM demonstrating the issue. Not sure if we want a standalone ticket or if we should fix all the db_type based casting issues (bulk_update, schema editor) here.

comment:41 by Jacob Walls, 6 months ago

As part of this work we should opportunistically look for a way to delegate the dynamic provision of ArrayField sizes to compilation time, or else open a new ticket for it.

comment:42 by Rowan Douglas, 5 months ago

Cc:	Rowan Douglas added

Hi Everyone, sorry for the radio silence on my part. As you suspected I was away during the holidays and then making up for that time away at work. I also wasn't receiving notifications, but I have CC'ed my github username now, so hopefully that is fixed.

To check that I am up-to-date with the progress made:

Jacob has fixed the immediate issue with bulk_create and it was backported.
The remaining work is to implement the parametrized=False option for db_type and use it to fix the issue for bulk_update and replace the quick fix for bulk_create.

I am also happy to use it to fix the issues Simon mentioned with the schema editor as part of this ticket.

I am not sure how this applies to the ArrayField as mentioned by Jacob, so I will leave that for another ticket, unless you feel strongly that it should be part of this ticket. In that case I will need to read up on that issue and then may get back to you with questions.

Sorry again for the lack of communication on my part, let me know if I have missed anything.

Last edited 5 months ago by Rowan Douglas (previous) (diff)

comment:43 by Simon Charette, 5 months ago

Hello Rowan, absolute no worries about the delayed answer, glad to see you are still interested in working on this ticket!

You're fully caught up yes. The bulk_create data loss issue introduced in 5.2 was fixed by d6ae2ed868e43671afc4d433c3d8f4d27f7eb555 and the parametrized solution should be explored to resolve bulk_update (and other problems in distinct tickets). I'll provide you some context but it is not meant to be prescriptive, just some verbose pointers to reduce the feedback cycles and get you started.

Here's how a commit breakdown could look like

Introduce Field.db_type(parametrized=True). This will likely require a deprecation shim in Fied.__init_subclass__ that warns if an override doesn't confom to the new signature and wraps the override with a function that accepts parametrized and ignores it as well as documentation and tests for the adjusted Field subclasses (e.g here's how something like that can be achieved). I think a definition time warning is preferable as to avoid costly signature check and warnings at query compilation time.
Adjust the logic added in d6ae2ed868e43671afc4d433c3d8f4d27f7eb555 to make use of this new parametrized=False paramter instead of mangling the SQL returned by Field.db_type
Introduce Cast(parametrized=True), document and test.
Adjust bulk_update's Cast to pass parametrized=False

That's already a significant chunk of work but if you fancy once 2. lands you could also work on a more adequate data type over 1378d665a1c85897d951f2ca9618b848fdbba2e7 for Postgres field alterations.

I think Jacob's comment was referring to the last part of comment:2 regarding ArrayField. I think this case will be fixed by 1. as long we make sure to percolate parametrized to self.base_field.db_type in it's db_type implementation.

Lastly I'll include a few extra findings I stumbled upon while reviewing my comment.

While Field.cast_db_type is not documented I think it should follow the same treatment as .db_type regarding a deprecation shim when introducing parametrized
It looks like RangeContainedBy.process_rhs could also benefit from parametrized=False

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#33647 assigned Bug

bulk_update and bulk_create silently truncating values for size limited fields on postgres

Description (last modified by Jacob Walls)

Change History (43)

follow-up: 2 comment:1 by Simon Charette, 4 years ago

in reply to: 1 comment:2 by jerch, 4 years ago

comment:3 by Simon Charette, 4 years ago

follow-ups: 5 6 comment:4 by Simon Charette, 4 years ago

in reply to: 4 comment:5 by jerch, 4 years ago

in reply to: 4 comment:6 by jerch, 4 years ago

comment:7 by Carlton Gibson, 4 years ago

comment:8 by jerch, 4 years ago

follow-up: 10 comment:9 by Carlton Gibson, 4 years ago

in reply to: 9 comment:10 by jerch, 4 years ago

comment:11 by Dan Hamilton, 3 years ago

comment:12 by Dan Hamilton, 3 years ago

comment:13 by Priyank Panchal, 3 years ago

comment:14 by David Sanders, 3 years ago

comment:15 by Priyank Panchal, 3 years ago

comment:16 by Priyank Panchal, 3 years ago

follow-up: 18 comment:17 by Akash Kumar Sen, 3 years ago

in reply to: 17 comment:18 by Mariusz Felisiak, 3 years ago

follow-up: 20 comment:19 by Akash Kumar Sen, 3 years ago

in reply to: 19 comment:20 by Mariusz Felisiak, 3 years ago

comment:21 by Akash Kumar Sen, 3 years ago

comment:22 by Mariusz Felisiak, 3 years ago

follow-up: 24 comment:23 by Akash Kumar Sen, 3 years ago

in reply to: 23 comment:24 by Mariusz Felisiak, 3 years ago

comment:25 by Akash Kumar Sen, 3 years ago

comment:26 by Craig de Stigter, 2 years ago

comment:27 by Rowan Douglas, 8 months ago

comment:28 by Simon Charette, 8 months ago

comment:29 by Jacob Walls, 8 months ago

comment:30 by Rowan Douglas, 8 months ago

comment:31 by Jacob Walls, 8 months ago

comment:32 by Jacob Walls, 7 months ago

comment:33 by Jacob Walls, 7 months ago

comment:34 by Simon Charette, 7 months ago

comment:35 by Lily, 7 months ago

comment:36 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

comment:37 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

comment:38 by Jacob Walls <jacobtylerwalls@…>, 7 months ago

comment:39 by Jacob Walls, 7 months ago

comment:40 by Simon Charette, 6 months ago

comment:41 by Jacob Walls, 6 months ago

comment:42 by Rowan Douglas, 5 months ago

comment:43 by Simon Charette, 5 months ago

Download in other formats: