Opened 4 years ago
Last modified 9 months ago
#30224 assigned Bug
Mysql Datetime value is string instead of datetime object
Reported by: | Martin Kuhn | Owned by: | Rohit Jha |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | Mysql |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
The Problem
Datetimes are sometimes returned as objects or strings. It depends on the value. E.g. '2019-03-01 00:00:00.000000' is returned as a string, while2018-12-12 22:02:53.134000
is provided as a datetime object in the sql/compiler.py.
My Table
CREATE TABLE
myTable
(
id
char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
created_on
datetime(6) NOT NULL,
modified_on
datetime(6) NOT NULL,
first_spot_broadcast
datetime(6) DEFAULT NULL,
expiry_date
datetime(6) DEFAULT NULL
)
DB driver: django.db.backends.mysql
Django version: 2.0.4 tried 2.2 as well
DRF 3.8.2
mysqlclient = "==1.4.2"
This issue only occurred for me when I execute this query. I tried upgrading to 2.2 and the issue persisted.
.annotate( minExpiryDate=Min('buyout__expiry_date', filter=Q(buyout__is_archived=False, buyout__is_deleted=False))) \ .filter(Q(minExpiryDate__gte=(now + expiryInterval)) & Q(minExpiryDate__lte=(now + expiryInterval + timedelta(days=1))))
My stack trace is:
Traceback (most recent call last): File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1741, in <module> main() File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/usr/share/pycharm/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/server/manage.py", line 15, in <module> execute_from_command_line(sys.argv) File "site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "site-packages/django/core/management/base.py", line 316, in run_from_argv self.execute(*args, **cmd_options) File "site-packages/django/core/management/base.py", line 353, in execute output = self.handle(*args, **options) File "/server/buyout/management/commands/send_expiry_reminder_email.py", line 8, in handle send_expiry_reminder_summary() File "site-packages/celery/local.py", line 191, in __call__ return self._get_current_object()(*a, **kw) File "site-packages/celery/app/task.py", line 375, in __call__ return self.run(*args, **kwargs) File "/server/buyout/tasks/expiry_reminder.py", line 46, in send_expiry_reminder_summary if len(projects_expiring) > 0: File "site-packages/django/db/models/query.py", line 250, in __len__ self._fetch_all() File "site-packages/django/db/models/query.py", line 1186, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "site-packages/django/db/models/query.py", line 63, in __iter__ for row in compiler.results_iter(results): File "site-packages/django/db/models/sql/compiler.py", line 1009, in apply_converters value = converter(value, expression, connection) File "site-packages/django/db/backends/mysql/operations.py", line 252, in convert_datetimefield_value value = timezone.make_aware(value, self.connection.timezone) File "site-packages/django/utils/timezone.py", line 264, in make_aware return timezone.localize(value, is_dst=is_dst) File "site-packages/pytz/__init__.py", line 222, in localize if dt.tzinfo is not None: AttributeError: 'str' object has no attribute 'tzinfo' Exception ignored in: <generator object cursor_iter at 0x7f7ab3f88390> Traceback (most recent call last): File "site-packages/django/db/models/sql/compiler.py", line 1469, in cursor_iter cursor.close() File "site-packages/MySQLdb/cursors.py", line 86, in close while self.nextset(): File "site-packages/MySQLdb/cursors.py", line 136, in nextset nr = db.next_result() MySQLdb._exceptions.OperationalError: (2006, '')
Change History (18)
comment:1 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 4 years ago by
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:3 Changed 4 years ago by
Type: | Uncategorized → Bug |
---|
comment:4 Changed 4 years ago by
comment:5 Changed 4 years ago by
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Hello Martin,
There's effectively something broken here but nothing seems to prove Django is at fault; as you've noticed it looks like it's mysqlclient
that returns str
for some datetimes.
What I suggest you do to confirm the origin of the bug is to use mysqlclient
directly to connect to your database and retrieve the problematic rows. If the returned data contains string for datetimes columns then it's likely a mysqlclient
bug and there's not much Django can do.
I'll close the ticket for now but please reopen if you can provide more details about how Django is at fault.
comment:6 Changed 4 years ago by
Ok, I managed to narrow down the problem. My Query is using annotations to use the aggregation function MIN.
Project.objects \ .prefetch_related('buyout_set') \ .prefetch_related('teammember_set') \ .annotate(min_expiry_date=Min('buyout__expiry_date', filter=Q(buyout__is_archived=False, buyout__is_deleted=False)) ) \ .filter(Q(min_expiry_date__gte=(now + expiryInterval)) & Q(min_expiry_date__lte=(now + expiryInterval + timedelta(days=1))))
I tested the generated query below using mysqlclient
and found that it does return the annotated value as a string. The problem is a custom annotation "MIN(Date)". Probably mysqlclient has no type info for this custom value, since it could be any/multiple types. (E.g. using an IF statement.)
However this auto generated query could be rewritten to use CAST( "annotation" as DateTime).
The resulting query is
SELECT `advertisement_project`.`id`, `advertisement_project`.`created_on`, `advertisement_project`.`modified_on`, `advertisement_project`.`is_deleted`, `advertisement_project`.`created_by_id`, `advertisement_project`.`modified_by_id`, `advertisement_project`.`name`, `advertisement_project`.`code`, `advertisement_project`.`spot_length`, `advertisement_project`.`eta`, `advertisement_project`.`spot_id`, `advertisement_project`.`spot_id_suffix`, `advertisement_project`.`folder_id`, `advertisement_project`.`system_id`, `advertisement_project`.`status`, `advertisement_project`.`type`, `advertisement_project`.`ad_type`, `advertisement_project`.`parent_project_id`, `advertisement_project`.`original_project_id`, `advertisement_project`.`territory_id`, `advertisement_project`.`language_id`, `advertisement_project`.`campaign_id`, `advertisement_project`.`preview_id`, `advertisement_project`.`links_id`, `advertisement_project`.`producer_names`, MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) AS `minExpiryDate` FROM `advertisement_project` LEFT OUTER JOIN `buyout_buyout` ON (`advertisement_project`.`id` = `buyout_buyout`.`project_id`) GROUP BY `advertisement_project`.`id` HAVING (MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) >= '2019-03-01 00:00:00' AND MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) <= '2019-03-02 00:00:00') ORDER BY NULL;
comment:7 Changed 4 years ago by
I think the most straightforward and graciously solution is to patch Django's django/db/backends/mysql/operations.py
convert_datetimefield_value function. It should simply check for type string and try to convert it manually.
comment:8 Changed 4 years ago by
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:9 Changed 4 years ago by
Triage Stage: | Unreviewed → Accepted |
---|
Thanks for the investigation Martin, it's greatly appreciated.
I agree that the most straightforward solution for the datetime would be to adapt convert_datetimefield_value
but it'd be great to figure out if other types are affected as well (e.g. decimal, floats, ...) so we can figure out if the solution needs to live at the Aggregate(filter).as_mysql
level as well.
Could you possible try to figure if it's the case?
I also wonder if it could be possible to let mysqlclient
know what the expected type is for a specific cursor instead though. We have this information in the compiler (through .output_field
) so if we can provide per-cursor converter hints I assume it would be even more efficient because it would be performed in C. Another option is figuring out if mysqlclient
can somehow figure out how to introspect such return value on it's own (e.g. maybe MySQL returns a different field type in this case) but in all cases we'll have to ship a Django workaround anyway.
comment:11 Changed 4 years ago by
The problem seems to be Aggregation functions have a beef with Datetime in general. They convert them into a string. :(
So truly the best workaround would be to Wrap these functions with a CAST operation.
c.execute(""" SELECT MIN(CAST('2018-01-01 00:00:00' AS datetime)); """) 1 c.fetchall()[0][0] '2018-01-01 00:00:00'
By the way mysqlclient's response was "The answer is noooo!" Defiantly not their problem.
We could ask Mysql to fix it, but the same seems to be the case for mariaDB. I experimented with different MySql flavours and versions, before opening this bug. That means a improvement on the server would probably be around 2055.
comment:12 Changed 4 years ago by
Version: | 2.2 → master |
---|
Ah that's too bad. I guess we'll have to fix it on the Django side in this case.
Does it only happen for Min
and Max
functions? If it's the case I'm tempted to suggest we address the issue by implementing as_mysql
methods on these two functions that wrap the inner expression with a CAST
like you did here when self.output_field.internal_type() == 'DateTimeField
. The easiest way is probably to provide a different template
to the super functioné
Here's an example of how this was done for a duration handling of Avg
on MySQL
- https://github.com/django/django/blob/846624ed0858aec0e51baebaa5b397e135c6d1dc/django/db/models/functions/mixins.py#L25-L29
- https://github.com/django/django/blob/65858119d23e37872505a4476e7141c33981fb50/django/db/models/aggregates.py#L99
Happy to review your PR if you're interested in submitting one.
comment:13 Changed 4 years ago by
Yes, I can take a loop in my free time. I just need to make the following correction.
Seems like this is only a problem for MySql and Percona. Looks like mariadb is working fine. I just checked against their corresponding latest versions.
comment:14 Changed 3 years ago by
Owner: | changed from nobody to Rohit Jha |
---|---|
Status: | new → assigned |
comment:15 Changed 3 years ago by
I am trying to recreate this issue, I have written a (test case)https://github.com/django/django/compare/master...rohitjha941:ticket_30224 which generates the following query:
SELECT "expressions_experiment"."id", "expressions_experiment"."name", "expressions_experiment"."assigned", "expressions_experiment"."completed", "expressions_experiment"."estimated_time", "expressions_experiment"."start", "expressions_experiment"."end", Min(CASE WHEN "expressions_result"."experiment_id" = 1 THEN "expressions_result"."result_time" ELSE NULL END) AS "min_expiry_date" FROM "expressions_experiment" LEFT OUTER JOIN "expressions_result" ON ( "expressions_experiment"."id" = "expressions_result"."experiment_id" ) GROUP BY "expressions_experiment"."id", "expressions_experiment"."name", "expressions_experiment"."assigned", "expressions_experiment"."completed", "expressions_experiment"."estimated_time", "expressions_experiment"."start", "expressions_experiment"."end"
and output is
[{'id': 1, 'name': 'Rohit', 'assigned': datetime.date(2020, 3, 1), 'completed': datetime.date(2020, 3, 21), 'estimated_time': datetime.timedelta(days=20, seconds=36000), 'start': datetime.datetime(2020, 3, 1, 11, 23, 36, 34995), 'end': datetime.datetime(2020, 3, 21, 21, 23, 36, 34998), 'min_expiry_date': datetime.datetime(2020, 4, 20, 21, 23, 36, 35778)}]
comment:16 Changed 3 years ago by
Cc: | Sergey Fedoseev added |
---|
comment:17 Changed 9 months ago by
Could we proceed with this issue with this very minimalistic failing example?
Failing with AttributeError
>>> SomeModel.objects.annotate(bla=Value(timezone.now(), models.DateTimeField())).first().bla AttributeError: 'str' object has no attribute 'utcoffset'
Working with Cast
>>> SomeModel.objects.annotate(bla=Cast(Value(timezone.now(), models.DateTimeField()), models.DateTimeField())).first().bla datetime.datetime(2022, 6, 21, 8, 25, 39, 678166, tzinfo=datetime.timezone.utc)
Are we asking everyone to always wrap all kinds of expressions involving timestamps with Cast
? There is no such hint in https://docs.djangoproject.com/en/4.0/ref/models/expressions/#value-expressions. We use Value(<some datetime>)
a lot as fallbacks in When
, Coalesce
, Least
, ... expressions.
comment:18 Changed 9 months ago by
Cc: | Sergey Fedoseev removed |
---|
Could it be related to this bug ?
https://sourceforge.net/p/mysql-python/bugs/325/