Opened 4 years ago

Last modified 9 months ago

#30224 assigned Bug

Mysql Datetime value is string instead of datetime object

Reported by: Martin Kuhn Owned by: Rohit Jha
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: Mysql
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Martin Kuhn)

The Problem
Datetimes are sometimes returned as objects or strings. It depends on the value. E.g. '2019-03-01 00:00:00.000000' is returned as a string, while2018-12-12 22:02:53.134000 is provided as a datetime object in the sql/compiler.py.

My Table

CREATE TABLE myTable (

id char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
created_on datetime(6) NOT NULL,
modified_on datetime(6) NOT NULL,
first_spot_broadcast datetime(6) DEFAULT NULL,
expiry_date datetime(6) DEFAULT NULL

)

DB driver: django.db.backends.mysql
Django version: 2.0.4 tried 2.2 as well
DRF 3.8.2
mysqlclient = "==1.4.2"

This issue only occurred for me when I execute this query. I tried upgrading to 2.2 and the issue persisted.

        .annotate(
        minExpiryDate=Min('buyout__expiry_date', filter=Q(buyout__is_archived=False, buyout__is_deleted=False))) \
        .filter(Q(minExpiryDate__gte=(now + expiryInterval))
                & Q(minExpiryDate__lte=(now + expiryInterval + timedelta(days=1))))

My stack trace is:

Traceback (most recent call last):
  File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1741, in <module>
    main()
  File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/usr/share/pycharm/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/server/manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "site-packages/django/core/management/base.py", line 353, in execute
    output = self.handle(*args, **options)
  File "/server/buyout/management/commands/send_expiry_reminder_email.py", line 8, in handle
    send_expiry_reminder_summary()
  File "site-packages/celery/local.py", line 191, in __call__
    return self._get_current_object()(*a, **kw)
  File "site-packages/celery/app/task.py", line 375, in __call__
    return self.run(*args, **kwargs)
  File "/server/buyout/tasks/expiry_reminder.py", line 46, in send_expiry_reminder_summary
    if len(projects_expiring) > 0:
  File "site-packages/django/db/models/query.py", line 250, in __len__
    self._fetch_all()
  File "site-packages/django/db/models/query.py", line 1186, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "site-packages/django/db/models/query.py", line 63, in __iter__
    for row in compiler.results_iter(results):
  File "site-packages/django/db/models/sql/compiler.py", line 1009, in apply_converters
    value = converter(value, expression, connection)
  File "site-packages/django/db/backends/mysql/operations.py", line 252, in convert_datetimefield_value
    value = timezone.make_aware(value, self.connection.timezone)
  File "site-packages/django/utils/timezone.py", line 264, in make_aware
    return timezone.localize(value, is_dst=is_dst)
  File "site-packages/pytz/__init__.py", line 222, in localize
    if dt.tzinfo is not None:
AttributeError: 'str' object has no attribute 'tzinfo'
Exception ignored in: <generator object cursor_iter at 0x7f7ab3f88390>
Traceback (most recent call last):
  File "site-packages/django/db/models/sql/compiler.py", line 1469, in cursor_iter
    cursor.close()
  File "site-packages/MySQLdb/cursors.py", line 86, in close
    while self.nextset():
  File "site-packages/MySQLdb/cursors.py", line 136, in nextset
    nr = db.next_result()
MySQLdb._exceptions.OperationalError: (2006, '')

Change History (18)

comment:1 Changed 4 years ago by Martin Kuhn

Description: modified (diff)

comment:2 Changed 4 years ago by Martin Kuhn

Component: UncategorizedDatabase layer (models, ORM)

comment:3 Changed 4 years ago by Martin Kuhn

Type: UncategorizedBug

comment:4 Changed 4 years ago by Martin Kuhn

Could it be related to this bug ?

https://sourceforge.net/p/mysql-python/bugs/325/

comment:5 Changed 4 years ago by Simon Charette

Resolution: needsinfo
Status: newclosed

Hello Martin,

There's effectively something broken here but nothing seems to prove Django is at fault; as you've noticed it looks like it's mysqlclient that returns str for some datetimes.

What I suggest you do to confirm the origin of the bug is to use mysqlclient directly to connect to your database and retrieve the problematic rows. If the returned data contains string for datetimes columns then it's likely a mysqlclient bug and there's not much Django can do.

I'll close the ticket for now but please reopen if you can provide more details about how Django is at fault.

comment:6 Changed 4 years ago by Martin Kuhn

Ok, I managed to narrow down the problem. My Query is using annotations to use the aggregation function MIN.

Project.objects \
        .prefetch_related('buyout_set') \
        .prefetch_related('teammember_set') \
        .annotate(min_expiry_date=Min('buyout__expiry_date',
                                      filter=Q(buyout__is_archived=False, buyout__is_deleted=False))
                  ) \
        .filter(Q(min_expiry_date__gte=(now + expiryInterval))
                & Q(min_expiry_date__lte=(now + expiryInterval + timedelta(days=1))))

I tested the generated query below using mysqlclient and found that it does return the annotated value as a string. The problem is a custom annotation "MIN(Date)". Probably mysqlclient has no type info for this custom value, since it could be any/multiple types. (E.g. using an IF statement.)
However this auto generated query could be rewritten to use CAST( "annotation" as DateTime).

The resulting query is


SELECT 
    `advertisement_project`.`id`,
    `advertisement_project`.`created_on`,
    `advertisement_project`.`modified_on`,
    `advertisement_project`.`is_deleted`,
    `advertisement_project`.`created_by_id`,
    `advertisement_project`.`modified_by_id`,
    `advertisement_project`.`name`,
    `advertisement_project`.`code`,
    `advertisement_project`.`spot_length`,
    `advertisement_project`.`eta`,
    `advertisement_project`.`spot_id`,
    `advertisement_project`.`spot_id_suffix`,
    `advertisement_project`.`folder_id`,
    `advertisement_project`.`system_id`,
    `advertisement_project`.`status`,
    `advertisement_project`.`type`,
    `advertisement_project`.`ad_type`,
    `advertisement_project`.`parent_project_id`,
    `advertisement_project`.`original_project_id`,
    `advertisement_project`.`territory_id`,
    `advertisement_project`.`language_id`,
    `advertisement_project`.`campaign_id`,
    `advertisement_project`.`preview_id`,
    `advertisement_project`.`links_id`,
    `advertisement_project`.`producer_names`,
    MIN(CASE
        WHEN
            (`buyout_buyout`.`is_archived` = 0
                AND `buyout_buyout`.`is_deleted` = 0)
        THEN
            `buyout_buyout`.`expiry_date`
        ELSE NULL
    END) AS `minExpiryDate`
FROM
    `advertisement_project`
        LEFT OUTER JOIN
    `buyout_buyout` ON (`advertisement_project`.`id` = `buyout_buyout`.`project_id`)
GROUP BY `advertisement_project`.`id`
HAVING (MIN(CASE
    WHEN
        (`buyout_buyout`.`is_archived` = 0
            AND `buyout_buyout`.`is_deleted` = 0)
    THEN
        `buyout_buyout`.`expiry_date`
    ELSE NULL
END) >= '2019-03-01 00:00:00'
    AND MIN(CASE
    WHEN
        (`buyout_buyout`.`is_archived` = 0
            AND `buyout_buyout`.`is_deleted` = 0)
    THEN
        `buyout_buyout`.`expiry_date`
    ELSE NULL
END) <= '2019-03-02 00:00:00')
ORDER BY NULL;

comment:7 Changed 4 years ago by Martin Kuhn

I think the most straightforward and graciously solution is to patch Django's django/db/backends/mysql/operations.py convert_datetimefield_value function. It should simply check for type string and try to convert it manually.

comment:8 Changed 4 years ago by Martin Kuhn

Resolution: needsinfo
Status: closednew

comment:9 Changed 4 years ago by Simon Charette

Triage Stage: UnreviewedAccepted

Thanks for the investigation Martin, it's greatly appreciated.

I agree that the most straightforward solution for the datetime would be to adapt convert_datetimefield_value but it'd be great to figure out if other types are affected as well (e.g. decimal, floats, ...) so we can figure out if the solution needs to live at the Aggregate(filter).as_mysql level as well.

Could you possible try to figure if it's the case?

I also wonder if it could be possible to let mysqlclient know what the expected type is for a specific cursor instead though. We have this information in the compiler (through .output_field) so if we can provide per-cursor converter hints I assume it would be even more efficient because it would be performed in C. Another option is figuring out if mysqlclient can somehow figure out how to introspect such return value on it's own (e.g. maybe MySQL returns a different field type in this case) but in all cases we'll have to ship a Django workaround anyway.

comment:10 Changed 4 years ago by Martin Kuhn

comment:11 Changed 4 years ago by Martin Kuhn

The problem seems to be Aggregation functions have a beef with Datetime in general. They convert them into a string. :(
So truly the best workaround would be to Wrap these functions with a CAST operation.

c.execute(""" SELECT MIN(CAST('2018-01-01 00:00:00' AS datetime)); """)
1
c.fetchall()[0][0]
'2018-01-01 00:00:00'

By the way mysqlclient's response was "The answer is noooo!" Defiantly not their problem.

We could ask Mysql to fix it, but the same seems to be the case for mariaDB. I experimented with different MySql flavours and versions, before opening this bug. That means a improvement on the server would probably be around 2055.

comment:12 Changed 4 years ago by Simon Charette

Version: 2.2master

Ah that's too bad. I guess we'll have to fix it on the Django side in this case.

Does it only happen for Min and Max functions? If it's the case I'm tempted to suggest we address the issue by implementing as_mysql methods on these two functions that wrap the inner expression with a CAST like you did here when self.output_field.internal_type() == 'DateTimeField. The easiest way is probably to provide a different template to the super functioné

Here's an example of how this was done for a duration handling of Avg on MySQL

Happy to review your PR if you're interested in submitting one.

comment:13 Changed 4 years ago by Martin Kuhn

Yes, I can take a loop in my free time. I just need to make the following correction.
Seems like this is only a problem for MySql and Percona. Looks like mariadb is working fine. I just checked against their corresponding latest versions.

comment:14 Changed 3 years ago by Rohit Jha

Owner: changed from nobody to Rohit Jha
Status: newassigned

comment:15 Changed 3 years ago by Rohit Jha

I am trying to recreate this issue, I have written a (test case)https://github.com/django/django/compare/master...rohitjha941:ticket_30224 which generates the following query:

SELECT "expressions_experiment"."id", 
       "expressions_experiment"."name", 
       "expressions_experiment"."assigned", 
       "expressions_experiment"."completed", 
       "expressions_experiment"."estimated_time", 
       "expressions_experiment"."start", 
       "expressions_experiment"."end", 
       Min(CASE 
             WHEN "expressions_result"."experiment_id" = 1 THEN 
             "expressions_result"."result_time" 
             ELSE NULL 
           END) AS "min_expiry_date" 
FROM   "expressions_experiment" 
       LEFT OUTER JOIN "expressions_result" 
                    ON ( "expressions_experiment"."id" = 
                         "expressions_result"."experiment_id" ) 
GROUP  BY "expressions_experiment"."id", 
          "expressions_experiment"."name", 
          "expressions_experiment"."assigned", 
          "expressions_experiment"."completed", 
          "expressions_experiment"."estimated_time", 
          "expressions_experiment"."start", 
          "expressions_experiment"."end" 

and output is

  [{'id': 1, 'name': 'Rohit', 'assigned': datetime.date(2020, 3, 1), 'completed': datetime.date(2020, 3, 21), 'estimated_time': datetime.timedelta(days=20, seconds=36000), 'start': datetime.datetime(2020, 3, 1, 11, 23, 36, 34995), 'end': datetime.datetime(2020, 3, 21, 21, 23, 36, 34998), 'min_expiry_date': datetime.datetime(2020, 4, 20, 21, 23, 36, 35778)}] 

comment:16 Changed 3 years ago by Sergey Fedoseev

Cc: Sergey Fedoseev added

comment:17 Changed 9 months ago by Tobias Krönke

Could we proceed with this issue with this very minimalistic failing example?

Failing with AttributeError

>>> SomeModel.objects.annotate(bla=Value(timezone.now(), models.DateTimeField())).first().bla
AttributeError: 'str' object has no attribute 'utcoffset'

Working with Cast

>>> SomeModel.objects.annotate(bla=Cast(Value(timezone.now(), models.DateTimeField()), models.DateTimeField())).first().bla
datetime.datetime(2022, 6, 21, 8, 25, 39, 678166, tzinfo=datetime.timezone.utc)

Are we asking everyone to always wrap all kinds of expressions involving timestamps with Cast? There is no such hint in https://docs.djangoproject.com/en/4.0/ref/models/expressions/#value-expressions. We use Value(<some datetime>) a lot as fallbacks in When, Coalesce, Least, ... expressions.

comment:18 Changed 9 months ago by Sergey Fedoseev

Cc: Sergey Fedoseev removed
Note: See TracTickets for help on using tickets.
Back to Top