Opened 3 years ago

Last modified 3 months ago

#22669 assigned Bug

bulk_create with empty model fields fails on oracle

Reported by: sns1081@… Owned by: Michael Nacharov
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: QuerySet.bulk_create
Cc: mnach@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I did the minimal example for reproducing this bug. Model looks very simple:

class ProbaModel(models.Model):
    num = models.IntegerField(null=True, blank=True)

When i tried to run the following test code:

class ProbaTestCase(TestCase):
    def test_bulk_create(self):
        ProbaModel.objects.bulk_create(
            [
                ProbaModel(num=1),
                ProbaModel()
            ]
        )

I got an

Traceback (most recent call last):
  File "/home/sns/devel/proba/proba/app1/tests.py", line 10, in test_bulk_create
    ProbaModel()
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/manager.py", line 160, in bulk_create
    return self.get_queryset().bulk_create(*args, **kwargs)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create
    self._batched_insert(objs_without_pk, fields, batch_size)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert
    using=self.db)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/sns/devel/proba/local/lib/python2.7/site-packages/django/db/backends/oracle/base.py", line 816, in execute
    return self.cursor.execute(query, self._param_generator(params))
DatabaseError: ORA-01790: expression must have same datatype as corresponding expression

This bug confirmed on Oracle 11.2.0.4 and 12. Complete project located on my GitHub: https://github.com/nsadovskiy/bulk_create_fail

Solution that works fo me is patch on django/db/backends/oracle/base.py:

*** base.py	2014-05-20 18:37:07.000000000 +0700
--- base.py	2014-05-20 18:41:45.000000000 +0700
***************
*** 467,509 ****
          return '%s_TR' % util.truncate_name(table, name_length).upper()
  
      def bulk_insert_sql(self, fields, num_values):
!         MAPPER = {
!             'CharField': 'to_nchar(%s)',
!             'CommaSeparatedIntegerField': 'to_nchar(%s)',
!             'EmailField': 'to_nchar(%s)',
!             'FileField': 'to_nchar(%s)',
!             'FilePathField': 'to_nchar(%s)',
!             'ImageField': 'to_nchar(%s)',
!             'SlugField': 'to_nchar(%s)',
!             'URLField': 'to_nchar(%s)',
! 
!             'IPAddressField': 'to_char(%s)',
!             'GenericIPAddressField': 'to_char(%s)',
! 
!             'AutoField': 'to_number(%s)',
!             'BigIntegerField': 'to_number(%s)',
!             'BooleanField': 'to_number(%s)',
!             'DecimalField': 'to_number(%s)',
!             'FloatField': 'to_number(%s)',
!             'IntegerField': 'to_number(%s)',
!             'NullBooleanField': 'to_number(%s)',
!             'PositiveIntegerField': 'to_number(%s)',
!             'PositiveSmallIntegerField': 'to_number(%s)',
!             'SmallIntegerField': 'to_number(%s)',
!             'ForeignKey': 'to_number(%s)',
!             'ManyToManyField': 'to_number(%s)',
!             'OneToOneField': 'to_number(%s)',
! 
!             'DateField': 'to_date(%s)',
! 
!             'DateTimeField': 'to_timestamp(%s)',
!             'TimeField': 'to_timestamp(%s)',
! 
!             'BinaryField': 'to_blob(%s)',
! 
!             'TextField': 'to_nclob(%s)'
!         }
!         items_sql = "SELECT %s FROM DUAL" % ", ".join([MAPPER.get(field.get_internal_type(), '%s') for field in fields])
          return " UNION ALL ".join([items_sql] * num_values)
  
  
--- 467,473 ----
          return '%s_TR' % util.truncate_name(table, name_length).upper()
  
      def bulk_insert_sql(self, fields, num_values):
!         items_sql = "SELECT %s FROM DUAL" % ", ".join(["%s"] * len(fields))
          return " UNION ALL ".join([items_sql] * num_values)
  

Attachments (2)

Screenshot from 2016-04-15 18:15:04.png (10.6 KB) - added by Michael Nacharov 8 months ago.
Current "Execution plan"
Screenshot from 2016-04-15 18:13:22.png (9.7 KB) - added by Michael Nacharov 8 months ago.
INSET ALL execution plan

Download all attachments as: .zip

Change History (16)

comment:1 Changed 2 years ago by Tim Graham

Needs tests: set
Triage Stage: UnreviewedAccepted

I'm not sure about the patch, but it at least needs tests.

comment:2 Changed 2 years ago by Michael Nacharov

Hi folks!

I faced with this issue too. As I understand this error happen because django used empty string(") to represent NULL value. It's OK when we work with single row. However in current bulk_create implementation oracle tries to do UNION ALL first and if there is no-string values in the same column there will be an error as described above.
I can see two approaches here:

  • We can change representation of NULL value in django-oracle backend to "NULL" from empty string
  • We can change insertion method from
    INSERT INTO mytable (column1, column2, column3) 
    SELECT 'val1.1', 'val1.2', 'val1.3' FROM dual
    UNION ALL
    SELECT 'val2.1', 'val2.2', 'val2.3' FROM dual
    UNION ALL
    SELECT 'val3.1', 'val3.2', 'val3.3' FROM dual
    
    to
    INSERT ALL
      INTO mytable (column1, column2, column3) VALUES ('val1.1', 'val1.2', 'val1.3')
      INTO mytable (column1, column2, column3) VALUES ('val2.1', 'val2.2', 'val2.3')
      INTO mytable (column1, column2, column3) VALUES ('val3.1', 'val3.2', 'val3.3')
    SELECT * FROM dual;
    
    http://www.techonthenet.com/oracle/questions/insert_rows.php

What do you think?

comment:3 Changed 16 months ago by Tim Graham

Keywords: QuerySet.bulk_create added

comment:4 Changed 11 months ago by Mohsen Bande

@mnach is there any way one can patch your suggested approaches, as a workaround for now?
@devs any progress on resolving this issue? really needed

Last edited 11 months ago by Mohsen Bande (previous) (diff)

comment:5 Changed 11 months ago by Tim Graham

As indicated by the flags on the ticket, we are waiting for a patch that includes tests.

comment:6 Changed 8 months ago by Michael Nacharov

Owner: changed from nobody to Michael Nacharov
Status: newassigned
Version: 1.6master

Changed 8 months ago by Michael Nacharov

Current "Execution plan"

Changed 8 months ago by Michael Nacharov

INSET ALL execution plan

comment:7 Changed 8 months ago by Michael Nacharov

Hi all!

I need a review! It's first time I did a pull request to django, sorry if I missed something..

Tested in Oracle 12c Enterprise Edition Release 12.1.0.2.0. Thanks to OracleTestSetup!


I replace single_table_insert clause to multi_table_insert in terms used in oracle documentation because it is more efficient acording to execution plan

single_table_insert
Current "Execution plan"

multi_table_insert
INSET ALL execution plan

but solution suggested by @sns1081 also works, and I am ready to do another PR if this will be better for future purposes

comment:8 Changed 8 months ago by Michael Nacharov

My investigation of this subject came to this approach:

cursor.setinputsizes(arg0 = cx_Oracle.NUMBER)
cursor.execute("select 1 from dual union all select :arg0 from dual", arg0
= None)

by Anthony Tuininga at cx_Oracle mailing list

method setinputsizes is a part of Python Database API Specification v2.0 and Python Database API Specification v1.0. Backends which support it:

So, I guess that SQLCompiller classes can contain method like "prepare_cursor" which gets arguments from, for example, a database wrapper and pass them to setinputsizes method (and maybe to setoutputsize if it will be needed)

I need to ask @devs: Is this approach better than using "cast(:var as number)" inside query ?

comment:9 Changed 6 months ago by Michael Nacharov

Owner: Michael Nacharov deleted
Status: assignednew

comment:10 Changed 5 months ago by Tim Court

This bug just bit me. I am working around it by replacing the bulk_create with ~20,000 individual inserts. It'll be slow but it's just a migration so I can be patient.

Still, +1 for getting this fixed! :)

comment:11 Changed 4 months ago by Michael Nacharov

Cc: mnach@… added
Owner: set to Michael Nacharov
Status: newassigned

comment:12 Changed 4 months ago by Tim Graham

Needs tests: unset
Patch needs improvement: set

A PR includes tests but some failures on Oracle remain.

comment:13 Changed 4 months ago by Tim Graham

Patch needs improvement: unset

Tests are passing now and the patch is ready for some feedback from Oracle users.

comment:14 Changed 3 months ago by Tim Graham

Patch needs improvement: set

I left some comments for improvement.

Note: See TracTickets for help on using tickets.
Back to Top