Opened 3 years ago

Last modified 3 years ago

#23964 new Bug

MySQL: BaseModelFormSet.validate_unique() fails for mixed case;

Reported by: Jon Dufresne Owned by: nobody
Component: Forms Version: 1.7
Severity: Normal Keywords:
Cc: jon.dufresne@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Due to MySQL collation rules, two strings differing only by case are considered equal ("foo" is equal to "FOO"). If a database field is unique, upon inserting two rows where the unique field differs only by case, there will be a unique constraint violation.

If a modelformset_factory() is used to create a form set for a model with a unique field, the user can enter two strings differing only by case. The BaseModelFormSet.validate_unique() does not consider MySQL's collation rules and considers these two strings different. Upon insertion in the database, there is a constraint violation. The following new unit test demonstrates how this could happen. Code: <https://github.com/jdufresne/django/tree/mysql-formset-duplicates>.

Not sure the correct approach to fix this as other database backends will happily accept strings differing only by case. So the formset needs to have some sense of the database backend or whether or not unique fields should be case sensitive.

diff --git a/tests/forms_tests/models.py b/tests/forms_tests/models.py
index 82fa005..45d9680 100644
--- a/tests/forms_tests/models.py
+++ b/tests/forms_tests/models.py
@@ -110,3 +110,7 @@ class Cheese(models.Model):
 
 class Article(models.Model):
     content = models.TextField()
+
+
+class Tag(models.Model):
+    name = models.CharField(max_length=100, unique=True)
diff --git a/tests/forms_tests/tests/test_formsets.py b/tests/forms_tests/tests/test_formsets.py
index 94e2704..2e77afd 100644
--- a/tests/forms_tests/tests/test_formsets.py
+++ b/tests/forms_tests/tests/test_formsets.py
@@ -6,8 +6,10 @@ import datetime
 from django.forms import (CharField, DateField, FileField, Form, IntegerField,
     SplitDateTimeField, ValidationError, formsets)
 from django.forms.formsets import BaseFormSet, formset_factory
+from django.forms.models import modelformset_factory
 from django.forms.utils import ErrorList
 from django.test import TestCase
+from ..models import Tag
 
 
 class Choice(Form):
@@ -1213,3 +1215,22 @@ class TestEmptyFormSet(TestCase):
         class FileForm(Form):
             file = FileField()
         self.assertTrue(formset_factory(FileForm, extra=0)().is_multipart())
+
+
+TagFormSet = modelformset_factory(Tag, fields=['name'])
+
+
+class ModelFormSetTestCase(TestCase):
+    def test_duplicates_mixed_case(self):
+        formset = TagFormSet({
+            'form-TOTAL_FORMS': 2,
+            'form-INITIAL_FORMS': 0,
+            'form-0-name': 'TAG',
+            'form-1-name': 'tag',
+        })
+        self.assertTrue(formset.is_valid())
+        formset.save()
+        self.assertQuerysetEqual(
+            Tag.objects.all(),
+            ['TAG'],
+            lambda o: o.name)

When running this test with a MySQL database the test fails with:

======================================================================
ERROR: test_duplicates_mixed_case (forms_tests.tests.test_formsets.ModelFormSetTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jon/devel/django/tests/forms_tests/tests/test_formsets.py", line 1232, in test_duplicates_mixed_case
    formset.save()
  File "/home/jon/devel/django/django/forms/models.py", line 638, in save
    return self.save_existing_objects(commit) + self.save_new_objects(commit)
  File "/home/jon/devel/django/django/forms/models.py", line 769, in save_new_objects
    self.new_objects.append(self.save_new(form, commit=commit))
  File "/home/jon/devel/django/django/forms/models.py", line 621, in save_new
    return form.save(commit=commit)
  File "/home/jon/devel/django/django/forms/models.py", line 461, in save
    construct=False)
  File "/home/jon/devel/django/django/forms/models.py", line 103, in save_instance
    instance.save()
  File "/home/jon/devel/django/django/db/models/base.py", line 694, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/jon/devel/django/django/db/models/base.py", line 722, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/home/jon/devel/django/django/db/models/base.py", line 803, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/home/jon/devel/django/django/db/models/base.py", line 842, in _do_insert
    using=using, raw=raw)
  File "/home/jon/devel/django/django/db/models/manager.py", line 86, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/jon/devel/django/django/db/models/query.py", line 952, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/jon/devel/django/django/db/models/sql/compiler.py", line 930, in execute_sql
    cursor.execute(sql, params)
  File "/home/jon/devel/django/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/home/jon/devel/django/django/db/utils.py", line 95, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/jon/devel/django/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/home/jon/devel/django/django/db/backends/mysql/base.py", line 126, in execute
    return self.cursor.execute(query, args)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry 'tag' for key 'name'")

----------------------------------------------------------------------

Change History (7)

comment:1 Changed 3 years ago by Jon Dufresne

I am able to workaround this in my project by changing the line in BaseModelFormSet.validate_unique() from:

row_data = tuple([form.cleaned_data[field] for field in unique_check if field in form.cleaned_data])

To (adding .lower()):

row_data = tuple([form.cleaned_data[field].lower() for field in unique_check if field in form.cleaned_data])

I feel this is a hack and wrongly assumes that the database is case-insensitive. This is OK for my project, but not as a proper fix.

I'd appreciate some guidance on how this should be fixed in the general case to handle case-insensitive databases such as MySQL as well as case-sensitive databases such as PostgreSQL.

comment:2 Changed 3 years ago by Tim Graham

Do you really want that collation? Have you read the Django docs on the topic? I am not sure a specific mention of this problem is needed, but if you care to draft something, I'll happily review it.

It seems like a lot of complexity for Django to try to inspect MySQL's collation and alter its behavior based on that, and I'm not sure it's worth it.

comment:3 Changed 3 years ago by Jon Dufresne

Cc: jon.dufresne@… added

Do you really want that collation? Have you read the ​Django docs on the topic?

Thank you for pointing me to that documentation. I have read through it.

By "that collation" do you mean the default of utf8_general_ci?

To be honest, I find it a bit strange that Django is saying it can't work properly with MySQL--a supported database bacekend--in the default MySQL configuration. It seems to be suggesting I change my database collation to utf8_bin (which I can do if I must) but changing to that collation creates different set of issues (bytestrings vs Unicode strings). So I guess I have to choose which issues I can tolerate most.

I am not sure a specific mention of this problem is needed, but if you care to draft something, I'll happily review it.

Sure, I'll take a stab at it.

It seems like a lot of complexity for Django to try to inspect MySQL's collation and alter its behavior based on that, and I'm not sure it's worth it.

I agree we want to avoid inspecting MySQL's collation.

This is why I was wondering if there was any guidance to fix this in a general way that didn't look like a complete hack around MySQL specific behavior. Basically, I'm asking for some ideas. Once those ideas are presented, I don't mind doing the work to get them in place.

If it isn't worth it, that means this default database backend will knowingly break when used with useful Django features. Those useful features: model formsets and unique constraints.

comment:4 Changed 3 years ago by Tim Graham <timograham@…>

In 4d27d72d149b714431b77f2f15bad1591a9602b7:

Refs #23964 -- Added warning about case-insensitive, unique fields used with formsets

comment:5 Changed 3 years ago by Tim Graham <timograham@…>

In 4862eb6e5b6dae63e5c156fa1095c5b87503620b:

[1.7.x] Refs #23964 -- Added warning about case-insensitive, unique fields used with formsets

Backport of 4d27d72d149b714431b77f2f15bad1591a9602b7 from master

comment:6 Changed 3 years ago by Tim Graham <timograham@…>

In 30a12d6ca66ae2558740ef05f4a0b44566cfefa3:

[1.6.x] Refs #23964 -- Added warning about case-insensitive, unique fields used with formsets

Backport of 4d27d72d149b714431b77f2f15bad1591a9602b7 from master

comment:7 Changed 3 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

Thanks for the documentation patch. I don't see an easy way to solve the problem generally. An option on the formset won't work if multiple databases are in use and third-party apps cannot know what database they are running on. A model method like get_normalized_unique_value() could inspect the database indicated by the write router and handle database/table specific behavior. The formset could call that then. Perhaps try raising the idea on the DevelopersMailingList to get other opinions. I'll mark the ticket accepted for now, and we can close it later if we decide it isn't worthwhile.

Note: See TracTickets for help on using tickets.
Back to Top