Opened 14 months ago

Closed 9 months ago

Last modified 9 months ago

#22305 closed Cleanup/optimization (fixed)

MaxLengthValidator doesn't take database encoding into account

Reported by: joeri Owned by: nobody
Component: Documentation Version: master
Severity: Normal Keywords:
Cc: joeri Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Interesting issue we came across today. Consider the following:

>>> from django.db import models
>>> from django.forms.models import modelform_factory

>>> class Pizza(models.Model):
>>>    name = models.CharField(max_length=10)  # Short pizza names ftw.

>>> form = modelform_factory(Pizza)

>>> pizza_name = u'mozzarélla'  # Note the special character.
>>> f = form(data={'name': pizza_name})
>>> f.is_valid()
True

However, when form is saved to the database you will see: DataError: value too long for type character varying(10).

Why? Because the form's MaxLengthValidator sees:

>>> len(pizza_name)  # Woop, it fits!
10

And the database sees (assuming it uses UTF-8 encoding) this:

>>> len(pizza_name.encode('utf-8'))  # Oops, does not fit!
11

A solution would be to replace len(x) in the MaxLengthValidator to len(x.encode('utf-8')). This however does not take the actual database encoding into account...

Change History (8)

comment:1 Changed 14 months ago by bmispelon

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to worksforme
  • Status changed from new to closed

Hi,

Your database should validate the length of the text, not of the corresponding bytes.

I tried your code and it works fine for me (on sqlite and postgres) so there must be something else at play.

Could you show the code you're using to trigger the error and the corresponding traceback (and reopen this ticket when you do so)?

Thanks.

comment:2 Changed 13 months ago by joeri

  • Cc joeri added
  • Resolution worksforme deleted
  • Status changed from closed to new

Hi,

I was a bit incomplete in my ticket. We are using PostgreSQL 9.1 and we traced the problem to the database charset, which was set to SQL_ASCII (for some reason template0 charset was set to this and used to create the db). You would still expect everything to work on the db end, or that your ModelForm catches the error.

Edge case, so maybe a note in the docs would suffice.

Here's the unit tests that I ran against a PostgreSQL db with SQL_ASCII encoding:

# -*- coding: utf-8 -*-
from django.test import TestCase
from django.db import models
from django.forms.models import modelform_factory


class Pizza(models.Model):
    name = models.CharField(max_length=10)  # Short pizza names ftw.


class MyTests(TestCase):
    def test_form_to_db(self):
        form = modelform_factory(Pizza)
        pizza_name = u'mozzarélla'
        f = form(data={'name': pizza_name})

        self.assertTrue(f.is_valid())
        f.save()  # Gives an error

And here's the result of the test:

$ python src/manage.py test
Creating test database for alias 'default'...
E
======================================================================
ERROR: test_form_to_db (tests.MyTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joeri/playground/django-1.6/ticket22305/tests.py", line 18, in test_form
    f.save()
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/forms/models.py", line 446, in save
    construct=False)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/forms/models.py", line 99, in save_instance
    instance.save()
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/base.py", line 654, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/base.py", line 687, in _do_insert
    using=using, raw=raw)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/query.py", line 1511, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
    cursor.execute(sql, params)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/joeri/playground/django-1.6/env/local/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
DataError: value too long for type character varying(10)

Reopening the ticket.

comment:3 Changed 13 months ago by mjtamlyn

  • Component changed from Forms to Documentation
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Bug to Cleanup/optimization

Marking this as a documentation issue. Django assumes UTF8 everywhere and takes great pains to ensure you can assume this. All we need to do is add a small note that your database should be configured to use UTF8 as well.

Despite anything else, if a user enters mozzarélla and gets an error saying they must only type 10 characters that is horrible UX.

comment:4 Changed 12 months ago by shai

I just closed ticket 17202, which (in a comment) complains about wrong length repored in introspection when a MySQL table is defined to use latin1, as a duplicate of this bug.

comment:5 Changed 9 months ago by nip3o

  • Has patch set
  • Version changed from 1.6 to master

comment:6 Changed 9 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 08b85de9b7a8940702dba9348b642538da888c6c:

Fixed #22305 -- Added note to docs about UTF8 database requirement.

comment:7 Changed 9 months ago by Tim Graham <timograham@…>

In 1c714c18d22a38ad2dca696ca8bc4201cc3da131:

[1.6.x] Fixed #22305 -- Added note to docs about UTF8 database requirement.

Backport of 08b85de9b7 from master

comment:8 Changed 9 months ago by Tim Graham <timograham@…>

In ddf2b7d96b1048b1210d2315d92275496352ccaf:

[1.7.x] Fixed #22305 -- Added note to docs about UTF8 database requirement.

Backport of 08b85de9b7 from master

Note: See TracTickets for help on using tickets.
Back to Top