Opened 3 years ago

Closed 3 years ago

#29693 closed Bug (invalid)

capfirst() corrupts text for languages without capital letters

Reported by: irakli khitarishvili Owned by: nobody
Component: Utilities Version: 2.1
Severity: Normal Keywords: translation localization corrupted character
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using Django admin for ka-ge language first characters gets corrupted, because language dose not have capital letters.
For example "username" in Georgian is "მომხმარებლის სახელი" and when first letter is capitalized it's displayed as "Სომხმარებლის სახელი".

You can test this case when "log in" page is displayed for admin.

In django.contrib.admin.forms.AuthenticationForm

    def __init__(self, request=None, *args, **kwargs):
        self.request = request
        self.user_cache = None
        super().__init__(*args, **kwargs)
        self.username_field = UserModel._meta.get_field(UserModel.USERNAME_FIELD)
        self.fields['username'].max_length = self.username_field.max_length or 254
        if self.fields['username'].label is None:
            # This is the line when character gets corrupted
            self.fields['username'].label = capfirst(self.username_field.verbose_name)

Attachments (1)

log_in_page.png (13.1 KB) - added by irakli khitarishvili 3 years ago.
screen of the page

Download all attachments as: .zip

Change History (8)

Changed 3 years ago by irakli khitarishvili

Attachment: log_in_page.png added

screen of the page

comment:1 Changed 3 years ago by Tim Graham

Component: contrib.adminUtilities
Summary: Django admin corrupted first character in verbose_name for languages without capital letterscapfirst() corrupts text for languages without capital letters
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Any idea how django.utils.text.capfirst() could be adapted to fix the issue? I was thinking maybe we could check ord() of the first character's upper() but I'm not sure if that's reliable.or not.

comment:2 Changed 3 years ago by Claude Paroz

I cannot reproduce this issue on a Linux/Python3.5 system.

>>> ord('მ')
4315
>>> ord('მ'.upper())
4315

What system are you using?

comment:3 Changed 3 years ago by irakli khitarishvili

Python : 3.7.0
System : Windows 7, Linux (Ubuntu 17.10)

comment:4 Changed 3 years ago by Claude Paroz

That's weird. However, I don't think that's something that Django can change. It must be related to the underlying Unicode library mappings.
Maybe other people can test to see if the issue you are seeing is the exception or the rule!

comment:5 Changed 3 years ago by Carlton Gibson

Repeating Claude's test on macOS with Python 3.6 and 3.7 and on Windows 10 (with cmd + Python 3.6 and WSL bash + Python 3.5) I get the same result:

>>> ord('მ')
4315
>>> ord('მ'.upper())
4315
Last edited 3 years ago by Carlton Gibson (previous) (diff)

comment:6 Changed 3 years ago by Tim Graham

Looks like Python 3.7 is required to reproduce. Bisected to https://github.com/python/cpython/commit/4705ea38c900f068fd262aca02943896d1123544 (update from Unicode 10 to 11).

On Ubuntu 18.04 with a manually installed Python 3.7:

>>> ord('მ'.upper())
7323
Last edited 3 years ago by Tim Graham (previous) (diff)

comment:7 Changed 3 years ago by Claude Paroz

Resolution: invalid
Status: newclosed

Maybe a Unicode bug, then? Looks like Მ / \u1C9B / https://unicode-table.com/en/1C9B/ has no information in Unicode. (not that Unicode does have capitals for Georgian, see e.g. https://unicode-table.com/en/10AB/).

Anyway, I fear the resolution will not take place in Django, but in Python or in Unicode data. I'd suggest trying to report the issue in Python.

Note: See TracTickets for help on using tickets.
Back to Top