Opened 7 years ago
Closed 7 years ago
#29693 closed Bug (invalid)
capfirst() corrupts text for languages without capital letters
| Reported by: | irakli khitarishvili | Owned by: | nobody | 
|---|---|---|---|
| Component: | Utilities | Version: | 2.1 | 
| Severity: | Normal | Keywords: | translation localization corrupted character | 
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
When using Django admin for ka-ge language first characters gets corrupted, because language dose not have capital letters.
For example "username" in Georgian is "მომხმარებლის სახელი" and when first letter is capitalized it's displayed as "Სომხმარებლის სახელი".
You can test this case when "log in" page is displayed for admin.
In django.contrib.admin.forms.AuthenticationForm
    def __init__(self, request=None, *args, **kwargs):
        self.request = request
        self.user_cache = None
        super().__init__(*args, **kwargs)
        self.username_field = UserModel._meta.get_field(UserModel.USERNAME_FIELD)
        self.fields['username'].max_length = self.username_field.max_length or 254
        if self.fields['username'].label is None:
            # This is the line when character gets corrupted
            self.fields['username'].label = capfirst(self.username_field.verbose_name)
      Attachments (1)
Change History (8)
by , 7 years ago
| Attachment: | log_in_page.png added | 
|---|
comment:1 by , 7 years ago
| Component: | contrib.admin → Utilities | 
|---|---|
| Summary: | Django admin corrupted first character in verbose_name for languages without capital letters → capfirst() corrupts text for languages without capital letters | 
| Triage Stage: | Unreviewed → Accepted | 
| Type: | Uncategorized → Bug | 
Any idea how django.utils.text.capfirst() could be adapted to fix the issue? I was thinking maybe we could check ord() of the first character's upper() but I'm not sure if that's reliable.or not.
comment:2 by , 7 years ago
I cannot reproduce this issue on a Linux/Python3.5 system.
>>> ord('მ')
4315
>>> ord('მ'.upper())
4315
What system are you using?
comment:4 by , 7 years ago
That's weird. However, I don't think that's something that Django can change. It must be related to the underlying Unicode library mappings.
Maybe other people can test to see if the issue you are seeing is the exception or the rule!
comment:5 by , 7 years ago
Repeating Claude's test on macOS with Python 3.6 and 3.7 and on Windows 10 (with cmd + Python 3.6 and WSL bash + Python 3.5) I get the same result: 
>>> ord('მ')
4315
>>> ord('მ'.upper())
4315
comment:6 by , 7 years ago
Looks like Python 3.7 is required to reproduce. Bisected to https://github.com/python/cpython/commit/4705ea38c900f068fd262aca02943896d1123544 (update from Unicode 10 to 11).
On Ubuntu 18.04 with a manually installed Python 3.7:
>>> ord('მ'.upper())
7323
comment:7 by , 7 years ago
| Resolution: | → invalid | 
|---|---|
| Status: | new → closed | 
Maybe a Unicode bug, then? Looks like Მ / \u1C9B / https://unicode-table.com/en/1C9B/ has no information in Unicode. (not that Unicode does have capitals for Georgian, see e.g. https://unicode-table.com/en/10AB/).
Anyway, I fear the resolution will not take place in Django, but in Python or in Unicode data. I'd suggest trying to report the issue in Python.
screen of the page