Opened 5 years ago

Last modified 15 months ago

#13339 new Bug

Date(Time)Field.to_python() fails to parse localized month names

Reported by: UloPe Owned by: nobody
Component: Forms Version: 1.1
Severity: Normal Keywords: i18n l10n
Cc: Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Date(Time)Field.to_python() uses time.strptime to try and parse the input values.

If the input string contains a localized month name (e.g. "Dezember" for December in German) validation fails. This probably is due to strptime relying on the current locale to parse month names.

Test showing this behaviour is attached.

(N.B.: I'm not really clear on why the reverse operation in formats.localize_input() works correctly)

Attachments (1)

ticket_13339_l10n_month_tests.diff (2.1 KB) - added by UloPe 5 years ago.

Download all attachments as: .zip

Change History (22)

Changed 5 years ago by UloPe

comment:1 Changed 5 years ago by russellm

  • milestone 1.2 deleted
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to wontfix
  • Status changed from new to closed

It isn't clear to me that Date(Time)Field *should* be using localization in to_python(). L10N belongs in the forms framework because that is the user-facing interface that should be used to gather data. The to_python() method accepts strings for historical purposes, and to support serialization; neither of these uses support the idea of using localization.

comment:2 Changed 5 years ago by UloPe

  • milestone set to 1.2
  • Resolution wontfix deleted
  • Status changed from closed to reopened

I guess there's a missunderstanding here.

What I'm talking about *is* forms.fields.DateField

comment:3 follow-up: Changed 5 years ago by russellm

  • milestone 1.2 deleted
  • Resolution set to invalid
  • Status changed from reopened to closed

Well, then there is a *big* misunderstanding - because forms.DateField doesn't have a to_python() method.

comment:4 in reply to: ↑ 3 Changed 5 years ago by kmtracey

Replying to russellm:

Well, then there is a *big* misunderstanding - because forms.DateField doesn't have a to_python() method.

??? Yes, it does: http://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L320

comment:5 Changed 5 years ago by russellm

  • milestone set to 1.2
  • Resolution invalid deleted
  • Status changed from closed to reopened

For my next trick, I will attempt to put my entire lower leg into my mouth.

My apologies - there is a to_python() method in trunk; I was checking against v1.1 source code.

comment:6 Changed 5 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

Ok - This is closely related to #12986, but it appears to be a separate problem.

comment:7 Changed 5 years ago by russellm

Ok - to fill in some blanks that have been discussed on IRC:

The issue is that strptime() uses the locale to perform translations. The locale operates across threads (so it isn't threadsafe), and it is expensive to set and unset. Therefore, Python doesn't provide a way to parse dates that is sensitive to a locale of choice - it assumes that an application will require a single locale, not that different threads in a single application will need different locales.

strptime() doesn't provide any hooks to control locale, and the implementation uses all sorts of global variables to control ; the only two solutions are:

  1. Reimplement strptime with locale sensitive
  2. Provide a translation layer to convert a user-specified date into a language parseable by strptime()
  3. Call this a known limitation of the L10N implementation (i.e., that you can't parse dates with %B or any of the text-based date format specifiers).

1 will be a lot of effort, especially at this late stage in development.

2 is very messy, and is also quite complex (since you need to provide LANGUAGE_CODE->Locale translations, not LANGUAGE_CODE->EN translations)

This leaves 3 as the only viable option at this point. Other suggestions welcome, but barring a better suggestion, we'll just have to document the limitation and call this a known issue.

comment:8 Changed 5 years ago by bcurtu

I have met this problem dealing with Paypal date format ("%H:%M:%S %b %d, %Y PDT" => 12:34:32 Apr 12, 2010 PDT) That's actually an stupid format, but it's paypal so... Anyway, I work with spanish locale, so it crashed. My workaround was really dirty, but it's the only way I got it to work (for django1.1):

In django/form/fields.py, in DateTimeField clean method, line 387:

        for format in self.input_formats: 
            tmp_value = value
            try:
                try:
                    return datetime.datetime(*time.strptime(tmp_value, format)[:6])
                except:
                    if '%b' in format:
                        tmp_value = tmp_value.replace('Apr','Abr').replace('Aug','Ago').replace('Dec','Dic').replace('Jan','Ene') #translate here your months :(
                    return datetime.datetime(*time.strptime(tmp_value, format)[:6])
            except ValueError:
                continue

comment:9 Changed 5 years ago by RoySmith

Just want to point out that bcurtu's workaround has a potential problem. If your locale is such that the set of month names and the set of time zone names intersect, you'll replace the time zone as well as the month. That would not be good.

comment:10 Changed 5 years ago by Reflejo

I don't know exactly what is the approach here but I assume that the same locale is used across all threads.

As russellm says, strptime uses some global variables as cache. But actually, there is a dirty hack that you can do if you don't want to set locale

<dirty alert>

>>> import time
>>> import _strptime

# English here
>>> time.strptime('December', '%B')
time.struct_time(tm_year=1900, tm_mon=12, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=335, tm_isdst=-1)

>>> _strptime._TimeRE_cache.locale_time.f_month = ['', 'enero', 'febrero', 'marzo', 'abril', 'mayo', 'junio', 'julio', 'agosto', 'septiembre', 'octubre', 'noviembre', 'diciembre']
>>> _strptime._TimeRE_cache.locale_time.a_month = ['', 'jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
>>> _strptime._TimeRE_cache = _strptime.TimeRE(_strptime._TimeRE_cache.locale_time)
>>> _strptime._regex_cache = {}

# Spanish here

>>> time.strptime('Diciembre', '%B')
time.struct_time(tm_year=1900, tm_mon=12, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=335, tm_isdst=-1)

</dirty alert>

comment:11 Changed 5 years ago by brettcannon

As the original author of strptime() in Python, Alex Gaynor asked me to comment here. Basically the comments about strptime() are accurate; it assumes a single locale for the entire process (which can be changed, but that isn't useful in a threaded situation) just as Python does thanks to using C-level locale functions.

Allowing strptime() accept a specific locale to parse against could possibly get fixed in a future version of Python if strptime() on the datetime module accepted a locale argument. Then it could (in an non-thread-safe way) change the locale, calculate everything it needs, cache it, switch back to the proper locale, and then do the processing. The problem is it wouldn't be thread-safe, but since locale stuff already isn't already thread-safe then maybe that is not such a big deal. But at best that would be a Python 3.2 or later feature.

comment:12 Changed 5 years ago by CollinAnderson

  • Cc CollinAnderson added

comment:13 Changed 5 years ago by russellm

(In [13039]) Refs #13339 -- Disable %b/%B-based locale datetime input formats, and document that they are problematic.

comment:14 Changed 5 years ago by russellm

  • milestone 1.2 deleted

Moving off the 1.2 milestone; a fix to allow %B/%b (as well as %a/%A and %p for that matter) will require a whole lot more work, but isn't critical for 1.2

comment:15 Changed 5 years ago by russellm

Related issue: #13437, which also requires a reimplementation of strptime to handle L10N edge cases.

comment:16 Changed 5 years ago by CollinAnderson

  • Cc CollinAnderson removed

comment:17 Changed 4 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:18 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:19 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:20 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:21 Changed 15 months ago by claudep

  • Triage Stage changed from Accepted to Someday/Maybe
Note: See TracTickets for help on using tickets.
Back to Top