Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16878 closed New feature (fixed)

Improve the "intword" filter

Reported by: crodjer Owned by: crodjer
Component: contrib.humanize Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The current code for intword function is repetitive. For each name,
same code block is re-written.

The patch from this pull request,
makes the code more DRY and also allowes the use words which
represent much larger numbers then trillion. In the code, if the lots of
zeroes don't look too good, they can be replaced to exponent form and used
as 10**n. I used zeroes thinking that might save some compution. The
test in other commit can be neglected if not needed.

Attachments (1)

16878.diff (3.3 KB) - added by julien 3 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 3 years ago by carljm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.3 to SVN

Accepting, though I think this is pretty low priority.

There should be tests added to cover the newly-supported bigger numbers, but they should go in regressiontests/humanize/tests.py with the existing intword tests.

comment:2 Changed 3 years ago by crodjer

I understand this is not of much importance. I was going through the code and found this so wrote a quick patch.

I have fixed the code to pass the translation regression tests and updated the tests to cover some new larger numbers. Also to make it look cleaner updated the code so that in the conversions list the exponent to 10 is specified instead of numbers with large number of zeroes.

comment:3 Changed 3 years ago by jezdez

  • Needs documentation set
  • Needs tests set

The proposed patch (on Github) is incorrect in a number of ways:

  1. even if they are very big numbers, they still have a singular and a plural in some languages (e.g, "Septillion" and "Septillionen" in German)
  1. it doesn't actually mark the number words for translation since you are using string concatenation to create the strings to be passed to ungettext
  1. this needs more thorough tests with another (non-English) language
Last edited 3 years ago by jezdez (previous) (diff)

comment:4 Changed 3 years ago by jezdez

  • Type changed from Cleanup/optimization to New feature

comment:5 Changed 3 years ago by crodjer

Updated the code to user lamda functions for each large number from the list.
Commit (on gihub): [d81c5566e](https://github.com/crodjer/django/commit/d81c5566e28dba8330798bed7f8f4ce83001c738)

comment:6 Changed 3 years ago by jezdez

The patch still needs improvement:

  • The ungettext call shouldn't only contain the number word (e.g. ungettext('billion', 'billion', n)) but also a placeholder for the value (e.g.ungettext('%(value)s billion', '%(value)s billion', n). The reason is simple: there are languages which don't follow the same order of words of "<value> <numberword>". That's why It should be left to the translator to decide it.
  • You don't have to update all the po files manually, updating the base translation file (the English po file) is enough since that's what the Django project on transifex.net will pickup automatically.

comment:7 follow-up: Changed 3 years ago by crodjer

To take care of the "<value> <numberword>" issue exactly, I have kept another two ungettext calls '%(value)s %(suffix)s', '%(value)s %(suffix)s' (lines 98,99) which generate a translation for specifying that in the translation files. So apart from the definations of large number words, the format can also be specified with it.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 3 years ago by jezdez

Replying to crodjer:

To take care of the "<value> <numberword>" issue exactly, I have kept another two ungettext calls '%(value)s %(suffix)s', '%(value)s %(suffix)s' (lines 98,99) which generate a translation for specifying that in the translation files. So apart from the definations of large number words, the format can also be specified with it.

That's not enough, the string needs to be one and only one string to be translated, not something that consists of two separate strings.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

Right, please revert the changes done to the other po files. You can prevent updating all those files by passing the specific language you want makemessages to update with the -l option.

comment:9 in reply to: ↑ 8 Changed 3 years ago by crodjer

Replying to jezdez:

That's not enough, the string needs to be one and only one string to be translated, not something that consists of two separate strings.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

Right, please revert the changes done to the other po files. You can prevent updating all those files by passing the specific language you want makemessages to update with the -l option.

Updated the code according to your suggestions. Now no changes in existing translations would be needed, just additions maybe for largeer number strings.

comment:10 Changed 3 years ago by jezdez

  • Resolution set to fixed
  • Status changed from new to closed

In [16897]:

(The changeset message doesn't reference this ticket)

comment:11 Changed 3 years ago by julien

  • Resolution fixed deleted
  • Status changed from closed to reopened

The humanize tests are failing, most likely because intword_converters.items() returns the items in the wrong order. The attached patch makes used of SortedDict to enforce a deterministic order.

Changed 3 years ago by julien

comment:12 Changed 3 years ago by jezdez

  • Resolution set to worksforme
  • Status changed from reopened to closed

I don't see how this patch would fix the error we see on the buildbout: http://ci.djangoproject.com/job/Django/296/database=sqlite3,python=python2.6/console

I think this is related to stale files on the buildbot due to moving the test directory, not the order. At least I can't reproduce a situation in which the order (as tried to be fixed by @julienphalip's patch) would be of importance to the intword filter.

comment:13 Changed 3 years ago by jezdez

  • Resolution worksforme deleted
  • Status changed from closed to reopened

So yeah, after having tested this on Python 2.5 I see the error after all:

Creating test database for alias 'default'...
Creating test database for alias 'other'...
..F.F.....
======================================================================
FAIL: test_i18n_intword (django.contrib.humanize.tests.HumanizeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 82, in test_i18n_intword
    self.humanize_tester(test_list, result_list, 'intword')
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 22, in humanize_tester
    msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: u'0,0 decillion' != u'1,0 Million' : intword test failed, produced '0,0 decillion', should've produced '1,0 Million'

======================================================================
FAIL: test_intword (django.contrib.humanize.tests.HumanizeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 64, in test_intword
    self.humanize_tester(test_list, result_list, 'intword')
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 22, in humanize_tester
    msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: u'0.0 decillion' != u'1.0 million' : intword test failed, produced '0.0 decillion', should've produced '1.0 million'

----------------------------------------------------------------------
Ran 10 tests in 0.043s

FAILED (failures=2)

Re-re-opening :)

comment:14 Changed 3 years ago by julien

  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed in r16900, thanks! :)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.