Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#16878 closed New feature (fixed)

Improve the "intword" filter

Reported by: Rohan Jain Owned by: Rohan Jain
Component: contrib.humanize Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The current code for intword function is repetitive. For each name,
same code block is re-written.

The patch from this pull request,
makes the code more DRY and also allowes the use words which
represent much larger numbers then trillion. In the code, if the lots of
zeroes don't look too good, they can be replaced to exponent form and used
as 10**n. I used zeroes thinking that might save some compution. The
test in other commit can be neglected if not needed.

Attachments (1)

16878.diff (3.3 KB) - added by Julien Phalip 5 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 5 years ago by Carl Meyer

Patch needs improvement: set
Triage Stage: UnreviewedAccepted
Version: 1.3SVN

Accepting, though I think this is pretty low priority.

There should be tests added to cover the newly-supported bigger numbers, but they should go in regressiontests/humanize/tests.py with the existing intword tests.

comment:2 Changed 5 years ago by Rohan Jain

I understand this is not of much importance. I was going through the code and found this so wrote a quick patch.

I have fixed the code to pass the translation regression tests and updated the tests to cover some new larger numbers. Also to make it look cleaner updated the code so that in the conversions list the exponent to 10 is specified instead of numbers with large number of zeroes.

comment:3 Changed 5 years ago by Jannis Leidel

Needs documentation: set
Needs tests: set

The proposed patch (on Github) is incorrect in a number of ways:

  1. even if they are very big numbers, they still have a singular and a plural in some languages (e.g, "Septillion" and "Septillionen" in German)
  1. it doesn't actually mark the number words for translation since you are using string concatenation to create the strings to be passed to ungettext
  1. this needs more thorough tests with another (non-English) language
Last edited 5 years ago by Jannis Leidel (previous) (diff)

comment:4 Changed 5 years ago by Jannis Leidel

Type: Cleanup/optimizationNew feature

comment:5 Changed 5 years ago by Rohan Jain

Updated the code to user lamda functions for each large number from the list.
Commit (on gihub): [d81c5566e](https://github.com/crodjer/django/commit/d81c5566e28dba8330798bed7f8f4ce83001c738)

comment:6 Changed 5 years ago by Jannis Leidel

The patch still needs improvement:

  • The ungettext call shouldn't only contain the number word (e.g. ungettext('billion', 'billion', n)) but also a placeholder for the value (e.g.ungettext('%(value)s billion', '%(value)s billion', n). The reason is simple: there are languages which don't follow the same order of words of "<value> <numberword>". That's why It should be left to the translator to decide it.
  • You don't have to update all the po files manually, updating the base translation file (the English po file) is enough since that's what the Django project on transifex.net will pickup automatically.

comment:7 Changed 5 years ago by Rohan Jain

To take care of the "<value> <numberword>" issue exactly, I have kept another two ungettext calls '%(value)s %(suffix)s', '%(value)s %(suffix)s' (lines 98,99) which generate a translation for specifying that in the translation files. So apart from the definations of large number words, the format can also be specified with it.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

comment:8 in reply to:  7 ; Changed 5 years ago by Jannis Leidel

Replying to crodjer:

To take care of the "<value> <numberword>" issue exactly, I have kept another two ungettext calls '%(value)s %(suffix)s', '%(value)s %(suffix)s' (lines 98,99) which generate a translation for specifying that in the translation files. So apart from the definations of large number words, the format can also be specified with it.

That's not enough, the string needs to be one and only one string to be translated, not something that consists of two separate strings.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

Right, please revert the changes done to the other po files. You can prevent updating all those files by passing the specific language you want makemessages to update with the -l option.

comment:9 in reply to:  8 Changed 5 years ago by Rohan Jain

Replying to jezdez:

That's not enough, the string needs to be one and only one string to be translated, not something that consists of two separate strings.

In the commit, I have manually edited only one language translation file (de) for use in the i18n regressions tests. Rest are all autogenerated.

Right, please revert the changes done to the other po files. You can prevent updating all those files by passing the specific language you want makemessages to update with the -l option.

Updated the code according to your suggestions. Now no changes in existing translations would be needed, just additions maybe for largeer number strings.

comment:10 Changed 5 years ago by Jannis Leidel

Resolution: fixed
Status: newclosed

In [16897]:

(The changeset message doesn't reference this ticket)

comment:11 Changed 5 years ago by Julien Phalip

Resolution: fixed
Status: closedreopened

The humanize tests are failing, most likely because intword_converters.items() returns the items in the wrong order. The attached patch makes used of SortedDict to enforce a deterministic order.

Changed 5 years ago by Julien Phalip

Attachment: 16878.diff added

comment:12 Changed 5 years ago by Jannis Leidel

Resolution: worksforme
Status: reopenedclosed

I don't see how this patch would fix the error we see on the buildbout: http://ci.djangoproject.com/job/Django/296/database=sqlite3,python=python2.6/console

I think this is related to stale files on the buildbot due to moving the test directory, not the order. At least I can't reproduce a situation in which the order (as tried to be fixed by @julienphalip's patch) would be of importance to the intword filter.

comment:13 Changed 5 years ago by Jannis Leidel

Resolution: worksforme
Status: closedreopened

So yeah, after having tested this on Python 2.5 I see the error after all:

Creating test database for alias 'default'...
Creating test database for alias 'other'...
..F.F.....
======================================================================
FAIL: test_i18n_intword (django.contrib.humanize.tests.HumanizeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 82, in test_i18n_intword
    self.humanize_tester(test_list, result_list, 'intword')
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 22, in humanize_tester
    msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: u'0,0 decillion' != u'1,0 Million' : intword test failed, produced '0,0 decillion', should've produced '1,0 Million'

======================================================================
FAIL: test_intword (django.contrib.humanize.tests.HumanizeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 64, in test_intword
    self.humanize_tester(test_list, result_list, 'intword')
  File "/Users/jezdez/Code/git/django/django/contrib/humanize/tests.py", line 22, in humanize_tester
    msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: u'0.0 decillion' != u'1.0 million' : intword test failed, produced '0.0 decillion', should've produced '1.0 million'

----------------------------------------------------------------------
Ran 10 tests in 0.043s

FAILED (failures=2)

Re-re-opening :)

comment:14 Changed 5 years ago by Julien Phalip

Resolution: fixed
Status: reopenedclosed

Fixed in r16900, thanks! :)

Note: See TracTickets for help on using tickets.
Back to Top