Opened 10 years ago

Last modified 3 months ago

#23321 new Cleanup/optimization

Remove .mo files from the Django Git repository

Reported by: Claude Paroz Owned by: nobody
Component: Internationalization Version: dev
Severity: Normal Keywords:
Cc: slav0nic@…, Maciej Olko, Calidae Developers, Ningú Triage Stage: Someday/Maybe
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Binary/generated files are no good candidates to be included in a Git repository. They unnecessarily bloat the repository without added value.
It would be nice to compile those .mo files at package build time.

Change History (15)

comment:1 by Sergey Maranchuk, 10 years ago

Cc: slav0nic@… added

comment:2 by Aymeric Augustin, 10 years ago

That change would make it a bit more error-prone to work on i18n'd projects with the development version of Django.

I'm not saying we can't remove the .mo files, but we need to think about the consequences. They add some value.

comment:3 by Claude Paroz, 10 years ago

I think it would be possible to check the presence of .mo files in runserver and output an appropriate warning. I understand the commodity of having .mo files in the repo, but I don't think this justifies having generated binary files in a VCS.

comment:4 by Claude Paroz, 10 years ago

Here's a branch where I started working on this: https://github.com/claudep/django/tree/23321

comment:6 by Tim Graham, 9 years ago

Triage Stage: AcceptedReady for checkin

Code looks fine to me, but would be good to get an opinion from another person familiar with translations too.

comment:7 by Jannis Leidel, 9 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinSomeday/Maybe

I don't think we should go that route as it would introduce a couple of issues that make it harder for our users and from a maintenance standpoint:

  • The most pressing issues IMO will show up for users that are using not-yet-released versions of Django, e.g. translators and contributors.
    • there are differences in gettext versions that we would not be able to fix
    • Windows users don't usually have gettext installed
  • The test system would have to compile the po files on every test run to make sure to have a consistent set to base tests on
  • Users on system with a non-writable file system may have problems with the subprocess call as part of trans_real.py
  • The Django release manager would have to have gettext installed and run an additional command to build the tarball, something that I think is better suited for the translation manager (who has to pull files from Transifex anyways)

I understand that having compiled files in a VCS aren't good, but the proposed plan doesn't convince me to drop the mo files.

If only we'd use Babel instead.. it does have the ability to compile po files to mo files without dependency on gettext.

comment:8 by Carlton Gibson, 4 years ago

On the repo size issue, I've taken to cloning using the depth option, which restricts the fetched history. e.g. --depth=1000 is more than enough for a lot of cases. Perhaps we could add that as an example to the docs, so that folks don't need to clone the whole history. (?)

Version 0, edited 4 years ago by Carlton Gibson (next)

comment:9 by Maciej Olko, 8 months ago

Cc: Maciej Olko added

comment:10 by Calidae Developers, 8 months ago

Cc: Calidae Developers added

comment:11 by Ningú, 8 months ago

Cc: Ningú added

comment:12 by Ningú, 8 months ago

If one reasons about this as if we were speaking about a C extension, I think all those points made by Jannis Leidel do fall pretty short:

  • Yes, people working on a repositoy checkout instead of a public release will need the compilation toolchain. Yes, there will be sharp edges on certain platforms because of this and that is out of reach for the Django project.
  • Yes, the test system ought to compile those binaries each time. If that ever had a significant impact on CI times, just engineer a cache for both those files and the toolchain setup.
  • Yes, you need a writable filesystem to develop on a project. Whoever ships a Django checkout on a read-only FS should be responsible for compiling *.mo files before turning the FS read-only.
  • Yes, the release manager also needs the compilation toolchain. If that is cumbersome, just produce the packages on a CI pipeline; the release manager can then download, verify, sign and publish those if your workflow requires that. Otherwise just publish them from the CI as well!

Replacing gettext with babel might alleviate some of this but IMHO that should exclusively be a a build-time dependency and never a run-time dependency, just as gettext. A lot has been going on in the packaging scene since Claude's PR, but now I'd depict this as a build-system requirement
`
[build-system]
requires = ['setuptools>=40.8.0', 'babel>=2']
build-backend = 'setuptools.build_meta'
`
and then tell the build backend (not necessarily setuptools) to produce *.mo files when building a wheel distribution. Either gettext or babel would be a requirement to build either a Django checkout or a source distribution. This would be a better fit for PEP-517 and require less documentation than reminding people to compilemessages before installing or packaging Django while tox could be responsible for producing *.mo files in the CI. But maybe this is an over-engineered idea.

I have a sense this is not addressed because of certain FUD while obviating real recurring "mo and po files out of sync" issues in the whole django ecosystem https://code.djangoproject.com/ticket/8732 . Yes, contributors will be pushed a new build-time dependency if they expect their non-wheel installs to be localized. As it should have always been! Translators should be familiar with gettext anyway, irrespective of their platform.

Last edited 8 months ago by Ningú (previous) (diff)

comment:13 by Natalia Bidart, 4 months ago

Hello everyone!

As a Django release manager, and as someone who went thru the super painful process of incorporating translations from Transifex into Django 5.0, I'd like to express a big +1 to remove the .mo files from the Django source. I agree with Ningú's counterpoints to Jannis Leidel's comment:7, and also I'd like to add:

  • Automatic compiling .po files when running tests would not add noticeable overhead since we could add a flag to use the existing ones if available (like keep-db, perhaps on by default)
  • Manually compiling .po files when developing Django and/or a from-repo version of Django *and* working on i18n related issues, feels natural to me (as long as we properly document this)
  • I haven't used Babel before but even assuming that this is a superior lib, I think that that migration should be treated and pushed forward as an orthogonal issue and not block improvements to our current (sometimes painful) translations machinery.

Claude, question: in your PR, why are you favoring using msgfmt directly instead of using the Django compilemessages command (perhaps its internal compile_messages helper)?

comment:14 by Claude Paroz, 4 months ago

Claude, question: in your PR, why are you favoring using msgfmt directly instead of using the Django compilemessages command (perhaps its internal compile_messages helper)?

Probably because I estimated at the time we didn't need all bells and whistles from compilemessages. May be re-tested.

in reply to:  13 comment:15 by Ningú, 3 months ago

Replying to Natalia Bidart:

Claude, question: in your PR, why are you favoring using msgfmt directly instead of using the Django compilemessages command (perhaps its internal compile_messages helper)?

At the time I experimented my build-system requirement idea on a third party package (but didn't push it forward): https://github.com/farridav/django-jazzmin/pull/526/commits/d0ff328a46b21410c491a7daf6d92c0c44c88543

While using Django's compilemessages was convenient, it already felt weird to use django as a build dependency because I felt porting this approach to Django itself would be sort of a red flag (I couldn't build django because I'd require django having been built?). Claude's msgfmt felt unfamiliar but the right way nonetheless.

Also agree on all points made by Natalia Bidart, specially Babel adoption being an orthogonal issue.

Note: See TracTickets for help on using tickets.
Back to Top