Opened 9 years ago

Last modified 3 months ago

#23321 new Cleanup/optimization

Remove .mo files from the Django Git repository

Reported by: Claude Paroz Owned by: nobody
Component: Internationalization Version: dev
Severity: Normal Keywords:
Cc: slav0nic@…, Maciej Olko, Calidae Developers, Ningú Triage Stage: Someday/Maybe
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Binary/generated files are no good candidates to be included in a Git repository. They unnecessarily bloat the repository without added value.
It would be nice to compile those .mo files at package build time.

Change History (12)

comment:1 Changed 9 years ago by Sergey Maranchuk

Cc: slav0nic@… added

comment:2 Changed 9 years ago by Aymeric Augustin

That change would make it a bit more error-prone to work on i18n'd projects with the development version of Django.

I'm not saying we can't remove the .mo files, but we need to think about the consequences. They add some value.

comment:3 Changed 9 years ago by Claude Paroz

I think it would be possible to check the presence of .mo files in runserver and output an appropriate warning. I understand the commodity of having .mo files in the repo, but I don't think this justifies having generated binary files in a VCS.

comment:4 Changed 9 years ago by Claude Paroz

Here's a branch where I started working on this: https://github.com/claudep/django/tree/23321

comment:6 Changed 9 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

Code looks fine to me, but would be good to get an opinion from another person familiar with translations too.

comment:7 Changed 9 years ago by Jannis Leidel

Patch needs improvement: set
Triage Stage: Ready for checkinSomeday/Maybe

I don't think we should go that route as it would introduce a couple of issues that make it harder for our users and from a maintenance standpoint:

  • The most pressing issues IMO will show up for users that are using not-yet-released versions of Django, e.g. translators and contributors.
    • there are differences in gettext versions that we would not be able to fix
    • Windows users don't usually have gettext installed
  • The test system would have to compile the po files on every test run to make sure to have a consistent set to base tests on
  • Users on system with a non-writable file system may have problems with the subprocess call as part of trans_real.py
  • The Django release manager would have to have gettext installed and run an additional command to build the tarball, something that I think is better suited for the translation manager (who has to pull files from Transifex anyways)

I understand that having compiled files in a VCS aren't good, but the proposed plan doesn't convince me to drop the mo files.

If only we'd use Babel instead.. it does have the ability to compile po files to mo files without dependency on gettext.

comment:8 Changed 4 years ago by Carlton Gibson

On the repo size issue, for some occasions I've taken to cloning using the depth option, which restricts the fetched history. e.g. --depth=1000 is more than enough for a lot of cases. Perhaps we could add that as an example to the docs, so that folks don't need to clone the whole history. (?)

Last edited 4 years ago by Carlton Gibson (previous) (diff)

comment:9 Changed 3 months ago by Maciej Olko

Cc: Maciej Olko added

comment:10 Changed 3 months ago by Calidae Developers

Cc: Calidae Developers added

comment:11 Changed 3 months ago by Ningú

Cc: Ningú added

comment:12 Changed 3 months ago by Ningú

If one reasons about this as if we were speaking about a C extension, I think all those points made by Jannis Leidel do fall pretty short:

  • Yes, people working on a repositoy checkout instead of a public release will need the compilation toolchain. Yes, there will be sharp edges on certain platforms because of this and that is out of reach for the Django project.
  • Yes, the test system ought to compile those binaries each time. If that ever had a significant impact on CI times, just engineer a cache for both those files and the toolchain setup.
  • Yes, you need a writable filesystem to develop on a project. Whoever ships a Django checkout on a read-only FS should be responsible for compiling *.mo files before turning the FS read-only.
  • Yes, the release manager also needs the compilation toolchain. If that is cumbersome, just produce the packages on a CI pipeline; the release manager can then download, verify, sign and publish those if your workflow requires that. Otherwise just publish them from the CI as well!

Replacing gettext with babel might alleviate some of this but IMHO that should exclusively be a a build-time dependency and never a run-time dependency, just as gettext. A lot has been going on in the packaging scene since Claude's PR, but now I'd depict this as a build-system requirement
`
[build-system]
requires = ['setuptools>=40.8.0', 'babel>=2']
build-backend = 'setuptools.build_meta'
`
and then tell the build backend (not necessarily setuptools) to produce *.mo files when building a wheel distribution. Either gettext or babel would be a requirement to build either a Django checkout or a source distribution. This would be a better fit for PEP-517 and require less documentation than reminding people to compilemessages before installing or packaging Django while tox could be responsible for producing *.mo files in the CI. But maybe this is an over-engineered idea.

I have a sense this is not addressed because of certain FUD while obviating real recurring "mo and po files out of sync" issues in the whole django ecosystem https://code.djangoproject.com/ticket/8732 . Yes, contributors will be pushed a new build-time dependency if they expect their non-wheel installs to be localized. As it should have always been! Translators should be familiar with gettext anyway, irrespective of their platform.

Last edited 3 months ago by Ningú (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top