Opened 10 years ago
Last modified 10 months ago
#23321 new Cleanup/optimization
Remove .mo files from the Django Git repository
Reported by: | Claude Paroz | Owned by: | nobody |
---|---|---|---|
Component: | Internationalization | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | slav0nic@…, Maciej Olko, Calidae Developers, Ningú | Triage Stage: | Someday/Maybe |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Binary/generated files are no good candidates to be included in a Git repository. They unnecessarily bloat the repository without added value.
It would be nice to compile those .mo files at package build time.
Change History (15)
comment:1 by , 10 years ago
Cc: | added |
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
I think it would be possible to check the presence of .mo files in runserver
and output an appropriate warning. I understand the commodity of having .mo files in the repo, but I don't think this justifies having generated binary files in a VCS.
comment:4 by , 10 years ago
Here's a branch where I started working on this: https://github.com/claudep/django/tree/23321
comment:6 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Code looks fine to me, but would be good to get an opinion from another person familiar with translations too.
comment:7 by , 10 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Someday/Maybe |
I don't think we should go that route as it would introduce a couple of issues that make it harder for our users and from a maintenance standpoint:
- The most pressing issues IMO will show up for users that are using not-yet-released versions of Django, e.g. translators and contributors.
- there are differences in gettext versions that we would not be able to fix
- Windows users don't usually have gettext installed
- The test system would have to compile the po files on every test run to make sure to have a consistent set to base tests on
- Users on system with a non-writable file system may have problems with the subprocess call as part of trans_real.py
- The Django release manager would have to have gettext installed and run an additional command to build the tarball, something that I think is better suited for the translation manager (who has to pull files from Transifex anyways)
I understand that having compiled files in a VCS aren't good, but the proposed plan doesn't convince me to drop the mo files.
If only we'd use Babel instead.. it does have the ability to compile po files to mo files without dependency on gettext.
comment:8 by , 5 years ago
On the repo size issue, for some occasions I've taken to cloning using the depth
option, which restricts the fetched history. e.g. --depth=1000
is more than enough for a lot of cases. Perhaps we could add that as an example to the docs, so that folks don't need to clone the whole history. (?)
comment:9 by , 15 months ago
Cc: | added |
---|
comment:10 by , 15 months ago
Cc: | added |
---|
comment:11 by , 15 months ago
Cc: | added |
---|
comment:12 by , 15 months ago
If one reasons about this as if we were speaking about a C extension, I think all those points made by Jannis Leidel do fall pretty short:
- Yes, people working on a repositoy checkout instead of a public release will need the compilation toolchain. Yes, there will be sharp edges on certain platforms because of this and that is out of reach for the Django project.
- Yes, the test system ought to compile those binaries each time. If that ever had a significant impact on CI times, just engineer a cache for both those files and the toolchain setup.
- Yes, you need a writable filesystem to develop on a project. Whoever ships a Django checkout on a read-only FS should be responsible for compiling *.mo files before turning the FS read-only.
- Yes, the release manager also needs the compilation toolchain. If that is cumbersome, just produce the packages on a CI pipeline; the release manager can then download, verify, sign and publish those if your workflow requires that. Otherwise just publish them from the CI as well!
Replacing gettext with babel might alleviate some of this but IMHO that should exclusively be a a build-time dependency and never a run-time dependency, just as gettext. A lot has been going on in the packaging scene since Claude's PR, but now I'd depict this as a build-system requirement
`
[build-system]
requires = ['setuptools>=40.8.0', 'babel>=2']
build-backend = 'setuptools.build_meta'
`
and then tell the build backend (not necessarily setuptools) to produce *.mo files when building a wheel distribution. Either gettext or babel would be a requirement to build either a Django checkout or a source distribution. This would be a better fit for PEP-517 and require less documentation than reminding people to compilemessages before installing or packaging Django while tox could be responsible for producing *.mo files in the CI. But maybe this is an over-engineered idea.
I have a sense this is not addressed because of certain FUD while obviating real recurring "mo and po files out of sync" issues in the whole django ecosystem https://code.djangoproject.com/ticket/8732 . Yes, contributors will be pushed a new build-time dependency if they expect their non-wheel installs to be localized. As it should have always been! Translators should be familiar with gettext anyway, irrespective of their platform.
follow-up: 15 comment:13 by , 10 months ago
Hello everyone!
As a Django release manager, and as someone who went thru the super painful process of incorporating translations from Transifex into Django 5.0, I'd like to express a big +1 to remove the .mo files from the Django source. I agree with Ningú's counterpoints to Jannis Leidel's comment:7, and also I'd like to add:
- Automatic compiling .po files when running tests would not add noticeable overhead since we could add a flag to use the existing ones if available (like
keep-db
, perhapson
by default) - Manually compiling .po files when developing Django and/or a from-repo version of Django *and* working on i18n related issues, feels natural to me (as long as we properly document this)
- I haven't used Babel before but even assuming that this is a superior lib, I think that that migration should be treated and pushed forward as an orthogonal issue and not block improvements to our current (sometimes painful) translations machinery.
Claude, question: in your PR, why are you favoring using msgfmt
directly instead of using the Django compilemessages
command (perhaps its internal compile_messages
helper)?
comment:14 by , 10 months ago
Claude, question: in your PR, why are you favoring using msgfmt directly instead of using the Django compilemessages command (perhaps its internal compile_messages helper)?
Probably because I estimated at the time we didn't need all bells and whistles from compilemessages
. May be re-tested.
comment:15 by , 10 months ago
Replying to Natalia Bidart:
Claude, question: in your PR, why are you favoring using
msgfmt
directly instead of using the Djangocompilemessages
command (perhaps its internalcompile_messages
helper)?
At the time I experimented my build-system requirement idea on a third party package (but didn't push it forward): https://github.com/farridav/django-jazzmin/pull/526/commits/d0ff328a46b21410c491a7daf6d92c0c44c88543
While using Django's compilemessages was convenient, it already felt weird to use django as a build dependency because I felt porting this approach to Django itself would be sort of a red flag (I couldn't build django because I'd require django having been built?). Claude's msgfmt felt unfamiliar but the right way nonetheless.
Also agree on all points made by Natalia Bidart, specially Babel adoption being an orthogonal issue.
That change would make it a bit more error-prone to work on i18n'd projects with the development version of Django.
I'm not saying we can't remove the .mo files, but we need to think about the consequences. They add some value.