#30190 closed New feature (fixed)
json lines (jsonl) serializer
Reported by: | aliva | Owned by: | nobody |
---|---|---|---|
Component: | Core (Serialization) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Francisco Javier | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have create a patch to add jsonl serializer/deserializer to json,
Issue with json serializer is when using dumpdata/loaddata json deserializer tries to load the whole file which causes MemoryError, but with jsonl loaddata will read file line by line which so you won't see memory overflow
note that I will add docs when I get the green light for this patch
Attachments (2)
Change History (16)
by , 6 years ago
Attachment: | patch.diff added |
---|
comment:1 by , 6 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Hi. Thanks for the report.
This seems like a reasonable idea for a serializer. It also though seems perfectly suited to be wrapped up as a third-party application. In general we tend to prefer those, rather than increasing the surface area of the code in the core framework. As such, we'd need to see if there was a consensus on mailing list to add the code. I wouldn't be surprised if a third-party app didn't already exist, so we should look into that too.
You'd need to add test cases as well as docs. I'd suggest putting that together as a third-party app on GitHub say, and then seeing if there was a willingness to move that into core, if that's what you want to do at that point.
comment:2 by , 6 years ago
See #22259 where you'll find a link to an existing third-party library.
Personally I would be open to integrate this into Django, as several tickets were open in the past wrt memory issues with big files. However, this should indeed be discussed on the django-developers mailing list first.
comment:3 by , 6 years ago
I've been using the library linked in the other ticket with good success. I'd be in favour of adding something to core.
comment:5 by , 6 years ago
Hi
I have started writing tests for jsonl, (I simply copied test_json
to test_jsonl
) and trying to fix issues. there is a problem I'm facing, when using jsonl output lines can be very long, so what is the preferred way to pass flake for test data?
- should I add
NOQA: E501
for these lines? - use this style
a = """{"key": value",""" \ """"key2": value"}"""
- use this style
a = """{ "key": "value" "ley2: "value2", }.replace("\n", "")
also I have added patch 2 which is a work in progress and some tests fail
as a sample here is a sample test data from test_json.py
mapping_ordering_str = """[ { "model": "serializers.article", "pk": %(article_pk)s, "fields": { "author": %(author_pk)s, "headline": "Poker has no place on ESPN", "pub_date": "2006-06-16T11:00:00", "categories": [ %(first_category_pk)s, %(second_category_pk)s ], "meta_data": [] } } ]
It should be like this for jsonl
mapping_ordering_str = """{"model": "serializers.article", "pk": %(article_pk)s, fields": {"author": %(author_pk)s, "headline": "Poker has no place on ESPN", pub_date": "2006-06-16T11:00:00", categories": [%(first_category_pk)s, %(second_category_pk)s], "meta_data": []}}\n"""
comment:6 by , 5 years ago
The Django Core MUST solve his own things by himself, and not relaying to a third party solution.
If dumpdata/loaddata is broken for very big datasets, the issue must be addresed and solved as soon as posible.
In my particular case, I have a Sqlite3 DataBase of 800MB, and want to migrate to PostgreSQL. The Dumpdata/Loaddata combo is the only straight way to do it.
If the django-mljson (or the patch submited in the ticket) solves the MemoryError of Loaddata in big datasets, Django-Core must integrate the mljson (or the patch submited) as the default serializer/deserializer for dumpdata/loaddata process.
Actually, my issue is solved by using django-mljson, but lost 2 days figuring 'what the heck' was going wrong.
comment:7 by , 5 years ago
Cc: | added |
---|---|
Needs tests: | set |
Resolution: | needsinfo |
Status: | closed → new |
Type: | New feature → Bug |
comment:8 by , 5 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Type: | Bug → New feature |
Please don't reopen closed ticket. This ticket should be discussed in the DevelopersMailingList before reopening. We can reconsider it when we reach a strong consensus on the mailing list.
Also please try to be more polite, your comment sounds like a demand. This is not a good way to be heard.
comment:9 by , 5 years ago
Hi!
I have talked about it on mailing list and here is my PR
comment:10 by , 5 years ago
Needs tests: | unset |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
comment:11 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|
OK, thanks for updating the ticket here.
comment:12 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
patch 1