Opened 17 years ago
Closed 10 years ago
#5253 closed New feature (wontfix)
serializer for csv format for loaddata and dumpdata
Reported by: | Owned by: | Etienne Robillard | |
---|---|---|---|
Component: | Core (Serialization) | Version: | dev |
Severity: | Normal | Keywords: | csv, foreign keys, feature |
Cc: | erob@…, tim.babych@…, cmawebsite@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Currently supported serialization formats are to verbose to be easily edited by hand. Csv format would support a more fluent development cycle of click together data, dump data, edit data, load data.
Attachments (6)
Change History (31)
by , 17 years ago
Attachment: | csvserializer.diff added |
---|
comment:1 by , 17 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
A CSV serializer is an interesting proposal, and certainly one I'm amenable to. If you're serious about this, the patch needs to be integrated with Django itself - i.e., a patch that is complete and requires no modifications to apply against the current trunk source. This will also integrate the serializer with the serialization test suite, providing a much more comprehensive set of tests than your current test suite (all registered serializers will be tested by the regressiontest/serializers_regress suite, for a spanning set of data types and foreign key data). Documentation updates are also required.
On top of this - one question, and one comment.
A question: Is the 'model identification' syntax (--- article: id,owner,text) in any way standardized? I can't say i've ever seen it before
A comment: extra configuration options (CSV_SERIALIZER_OPTIONS) are probably a non-starter, unless you can make a _very_ strong case for them. Passing options to individual serialization instances would be vastly preferable.
follow-up: 3 comment:2 by , 17 years ago
Thanks for your feedback. I'm serious enough to integrate it with the current source/tests.
And an answer and a comment,
I haven't seen the syntax elsewhere, I haven't even seen putting different table data into a single csv file. I just needed a header that stands out.
I can't make a strong case for the configuration options, I'm probably missing the extra command line arguments that get passed to loaddata/dumpdata.
follow-up: 4 comment:3 by , 17 years ago
Replying to Adam Schmideg:
I haven't seen the syntax elsewhere, I haven't even seen putting different table data into a single csv file. I just needed a header that stands out.
Ok. In this case, you will need to make a case for why your header is the right one. Other serializers (YAML, XML, JSON) are unambiguous, and use the syntax as-published. If you are going to invent a syntax, you will need to convince us that your chosen syntax is the right one (i.e., it is a common 'comment' format, or it matches the headers produced by Excel, or some such thing).
I can't make a strong case for the configuration options, I'm probably missing the extra command line arguments that get passed to loaddata/dumpdata.
I think you missed my point. Ideally, you want to avoid configuration options at all. Some are unavoidable (such as the unicode handling options for the JSON backend), and some are common (such as indent). Ideally, you only want options that are 'common'; a small number of other options may be acceptable, but you would need to make a good case for them.
To me, having an option like format=Excel suggests that at a user level, you actually have two serializers. They may share a lot of implementation, but the public interface should be separate.
I haven't dug in detail to work out why 'nice_foreign_keys' exists, but by the looks of it, it isn't required functionality at all. Foreign keys should be integers (ignoring non-numerical keys for the moment) - not strings containing some magic syntax.
follow-up: 5 comment:4 by , 17 years ago
Replying to russellm:
I haven't seen the syntax elsewhere, I haven't even seen putting different table data into a single csv file. I just needed a header that stands out.
Ok. In this case, you will need to make a case for why your header is the right one. Other serializers (YAML, XML, JSON) are unambiguous, and use the syntax as-published. If you are going to invent a syntax, you will need to convince us that your chosen syntax is the right one (i.e., it is a common 'comment' format, or it matches the headers produced by Excel, or some such thing).
After some googling I still couldn't find a reference to putting multiple table data into a single csv file. So I'm going to invent a syntax for it keeping in mind that it should
- be handled by spreadsheet applications
- be friendly to processing by grep, sed, etc.
- look human.
I see two locations for table metadata,
- right before the corresponding table data
- at the beginning of the csv file
I would go with the 1st option to be able to split the file easily into separate csv files for each table. My suggestion is
news_reporter:id,name 1,John Doe 2,Mary Poppins news_article:id,reporter_id,text 1,1,Being And Time 2,1,Life After Life
Note the empty line between the tables. It has a number of advantages,
- headers behave almost like normal csv headers with the exception that the first column will be called <tablename>:id
- table boundaries can be spotted visually
- the table name is separated from the first column by a colon
- which is an usual unix separator
- it doesn't conflict with other separators in this context, it can't be part of the table name
- looks human to me
To me, having an option like format=Excel suggests that at a user level, you actually have two serializers. They may share a lot of implementation, but the public interface should be separate.
I'm going to drop that option
I haven't dug in detail to work out why 'nice_foreign_keys' exists, but by the looks of it, it isn't required functionality at all. Foreign keys should be integers (ignoring non-numerical keys for the moment) - not strings containing some magic syntax.
I found this feature very useful in the following situation. The first version of my initial_data.csv looked something like this
auth.group:id,name,permissions 1,Reporters,"(1,5,8)"
where the last column is a list of foreign keys to contenttypes. When I did a flush to my database, the records of the contenttypes table were created with a different id, changing the permissions of my Reporters group. It might be due to the fact that I added other installed apps, or that the content types are created in a random order. Anyway, this problem could be solved using the nice_foreign_keys feature, resulting in this file
auth.group:id,name,permissions 1,Reporters,"({'codename': u'add_article', 'content_type': {'model': 'article', 'app_label': u'news'}, {'codename': u'add_article', 'content_type': {'model': 'article', 'app_label': u'news'})"
It may look a longish example but I find it a mixture of the good parts of the json and the csv world. But I agree it's a different serializer rather than a plain csv one with the nice-and-tricky feature set.
follow-ups: 6 7 comment:5 by , 17 years ago
Replying to Adam Schmideg <adam@schmideg.net>:
After some googling I still couldn't find a reference to putting multiple table data into a single csv file.
Before this gets committed, I'll need to see some consensus amongst others that this is indeed the case. "I couldn't find it" isn't a particularly compelling argument by itself :-)
Once the patch is ready for inclusion, we will need to raise the design for public comment. This should shake out any objections or counterexamples for the design you are proposing.
- headers behave almost like normal csv headers with the exception that the first column will be called <tablename>:id
<tablename>:pk would be better here; both for consistency with other backends, and because the primary key isn't always called id. pk is the reserved name for the primary key in searches, etc, which is why it is used in the JSON and YAML serializers.
I haven't dug in detail to work out why 'nice_foreign_keys' exists, but by the looks of it, it isn't required functionality at all. Foreign keys should be integers (ignoring non-numerical keys for the moment) - not strings containing some magic syntax.
I found this feature very useful in the following situation. The first version of my initial_data.csv looked something like this
1) CSV is a terse syntax by design. This a very pythonic extension.
2) Your original complaint was that JSON syntax was too verbose - and then your introduce the most verbose and repetitive component of JSON syntax into your CSV syntax? Am I the only one that thinks this is a little strange?
3) You are attempting to solve a problem that isn't part of what a serializer should be doing. If you require contenttypes as part of your fixtures, serialize (and deserialize) the contenttypes table. That way you can be guaranteed the IDs for the contenttypes you use.
comment:6 by , 17 years ago
Before this gets committed, I'll need to see some consensus amongst others that this is indeed the case. "I couldn't find it" isn't a particularly compelling argument by itself :-)
I agree. But it presents some philosophical problems how to prove that something doesn't exist. Anyway, I included a reference in the module doc to the only similar thing I found.
Once the patch is ready for inclusion, we will need to raise the design for public comment. This should shake out any objections or counterexamples for the design you are proposing.
Sounds scary but the patch is ready now.
<tablename>:pk would be better here; both for consistency with other backends, and because the primary key isn't always called id. pk is the reserved name for the primary key in searches, etc, which is why it is used in the JSON and YAML serializers.
Well, I was thinking about this proposal then I went with the pk column name which is id in many cases but necesserily. I included my arguments in the module doc.
I haven't dug in detail to work out why 'nice_foreign_keys' exists, but by the looks of it, it isn't required functionality at all. Foreign keys should be integers (ignoring non-numerical keys for the moment) - not strings containing some magic syntax.
I found this feature very useful in the following situation. The first version of my initial_data.csv looked something like this
1) CSV is a terse syntax by design. This a very pythonic extension.
Agree.
2) Your original complaint was that JSON syntax was too verbose - and then your introduce the most verbose and repetitive component of JSON syntax into your CSV syntax? Am I the only one that thinks this is a little strange?
It would be terser than JSON in most cases and as verbose as JSON is some cases. But your reasoning made me think and I came up with a simpler format. I also realized it wouldn't fit here so if I have enough time, I'll make a new ticket for the CSO (comma-separated objects) format.
3) You are attempting to solve a problem that isn't part of what a serializer should be doing. If you require contenttypes as part of your fixtures, serialize (and deserialize) the contenttypes table. That way you can be guaranteed the IDs for the contenttypes you use.
In my experience, contenttypes creation is hooked on the post_syncdb signal that runs before loading initial_data so when I want to load my data, the contenttypes are already created with messed up ids.
by , 17 years ago
Attachment: | csv_serializer.diff added |
---|
An implementation that works with serializers_regress test ready for inclusion
comment:7 by , 17 years ago
Replying to russellm:
The triage process is not quite clear to me. Is there anything else you expect me to do to complete this patch? I think it's documented enough, it runs the tests. This is my first code contribution ever to open source so I'm excited it.
comment:8 by , 17 years ago
The triage process calls for exactly what I said in my last message -
Once the patch is ready for inclusion, we will need to raise the design for public comment. This should shake out any objections or counterexamples for the design you are proposing.
That is the next step - getting feedback on django-dev. Your patch contains several issues (header format, model separation, pk naming, handling of empty strings) that are value judgements rather than literal interpretations of a clear specification. If these issues didn't exist, I would commit this straight away. However, the fact that they do exist means that the patch need to be sanity checked by the community.
You already know my position on the pk naming - IMHO consistency with other serializers is more important that consistency with the SQLite database dumper. I'm not wild on the empty string proposal, but I acknowledge the issue. You might want to look at how the Oracle backend handles the null problem - Oracle doesn't differentiate between NULL and "", and I seem to recall there was some special handling for that case.
Now, that's my opinion, and I'm willing to be overruled. However, nothing will happen unless you push it. Start a new thread on django-developers, and deal with the responses. The sprint that is currently underway is another means to get your idea approved. Essentially, if you want it, you have to push it.
comment:9 by , 17 years ago
Keywords: | feature added |
---|
comment:10 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I am adding a version of the current patch that works with current trunk. I need csv serialization for a vendor and would like to see this get into django at some point, so I will accept it, create some tests, address russellm's concerns and bring it up when all those things are ready to see if anyone else likes the idea.
by , 16 years ago
Attachment: | csv_serializer_5253.diff added |
---|
version of old (incomplete) patch that works with r9232
comment:11 by , 16 years ago
I just hacked a mini csv serializer together and would like to propose an idea: serialize different models to different csv files, and then zip them up... but maybe that would be more of a zip serializer? The one i made creates a csv file for every model and then also stores any file/images in the zip file too.
Works for me.
comment:12 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
Backing out of this. Sorry.
comment:13 by , 16 years ago
Cc: | added |
---|---|
Owner: | changed from | to
I propose to take charge of this ticket until someone else want to resume work on it..
comment:14 by , 16 years ago
Made a updated patch that seems to work with Django 0.96.3. I also changed the is_string_field
function so that it doesn't require importing model definitions by using slightly improved
type introspection.
by , 16 years ago
Attachment: | csv_serializer.patch added |
---|
Patch based on the previous ones and adding support for 0.96.3
follow-up: 18 comment:15 by , 16 years ago
Django .96 is not eligible for new features, this patch should be written against the latest trunk.
by , 16 years ago
Attachment: | csv_serializer-0.96.3.diff added |
---|
comment:16 by , 16 years ago
giving up trying to upload the patch... something seems broken with the file uploader thing.
comment:17 by , 16 years ago
The patch has been uploaded fine, the preview doesn't always show correctly.
comment:18 by , 16 years ago
Replying to Alex:
Django .96 is not eligible for new features, this patch should be written against the latest trunk.
I agree, except that my client requires that it works with Django 0.96.3.
comment:19 by , 15 years ago
Cc: | added |
---|
comment:20 by , 15 years ago
Cc: | added; removed |
---|
comment:21 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:24 by , 10 years ago
Cc: | added |
---|
comment:25 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I'm going to make an executive decision and close this as a wontfix.
This specific ticket is asking for a piecemeal addition to Django's serializers; the real underlying problem is that Django's serializers aren't as flexible as they need to be for real world problems.
Once upon a time, I harboured dreams of writing a fully flexible serialisation framework for Django; luckily, that is no longer required, as Tom Christie has written a very robust and flexible set of serializers for Django REST Framework. If we want this in Django, we should maybe approach Tom to pull that piece of REST Framework into Django; but frankly, I'm not convinced this is worth the code churn (and it's one more thing that the core team doesn't have to worry about, which is a good thing).
A working implementation with minor limitations