Django

Code

Ticket #7052 (new)

Opened 8 months ago

Last modified 1 week ago

auth fixture fails to import when running test server

Reported by: jb0t Assigned to: nobody
Milestone: Component: Serialization
Version: SVN Keywords: auth_permission auth_content fixture import
Cc: eallik@gmail.com, daevaorn@gmail.com, bsn.dev@gmail.com Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

If you perform a fixture dump on auth, so as to have users in your test database to perform unit testing. When you try to run the test server, telling it to load the auth fixture, it *can fail. I turned the logging in PostgreSQL to log every query, and found that it was a foreign key violation that was causing the failure. Further investigation seemed to indicate that since I had altered data in auth_permission and django_content_type by hand, resulting in the sequence of primary keys becoming non sequential, this was causing the problem.

When these tables are rebuilt during test db creation, the id column of the rows do not end up with the same id's as they did in the database which created the fixture. I don't know if this is related, but as far as I could tell it was simply a matter of the id's in these two tables not being 100% sequential. I ended up dumping all the data out, then creating the database and repopulating it in a linear fashion (with respect specifically to those two tables). This fixed the problem and the test server happily created and populated those tables.

If you never alter these tables by hand, the id's should always remain sequential and this problem will not occur.

Attachments

Change History

(follow-up: ↓ 5 ) 04/21/08 17:18:15 changed by russellm

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • needs_tests changed.
  • needs_docs changed.

This is a problem I have known about for a while, although I can't recall ever seeing a ticket describing it. Strictly, it's not a bug - the serializers work fine, it's an issue with the consistency between your fixture and the content types that are auto-created in your database. However, it's a pretty common problem and an easy mistake to make, so it deserves a solution.

The best proposal I have thought of is to modify the syntax for foreign key references in serialized data to include a query capability; i.e., instead of just having a JSON fixture containing the literal foreign key content_type: 3, we would allow content_type: { class:'content_type', name:'blah' }, where the inner braces would be interpreted as a query and would resolve to the appropriate foriegn key at time of import. Similar syntax would be required for other serializers. Any other suggestions are also welcome.

07/01/08 18:54:27 changed by russellm

  • component changed from Unit test system to Serialization.

This is a larger serialization issue, not just unit tests.

07/09/08 09:35:03 changed by jb0t

This bug is still around, and bites me often but I have figured out that I can get it to work properly as long as I start any changes to the schema by importing a clean working fixture, performing the changes, exporting the fixture (I use XML), then import it to ensure that it will work the next time around.

(follow-up: ↓ 6 ) 07/18/08 01:11:21 changed by jb0t

Tomorrow I am removing my custom permissions http://www.djangoproject.com/documentation/authentication/#custom-permissions

I can't reliably import fixtures and dealing with this bug has grown tiresome. I am guessing that very few people are using custom permissions or this bug would not have existed this long.

(in reply to: ↑ 1 ) 07/18/08 01:19:12 changed by anonymous

Replying to russellm:

...it's an issue with the consistency between your fixture and the content types that are auto-created in your database. However, it's a pretty common problem and an easy mistake to make, so it deserves a solution.

Actually, Django is creating the fixtures so any inconsistency would be the fault of the serializers. I have since learned that having a custom permission is the sole cause of the problem. My comment about 'if you never edit the data by hand' is NOT actually true.

(in reply to: ↑ 4 ; follow-up: ↓ 7 ) 07/18/08 01:24:10 changed by russellm

Replying to jb0t:

I can't reliably import fixtures and dealing with this bug has grown tiresome. I am guessing that very few people are using custom permissions or this bug would not have existed this long.

Have you considered working on a fix yourself? This is an open source project - no problem gets fixed unless someone volunteers to fix it. Given enough time, I will probably get around to fixing this problem myself, but at the moment I'm a little preoccupied with fixing other bugs and getting v1.0 out the door. However if a suitable fix were to materialize, I would commit it very quickly.

This ticket contains a description of the problem, and a description of a fix that would be accepted. All that is left is for someone to do the work.

If you can't do the coding yourself, but you _REALLY_ need a fix, how about paying someone to do the work for you?

The only course of action that is _guaranteed_ to have no effect is complaining.

(in reply to: ↑ 6 ) 07/18/08 18:20:01 changed by anonymous

Replying to russellm:

Have you considered working on a fix yourself? This is an open source project - no problem gets fixed unless someone volunteers to fix it. Given enough time, I will probably get around to fixing this problem myself, but at the moment I'm a little preoccupied with fixing other bugs and getting v1.0 out the door. However if a suitable fix were to materialize, I would commit it very quickly. This ticket contains a description of the problem, and a description of a fix that would be accepted. All that is left is for someone to do the work. If you can't do the coding yourself, but you _REALLY_ need a fix, how about paying someone to do the work for you? The only course of action that is _guaranteed_ to have no effect is complaining.


I would love to fix this problem myself, but it is over my head and important enough that someone who really knows more than I do be the one to correct it.

It seems that after numerous exports and imports, all starting from an original export created by Django, I have found that even when it succeeds the import, it (potentially) does not properly match content type id to permission.

07/25/08 16:02:15 changed by jb0t

I managed to figure out a suitable way to deal with this bug until it gets fixed (wish i could fix it, but its just beyond me).

I perform a datadump with a usable indent size

python manage.py dumpdata auth app1 app2 --indent=4 --format=xml > initial_data.xml

then manually edit the xml file and remove all serialized permissions objects. this forces them to be recreated when the content types data is created, but keeps all user objects, which ultimately much of the rest of the data is related to.

08/24/08 12:40:07 changed by Erik Allik <eallik@gmail.com>

  • cc set to eallik@gmail.com.

(follow-up: ↓ 11 ) 09/08/08 01:43:50 changed by shai

I think I have a usable workaround. The idea is simple: Allow each class to set its own content_type id.

To do this, we connect a function to the pre_save signal of ContentType; this allows us to modify ContentType instances before they are saved. We make the function take an id for the content_type from the model class.

At the top of the models.py file, after imports, add these lines:

# Make content_type ids consistent
from django.db.models.signals import pre_save
def set_content_type_id(sender, **kwargs):
    content_type = kwargs.get('instance')
    if content_type is None:
        raise Exception, "pre-save signal API changed -- fix me"
    cls = content_type.model_class()
    if getattr(cls,'content_type_id',False):
        content_type.pk = cls.content_type_id
pre_save.connect(set_content_type_id, sender=ContentType)

Now, in each model class, you can add an attribute like so

class MyModel(models.Model):
    ...
    content_type_id = 2001
    ...

class MyOtherModel(models.Model):
    ...
    content_type_id = 2002
    ....

Classes which do not set the content_type_id will have one assigned to them automatically (in sequence).

This is still not perfect -- it places the burden of allocating unique ids on the programmers, rather than the system, but it is still way better than having to manually edit dump files every time.

A note about picking ids: In most cases, you do not need to set ids for all models, and you have less than, say, 200 models, so ids over 1000 should be safe (you can use ids up to a billion or two, so there's really no problem there). Just be careful if your ids sequence is not reset when you recreate the database -- then your ids will just grow and grow. In such a case, I would allocate low numbers and define the sequence to start higher.

Another note: This workaround could lead to a real solution; if the ids, instead of being set by the user, were being set from some hashing of the model name. But there are several problems in this direction: One is finding a hash function that's platform-independent, and a much harder one is finding a conflict-resolution method that guarantees no change when models are added. These are hard problems, and for me the benefit doesn't seem to be worth the effort.

A final note: My first instinct was to add the content_type_id to a Meta nested class, rather than the class itself. Alas, Meta does not allow addition of arbitrary attributes. Perhaps the content_type framework needs a helper-class system like the admin framework's, but for just one attribute, it (again) isn't worth it.

Hope this helps,

Shai.

(in reply to: ↑ 10 ; follow-up: ↓ 12 ) 09/08/08 01:50:54 changed by russellm

Replying to shai:

I think I have a usable workaround. The idea is simple: Allow each class to set its own content_type id.

Ok - so what content type should django-tagging use for a tag? How does the author of django-tagging prevent a clash with the content type for a blog entry in coltrane?

You have to remember that if you allow/force manual specification of content types, you have just created a global namespace where clashes are neither desirable, nor particularly resolvable (since there is no way to fix clashes without modifying the code for the app). We don't want to have to resort to the IANA to fix this sort of problem.

(in reply to: ↑ 11 ; follow-up: ↓ 13 ) 09/08/08 07:48:11 changed by shai

Replying to russellm:

Replying to shai:

I think I have a usable workaround. The idea is simple: Allow each class to set its own content_type id.

Ok - so what content type should django-tagging use for a tag? How does the author of django-tagging prevent a clash with the content type for a blog entry in coltrane?

I said "workaround", not "solution". You are right; this is not particularly useful for reusable, "component" applications (I can think of ways -- like using a "base value" for the app, configured at the server level, and setting the model content_type ids as offsets from this base value -- but this is still not a good way to do things).

However, for "system" applications -- applications that are not intended to be reused as components -- this is workable. It will make jb0t's life a lot easier.

you have just created a global namespace

Em, no. content_types did that. And while it did take care to avoid clashes, it didn't make the namespace properly (reproducibly) serializable. I'm just taking the other side of the trade-off -- clash avoidance responsibility to the user, in return for reproducible serializability.

But yes, I agree; this is still quite far from a proper, complete solution. It is a partial workaround only.

(It does hint to the real solution: Make the model name -- probably fully prefixed -- the pk of content_type. Yes, a string pk. Blasphemy. I know).

(in reply to: ↑ 12 ; follow-up: ↓ 14 ) 09/08/08 10:27:21 changed by russellm

Replying to shai:

I said "workaround", not "solution". You are right; this is not particularly useful for reusable, "component" applications (I can think of ways -- like using a "base value" for the app, configured at the server level, and setting the model content_type ids as offsets from this base value -- but this is still not a good way to do things).

I have an idea how the solution will work, and I've articulated it before. All I need is the time to implement it. Implementing a problematic workaround will only serve to consume time that could be spent fixing the problem.

you have just created a global namespace

Em, no. content_types did that. And while it did take care to avoid clashes, it didn't make the namespace properly (reproducibly) serializable. I'm just taking the other side of the trade-off -- clash avoidance responsibility to the user, in return for reproducible serializability.

content_types namespaces are currently global in the per-app sense. This proposal makes them global in the 'whole globe' sense. Per-app, we can programatically avoiding collisions. We can't do that over the entire globe without the IANA or something similar. String PKs are a non-starter, if only because it means a massive backwards change.

(in reply to: ↑ 13 ) 09/08/08 12:02:15 changed by shai

Replying to russellm:

I have an idea how the solution will work, and I've articulated it before. All I need is the time to implement it. Implementing a problematic workaround will only serve to consume time that could be spent fixing the problem.

I must be missing something; with this workaround, I haven't asked you, or anyone else, to implement anything -- it's just a way for a substantial part of the developers to solve their day-to-day problems, until a real fix is done.

you have just created a global namespace

Em, no. content_types did that. And while it did take care to avoid clashes, it didn't make the namespace properly (reproducibly) serializable. I'm just taking the other side of the trade-off -- clash avoidance responsibility to the user, in return for reproducible serializability.

content_types namespaces are currently global in the per-app sense. This proposal makes them global in the 'whole globe' sense.

content_types ids are currently global in the global sense. That is what I meant.

Per-app, we can programatically avoiding collisions. We can't do that over the entire globe without the IANA or something similar.

Yes, I agreed that people who build their apps to be used as components shouldn't be doing this. You're rejecting my suggestion on the grounds that it doesn't solve the problem for all django apps, ignoring the fact that it is a good workaround for many. My own application, which triggered my need for this, is not going to be part of any system but my own; this is the situation with a substantial part of the django apps -- where IANA etc. are simply not an issue.

Again: I'm not asking you to do anything about this workaround except, well, stop opposing it, or explain why it's bad *for a single-system app*.

11/06/08 18:28:01 changed by alexkoshelev

  • cc changed from eallik@gmail.com to eallik@gmail.com, daevaorn@gmail.com.

11/24/08 18:13:27 changed by bsndev

  • cc changed from eallik@gmail.com, daevaorn@gmail.com to eallik@gmail.com, daevaorn@gmail.com, bsn.dev@gmail.com.

Add/Change #7052 (auth fixture fails to import when running test server)




Change Properties
Action