Opened 5 years ago

Last modified 7 months ago

#16508 new New feature

Provide real support for virtual fields

Reported by: Vlastimil Zíma Owned by:
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: pirosb3, michal.petrucha@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have created a virtual field based on GenericForeignKey, but I found out that it can not be properly used for model initiation.

Altered example from my test:

class Managing(models.Model):
    shortcut = models.CharField(max_length=20)
    name_cs = TranslationProxyField('name', 'cs', False)


class ManagingTranslation(models.Model):
    name = models.CharField(max_length=20)


class TranslationProxyField(object):
    """
    Descriptor that is a virtual field that provides access to translations.
    """
    def __init__(self, field_name, language_code=None, fallback=False):
        self._field_name = field_name
        self._language_code = language_code
        self._fallback = fallback
        names = [field_name]
        if language_code is not None:
            names.append(language_code.replace('-', '_'))
        if fallback:
            names.append(FALLBACK_FIELD_SUFFIX)
        self.name = '_'.join(names)

    def contribute_to_class(self, cls, name):
        if self.name != name:
            raise ValueError('Field proxy %s is added to class under bad attribute name.' % self)
        self.model = cls
        cls._meta.add_virtual_field(self)

        # Connect myself as the descriptor for this field
        setattr(cls, name, self)

# This results in Type error from Model.__init__, because of extra kwarg
Managing(shortcut='name', name_cs='jméno')

I tried to use pre_init signal, but I found out that I will not get instance argument, so I can not use that. This way is truly usable only for virtual fields which are like GenericForeignKey built on real model fields of object. When it gets more complicated, it can not be used. Confusing on this path is that instance is in providing_args of pre_init signal.

Currently I solve this by quite a hack which is deriving of TranslationProxyField from property instead of object. This works because of setting properties is allowed in Model.__init__.

In my opinion, a stable solution would be either providing an instance argument in pre_init signal or enable setting of virtual field in model initiation. Second option would also enable removal of pre_init connection from GenericForeignKey.

Exact examples can be found at github: https://github.com/vzima/django-multilingual-ds9

Change History (21)

comment:1 Changed 5 years ago by Bas Peschier

Triage Stage: UnreviewedAccepted

Virtual fields are also discussed shortly by Michal Petrucha for the Composite fields API [1] and in further responses; it seems like there are more use cases, but I would strongly suggest taking it up on the django-developers mailing list to further discuss it.

[1]: https://groups.google.com/d/msg/django-developers/rF79c8z65cQ/D2ZM1yZPX4cJ

comment:2 Changed 4 years ago by Anssi Kääriäinen

Type: UncategorizedNew feature

comment:3 Changed 3 years ago by pirosb3

Cc: pirosb3 added
Owner: changed from nobody to pirosb3
Status: newassigned

Hi,

I will be taking this ticked under consideration as we are currently planning a refactor of _meta.

Dan

comment:5 Changed 22 months ago by Anssi Kääriäinen

I'm planning on tackling this issue. The initial plan is to:

  • Change ForeignObjectRel subclasses to real field instances. (For example, ForeignKey generates a ManyToOneRel in the related model). The Rel instances are already returned from get_field(), but they aren't yet field subclasses.
  • Allow direct usage of ForeignObjectRel subclasses. In certain cases it can be advantageous to be able to define reverse relations directly. For example, see https://github.com/akaariai/django-reverse-unique.
  • Partition ForeignKey to virtual relation field, and concrete data field. The former is the model.author, the latter model.author_id's backing implementation.
  • Consider other cases where true virtual fields are needed.

My initial try will be a class structure like:

    Field
        RelationField (base class for ForeignKey, ManyToOneRel, maybe also ManyToManyField)
        ConcreteField (base class for concrete data fields, for example IntegerField, ...)

I expect this to cause a good amount of breakage in 3rd party apps. I'm going to use the old good "lets first break things, then fix those cases that are complained about".

comment:6 Changed 22 months ago by pirosb3

Thanks akaariai,

If you need any help on my side (given the work I did on Meta) please let me know.

Daniel

comment:7 Changed 22 months ago by Thomas Stephenson

Could I propose the following class structure:

Field
    VirtualField
        RelationField
    ConcreteField

The addition of an intermediary VirtualField class would be for future support of a CompositeField class, which isn't logically a RelationField, but is logically virtual (and subfields of the CompositeField would be limited to concrete fields for simplicity of implementation).

Last edited 22 months ago by Thomas Stephenson (previous) (diff)

comment:8 Changed 20 months ago by Federico Capoano

I am also interested in this ticket.

See this other discussion on Django Developers: https://groups.google.com/forum/#!topic/django-developers/_FmJRK3sJGs].

What's the proposed definition of VirtualField?

I have a proposal:

"A virtual field is a model field which it correlates to one or multiple concrete fields, but doesn't add or alter columns in the database."

Best regards
Federico

comment:9 Changed 20 months ago by Collin Anderson

I believe in the _meta refactor we simply defined it as "doesn't add or alter columns in the database.", something like field.column = None. It probably does correlates to one or multiple concrete fields, but it doesn't have to.

Also, one thing to note is that we don't want to distinguish (too much) between virtual and non virtual on the higher levels of the API (like Model Forms). Model Forms shouldn't care whether a field has a column in the database or not. So in that case, it doesn't make sense to have a _meta.virtual_fields like we used to do. But of course the ORM and migrations are going to want to know that they should ignore these fields.

comment:10 Changed 9 months ago by Michal Petrucha

Cc: michal.petrucha@… added

I have recently started to dust off my old code from GSoC 2013, where my first step was pretty much this. Let me summarize my take on this. (I'm sorry, this is going to be a bit long, but I'm trying to organize my thoughts on how to approach this entire thing.)

My implementation from a few years ago matches the description from comment:5 (as far as the conceptual handling of ForeignKey goes), and also comment:7 (when it comes to the actual class hierarchy). And yeah, comment:9 is also mostly spot-on, except for the last sentence – the ORM or migrations certainly can't ignore ForeignKey once it becomes virtual; instead, migrations will have to hide any auto-generated auxiliary concrete fields to make migrations backwards-compatible.

I think that before we start working on ForeignKey and break the entire ORM, it would be kind of nice to just get the more general VirtualField in. The code for just the VirtualField class without any changes to ForeignKey and friends can be seen in this Github comparison. After that I implemented the virtual ForeignKey change, but up to that point, it was pretty isolated and self-contained.

Key observations from my implementation: GenericForeignKey and GenericRelation are handled specially in

  1. model instance initialization (using the pre_init signal),
  2. they are kind of special cased in deletion cascading, and
  3. they are cloned in every model subclass.

Number 1 was fixed by making Model.__init__ consider virtual fields when processing keyword arguments. This made it possible to remove the usage of pre_init from GenericForeignKey.

Number 2 I'm not entirely certain about right now; I haven't yet found the time to fully understand how this is handled after Daniel's _meta refactor, but there used to be a few caches of arbitrary collections of relationship objects. I handled this by including GenericRelation in those collections, which meant it was no longer necessary to special-case _meta.virtual_fields just to reach GenericRelations. I guess these days this could be achieved by making sure GenericRelation appears in _meta.related_objects, but I'd need to study this a bit further to be certain. (Daniel, do you have any insight here? Can we even to that now without breaking backwards compatibility?)

Number 3 is a peculiarity of GenericRelation. The problem here is that it needs a reference to the current model in order to use the correct content type, which means it really does need to be cloned in every subclass, even in concrete model inheritance (otherwise it would use the parent's content type).

Right now, all virtual fields are cloned in concrete inheritance, but that doesn't make any sense for other virtual fields, like ForeignKey, or CompositeField, or whatever. Those should be handled just like any other field, as far as inheritance is concerned. This doesn't really have anything to do with GenericRelation being a virtual field, instead, a more fitting name for this particular concept would be something like “private” field (or “local private”, as I named it in my 2013's branch). After all, we already do have support for other virtual fields that do not land in _meta.virtual_fields. (ForeignObject, although AFAIK it is not a public API, but still.)

With all that being said, I would like to get the ball rolling by offering PR 6329 which implements number 3 above, the rename of _meta.virtual_fields to _meta.private_fields.

In the meantime I'll go investigate what to do about the rest of the changes in that range of commits I linked above.

comment:11 Changed 9 months ago by Anssi Kääriäinen

Big +1 to introducing these changes in smaller chunks.

For composite fields we are going to have a design problem for mutable composite values. For example, assume User.full_name is a FullName instance. What should happen in the following cases, and how to implement the wanted behavior?

u = User()
u.full_name = FullName('Anssi', 'Kääriäinen')
u.full_name.first_name = 'Matti'
u.first_name == u.full_name.first_name
OUT: ???

u1 = User()
u2 = User()
u1.full_name = FullName('Anssi', 'Kääriäinen')
u2.full_name = u1.full_name
u2.full_name.first_name = 'Matti'
u1.first_name == u2.first_name
OUT: ???

This is a nasty issue, and I think we should avoid it by saying that for now Django doesn't support mutable composite field values (that is, the composite field's value should be a tuple). Later on we can then decide how to tackle this issue properly.

comment:12 Changed 9 months ago by Michal Petrucha

Heh, this is skipping a bit further ahead; I would prefer to finish the work on ForeignKey before moving on to CompositeField, and ForeignKey itself will take quite some time. However, since you have already brought it up...

As far as I recall, I implemented CompositeField as a descriptor that dynamically constructed a CompositeValue on read, and unpacked any tuple / iterable into individual values on write, but that was it; the CompositeValue itself was just a namedtuple, so it didn't act as a proxy for the value on the model instance itself. So in your example, u1.full_name.first_name would be a read-only item.

I think that is a reasonable restriction; trying to turn a composite value into some kind of proxy object might be a fun exercise, but it would require some amount of descriptor hacking, and I honestly don't see enough value in it that would make it worth the complexity.

comment:13 Changed 9 months ago by Anssi Kääriäinen

Sorry for bringing this up :)

What you describe seems like a sane default. We can offer hooks for users to do all the dirty work in case they want real classes as composite values, but I don't see a need for Django to include this stuff out of the box.

comment:14 Changed 8 months ago by Tim Graham <timograham@…>

In c339a5a:

Refs #16508 -- Renamed the current "virtual" fields to "private".

The only reason why GenericForeignKey and GenericRelation are stored
separately inside _meta is that they need to be cloned for every model
subclass, but that's not true for any other virtual field. Actually,
it's only true for GenericRelation.

comment:15 Changed 8 months ago by Michal Petrucha

I tried to dig into item number 1 from comment:10, but I've hit a snag. I created PR 6477 with the work-in-progress patch.

The patch breaks the tests for GenericRelatedObjectManager.add, and the reason is that before the change, instance_pre_init only sets the values of the content type and primary key, but it does not cache the target instance in cache_attr. However, with the change, the regular __set__ is called, which caches the instance as well.

The problem with this is that GenericRelatedObjectManager.add only assigns the values of the content type, and the primary key, but as far as I can see, GenericRelatedObjectManager does not hold any reference to the GenericForeignKey field on the other side of the relationship, not even its name (either of the two would make this problem go away). It does not really need to, as long as it only operates with the basic values, but it makes it impossible for me to invalidate the cached instance.

Arguably, this might be considered a bug in the current implementation, since it is always possible to change the raw primary key of the target, but any subsequent reads of the target object will still return the old instance.

I can see the following ways out:

  1. Make GenericRelation hold a reference to the corresponding GenericForeignKey. This would mean that a lack of GenericForeignKey on the other side would become a hard failure; right now, there's a check that reports this situation as an error, but in theory, the GenericRelation should work just fine anyway, which would no longer be the case.
  2. Reorganize GenericForeignKey.__get__ a little bit to check that the cached instance matches the raw values. This would mean that even if there is already a cached instance, the content type would have to be looked up on every read. This adds a bit of overhead, although at least content types are cached, so it shouldn't actually hit the database most of the time.

Presonally, I would probably prefer option 2, because it solves the more general problem of stale cached instances, but if anyone disagrees, I'd be happy to implement the first option, too.

comment:16 Changed 8 months ago by Michal Petrucha

I implemented option 2, and pushed it to the PR; it seems to work all right. All existing tests pass, but I'll have to write some additional tests specifically for the cached object invalidation, as well as a mention of the changes in the release notes.

comment:17 Changed 7 months ago by Michal Petrucha

I'm now happy with the state of PR 6477; I rebased and squashed it into two commits. I consider it ready for final review.

comment:18 Changed 7 months ago by Tim Graham <timograham@…>

In 8a47ba67:

Refs #16508 -- Made Model.init() aware of virtual fields.

It's no longer necessary for GenericForeignKey (and any other virtual fields)
to intercept the field's values using the pre_init signal.

comment:19 Changed 7 months ago by Tim Graham <timograham@…>

In b9f8635:

Refs #16508 -- Added invalidation of stale cached instances of GenericForeignKey targets.

comment:20 Changed 7 months ago by Tim Graham

Owner: pirosb3 deleted
Status: assignednew

comment:21 Changed 7 months ago by Tim Graham <timograham@…>

In 31501fb:

Refs #18599 -- Added a test for assigning a GenericForeignKey in Model.init().

The issue was fixed by 8a47ba679d2da0dee74671a53ba0cd918b433e34
(refs #16508).

Note: See TracTickets for help on using tickets.
Back to Top