Changes between Version 18 and Version 19 of new_meta_api


Ignore:
Timestamp:
Jul 11, 2014, 9:41:03 AM (10 years ago)
Author:
pirosb3
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • new_meta_api

    v18 v19  
    33
    44As of my 2014 Summer of Code project, my second deliverable is a refactored working implementation of the Options API.
    5 The Options API is at the core of Django, it enables introspection of Django Models with the rest of the system. This includes lookups, queries, forms, admin to understand the capabilities of every model. The Options API is hidden under the _meta attribute of each model class.
    6 Options has always been a private API, but Django developers have always been using it in their projects in a non-official way. This is obviously very dangerous because, as there are no official endpoints, Options could change breaking other people's implementation. Options did not have any unit-tests, but the entire system uses it and relies on it to work correctly.
    7 My Summer of Code project is all about understanding and refactoring Options to make it a testable and official API that Django and any other developer can use.
     5The Options API is at the core of Django, it enables introspection of Django Models with the rest of the system. This enables lookups, queries, forms, admin to understand the capabilities of every model. The Options API is hidden under the _meta attribute of each model class.
     6Options has always been a private API, but Django developers have always been using it in their projects in a non-official way. This is obviously very dangerous because, as there is no official API, Options could change breaking other people's implementation.
     7Options also did not have any unit-tests, but the entire system uses it and relies on it to work correctly.
     8
     9My Summer of Code project is all about understanding and refactoring Options to make it a testable and official API that Django and any other developers can use.
    810
    911=== Current state of the API
    10 I now have a working and tested implementation of Options, I have managed to simplify 20+ functions and reduce them to 2 main endpoints, that are the main API. Because Options needs to be very fast, I necessarily had to add some accessors on Options for the most common calls (although both endpoints are cached, we can increase speed by avoiding function calls). Each accessor is a cached property and is computed, using the new API, on first access.
    11 
    12 For this reason, I am planning to release in attached PR:
     12I now have a working and tested implementation of Options, I have managed to reduce it to 2 main endpoints.
     13Because Options needs to be very fast, I necessarily had to add some accessors for the most common calls (although both endpoints are cached, we can increase speed by avoiding function calls). Each accessor is a cached property and is computed, using the new API, on first access.
     14
     15I am planning to release in the attached PR:
    1316 - Unit tests for the new Meta API
    1417 - The new Meta API
    1518 - The implementation of the new API throughout django and django.contrib
     19 - Documentation
     20
    1621
    1722=== Concepts
    18 
    1923
    2024==== Field types
     
    2731{{{
    2832class Person(models.Model):
    29     # DATA field
    3033    data_abstract = models.CharField(max_length=10)
    3134}}}
     
    3740{{{
    3841class Person(models.Model):
    39     # M2M fields
    4042    friends = models.ManyToManyField('self', related_name='friends', symmetrical=True)
    4143}}}
     
    5254    city = models.ForeignKey(City)
    5355}}}
    54 In this case, City has a related object from Person (as you can access person_set)
     56In this case, City has a related object from Person
    5557
    5658===== Related M2M
     
    6870
    6971===== Virtual
    70 Virtual fields do not necessarily have an entry on the database, they are "Django fields" such as a GenericRelation
     72Virtual fields do not necessarily have an entry on the database, they are "Django fields" such as a GenericForeignKey
     73
    7174{{{
    7275class Person(models.Model):
     
    7578    item = GenericForeignKey('content_type', 'object_id')
    7679}}}
    77 GenericForeignKey uses content_type and object_id to keep track of what model type and id is set by item, but item itself does not have a concrete presence on the database.
     80
     81GenericForeignKey uses 'content_type' and 'object_id' to keep track of what model type and id is set to item, but item itself does not have a concrete presence on the database.
    7882In this case, item is a virtual field.
    7983
     
    8286
    8387===== Local
    84 A local field is one that is defined on the queries model and is not derived from inheritance.
    85 Fields from models that directly inherit from abstract models or proxy classes are still local
     88A local field is when is not derived from inheritance. Fields from models that directly inherit from abstract models or proxy classes are still local
    8689
    8790{{{
     
    97100===== Hidden
    98101Hidden fields are only referred to related objects and related m2m. When a relational model (such as ManyToManyField, or ForeignKey) specifies a related_name that starts with a "+", it tells Django to not create a reverse relation.
     102
    99103{{{
    100104class City(models.Model):
     
    105109}}}
    106110
    107 In this case, City has a related hidden object from Person (as you can't access person_set)
     111City has a related hidden object from Person (as you can't access person_set)
    108112
    109113===== Concrete
     
    111115
    112116===== Proxied relations
    113 Proxied relations are when concrete models inherit all related from their proxies. 
     117Proxied relations are relations that point to a proxy of a model.
    114118
    115119{{{
     
    137141}}}
    138142
    139 get_fields takes a set of flags as parameters, and returns a tuple of field instances that match those parameters. All possible combinations of
     143get_fields takes a set of flags as parameters, and returns a tuple of field instances. All possible combinations of
    140144options are possible here, although some will have no effect (such as include_proxy combined with data or m2m by itself).
    141 get_fields is internally cached for speed and a recursive function that collects fields from each parent of the model.
     145get_fields is internally cached for speed and it is a recursive function that collects fields from each parent of the model.
    142146An example of every (sane) combination of flags will be available in the model_meta test suite that I will ship with the new API.
    143 The 'export_map' key is only used internally (by get_field) and is not part of the public API. 'export_map=True' will return an OrderedDict with fields
    144 as keys and a tuple of strings as values. While the keys map exactly to the same output as 'export_map=False', the tuple of values will contain all
    145 possible lookup names for that field. This is used to build a fast lookup table for get_field and to avoid re-iterating over every field to pull
    146 out every possible name.
     147The 'export_map' key is only used internally (by get_field) and is not part of the public API. 'export_map=True' will return an OrderedDict with fields as keys and a tuple of strings as values. While the keys map exactly to the same output as 'export_map=False', the tuple of values will contain all possible lookup names for that field. This is used to build a fast lookup table for get_field and to avoid re-iterating over every field to pull out every possible name.
    147148
    148149{{{
     
    176177}}}
    177178
    178 'get_field' returns a field_instance from a given field name. field_name can be anything from name, attname and related_query name.
    179 get_field is recursive by default and does not include any hidden or proxied relations. There has still not been any reason to add these
    180 and they can be derived from 'get_fields'.
     179'get_field' returns a field_instance from a given field name. field_name can be anything from name, attname and related_query_name.
     180get_field is recursive by default and does not include any hidden or proxied relations.
    181181If a given name is not found, it will raise a FieldDoesNotExist error.
    182182'get_field' is internally cached and gets all field information from 'get_fields' internally.
    183183
    184184NOTE: There is an inconsistency between the defaults of get_field and get_fields. 'get_fields' by default enables only data fields
    185 while 'get_field' by default enables data and m2m. This is because of backwards-compatibility issues (get_field already existed).
     185while 'get_field' by default enables data and m2m. This is because of backwards-compatibility issues (read more below).
    186186
    187187{{{
     
    209209==== Using bitfields as flags
    210210
    211 get_field and get_fields were originally designed to work with bits. The main choice for this decision was because there were many options and,
    212 in order to avoid providing multiple flags, it would be better to provide bits.
     211get_field and get_fields were originally designed to work with bits. The main choice for this decision was because there were many options and to avoid providing too many flags.
    213212The original API for bits is:
    214213
     
    239238
    240239The decision taken was to port 'get_field' and 'get_fields' to flags.
    241 A port of the old implementation lies here if you are interested: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade/django/db/models/options.py
     240A port of the old implementation still lies here if you are interested: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade/django/db/models/options.py
    242241
    243242==== Removed direct, m2m, model
     
    246245the attributes (there is only 1 place where m2m is used).
    247246
    248 The decision taken was to drop direct, m2m, model in the return type and only keep field_instance. All the rest will be derived.
     247The decision taken was to drop direct, m2m, model in the return type and only keep field_instance. All the rest will be derived if needed.
    249248
    250249==== Removed all calls "with_model"
     
    252251
    253252==== Removed the need of multiple maps
    254 The previous implementation relied on many different cache maps internally. This is somewhat necessary, but tends to increase bug-risk
    255 when cache-expiry happens. For this reason, my implementation relies only on 2 cache tables, and I have added a specific function to do
    256 cache expiry (called _expire_cache) that will wipe out all memory.
    257 The downsides if this aspect is that we cache a bit more naively (there are less layers of caching) but benchmarks show this does not
    258 decrease performance.
     253The previous implementation relied on many different cache maps internally. This is necessary, but tends to increase bug-risk when cache-expiry happens. For this reason, my implementation relies on only 2 cache tables, and I have added a specific function to do
     254cache expiry easily (_expire_cache). The downsides of this aspect is that we cache a bit more naively (there are less layers of caching) but benchmark shows no real decrease of performance.
    259255
    260256==== Used internal caching instead of lru_cache
    261 Our first approach to caching was to use functools.lru_cache. lru_cache is a simple decorator that provides cache and an expiry function
    262 built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself.
    263 
    264 The decision taken was to do very caching with simple try / catch and a dictionary for memoizing. This is also because we really don't need
    265 the 'lru' part of 'lru_caching': there are only a finite number of combinations that can be called.
    266 
    267 ==== Used internal caching instead of lru_cache
    268 Our first approach to caching was to use functools.lru_cache. lru_cache is a simple decorator that provides cache and an expiry function
    269 built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself.
    270 
    271 The decision taken was to do very caching with simple try / catch and a dictionary for memoizing. This is also because we really don't need
    272 the 'lru' part of 'lru_caching': there are only a finite number of combinations that can be called.
     257Our first approach to caching was to use 'functools.lru_cache'. 'lru_cache' is a simple decorator that provides cache and an expiry function built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself.
     258
     259The decision taken was to drop 'lru_cache' in favour of a simpler caching strategy. This is also because we really don't need the lru part of 'lru_caching'. there are only a finite number of combinations that can be called.
    273260
    274261==== Use cached_properties when possible
    275262Function calls are expensive in Python, All sensible attributes with no arguments have been transformed into cached_properties.
    276 A cached property is a read-only property that is calculated on demand and automatically cached. If the value has already been calculated,
    277 the cached value is returned. Cached properties avoid a new stack and are used for fast-access to fields, concrete_fields,
     263A cached property is a read-only property that is calculated on demand and automatically cached. If the value has already been calculated, the cached value is returned. Cached properties avoid a new stack and are used for fast-access to fields, concrete_fields,
    278264local_concrete_fields, many_to_many, field_names
    279265
     
    292278
    293279This was done for 2 reasons:
    294 1) We managed to squash 2 functions (get_field and get_field_by_name) in 1 single call
    295 2) I could not find any reason for the many_to_many flag to exist! there can never be data and m2m fields with the same name. So this looked
    296 like a legacy parameter that didn't have any effect (because turning it off did not break any tests)
    297 
    298 Finally, the reason the many_to_many flag existed was for a special validation case that was not documented anywhere. Russell helped me in
    299 looking for edge cases and finally I came up with a failing test case: https://github.com/django/django/pull/2893. The test case would fail on the
    300 new API but succeed on master.
    301 
    302 Our final iteration was to add all the field types as flags to get_field. By making m2m as first parameter, we avoid breaking existing implementations
    303 and maintain a similarity with the 'get_fields' API.
     280- 1) We managed to squash 2 functions (get_field and get_field_by_name) in 1 single call.
     281- 2) I could not find any reason for the many_to_many flag to exist! there can never be data and m2m fields with the same name. So this looked like a legacy parameter that was never removed (because turning it off did not break any tests).
     282
     283The reason the many_to_many flag existed was for a special validation case that was not documented anywhere. Russell helped me in looking for edge cases and finally I came up with a failing test case: https://github.com/django/django/pull/2893. The test case would fail on the new API but succeed on master.
     284
     285Our final iteration was to add all the field types as flags to get_field. By making m2m as first parameter, we avoid breaking existing implementations and maintain a similarity with the 'get_fields' API.
    304286
    305287=== Performance
    306 Throughout my project I have always kept an eye on performance. Throughout the development of my API I have refactored often and always looked for
    307 bottlenecks using cProfile. I am happy to say no major decrease in speed has happened, and the new implementation does a couple of optimizations
    308 that were not present in the old system. Said this, I prefer to not comment on performance but just to show the benchmarks. It will be the core
    309 team to decide if this is feasible or not.
     288Throughout my project I have always kept an eye on performance. I have always looked for bottlenecks using cProfile and other benchmarking tools. I am happy to say no major decrease in speed has happened, actually the new implementation does a couple of optimizations that were not present in the old system. Said this, I prefer to not comment on performance but just show the benchmarks. It will be the core team to decide if this is feasible or not.
    310289
    311290=== Main optimization points
    312291
    313292==== Compute inverse relation map on first access
    314 In order to find related objects, the current implementation does the following
     293In order to find related objects, the current implementation does the following:
    315294
    316295{{{
     
    323302REF: https://github.com/django/django/blob/master/django/db/models/options.py#L488
    324303
    325 This tends to be expensive depending on the setup, but results in a O(models * fields) complexity. We can increase performance by
    326 computing a inverse relation map on first access. This is done only **once**, not once per model
    327 
    328 REF: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/apps/registry.py#L176
    329 
    330 In this way we have a map of model -> [related_object, related_object, ..] and computing a hash lookup is O(1).
    331 
    332 https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/db/models/options.py#L423
    333 
    334 Now, only 1 much smaller loop is needed.
     304This tends to be expensive, it results in a O(models * fields) complexity. We can increase performance by computing an inverse relation map on first access. This is done only **once**, not once per model (https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/apps/registry.py#L176).
     305
     306In this way we have a map of { model : [related_object, related_object, ..] } and computing a hash lookup is O(1) (https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/db/models/options.py#L423).
    335307
    336308
    337309==== Benchmarks
    338 Here is a benchmarks table. It is benchmarking soc2014_meta_refactor_upgrade_flags_get_field (68dc11708eb2170540729b71db6bcaf4c46d6504)
    339 against django/master
     310Here is a benchmark results table. It is benchmarking soc2014_meta_refactor_upgrade_flags_get_field (68dc11708eb2170540729b71db6bcaf4c46d6504) against django/master.
    340311
    341312Djangobench: each number was picked as median of 2000 trials.
     
    344315==== Backwards compatibility
    345316All previous _meta functions will be backwards-compatible, with a DeprecationWarning.
    346 
    347317
    348318==== Next Steps
Back to Top