Opened 16 years ago

Closed 9 years ago

#9368 closed Cleanup/optimization (fixed)

Clean up code for getting columns for select query

Reported by: Jesse Young Owned by:
Component: Database layer (models, ORM) Version:
Severity: Normal Keywords:
Cc: Malcolm Tredinnick, Jesse Young Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I cleaned up some code in db/models/sql/query.py (see mtredinnick's suggestion on http://code.djangoproject.com/ticket/5420)

In particular:

  • The code to get a column's SQL alias was duplicated in 6 places, so I factored that out into a function
  • Depending on whether the get_default_columns function was called for the base model or a related model, it took different parameters, did different things, and returned different values. So I split it into two separate functions, get_base_columns and get_related_columns. This refactoring makes the Query class easier to understand and easier to subclass with less code duplication.
  • I added a function Query.get_base_fields, which the QuerySet class uses to determine which fields correspond to the values from the result set. Currently, get_base_fields returns all fields of the base model. However, Query subclasses could potentially override get_base_fields (as well as get_base_columns) so that only a subset of the model's fields are returned (e.g. if you want to lazy-load some large fields).

A large motivation for this refactoring is that functional decomposition of get_columns made it difficult to subclass Query to provide custom behavior. This can be seen especially well in contrib.gis.db.models.query, where it needed to override a small bit of logic in the middle of get_columns and get_default_columns, but ended up having to copy and paste those entire functions (which were both quite large) as a result. My change at least makes it so gis doesn't have to copy and paste quite as much. I would have liked to not copy and paste anything in gis, but I didn't quite figure out how to do that without breaking the regression tests. (Also, I haven't even run the code in gis because I don't have the necessary libraries installed. So somebody else might want to test that part...)

Attachments (1)

query_cleanup.diff (19.8 KB ) - added by Jesse Young 16 years ago.
clean up the code for getting a query's columns/fields v1.02

Download all attachments as: .zip

Change History (11)

by Jesse Young, 16 years ago

Attachment: query_cleanup.diff added

clean up the code for getting a query's columns/fields v1.02

comment:1 by Malcolm Tredinnick, 16 years ago

Owner: changed from nobody to Malcolm Tredinnick
Status: newassigned
Triage Stage: UnreviewedAccepted

Looks like a good set of changes, thanks. A few points.

  1. You can't use decorators in Django core code (must be Python 2.3 compatible).
  2. The extra classmethod you added to models.base.Model can't actually be used. I've already made that mistake once. The problem is that entirely avoids __init__ and there is lots of code that extends __init__ to do other stuff on initialisation. The QuerySet code should be a good citizen and use the public interface for constructing a model instance. Zipping a parallel list of field names and values into a dictionary isn't hard when it's required. It's tempting to try and shortcut __init__, but it's backwards compatible and, even when it wasn't, it turned out to be very inconvenient for other code when something tried to come in via the back door. So we'll stick to initialising a model with a list of values when we have all the fields and use the keyword argument approach when necessary (it's pretty easy to tell the two cases apart).
  3. As a stylistic issue, all the methods inside Query have the same level of visibility. It's a non-public class, so we don't need to worry about adding leading underscores to new methods.

Most of these I'll probably just fix when I commit this (I want to think about parts of the changes a bit more, so it'll be a few days before I commit it). So no need to write an updated patch; these are more notes for next time and so you understand why some of the changes won't go in as is.

comment:2 by Jesse Young, 16 years ago

Cc: Jesse Young added

Great, thanks for the feedback.

As for 2, _new_from_select_values does actually call __init__ on the model (I realize my docstring was confusing about that). It makes sense though if you want to put that functionality inline in QuerySet, since it's only used twice and (as you mentioned) isn't very complicated.

Last edited 9 years ago by Tim Graham (previous) (diff)

comment:3 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:4 by patchhammer, 14 years ago

Easy pickings: unset
Patch needs improvement: set
Severity: Normal
Type: Uncategorized

query_cleanup.diff fails to apply cleanly on to trunk

comment:5 by Luke Plant, 14 years ago

Type: UncategorizedCleanup/optimization

comment:6 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:7 by Anssi Kääriäinen, 12 years ago

I did some investigation some time ago for rewriting the cols output code - the code is available from https://github.com/akaariai/django/compare/dedupe_cols.

The original idea was to have just "col" syntax instead of "tbl_alias"."col" syntax unless the col is duplicated in some other table. This would result in much nicer looking SQL. However, this approach has a fundamental problem: We don't actually know all the cols in a table, just those the user has in the models. At the least we would need some way for the user to tell that this table has these extra cols, even if they are not part of the model definition. Otherwise we will break existing code with no upgrade path at all.

After the dedupe cols experiment I hacked a bit more, and the changes ended happening all-around. So, the branch isn't meant for inclusion. Still, I think the code could be useful as a reference point if somebody decides to tackle this issue.

comment:8 by Claude Paroz, 12 years ago

Owner: Malcolm Tredinnick removed
Status: assignednew

comment:9 by Tim Graham, 9 years ago

It seems that get_columns() is no longer overridden in GIS after 0c7633178fa9410f102e4708cef979b873bccb76. Is it time to close this ticket or is there more work to do?

comment:10 by Anssi Kääriäinen, 9 years ago

Resolution: fixed
Status: newclosed

There is likely more we can clean in select clause generation, but the main issue of this ticket (duplication of login) has been fixed.

Note: See TracTickets for help on using tickets.
Back to Top