Code


Version 21 (modified by russellm, 3 years ago) (diff)

Added notes about Lazy FKs with auth.User

Google's Summer of Code 2011

Django is once again a mentoring organisation for the 2011 Google Summer of Code. (Read Google's page for more information on how the program works.)

Django's GSoC program is being run by Andrew Godwin (andrew at aeracode.org)

Mentors

If you're interested in mentoring -- supervising a student in work on Django-related activities -- add your name, email, and the sort of projects you're interested in mentoring here:

  • Arthur Koziel (arthur@…) -- Interested in "Enhanced auth.user"
  • Alex Gaynor (alex.gaynor@…) -- Interested in Py3K and general awesomeness.
  • Russell Keith-Magee (russell@…) -- serialization, ORM improvements, testing
  • Jannis Leidel (jannis@…) -- admin, forms, i18n
  • Andrew Godwin (andrew@…) -- schema alteration

Students

Student application period opens March 28 and ends on April 8.

If you'd like to get started on your proposal early, we'll be looking for a few things.

  • You'll need to have a concrete task in mind (some ideas are below) along with a solid idea of what will constitute "success" (you tell us).
  • If your proposal is a single large feature, you'll need to present a detailed design specification. This proposal should be posted to django-developers, where it can be refined until it is accepted by the developer community.
  • We'll want to know a bit about you -- links to previous work are great, if any. If you're proposing something ambitious, you'll need to convince us that you're up to the task.
  • You'll also need to provide us with a schedule, including a detailed work breakdown and major milestones so your mentor can know if and when to nag you :)

Note that none of the ideas below are good enough to be submissions in their own right (so don't copy and paste)! We'll want to know not just what you want to do but how you plan to pull it off.

Don't feel limited to the ideas below -- if you've got a cool project you want to work on, we'll probably be able to find you a mentor. We plan on approving as many projects as we possibly can.

Note: we're looking for projects that add value to Django itself - not application/CMS projects that use Django.

You should also note that as far as proposals go, we don't make a distinction between a GSoC project and any other proposal for a new feature. When you contribute code, you will be expected to adhere to the same contribution guidelines as any other code contributor. This means you will be expected to provide extensive tests and documentation for any feature you add, you will be expected to participate in discussion on django-developers when your topic of interest is raised. If you're not already familiar with Django's contribution guidelines, now would be a good time to read them.

Communication

This year we're doing all GSOC-related communication via the django-developers mailing list. Any proposals for GSOC should be submitted there, as well as discussion on the proposed projects and any updates that students post.

Please be careful to keep content to the list clear and purposeful; if you have an idea, update, or criticism, please make sure you describe it in detail; it can be tedious asking people to clarify any vague statements, or having vital information drip-fed.

Ideas

Here are some suggestions for projects students may want to propose (please feel free add to this list!). This isn't by any means the be-all and end-all of ideas; please feel free to submit proposals for things not on this list. Remember, we'd much prefer that you posted a draft proposal and your rough timeline / success conditions to the django-developers list, even if it's already on the list below; it will help you get feedback on choosing the right part of a problem, as well as helping to see if there is any interest before you start drafting a full proposal.

When developing your proposal, try to scope ideas/proposals to the 4-month timeline -- simply proposing to fix a ticket or two will probably result in your proposal being rejected in favor of a more ambitious one. The GSoC does not cover activities other than coding, so certain ideas ("Write a more detailed tutorial" or "Create demonstration screencasts" or "Add a pony?") are not suitable for inclusion here.

On the other side, though, be sure to be concrete in your proposal. We'll want to know what your goals are, and how you plan to accomplish them.

In no particular order:

Template compilation

  • Complexity: High

A common criticism of Django's template language is that it is too slow. One reason for this is that the rendering process is handled at a very high level, interpreting a tree of tree nodes that have been generated by parsing the template source file.

Other Python-based template languages gain significant speedups by compiling templates directly to Python bytecode. A Django template compiler would allow for similar templating speedups.

Issues to consider:

  • How does Django's template variable scoping rules map to a compilation scheme?
  • Django Template tags are able to modify the context as the template is rendered. How does this affect the compilation process?
  • How should we handle the upgrade path when compiled templates are added to trunk?

See also:

Enhanced auth.user

  • Complexity: High

One of the most common class of questions on django-users surrounds issues of customizing Django's User model. For example:

  • How can I use an email address as a username?
  • I want to use Twitter/OAuth/Facebook to login - why can't I leave the username field empty?
  • How can I make the username field N characters longer/shorter?
  • How can I allow [insert random character] in usernames?
  • How can I have a single "name" field instead of "first_name"/"last_name"?

At present, there is no easy answer to these questions. Use of Django User model is not mandatory, but it is a dependency for a lot of Django applications. It is possible to do some of these customizations using some tricks or by manually modifying the contrib.auth source code, but these are not good solutions for novice users.

Ticket #3011 describes one approach that has been rejected - the idea of a 'pluggable' User model.

Note: This isn't a problem with an existing worked solution. A successful proposal on this project will require extensive discussion on django-developers.

Note: No, really -- this isn't a problem with a worked solution -- and the great solution you just thought of? It's been proposed before and rejected. This is a *HARD* project.

Issues to consider:

  • How can we represent the generic idea of a User without reducing the user table to little more than an identifying primary key?
  • How can we differentiate the ideas of identity, permission and authentication?
  • How can we manage the dependencies that exist in contrib.admin (and other parts of Django core and Django.contrib) that rely on the internals of auth.User as currently implemented?
  • How can we roll out a new/modified User model without requiring almost every Django application on the planet to undergo a complex database modification?

See also:

Improved error reporting

  • Complexity: Medium

The error messages raised by Django can sometimes be confusing or misleading. This is sometimes due to Django wrapping and re-raising errors when it shouldn't. Sometimes it's due to Django not displaying error information effectively. Sometimes it's simply a matter of not catching the right errors.

This should be fixed. Error messages are just as important to the development process as good documentation. This project would address the error reporting issues in Django to ensure that the errors reported by a Django project are as good as they can be.

Issues to consider:

  • Import errors discovered during application loading during can be masked under certain circumstances.
  • Errors in template tags and filters rarely produce helpful error messages.
  • Errors in ModelForm and ModelAdmin can raise errors that don't indicate the real problem

See also:

Improve annotation and aggregation

  • Complexity: Medium

The 2009 Summer of Code added the annotate() and aggregate() calls to Django's query arsenal. While these tools work well for simple arithmetic aggregates, they don't work well for date and string based queries. There are also use cases where you may want to annotate data onto a model that *isn't* an aggregate (for example, annotating the sum of two other aggregates).

This project would continue where the 2009 GSoC aggregation project left off. This would be an excellent project for anyone wishing to gain an intimate understanding of Django's Query infrastructure.

Issues to consider:

  • String concatenation and manipulation (e.g., annotate a model with the uppercase version of the first 5 characters of someone's name)
  • Grouping of results by date (e.g., show me a count of articles, grouped by day)
  • Allowing non-null defaults in aggregation (e.g., when a model has no related objects, use 0 not NULL)
  • Aggregates involving generic relations

See also:

Multiple timezone support for datetime representation

  • Complexity: Medium

Currently The TIME_ZONE Django setting allows PostgreSQL-backed installations to run project/application on timezones different from each other and from the system timezone of the server. Also, the information of DateTime fields is retrieved from the database as naïve Python datetime instances and when the DB backend is PostgreSQL the data sent and retrieved to/from the DB is corrected by the TIME_ZONE value.

But if you need to have:

  • date+time data to be entered in different locations using the local time
  • such data be displayed in the local time at different locations different from the location where it was originally entered.

then more granularity is needed so different instances of date+time inside one application can be handled in a way that takes in account its timezone.

An additional possibility would to create an additional presentation layer, where an user location/timezone preference can influence and personalize the display of date+time's (see the Django template filter idea in one of the thread linked below.)

Other advantages of a solution to this problem could be: Isolation from daylight saving time political policy changes and isolation from changes on time zones should the hosting of a production application be moved form one geographical location to another.

Issues to consider:

  • Compatibility with all the DB backend officially supported by Django
  • Backwards compatibility: Existing installations shouldn't be affected at all regarding the storage/interpretation of DateTime model fields values

See also:

Customizable serialization

  • Complexity: Minor

Django's current serializer implementation imposes some restrictions that limit the usefulness of the serializers outside of fixture loading. The basic serialization format, for example, can't be changed.

The aim of this project would be to deliver a fully customizable serialization framework. Ideally, this would be a class-based structure that allows users to define their own serialization format (including different output structure, including non-model fields, etc). The end goal is that you should be able to output any object (or list of objects), in any format, to any depth, with any additional information that might be relevant in a serialization context.

In short, anywhere we have made an arbitrary design decision with Django's existing serializers, that decision should be customizable as an end user.

When developing your proposal, the proof of concept is that you should be able to define Django's existing serialization formats using your new serialization format.

Issues to consider:

  • Serializing nested structures (of arbitrary depth)
  • Serializing subsets of model attributes
  • Serializing non-database attributes/properties
  • Serialized output that doesn't match the current default output format (i.e., a model in JSON doesn't have to be {"pk": XX, "model": "myapp.foo", "fields": {...}} )
  • Serialized output format that can change on a per-model basis
  • Serialized output format that can change based on where in the output tree the object is located (e.g., output the full User object if it's included from within model X, but only output the username if its included from within model Y)
  • In an XML context, control over the tags, namespaces, attributes and nesting structures in the final XML
  • In a JSON/YAML context, control over the use of lists, dictionaries etc, as well as the choice of key names for dictionaries.

See also:

IPv6 support

  • Complexity: Minor

Django doesn't currently provide support for IPv6. This project would update Django to provide support for IPv6 wherever Django currently uses IPv4 addresses.

Issues to consider:

  • Can IPv6 support be added to model fields without adding a new field type? Add ipv6=False kwarg to IPAddressField?
  • Is there anywhere in the WSGI/FCGI interface where IPv6 issues exist, but are currently unreported
  • Can IPv6 support be added to configuration files (e.g., to specify memcache interfaces) in a transparent fashion?

See also:

Best practices updates

  • Complexity: Moderate

Over the years, as Django has evolved, the idea of what constitutes "best practice" has also evolved. However, some parts of Django haven't kept up with those best practices. For example, contrib.comments and contrib.databrowse aren't deployable apps in the same sense as contrib.admin. As a result, these apps can't be (easily) deployed multiple times, and they can't use URL namespacing.

In addition, some features of Django's core have grown and evolved, and need refactoring. For example, validation is now performed in several places, but don't operate by hooking into the core 'validate' command. In addition, many aspects of the core validate command should be farmed out to the things that are being validated (e.g., the max/min conditions on a field should be validated by the field, not by a third party validator).

In short, Django has been bad at eating it's own dogfood. The contents of contrib should be audited and updated to make sure it meets current best practices.

Issues to consider:

  • What components need to be updated, and why?
  • How to do this update while maintaining backwards compatibility?

See also:

Validation functionality revamping

  • Complexity: Moderate

This idea has some overlap with the previous one.

Django currently has a validation framework: A static, monolithic collection of checks implemented in Python code that is automatically executed before the syncdb or runserver commands and whose functionality is available through the validate management command. It is given the chance to inspect the model definitions of installed apps and can flag errors to the developer during the development phase.

But there is the possibility to expand it to increase its usefulness. These are some ideas that have been proposed so far:

  • Add the concept of warnings, as opposed to the current hard errors. This would mean a refactoring of the code into a more generic framework so we can defer validation to individual fields or to the database backend, as required. Some scenarios where it would be of help to developers by pointing some non-fatal but potential problems:
    • Some database backends have some reserved names for database columns (e.g. Oracle doesn't accept columns named date or number)
    • Some fields names chosen by the developer can clash with names of ORM query lookups.
  • Provide a mechanism so applications can hook and get their own validation code run at this point. See for example ticket #8579.

See also:

Javascript test framework

  • Complexity: Low

Django has an extensive test framework for Python code, a suite of tools to make server-side testing easier, and a project policy that no new code is added without tests. This has been a significant contributor to the stability of Django as a project.

However, Django also has client-side components, and these are not tested. Django doesn't currently have any systematic way to test Javascript. As a result, large parts of Django's public-facing code are not tested, and are prone to regressions and failures -- most notably, Django's admin, and the handling of inline form elements.

We need a set of tools that allow us to test the Javascript code that forms part of Django's codebase, and a set of tests to validate the behavior of contrib.admin's widgets (and other admin components).

Issues to consider:

  • How to handle cross-browser differences? Should we use a tool like Selenium to do live tests, or write genuine unit tests of Javascript as a scripting language?
  • How to clearly identify javascript tests at runtime? It should be possible to "just run the GUI tests" or "just run the code tests". This may tie into a broader requirement to differentiate "integration tests" (which validate that an app is installed correctly) from "system tests" (which validate that an app works correctly internally)

See also:

Schema Alteration

  • Complexity: Medium

Django has, for many years, lacked any kind of schema alteration (an idea fundamental to database migrations) in core. Projects like South have become very popular as they fill this gap, and so we're looking to try and bridge the gap and start merging some relevant functionality into Django.

In particular, schema alteration backends are the first step. Each database has different methods of changing tables, indexes, and constraints; South has code for the five most popular databases, but it's entirely separate from Django. The idea is to merge these backends into the core Django code (and the concept of a Django database backend), supplementing or replacing the "creation" modules with an "alteration" module.

Once these backends are merged in, the South codebase can be heavily simplified (leaving just features like autodetection and ORM versioning), and other migration frameworks suddenly become a lot easier to write (as now the hard task of working around lack of features in MySQL and SQLite, and dealing with the differing syntaxes of each database is all done already).

Issues to consider:

  • How would the current creation module in database backends be affected? Would you leave it as-is, or refactor it to use the new alteration code?
  • How will you deal with a lack of features in various backends? South has workarounds for some, but others, such as properly managing indexes, are very difficult.
  • How will you make sure the new API is flexible enough to work with not just South, but other current and future migration frameworks?

Integrate databrowse into the admin

  • Complexity: Medium

The BDFLs of Django suggested merging databrowse into the admin two years ago: see #8936. Jacob stated the goals clearly on #django-dev today: <jacobkm> Personally, I think merging the functionality into the admin would be pretty awesome -- it'd get rid of mostly vestigal code, and it'd also provide a feature ("read permissions") that people have been clamoring for for ages.

The first lines of the docs for databrowse highlight the current situation:

  • the second sentence, while attempting to illustrate the difference between the purposes of the admin and of databrowse, actually shows that they are very similar,
  • the app is described as new and unstable, and at the same time it suffers from several long-standing, sometimes trivial, bugs, showing a lack of maintenance.

Given this situation, there is a consensus — at least today in #django-dev — to pull databrowse out of Django.

Merging databrowse into the admin would also resolve a common complaint: currently, there is no easy way to give a read-only access to the admin. Available permissions include "add", "change", "delete" but not "view".

The first step is to evaluate precisely the functionality that databrowse provides. The docs only show how to configure it, so reading the source and experimenting is important. Then you will see what is missing from the current admin to reach equivalent functionality. The most obvious points are:

  • The "automatic browsability": date-based filters, automatic linking to related models, etc. These are possible in the admin but require code.
  • The read only permissions. The concept is fairly simple. A good implementation should take care to hide all edition-related UI elements for users that do not have write permissions. Expressing this cleanly within the current admin app will require some work.

The following issues should be considered:

  • How can you improve the "browsability" of the admin (explore/show mode) while keeping it efficient as an edition interface (find/edit mode)?
  • How can you implement the "automatic browsability" described above while preserving the possibility to customize the display, like the current admin (fields, fieldsets, list_display, list_filter and friends)?
  • How will you provide a simple migration path for existing databrowse users?