Code


Version 2 (modified by jacob, 3 years ago) (diff)

--

Django's contribution data

Inspired by David Eaves, here's some information on accessing Django's contribution data.

Please take it, mash it up, and let show us the results!

If there's other data you'd like to see, please get in touch (jacob -at- jacobian.org) and let me know what you'd like to see. I'll do my best!

Trac's database

Data dumps out of Trac, our ticket tracking software.

There's two ways to access the data: Trac's RPC interface and the daily data dumps.

Daily data dumps

These are direct data dumps of the Trac database, collected nightly, in various formats. They're sanitized to remove some tables with senstive info (session data, etc.) but are otherwise complete.

Dumps are currently available in the following formats:

  • CSV (tar'd & bzipped directory; one CSV file per table; ~35MB).

The database schema is documented at http://trac.edgewall.org/wiki/TracDev/DatabaseSchema. The most interesting tables are probabably the ticket and ticket_change tables. ticket_change, in particular, contains each change ever made to a ticket and so probably has some of the most itnersting data available.

Trac's RPC interface

Trac has a XML-RPC and JSON-RPC interface. You view some documentation of these APIs at:

https://code.djangoproject.com/xmlrpc

Note

You'll need to be logged in to access this page and to access the data. If you need to create an account, the sign-up page is at https://www.djangoproject.com/accounts/register/.

The base URLs you'll use for for the XML-RPC and JSON-RPC APIs is:

https://{username}:{password}@code.djangoproject.com/login/rpc

The easiest way to access these APIs is with Python's xmlrpclib library. Here's a quick example:

>>> import xmlrpclib
>>> rpc_url = "https://USERNAME:PASSWORD@code.djangoproject.com/login/rpc"
>>> trac = xmlrpclib.ServerProxy(rpc_url)

# Get a single ticket's info.
>>> ticket, time_created, time_changed, attributes = trac.ticket.get(1337)
>>> attributes['resolution']
'wontfix'

# Perform a search. - counts the open (i.e. not-closed) tickets.
# Query syntax is documented at http://trac.edgewall.org/wiki/TracQuery#QueryLanguage
>>> not_closed = trac.ticket.query('status=!closed&max=5000')
>>> len(not_closed)
1850

Please be careful here. There are APIs that write data and using them could look like spam, so please ask me (jacob -at- jacobian.org) for permission first!

Mashups

If you create a mashup, please add it here!

Questions?

If you've got questions, please contact Jacob Kaplan-Moss (jacob -at- jacobian.org).