| 84 | Repository data/dumps |
| 85 | ===================== |
| 86 | |
| 87 | Data and dumps from our source control repository. You could use this to mine |
| 88 | information about who's committing code, when, etc. |
| 89 | |
| 90 | There are a few ways of accessing this data: `Querying the SVN repo`_, |
| 91 | `the GitHub API`_, and `SVN data dumps`_ in a variety of formats. |
| 92 | |
| 93 | Querying the SVN repo |
| 94 | --------------------- |
| 95 | |
| 96 | Django's SVN repository is at http://code.djangoproject.com/svn/django/; |
| 97 | you can use the ``svn`` client binary to interact with this as a sort of "API". |
| 98 | In particular, most ``svn`` commands take a ``--xml`` argument to return data |
| 99 | in XML. For example, to get information about a particular commit you might |
| 100 | do something like:: |
| 101 | |
| 102 | $ svn log http://code.djangoproject.com/svn/django/trunk -r1234 --xml |
| 103 | <?xml version="1.0"?> |
| 104 | <log> |
| 105 | <logentry |
| 106 | revision="1234"> |
| 107 | <author>jacob</author> |
| 108 | <date>2005-11-14T18:50:13.298556Z</date> |
| 109 | <msg>Added NOINDEX tag to debug 500 page (for robots)</msg> |
| 110 | </logentry> |
| 111 | </log> |
| 112 | |
| 113 | There are also a number of libraries in Python (and other languages) that can |
| 114 | access SVN directly. `pysvn`_ seems to be a popular choice. |
| 115 | |
| 116 | .. _pysvn: http://pysvn.tigris.org/ |
| 117 | |
| 118 | The GitHub API |
| 119 | -------------- |
| 120 | |
| 121 | Django's repository is mirrored onto GitHub (http://github.com/django/django), |
| 122 | which means you can use `GitHub's API`_ to to pull commit data. For example:: |
| 123 | |
| 124 | $ curl -i https://api.github.com/repos/django/django/git/commits/a0d59b49019d65b38c5612eb0b4fab0bb37271ae |
| 125 | HTTP/1.1 200 OK |
| 126 | Server: nginx/1.0.4 |
| 127 | Date: Wed, 07 Sep 2011 16:38:12 GMT |
| 128 | Content-Type: application/json |
| 129 | Connection: keep-alive |
| 130 | Status: 200 OK |
| 131 | X-RateLimit-Limit: 5000 |
| 132 | X-RateLimit-Remaining: 4994 |
| 133 | Content-Length: 995 |
| 134 | |
| 135 | { |
| 136 | "parents": [ |
| 137 | { |
| 138 | "url": "https://api.github.com/repos/django/django/git/commits/6465e005fd564bd75ba64f2f09d5824ed2455c9c", |
| 139 | "sha": "6465e005fd564bd75ba64f2f09d5824ed2455c9c" |
| 140 | } |
| 141 | ], |
| 142 | "committer": { |
| 143 | "date": "2005-11-14T10:50:13-08:00", |
| 144 | "name": "jacob", |
| 145 | "email": "jacob@bcc190cf-cafb-0310-a4f2-bffc1f526a37" |
| 146 | }, |
| 147 | "author": { |
| 148 | "date": "2005-11-14T10:50:13-08:00", |
| 149 | "name": "jacob", |
| 150 | "email": "jacob@bcc190cf-cafb-0310-a4f2-bffc1f526a37" |
| 151 | }, |
| 152 | "message": "Added NOINDEX tag to debug 500 page (for robots)\n\ngit-svn-id: http://code.djangoproject.com/svn/django/trunk@1234 bcc190cf-cafb-0310-a4f2-bffc1f526a37\n", |
| 153 | "url": "https://api.github.com/repos/django/django/git/commits/a0d59b49019d65b38c5612eb0b4fab0bb37271ae", |
| 154 | "sha": "a0d59b49019d65b38c5612eb0b4fab0bb37271ae", |
| 155 | "tree": { |
| 156 | "url": "https://api.github.com/repos/django/django/git/trees/a5d296a396f5bbf70d074ce09fa947f95cd91523", |
| 157 | "sha": "a5d296a396f5bbf70d074ce09fa947f95cd91523" |
| 158 | } |
| 159 | } |
| 160 | |
| 161 | .. _github's api: http://developer.github.com/v3/ |
| 162 | |
| 163 | SVN data dumps |
| 164 | -------------- |
| 165 | |
| 166 | Finally, for convenience, we provide a couple of full dumps of repository data |
| 167 | for off-line processing: |
| 168 | |
| 169 | * `Complete SVN log`_ (bzipped XML; ~1 MB). This is the complete output of |
| 170 | ``svn log --xml``. |
| 171 | |
| 172 | * `Full SVN dump`_ (bziiped SVN dump; ~200 MB, expands to ~ 1.8 GB). This |
| 173 | is the result of a ``svnadmin dump``. |
| 174 | |
| 175 | Each dump is updated nightly. |
| 176 | |
| 177 | .. _complete svn log: https://www.djangoproject.com/m/data/django-svn-log.xml.bz2 |
| 178 | .. _full svn dump: https://www.djangoproject.com/m/data/django-svn.svndump.bz2 |
| 179 | |