Django

Code

Ticket #6965 (closed: fixed)

Opened 8 months ago

Last modified 5 months ago

urlize should be faster

Reported by: floguy Assigned to: andrewbadr
Milestone: Component: Template system
Version: SVN Keywords:
Cc: floguy@gmail.com, andrew@disqus.com Triage Stage: Ready for checkin
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

Running the urlize or urlizetrunc filter on some text that is very large, or running it many times seems to slow down the system more than it should.

Attachments

faster_urlize.diff (4.9 kB) - added by floguy on 04/05/08 15:18:04.
Improved speed to about 50x faster. Tests still pass.
faster_urlize_r7936_t6965.diff (1.3 kB) - added by andrewbadr on 07/16/08 15:49:12.
Makes urlize faster

Change History

04/05/08 05:00:56 changed by mrts

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I'd be +1 on no email-to-mailto by default and adding an extra filter urlize_with_emails that supports email-to-mailto.

04/05/08 15:13:53 changed by floguy

  • cc set to floguy@gmail.com.
  • owner changed from nobody to floguy.
  • has_patch set to 1.
  • status changed from new to assigned.

04/05/08 15:18:04 changed by floguy

  • attachment faster_urlize.diff added.

Improved speed to about 50x faster. Tests still pass.

04/06/08 20:32:26 changed by SmileyChris

From a quick skim of your code, it seems like you're changing how the escaping works - what if a URL has an ampersand in it? Potentially couldn't you break the HTML if it trimmed to a limit half way in an &?

04/07/08 01:19:11 changed by floguy

  • needs_better_patch set to 1.

Great point! I'm going to have to think about how to overcome that, since it'll probably foil this whole current scheme.

05/02/08 11:17:30 changed by dnaquin@gmail.com

floguy, are you still working on this? Otherise, I can take a stab at it.

05/06/08 02:58:48 changed by floguy

No, go ahead. I implemented it another way but it made the code almost as slow as the old version :-(

05/22/08 18:57:51 changed by devin

  • owner changed from floguy to devin.
  • status changed from assigned to new.

06/03/08 21:07:02 changed by devin

  • owner deleted.

I gave my shot at this these last couple days, but the overhead seems to largely be in the regular expressions. punctuation_re especially seems clunky. I, however, don't know enough about regular expressions to optimize them.

07/15/08 18:27:40 changed by andrewbadr

  • owner set to andrewbadr.
  • status changed from new to assigned.

07/16/08 15:49:12 changed by andrewbadr

  • attachment faster_urlize_r7936_t6965.diff added.

Makes urlize faster

07/16/08 15:54:45 changed by andrewbadr

  • cc changed from floguy@gmail.com to floguy@gmail.com, andrew@disqus.com.
  • needs_better_patch deleted.
  • component changed from Uncategorized to Template system.

I created a patch that skips the regex match for most words. This resulted in 10x speed improvement on a test set of real posts. The reason for checking for '@' and ':' on line 98 is because I don't want to change the behavior for e.g. http://localhost/ even though it's not clear whether this should be supported or not (I don't think so). The len call was removed for good measure.

07/16/08 18:24:13 changed by SmileyChris

  • stage changed from Unreviewed to Ready for checkin.

Patch looks good. Existing tests will be enough so I it's ready to rock.

07/19/08 13:05:22 changed by mtredinnick

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [7985]) Fixed #6965 -- Sped up the urlize and urlizetrunc filters. A nice patch from Andrew Badr.


Add/Change #6965 (urlize should be faster)




Change Properties
Action