Code

Opened 6 years ago

Closed 6 years ago

#6965 closed (fixed)

urlize should be faster

Reported by: floguy Owned by: andrewbadr
Component: Template system Version: master
Severity: Keywords:
Cc: floguy@…, andrew@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Running the urlize or urlizetrunc filter on some text that is very large, or running it many times seems to slow down the system more than it should.

Attachments (2)

faster_urlize.diff (4.9 KB) - added by floguy 6 years ago.
Improved speed to about 50x faster. Tests still pass.
faster_urlize_r7936_t6965.diff (1.3 KB) - added by andrewbadr 6 years ago.
Makes urlize faster

Download all attachments as: .zip

Change History (14)

comment:1 Changed 6 years ago by mrts

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I'd be +1 on no email-to-mailto by default and adding an extra filter urlize_with_emails that supports email-to-mailto.

comment:2 Changed 6 years ago by floguy

  • Cc floguy@… added
  • Has patch set
  • Owner changed from nobody to floguy
  • Status changed from new to assigned

Changed 6 years ago by floguy

Improved speed to about 50x faster. Tests still pass.

comment:3 Changed 6 years ago by SmileyChris

From a quick skim of your code, it seems like you're changing how the escaping works - what if a URL has an ampersand in it? Potentially couldn't you break the HTML if it trimmed to a limit half way in an &?

comment:4 Changed 6 years ago by floguy

  • Patch needs improvement set

Great point! I'm going to have to think about how to overcome that, since it'll probably foil this whole current scheme.

comment:5 Changed 6 years ago by dnaquin@…

floguy, are you still working on this? Otherise, I can take a stab at it.

comment:6 Changed 6 years ago by floguy

No, go ahead. I implemented it another way but it made the code almost as slow as the old version :-(

comment:7 Changed 6 years ago by devin

  • Owner changed from floguy to devin
  • Status changed from assigned to new

comment:8 Changed 6 years ago by devin

  • Owner devin deleted

I gave my shot at this these last couple days, but the overhead seems to largely be in the regular expressions. punctuation_re especially seems clunky. I, however, don't know enough about regular expressions to optimize them.

comment:9 Changed 6 years ago by andrewbadr

  • Owner set to andrewbadr
  • Status changed from new to assigned

Changed 6 years ago by andrewbadr

Makes urlize faster

comment:10 Changed 6 years ago by andrewbadr

  • Cc andrew@… added
  • Component changed from Uncategorized to Template system
  • Patch needs improvement unset

I created a patch that skips the regex match for most words. This resulted in 10x speed improvement on a test set of real posts. The reason for checking for '@' and ':' on line 98 is because I don't want to change the behavior for e.g. http://localhost/ even though it's not clear whether this should be supported or not (I don't think so). The len call was removed for good measure.

comment:11 Changed 6 years ago by SmileyChris

  • Triage Stage changed from Unreviewed to Ready for checkin

Patch looks good. Existing tests will be enough so I it's ready to rock.

comment:12 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [7985]) Fixed #6965 -- Sped up the urlize and urlizetrunc filters. A nice patch from Andrew Badr.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.