Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Imploding URLs

by TilRMan (Friar)
on Jun 10, 2005 at 04:44 UTC ( #465414=note: print w/replies, xml ) Need Help??


in reply to Re^2: Imploding URLs
in thread Imploding URLs

So though I have a large collection of URLs (from my logs), I need to "implode" URLs on a one-by-one basis.

Why? Is the space savings that significant?

If all you have is a handful of substitutions, you can probably hand-pick the strings:

http www. .com .org .net :// index .htm .jpg .gif google yahoo mail news ebay

Replies are listed 'Best First'.
Re^4: Imploding URLs
by mobiGeek (Beadle) on Jun 11, 2005 at 03:54 UTC
    Yes, the savings is quite significant. From a hand-selected list of one user's habits, I was able to reduce some pages by more than 20%.

    The other thing is that this is not a collection of URLs from across the entire web. The URLs being crawled vary, but the proxy is part of a kind of "portal". So there are potentially thousands of URLs, but they come from a select list of sites. Thus the reason I am looking for the weighting of substrings.

    If one URL or one particular site (i.e. a particular substring) is crawled extremely frequently, then imploding that string might be much more bandwidth saving than simply imploding "http://" on all URLs.

    mG.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://465414]
help
Chatterbox?
jedikaiti is glad to see the conditioning persists...
[jedikaiti]: And yes, I have saved down a copy and added it to my walpaper rotation
[zentara]: jedikaiti my condition has a condition :-)
[jedikaiti]: :-D
[Your Mother]: ♫ You say apod, I say Perl POD ... let's call the whole thing off ♬

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2017-08-18 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Who is your favorite scientist and why?



























    Results (304 votes). Check out past polls.

    Notices?