Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Long url to tiny url.

by RMGir (Prior)
on Jul 07, 2007 at 09:38 UTC ( #625401=note: print w/replies, xml ) Need Help??

in reply to Long url to tiny url.

There are a lot of different ways it could be done.

But mainly, all you need is a database with 2 fields: realURL and tinyURL.

When someone submits a realURL asking for a tinyURL, you check to see if that realURL is already there.

If so, you give them the tinyURL that already exists for it.

If not, you use some random method to generate tinyURLs until you get one that's not already in your database, insert (realURL, tinyURL) in the database, and give them the new tinyURL.

When a tinyURL request comes in to your website, you generate an http-redirect request to the realURL.

Actually generating random strings is pretty easy -- you could just have an array of "printable" characters like

my @printable=('a'..'z','A'..'Z','0'..'9');
, and use
my $tiny=join "",map {$printable[rand(scalar @printable)]} 1..10;
to generate "tiny" strings until you get one that's not in your table already...


Replies are listed 'Best First'.
Re^2: Long url to tiny url.
by almut (Canon) on Jul 07, 2007 at 10:03 UTC
    If not, you use some random method to generate tinyURLs until you get one that's not already in your database ...

    Personally (i.e. I don't know if they are doing that way), I would just count issued tinyURLs (the count might come directly from the database), and represent that number in some number base, e.g. 62 for the printable chars you've specified.

    The advantage would be that it's simpler in the following ways: (1) I wouldn't have to check whether some randomly generated number is already in use, and (2) would be able to have the number of required digits grow just as needed, so the tinyURL is always as compact as possible. (With a random number, I'd have to choose ahead what width to use. Also, as full utilisation of the set is being approached, searching for a remaining free number would require more and more iterations...).

      If you just issue the next available alphanumeric sequence, as you say, the URL is always as small as possible. But the problem with that, or any other predictable sequence, is that it's open to abuse. Conversely, if you use a random number generator with a short period, as you say you'll get lots more collisions, while a long period means the URLs will be longer than necessary.


        Good point! (It always surprises me how you monks can turn anything into an interesting discussion :)

        Another way to deal with the problem of predictability would be to apply some 'salting' mechanism. For example, in its most simple case, prepend (or append) a random digit to the number, and use that to XOR the other digits (many other algorithms are conceivable, of course). That would considerably reduce predictability, while retaining most of the desirable properties of the simple counting approach.

        Yet another way would be to just check every tinyURL against a list of potentially vulgar or otherwise abusive words, and, if found, just skip that particular tinyURL. The advantage of that would also be that innocent users of the service wouldn't accidentally be assigned URLs, which might require them to add apologies or further explanations when sending out such an URL...

      Very nice. That would work quite well, yes.

      A better variant might be to seed some random number generator with a very long period, then generate the output strings -- that way, the consecutive tinyurl's wouldn't be predictable, and it would still be just as cheap to pre-generate them.

      If the generator can take its output as its seed, then all you need to do is save the last output value so you can restart the generator when it's time to generate the next set...


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://625401]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-06-13 09:36 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (54 votes). Check out past polls.