valavanp has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks,
I need to convert a long url into tiny url.
For example in http://tinyurl.com, when you enter www.yahoo.com it makes into tinyurl as http://tinyurl.com/1hm. when you click this link it automatically goes to yahoo.com.
I want to know how this task is done. Did they do it by some algorithms or random number generation.
I searched in cpan, but i couldn't get. So monks, any of you had come across this kind of situation, please tell how to approach this.
Thanks monks for all your valuable time.
Re: Long url to tiny url.
by RMGir (Prior) on Jul 07, 2007 at 09:38 UTC
|
There are a lot of different ways it could be done.
But mainly, all you need is a database with 2 fields:
realURL and tinyURL.
When someone submits a realURL asking for a tinyURL, you
check to see if that realURL is already there.
If so, you give them the tinyURL that already exists for it.
If not, you use some random method to generate tinyURLs until you get one that's not already in your database,
insert (realURL, tinyURL) in the database, and give them the new tinyURL.
When a tinyURL request comes in to your website, you generate an http-redirect request to the realURL.
Actually generating random strings is pretty easy -- you could just have an array of "printable" characters like
my @printable=('a'..'z','A'..'Z','0'..'9');
, and use
my $tiny=join "",map {$printable[rand(scalar @printable)]} 1..10;
to generate "tiny" strings until you get one that's not in your table already...
| [reply] [d/l] [select] |
|
If not, you use some random method to generate tinyURLs until you get one that's not already in your database ...
Personally (i.e. I don't know if they are doing that way), I would
just count issued tinyURLs (the count might come directly from the
database), and represent that number in some number base, e.g. 62 for
the printable chars you've specified.
The advantage would be that it's
simpler in the following ways: (1) I wouldn't have to check whether
some randomly generated number is already in use, and (2) would be able
to have the number of required digits grow just as needed, so the
tinyURL is always as compact as possible. (With a random number, I'd have
to choose ahead what width to use. Also, as full utilisation of the set
is being approached, searching for a remaining free number would
require more and more iterations...).
| [reply] |
|
If you just issue the next available alphanumeric sequence, as you say, the URL is always as small as possible. But the problem with that, or any other predictable sequence, is that it's open to abuse. Conversely, if you use a random number generator with a short period, as you say you'll get lots more collisions, while a long period means the URLs will be longer than necessary.
| [reply] |
|
|
Very nice. That would work quite well, yes.
A better variant might be to seed some random number generator with a very long period, then generate the output strings -- that way, the consecutive tinyurl's wouldn't be predictable, and it would still be just as cheap to pre-generate them.
If the generator can take its output as its seed, then all you need to do is save the last output value so you can restart the generator when it's time to generate the next set...
| [reply] |
Re: Long url to tiny url.
by Cody Pendant (Prior) on Jul 07, 2007 at 11:27 UTC
|
I always just assumed the TinyURL people were generating URLs, as the requests came in, in some base where you don't have to add a column any time too soon, as in, the first 16 million TinyURLs are in the space 000000 .. ffffff and when you run out you move to the 00000000 .. ffffffff space, which should keep you going for a while. I mean, they're not actual numbers, so 00000000 isn't the same as 000000.
Nobody says perl looks like line-noise any more
kids today don't know what line-noise IS ...
| [reply] |
Re: Long url to tiny url.
by lima1 (Curate) on Jul 07, 2007 at 10:36 UTC
|
UPDATE: Reusing such URLs does not make any sense, so ignore this posting :(
I think the easiest solution is just to generate a list with all possible strings - at least for the short ones. Advantages are that you can block strings easily for n days (just add another col in your list or database) and that you always generate the shortest possible string (sort list by string length and availability). If strings should not be reused, then almut's solution is the best.
#!/usr/bin/perl
use strict;
use warnings;
my @alphabet = ((1..9) , qw(a b c));
sub enum_words {
my ($word, $maxlength) = @_;
if (length $word < $maxlength) {
for my $char (@alphabet) {
enum_words($word . $char, $maxlength);
}
}
else {
print "$word\n";
}
}
for my $length (1 .. 4) {
enum_words('', $length);
}
| [reply] [d/l] |
Re: Long url to tiny url.
by aquarium (Curate) on Jul 09, 2007 at 00:49 UTC
|
| [reply] |
|
|