http://www.perlmonks.org?node_id=625396

valavanp has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I need to convert a long url into tiny url.

For example in http://tinyurl.com, when you enter www.yahoo.com it makes into tinyurl as http://tinyurl.com/1hm. when you click this link it automatically goes to yahoo.com.

I want to know how this task is done. Did they do it by some algorithms or random number generation.

I searched in cpan, but i couldn't get. So monks, any of you had come across this kind of situation, please tell how to approach this.

Thanks monks for all your valuable time.

Replies are listed 'Best First'.
Re: Long url to tiny url.
by RMGir (Prior) on Jul 07, 2007 at 09:38 UTC
    There are a lot of different ways it could be done.

    But mainly, all you need is a database with 2 fields: realURL and tinyURL.

    When someone submits a realURL asking for a tinyURL, you check to see if that realURL is already there.

    If so, you give them the tinyURL that already exists for it.

    If not, you use some random method to generate tinyURLs until you get one that's not already in your database, insert (realURL, tinyURL) in the database, and give them the new tinyURL.

    When a tinyURL request comes in to your website, you generate an http-redirect request to the realURL.

    Actually generating random strings is pretty easy -- you could just have an array of "printable" characters like

    my @printable=('a'..'z','A'..'Z','0'..'9');
    , and use
    my $tiny=join "",map {$printable[rand(scalar @printable)]} 1..10;
    to generate "tiny" strings until you get one that's not in your table already...

    Mike
      If not, you use some random method to generate tinyURLs until you get one that's not already in your database ...

      Personally (i.e. I don't know if they are doing that way), I would just count issued tinyURLs (the count might come directly from the database), and represent that number in some number base, e.g. 62 for the printable chars you've specified.

      The advantage would be that it's simpler in the following ways: (1) I wouldn't have to check whether some randomly generated number is already in use, and (2) would be able to have the number of required digits grow just as needed, so the tinyURL is always as compact as possible. (With a random number, I'd have to choose ahead what width to use. Also, as full utilisation of the set is being approached, searching for a remaining free number would require more and more iterations...).

        If you just issue the next available alphanumeric sequence, as you say, the URL is always as small as possible. But the problem with that, or any other predictable sequence, is that it's open to abuse. Conversely, if you use a random number generator with a short period, as you say you'll get lots more collisions, while a long period means the URLs will be longer than necessary.

        Polonius
        Very nice. That would work quite well, yes.

        A better variant might be to seed some random number generator with a very long period, then generate the output strings -- that way, the consecutive tinyurl's wouldn't be predictable, and it would still be just as cheap to pre-generate them.

        If the generator can take its output as its seed, then all you need to do is save the last output value so you can restart the generator when it's time to generate the next set...


        Mike
Re: Long url to tiny url.
by Cody Pendant (Prior) on Jul 07, 2007 at 11:27 UTC
    I always just assumed the TinyURL people were generating URLs, as the requests came in, in some base where you don't have to add a column any time too soon, as in, the first 16 million TinyURLs are in the space 000000 .. ffffff and when you run out you move to the 00000000 .. ffffffff space, which should keep you going for a while. I mean, they're not actual numbers, so 00000000 isn't the same as 000000.


    Nobody says perl looks like line-noise any more
    kids today don't know what line-noise IS ...
Re: Long url to tiny url.
by lima1 (Curate) on Jul 07, 2007 at 10:36 UTC
    UPDATE: Reusing such URLs does not make any sense, so ignore this posting :(

    I think the easiest solution is just to generate a list with all possible strings - at least for the short ones. Advantages are that you can block strings easily for n days (just add another col in your list or database) and that you always generate the shortest possible string (sort list by string length and availability). If strings should not be reused, then almut's solution is the best.

    #!/usr/bin/perl use strict; use warnings; my @alphabet = ((1..9) , qw(a b c)); sub enum_words { my ($word, $maxlength) = @_; if (length $word < $maxlength) { for my $char (@alphabet) { enum_words($word . $char, $maxlength); } } else { print "$word\n"; } } for my $length (1 .. 4) { enum_words('', $length); }
Re: Long url to tiny url.
by aquarium (Curate) on Jul 09, 2007 at 00:49 UTC
    Personally I don't like the idea of some third party having control of major entry points to your website.
    Such services can have hidden costs: tracking users who visit your website, introduce advertising, etc. Long urls are a sign of "bad" site layout, as important sections of a site should have a short url already, or even a short url re-direct link controlled by yourself.
    The only reasonable use of these services is to hide the ugly urls for personal web pages hosted by ISPs. If you have any control over the DNS (create A record entries like "this_section.bla.com" rather than "www.bla.com/this_section") or use webserver static or even regexp re-direction.
    i think that if you have a website receiving decent traffic, you can spend a few bucks a year to get a decent domain name.
    the hardest line to type correctly is: stty erase ^H