Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Web Address Validation

by scuttsuk (Initiate)
on Apr 22, 2013 at 11:31 UTC ( #1029846=perlquestion: print w/ replies, xml ) Need Help??
scuttsuk has asked for the wisdom of the Perl Monks concerning the following question:

Write a script which will read a server log file and return the percentage of addresses which are businesses. Can you modify the script so that it will print out the name of the business (e.g ikea,tesco) and how often they were hit. You will have to make some assumptions about how the name of the company is stored in its web address. http://www.tesco.com http://www.mmu.ac.uk http://www.amazon.com http://www.ikea.co.uk http://plus.maths.org Any help would be greatly appreciated.

Comment on Web Address Validation
Re: Web Address Validation
by Anonymous Monk on Apr 22, 2013 at 11:38 UTC

    Any help would be greatly appreciated.

    Hire a programmer :)

Re: Web Address Validation
by Corion (Pope) on Apr 22, 2013 at 11:43 UTC

    As you seem to have simply pasted your programming assignment here without showing where you have the actual problem, here are some interesting modules that will mostly solve this problem when put together in the correct sequence. Finding that sequence is called "programming" and something you have to do yourself.

Re: Web Address Validation
by hdb (Prior) on Apr 22, 2013 at 13:24 UTC

    If this assignment were part of an obfuscation contest, I would say this:

    $_=$,;$,=$/;$/=$_;$_=<DATA>;print/([-_a-z_-]+)\x2eco(?:m|m?\x2e[a-z][a +-z])\s/gi; __DATA__ http://www.tesco.com http://www.mmu.ac.uk http://www.amazon.com http:/ +/www.ikea.co.uk http://plus.maths.org
Re: Web Address Validation
by CountZero (Bishop) on Apr 22, 2013 at 15:51 UTC
    I am most interested to see who will find a good rule to check which URL links to a business.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      m#\.co(?:m|m?\.[a-z][a-z])/?$# is the first thing that comes to mind without actually hitting the network. .com stands for commercial ... well, once stood for. Nobody seems to remember what it is supposed to be used for anymore :(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1029846]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2014-10-25 09:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (142 votes), past polls