Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: parsing html

by kennethk (Monsignor)
on Mar 21, 2013 at 21:31 UTC ( #1024822=note: print w/ replies, xml ) Need Help??


in reply to parsing html

McA's solution will fix your immediate question, but if you are parsing HTML in anything other than an educational or 1-off context, I would suggest you use a CPAN module rather than reinvent the wheel; perhaps HTML::Parser or Mojo::DOM would be helpful. HTML in the wild is notoriously hard to handle in a general way.


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re: parsing html
Re^2: parsing html
by qingxia (Novice) on Mar 21, 2013 at 23:06 UTC

    And to toyyink and kennethk, it is actually a dataset which i need to prepare for the next stage analysis. it comes in as several html files and each of them contains a rather stable pattern like:

    id xxx borrower xxx date xxx ...

    and i want to code them into some standard format which can be read by some commercial statistical software like stata. e.g.

    id borrower date ... xxx xxxx xxxx
    and it is a little too time-consuming to do it in excel, so i switch to perl as i really would like to learn it. doing by learning would be more fun. you can say it is a kind of a one-off project because i will (hope) not frequently parse HTML but thank you anyway for the suggestion, totally agreed with you. best regards,sh

      When I said "1-off context", this is exactly what I meant; a quick script to process 1 set of data. I wholly support your choice of regex for this task.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1024822]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2014-08-29 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls