Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^2: parsing html

by qingxia (Novice)
on Mar 21, 2013 at 23:06 UTC ( #1024836=note: print w/replies, xml ) Need Help??

in reply to Re: parsing html
in thread parsing html

And to toyyink and kennethk, it is actually a dataset which i need to prepare for the next stage analysis. it comes in as several html files and each of them contains a rather stable pattern like:

id xxx borrower xxx date xxx ...

and i want to code them into some standard format which can be read by some commercial statistical software like stata. e.g.

id borrower date ... xxx xxxx xxxx
and it is a little too time-consuming to do it in excel, so i switch to perl as i really would like to learn it. doing by learning would be more fun. you can say it is a kind of a one-off project because i will (hope) not frequently parse HTML but thank you anyway for the suggestion, totally agreed with you. best regards,sh

Replies are listed 'Best First'.
Re^3: parsing html
by kennethk (Abbot) on Mar 22, 2013 at 14:20 UTC

    When I said "1-off context", this is exactly what I meant; a quick script to process 1 set of data. I wholly support your choice of regex for this task.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1024836]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2018-08-17 09:40 GMT
Find Nodes?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:

    Results (176 votes). Check out past polls.