Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
go ahead... be a heretic
 
PerlMonks  

Re: Regular Expression: I need a regex to fetch data from an html file

by CountZero (Chancellor)
on Feb 27, 2012 at 09:57 UTC ( #956411=note: print w/ replies, xml ) Need Help??


in reply to Regular Expression: I need a regex to fetch data from an html file

Actually, I think it would be handy if you tell us what data you wish to extract. Is it only the numeric parameter of openInvoice[0]?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics


Comment on Re: Regular Expression: I need a regex to fetch data from an html file
Download Code
Re^2: Regular Expression: I need a regex to fetch data from an html file
by Anonymous Monk on Feb 27, 2012 at 10:27 UTC
    <tr><td id='Auf'>50956866</td> <td id='Ku'>D510848</td> <td id='Rec'>18.10.2011</td> <td id='Re'>EUR 118,95</td> <td id='Za'>EUR 0,00</td> <td id='Off'>EUR 118,95</td>
    this was my html file from where I wanted to extract the data but finally I have the solution just wanted to share this with you monks the Regex:
    <td id='AuftragsId'>(.*)</td>\s*<td id='KundenNr'>(.*)</td>\s*<td id= +'RechnungsDatum'>(.*)</td>\s*<td id='RechnungsBetragAktuell'>(.*)</td +>\s*<td id='ZahlungsBetragAktuell'>(.*)</td>\s*<td id='OffenePosten'> +(.*)</td>

      There are a lot of nice modules in CPAN that will do your extraction in a more robust way-- i.e. they won't break if the maker of the table makes small changes in the text.

      Some places to start:
      HTML::TableExtract
      HTML::TreeParser
      HTML::TokeParser

      Unless you're trying to do something really out there (and maybe even then), someone has probably already solved more than half of your problem and posted a module that does it reliably.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://956411]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (10)
As of 2014-04-23 10:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (541 votes), past polls