Re^2: How to grab a portion of file with regex

http://www.perlmonks.org?node_id=1023593

in reply to Re: How to grab a portion of file with regex
in thread How to grab a portion of file with regex

Instead as swkronenfeld pointed out its better to use the CPAN module HTML::Parser

Not by much, HTML::Parser is very low-level, use a DOM parser supporting xpaths

Comment on Re^2: How to grab a portion of file with regex

Replies are listed 'Best First'.
Re^3: How to grab a portion of file with regex by 7stud (Deacon) on Mar 15, 2013 at 03:37 UTC
And for html files that are 9,000 GB's in size?	[reply]
Re^4: How to grab a portion of file with regex by kielstirling (Scribe) on Mar 15, 2013 at 03:53 UTC
Always limits to everything. I must remind you that I am not the one wanting to parse HTML. I am simply trying to offer guidance. I understand that HTML parsing is a hot topic. However, as a solution to the question asked HTML::Parser works fine.	[reply]
Re^4: How to grab a portion of file with regex by Anonymous Monk on Mar 15, 2013 at 03:58 UTC
And for html files that are 9,000 GB's in size? Nevermind that that 9k-GB html-files don't exit, you can still use XML::Twig, naturally	[reply]
Re^3: How to grab a portion of file with regex by kielstirling (Scribe) on Mar 15, 2013 at 02:46 UTC
Well instead of trolling why not supply a working example to help ?? Its always the Anonymous Monk lacking courage to put a name to a comment	[reply]
Re^4: How to grab a portion of file with regex by Anonymous Monk on Mar 15, 2013 at 04:00 UTC
Well instead of trolling why not supply a working example to help ?? Its always the Anonymous Monk lacking courage to put a name to a comment How is it trolling to point out the shortcomings of a "solution"? Maybe you should look up the definition of troll What courage is required to point out a simple fact about HTML::Parser? Are you under the impression that HTML::Parser is a high level parser? Your "solution" doesn't fetch the portion of page from class = lastUnit to class = line margin10 -- its incomplete -- it is lots easier/shorter/simpler to use `m{\Q$start\E(.+?)\Q$end\E}i` instead of that HTML::Parser low-levelness Have you seen Re: How to grab a portion of file with regex (don't)? Its not unlike a minimum of three different tutorials/walkthroughs/step-by-step-instructions on extracting/xpathing the dom , some even compare/contrast with HTML::Parser	[reply] [d/l]
Re^5: How to grab a portion of file with regex by kielstirling (Scribe) on Mar 15, 2013 at 04:22 UTC
You make some valid points. The example in the question didn't seem to need the content of the div. I do agree that working with the DOM is a much better way to parse HTML.	[reply]
Re^6: How to grab a portion of file with regex by Anonymous Monk on Mar 15, 2013 at 06:39 UTC

In Section Seekers of Perl Wisdom