multi line regex

metalfan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multi line regex by Happy-the-monk (Canon) on Jan 09, 2006 at 14:39 UTC
You read the file into an array of lines, then you compare single lines against something that suspiciously looks like a multi-line-structure. You probably have to slurp the whole file into the scalar and not use the for loop: `my $entry = do { local $/; <INFO> }; # slurp the whole file` To which Corion adds: with the `/x` modifier you will have to match all the whitespace explicitly, meaning to put `\s+` here and there and everywhere you might expect whitespace. Cheers, Sören	[reply] [d/l]
Re: multi line regex by matija (Priest) on Jan 09, 2006 at 14:54 UTC
This is wrong in so many ways. First of all, you're parsing HTML with a regex. Don't do that. Use HTML::Parser instead. Otherwise, there are just too many ways in which you can be tripped - tags with extra white space, tags with newlines, quotes missing or present in unexpected places, escaping of this, that or the other thing, javascript code fooling you into thinking you're in another tag when you really aren't, etc. Second, you're trying to extract data from an HTML table using regex. Don't do that. Use HTML::TableExtract instead. It will save you a LOT of hairpulling.	[reply]
Re^2: multi line regex by metalfan (Novice) on Jan 18, 2006 at 17:51 UTC
looks good, sorry for this question: but how can i use this to do geht the word in the first column? 1.column \| 2.column english word \| german word .... thx for help	[reply]
Re^3: multi line regex by matija (Priest) on Jan 18, 2006 at 23:02 UTC
Read the manual pages for HTML::TableExtract - once it parses the table, the first column will be the first element of the row array.	[reply]
Re: multi line regex by murugu (Curate) on Jan 09, 2006 at 15:00 UTC
Hi, It wont match. Why because you are having the contents of the html file in an array, where each element contains a line each from the html file you read. But you are matching with something else. Its better to use HTML::TokeParser to parse the html file and get the attribute values. Regards, Murugesan Kandasamy use perl for(;;);	[reply]
Re^2: multi line regex by ishnid (Monk) on Jan 09, 2006 at 15:46 UTC
Another alternative is HTML::TokeParser::Simple. Whichever HTML parsing module you eventually choose will obviously be up to yourself. The main point is that you should definitely, definitely use one of them, rather than trying to create custom regexps.	[reply]
Re: multi line regex by ptum (Priest) on Jan 09, 2006 at 14:41 UTC
Hi, [id://metalfan]. It looks to me as though you're reading the contents of your file into an array (@file) and then you're processing each line of the file one line at a time. If you want the whole file to be read in by <INFO> into a single scalar variable, then I think you need to set $/ (the input record separator) to something other than \n (like ''), and replace @file with $file.	[reply]
Re: multi line regex by Perl Mouse (Chaplain) on Jan 09, 2006 at 14:40 UTC
Because either the regex matches something different than you expect, or line you match against contains something different than you expect. What does the line you match against contain, and do you expect the regexp to match, or to fail? `Perl --((8:>*`	[reply]


Just another Perl shrine
	PerlMonks