http://www.perlmonks.org?node_id=1026848


in reply to Regex Split and Formatting

Read perlre. If you're in a rush read perlrequick. And if you find the learning curve too steep, take a step back and start with perlretut

Note, this isn't perfect. It assumes your end of line character is a linefeed. It also doesn't handle linefeeds in the 4th field. But then in the sample provided, it wasn't significant.

use strict; use warnings; my $data; localscope: { local $/; $data = <DATA>; my $i=1; while ($data =~ /\G((?:(?:[^,]*),){3}(?:[^\n]*\n))/g) { my $entry = $1; print "$i $entry"; $i++; } } 1; __DATA__ "123", "DEF123","this is test","C:\Abhinav\test.jpg" "456", "DEF456","this is test","C:\Matt\test.jpg" "726", "DEF726","this is test","C:\Matt\test.jpg"

Replies are listed 'Best First'.
Re^2: Regex Split and Formatting
by thirdm (Sexton) on Apr 03, 2013 at 17:39 UTC
    I would also recommend Jeffrey Friedl's Mastering Regular Expressions book. I haven't read all those pods to the end yet so perhaps they cover all that's needed, but I can say his book helped me a great deal and touches specifically on some of the issues here (assuming you can't be convinced to use a module). For instance, I didn't properly understand how to match across newlines (or, sadly, even that newline is one of the things \s matches -- yikes) or what the /m and /s (and /ms) qualifiers do exactly until I read his book. The information was probably there in the perl pods but my eye must have glazed over it or struggled with the wording.

    It also deals with how to match within quotes (and how to do so efficiently), to the point where Damian Conway's Perl 6 Exegesis 5 document even refers to a certain kind of regex as being "Friedl style".

    If the latest edition is too long for you, the one I read (recently) was the 1st edition, and I can say that it's still valuable, even if it does leave out some newer Perl regex features. Just in case, I read parts of the chapters on dfas and nfas in a newer edition from the library and I think perhaps the coverage there was flushed out and improved some (have to say the car analogy is the single thing I dislike about the book -- maybe I'd be less bothered if it was a bicycle analogy, I dunno), but the bulk of the size increase seemed to come from covering more regex flavours from more languages (ones that I don't personally care about as much as the ones in the 1st edition). Someone correct me if this is a poor impression to put out into the world.