Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

non-greedy piecewise matching

by mifflin (Curate)
on Aug 02, 2007 at 16:50 UTC ( #630328=perlquestion: print w/ replies, xml ) Need Help??
mifflin has asked for the wisdom of the Perl Monks concerning the following question:

I have some data files that were created without newlines that need to be fixed.
The files contain a bunch of records with an xml file name at the end.
They look like...

somedata file.xmlsomedata file.xmlsomedatafile.xml ....

what I want them to be is ...

somedata file.xml
somedata file.xml
somedata file.xml
...

So , i thought I could use a piecewize regex like so...
pos $data = 0; my $len = length $data; while (pos $data < $len) { if ( my ($line) = $data =~ m{ \G ( .+ \. xml ) }gcxms ) { print "$line\n"; } }
The problem is I cannot figure out how to make the regex non-greedy. My capturing portion matches the full string, all the way to the last xml file. How to I change the regex to be non-greedy and match up to the first xml file?

Comment on non-greedy piecewise matching
Download Code
Re: non-greedy piecewise matching
by NetWallah (Abbot) on Aug 02, 2007 at 16:56 UTC
    The non-greedy version of "+" is "+?". (perlreref).

         "An undefined problem has an infinite number of solutions." - Robert A. Humphrey         "If you're not part of the solution, you're part of the precipitate." - Henry J. Tillman

      thanks, that worked...
      > cat x my $data = 'jlasflsf.xmljlasjlkjlasjflsdf.xmlklajlajlsdfjkl.xml'; while (pos $data < length $data) { if ( $data =~ m{ \G ( .+? \. xml) }gcxms ) { print "$1\n"; } } > perl x jlasflsf.xml jlasjlkjlasjflsdf.xml klajlajlsdfjkl.xml
Re: non-greedy piecewise matching
by FunkyMonk (Canon) on Aug 02, 2007 at 16:58 UTC

    What's wrong with the much simpler s/\.xml/.xml\n/g?

      Nothing, in fact , that's the way i did it because I needed to get the files fixed now. I was just trying out piecewise matching becuse I've never done it before.
      I've been reading "Perl Best Practices" and was seeing if I could implement something like what was shown on pages 257-258.
Re: non-greedy piecewise matching
by prasadbabu (Prior) on Aug 02, 2007 at 17:03 UTC

    Hi mifflin,

    You have to use '.+?' instead of '.+' to make non-greediness. Take a look at perlre.

    As you said, if .xml is present after each records, then we can also use substitution or split function.

    $file =~ s/(\.xml)/$1\n/g;

    Prasad

Re: non-greedy piecewise matching
by ikegami (Pope) on Aug 02, 2007 at 17:40 UTC

    The greediness is just your first problem.

    Problem #2: You're using the g modifier in list context, causing all the matches to be returned at once. You'll never print anything other than the first file name.

    pos $data = 0; my $len = length $data; while (pos $data < $len) { if ( $data =~ m{ \G ( .+? \. xml ) }gcxms ) { print "$1\n"; } }

    Problem #3: If there's anything after the last .xml, you have yourself an infinite loop. Checking if pos is less then length is a bad idea when using the c modifier. Fix:

    pos $data = 0; for (;;) { $data =~ m{ \G ( .+? \. xml ) }gcxms or last; print "$1\n"; }

    Finally: Using the c modifier is rather useless, ugly if you only have one regexp, and it's rather complex (as shown by the number of errors). Fix:

    while ( $data =~ m{ \G ( .+? \. xml ) }gxms ) { print "$1\n"; }

    Tip: If you really did have a use for c (e.g. if you were writting a lexer), then you'd have multiple regexps, and aliasing $_ to the variable containing the text would be worthwhile.

    for ($data) { pos() = 0 for (;;) { /\G ... /xgc && do { ...; next }; /\G ... /xgc && do { ...; next }; /\G ... /xgc && do { ...; next }; last; }
      gads!
      Now I know why the damian put the following quote at the begining of his chapter...

      Some people, when confronted with a problem, think:
      "I know, I'll use regular expressions".
      Now they have two problems.
      -- Jamie Zawinski

      Thanks.

Re: non-greedy piecewise matching
by roboticus (Chancellor) on Aug 02, 2007 at 23:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://630328]
Approved by prasadbabu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2015-07-03 17:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (55 votes), past polls