Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Help with a regular expression for file name parsing

by TJPride (Pilgrim)
on Dec 07, 2011 at 14:12 UTC ( #942252=note: print w/ replies, xml ) Need Help??


in reply to Help with a regular expression for file name parsing

There are really two parts to this. The first is to match the three patterns; the second to eliminate the unwanted wrapper or backslash characters. I tried to figure out a regex that would do both at once, but it's either impossible or my knowledge of regex isn't up to the task. So I cheated.

use strict; use warnings; my $data = join '', <DATA>; my $file; while ($data =~ m/\@include (".*?"|'.*?'|(?:[^\s\\]|\\ )+)/g) { $file = $1; $file =~ s/["'\\]+//g; print "$file\n"; } __DATA__ #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive?

CAVEAT: Assumes that ", ', and \ will never appear within filenames themselves. If they can, this gets much more complex.


Comment on Re: Help with a regular expression for file name parsing
Download Code
Re^2: Help with a regular expression for file name parsing
by bontchev (Sexton) on Dec 09, 2011 at 08:22 UTC

    Thanks, you've been the most helpful one so far. Sadly, the above solution also doesn't solve the problem properly. However, I managed to combine it with another of the regular expressions that was proposed, plus some code for better resolving the escape sequences in the string, plus a better way of removing the quotes (only from the ends of the string - not from everywhere).

    Here is what I managed to come up with:

    use strict;
    use warnings;
    
    while (my $data = <DATA>)
    {
    	if ($data =~ /\@include/i)
    	{
    		$data =~ m/\@include\s+('^'+'|"^"+"|.+?(?<!\\))\s/gi;
    		my $fname = $1;
    		$fname =~ s/\\(rnt'"\\ )/"qq|\\$1|"/gee;
    		$fname =~ s/^"(.*)"$/$1/s or
    		$fname =~ s/^'(.*)'$/$1/s;
    		print "File name: <$fname>\n";
    	}
    }
    
    __DATA__
    #some "random stuff" @include 	"some file" did you parse that?
    #more 'random' stuff @include 'another file' you sure?
    #and more random stuff @include yet\ another\ file positive?
    #@Include file
    #	@include		"\"another one\""	hmmm...
    # some stuff

    The "if" is there because, as I've mentioned above, I have to do some other processing of the lines, too. This code mostly works although, as you say, it doesn't handle properly file names containing escaped quotes.

    Perhaps I should give up the idea of parsing this in some clever way and just process the part after the "@include" character-by-character?

      Sigh, the site mangled the code I posted. :-( I guess I've used the wrong tag. Let's try again:

      use strict; use warnings; while (my $data = <DATA>) { if ($data =~ /\@include/i) { $data =~ m/\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s/gi; my $fname = $1; $fname =~ s/\\([rnt'"\\ ])/"qq|\\$1|"/gee; $fname =~ s/^"(.*)"$/$1/s or $fname =~ s/^'(.*)'$/$1/s; print "File name: <$fname>\n"; } } __DATA__ #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive? #@Include file # @include "\"another one\"" hmmm... # some stuff
        m/\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s/gi;

        Ah. So my regex wasn't so useless to you after all.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://942252]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-10-21 04:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (95 votes), past polls