Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Help with a regular expression for file name parsing

by BrowserUk (Pope)
on Dec 07, 2011 at 07:11 UTC ( #942168=note: print w/ replies, xml ) Need Help??


in reply to Help with a regular expression for file name parsing

This works with the samples supplied:

print $data;; #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive? print for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\s]g;; "some file" 'another file' yet\ another\ file

Spreading that out a bit:

m[ \@include \s ## the introducer followed by a space ( ## capture '[^']+' ## A single quoted string with no embedded single + quotes | ## or "[^"]+" ## a double quoted string with no embedded double + quotes | ## or .+? (?<!\\) ## a min length string that ends in a space that +isn't escaped ) \s ]gx;;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


Comment on Re: Help with a regular expression for file name parsing
Select or Download Code
Re^2: Help with a regular expression for file name parsing
by TJPride (Pilgrim) on Dec 07, 2011 at 11:30 UTC
    Your regular expression works, but the code is rather a muddle. Here's a version that he can use to test with:

    $data = join '', <DATA>; print "$_\n" for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\s] +g; __DATA__ #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive?
      When tested with this version, the output is just 1
        Are you sure you copied it correctly? I get this:
        "some file" 'another file' yet\ another\ file
Re^2: Help with a regular expression for file name parsing
by bontchev (Sexton) on Dec 07, 2011 at 12:01 UTC

    I am sorry, but I can't make sense of your answer. :-( If the part marked as "code" is supposed to be a script that works - well, it doesn't; it just produces a bunch of errors.

    But let's concentrate just on the regular expression, because this is what I asked for. Sadly, that doesn't work, either. :-(

    Let's start with something easy:

    my $data = "\@include test"; if ($data =~ /\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s+/g) { print "File name: \"$1\"\n"; }

    This doesn't output anything at all, meaning that the parsing fails.

    If we set

    my $data = "\@include \'test test\'";

    this outputs

    File name: "'test"

    which is totally wrong. It should output

    File name: "test test"

    If we try

    my $data = "\@include \"test test\"";

    this produces the similarly wrong

    File name: ""test"

    And finally, if we try

    my $data = "\@include test\\ test";

    it also produces no output, meaning that the matching fails

    Any better suggestions?

      Any better suggestions?

      Learn to copy paste better :) because the regex you're using, isn't the same one BrowserUk posted

      His regex works, despite him posting the code in the context of his REPL (Read Eval Print Loop), see RFC: IPerl - Interactive Perl ( read-eval-print loop ), Re^6: RFC: IPerl - Interactive Perl ( read-eval-print loop ) (x)

      I checked

      #!/usr/bin/perl -- #~ 2011-12-07-04:10:56PDT by Anonymous Monk #~ perltidy -csc -otr -opr -ce -nibc -i=4 use strict; use warnings; use autodie; # dies if open/close... fail Main( @ARGV ); exit( 0 ); sub Main { if ( @_ == 2 ) { NotDemoMeaningfulName(@_); } else { Demo(); print '#' x 33 ,"\n", Usage(); } } ## end sub Main sub NotDemoMeaningfulName { my ( $inputFile, $outputFile ) = @_; open my ($inFh), '<', $inputFile; open my ($outFh), '>', $outputFile; while( defined( my $data = <$inFh>) ){ print $outFh "$_\n" for $data =~ m[\@include\s('[^']+'|"[^"]+"|.+?(?<!\\))\s]g +; # /\@include\s+('[^']+'|"[^"]+"|.+?(?<!\\))\s+ +/g } close $inFh; close $outFh; } ## end sub NotDemoMeaningfulName sub Usage { <<"__USAGE__"; $0 $0 dataFile newDataFile __USAGE__ } ## end sub Usage sub Demo { my ( $Input, $WantedOutput ) = DemoData(); NotDemoMeaningfulName( \$Input, \my $Output ); require Test::More; Test::More::is( $Output, $WantedOutput, ' NotDemoMeaningfulName Works Aas Designed' ); Test::More::done_testing(); print "\n$Output\n"; } ## end sub Demo sub DemoData { #~ http://perlmonks... my $One = <<'__One__'; @include test #some "random stuff" @include "some file" did you parse that? #more 'random' stuff @include 'another file' you sure? #and more random stuff @include yet\ another\ file positive? __One__ #~ http://perlmonks... my $Two = <<'__Two__'; test "some file" 'another file' yet\ another\ file __Two__ return $One, $Two; } ## end sub DemoData __END__ $ perl pm.re.942167.pl ok 1 - NotDemoMeaningfulName Works Aas Designed 1..1 test "some file" 'another file' yet\ another\ file ################################# pm.re.942167.pl pm.re.942167.pl dataFile newDataFile

        You checked what? I asked for a regular expression - not for three pages of code and a link to somebody's totally irrelevant custom module!

        Meanwhile, I figured out that the horrible BBoard software which this site uses simply mangles the posted code and to get the real stuff, you have to click on the "download" linke (which doesn't download!) and cut-and-paste from the page that opens.

        So, I managed to make TJPride's script output something meaningful:

        "some file" 'another file' yet\ another\ file

        Unfortunately, it is also wrong. This is not the proper output. The proper output, which such data, would be:

        some file another file yet another file

        Furthermore, I don't want all the stuff loaded into some kind of an array and everything in that array matched simultaneously. I have to do other stuff with each line, you know? I want to process the file line-by-line and, for this particular problem, I need a regular expression that fetches the file name after the "@include" keyword.

        Let's make this very simple, shall we? Consider this snippet of code:

        my $data = SOME STRING; if ($data =~ SOME EXPRESSION) { print "File name: \"$_\"\n"; }

        Question: What should SOME EXPRESSION be, so that if SOME STRING is

        "\@include file"

        or

        "\@include \"some file\""

        or

        "\@include another\\ file"

        the output is respectively

        File name: "file"

        or

        File name: "some file"

        or

        File name: "another file"

        Please do not answer, unless you have tested that your answer actually produces the desired output.

      Any better suggestions?

      For you, no. At least none that would be considered polite.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        For you, no. At least none that would be considered polite.

        Too bad. Then either you or I have come to the wrong place.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://942168]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (12)
As of 2014-08-21 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (128 votes), past polls