http://www.perlmonks.org?node_id=632064


in reply to Re: Problem in pattern matching with alternation
in thread Problem in pattern matching with alternation

I have to match extract the fallowing storage path from the lines as below
/usr/add-on/puccase_vob01/ccvob01/bt_rel.vbs public
/usr/add-on/puccase_vob01/ccvob01/scm.vbs
/usr/add-on/puccase_vob01/ccvob01/v_dialerclient_rel.vbs
The arguments means vobtags are as fallows
/vobs/bt_rel
/scm
/vobs/bt_rel
/v_dialerclient_rel
The script is working fine for the tags"/vobs/bt_rel"
"/vobs/UMTools" but its not working for tags /scm,/v_dialer which having only one slash(\) in it
i am using alternation operator in between the match,its not working,i unable to find the mistake after investing an hour for this task.
I think by this detail u may help me to find the mistake in my script
  • Comment on Re^2: Problem in pattern matching with alternation

Replies are listed 'Best First'.
Re^3: Problem in pattern matching with alternation
by graff (Chancellor) on Aug 12, 2007 at 15:34 UTC
    I appreciate that you are trying, but you are still not making any sense. You are not using "code" tags enough in your posts, you are not giving us anything we can try to run ourselves to demonstrate your problem, and you keep confusing "slash" with "\".

    I'll propose the following, which is based on code and data in the original post at the top of this thread. Please try this out, tell us whether it works for you, and if it doesn't (and you are still stumped about how to make it work), tell us exactly how it should work.

    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { print "$1\n" if ( m{^\*\s+(?:/\w+)+\s+(.+?)\s+} ); } __DATA__ * /vobs/bt_rel /usr/add-on/puccase_vob01/ccvob01/bt_rel.vbs public (re +plicated) * /scm /usr/add-on/puccase_vob01/ccvob01/scm.vbs public (replicated) * /v_dialermidtier /usr/addon/puccase_vob01/ccvob01/v_dialermidtier.vb +s public (replicated) * /v_dialer /usr/add-on/puccase_vob01/ccvob01/v_dialer.vbs public (rep +licated) * /vobs/UMTools /user/addon/puccase_vob01/ccvob01/UMtools.vbs replicat +ed)
    The difference between that and the OP code is:
    • it loops over each line of input, instead of reporting only a single output from all lines of input
    • it escapes the initial "*" character in the regex, to avoid a syntax error
    • it puts curlies around the regex, to avoid \/ (toothpick syndrome)
    • it allows the first string following "*" to contain any number (1 or more) of adjacent "/word" patterns (but does not capture this string)

    If you are just trying to get the third space-delimited token from each line of input, you could do this for each line, instead of the regex match:

    print +(split /\s+/)[2]; # print third "word" of line
    (updated to include "+" outside the parens -- thanks, naikonta!)

    If the data shown above is not correct, show us the actual data (inside <code> tags, please). If the output is not what you want, show us exactly what you want (based on the correct input data, again using <code> tags).

      print (split /\s+/)[2];  # print third "word" of line
      Ooops, you forget the + :-) It gives me,
      syntax error at __FILE__ line __LINE__, near ")[" Execution of -e aborted due to compilation errors.

      Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      Hi graff,
      Here i am explaining in detail what actually i am trying to extract.
      I am working on a task to get the vob storage path from the list of vobtags.below is the command i am running on command prompt
      bash-3.00$cleartool lsvob -s #which will give all vobtag list as below.
      /vobs/cs_test_scripts /vobs/openwall_pam_userpass_rel /vobs/cs_cim /vobs/PIP_PVOB /vobs/pvob_ic /v_dialer3rdparty /v_dialerclient /v_dialer_rel /scm
      #!/usr/bin/perl -w $arg=$ARGV[0]; $cmd="cleartool lsvob $arg" $arr=`$cmd`; print "$arr\n";
      after running this script with argument as vobtag list that i got from running "cleartool lsvob -s" command
      like as bash-3.0$cleartool lsvob /vobs/cs_test_scripts from command prompt will give the result as below
      * /vobs/cs_test_scripts /usr/add-on/puccase_vob01/ccvob01/cs_test_scri +pts.vbs public (replicated) * /scm /usr/add-on/puccase_vob01/ccvob01/scm.test.vbs
      the output contains three parts */vobs/cs_test_script,*/scm are as vobtags
      /usr/add-on/puccase_vob01/ccvob01/cs_test_scripts.vbs is vobstorage directory path
      and last part is type as "public (replicated)".
      from this i want to extract vobstorage path of each vobtag that i am supling vobtag(/vobs/cs_test_script)as argument to the below script,in that i am matching the output for the tags with 2 slashes in it ex:/vobs/cs_test_script and
      vobtag with one slash in it ex:/scm
      with regular expression that has to match the output for both type of vobtags(/vobs/cs_test_script,/scm)by using alternation operator in patternmatching as below script
      #!/usr/bin/perl $arg=$ARGV[0]; $cmd1="cleartool lsvob $arg"; $arr=`$cmd1`; print "$arr\n"; $storage1=$1 if($arr=~/^*\s+\/\w+\/\w+\s+(.+)\s+\w+/); print "$storage1\n"; $storage2=$1 if($arr=~/^*\s+\/\w+\s+(.+)\s+\w+/); print "$storage2\n"; $storage3=$1 if($arr=~/^*\s+\/\w+\/\w+\s+(.+)\s+\w+|^*\s+\/\w+\s+(.+)\ +s+\w+/); print "$storage3\n";
      I am running the script with both type of vobtags as /vobs/cs_test_scripts and /scm as arguments so its printing the match for both $storage1 and $storage2 as below
      bash-3.00$ perl scriptname.pl /vobs/cs_test_script
      bash-3.00$ perl scriptname.pl /scm
      * /vobs/cs_test_scripts /usr/add-on/puccase_vob01/ccvob01/cs_test_scri +pts.vbs public (replicated) #match storage1 /usr/add-on/puccase_vob01/ccvob01/cs_test_scripts.vbs * /scm /usr/add-on/puccase_vob01/ccvob01/scm.vbs publi +c (replicated) #storage2 /usr/add-on/puccase_vob01/ccvob01/scm.vbs
      In $storage3 variable i am combining the pattern match of storage1 and storage2 by alternation operator,it should work for both type of vobtag arguments,i am not getting where i am doing wrong...
      Monks sorry for repeated posting of same qustion again and again...the problems seems to simple for u people ....
        Here i am explaining in detail what actually i am trying to extract.

        And you seem to be completely ignoring (or not understanding) the code sample and suggestions that I posted. If you don't understand something I've said or something in my code, it's okay to quote me in a reply and say that you don't understand.

        i am matching the output for the tags with 2 slashes in it ex:/vobs/cs_test_script and vobtag with one slash in it ex:/scm with regular expression that has to match the output for both type of vobtags(/vobs/cs_test_script,/scm)by using alternation operator in pattern matching as below script

        Checking your code, I have learned something. I expected that if a regex that starts like this:  /^*\s+/ it would be a syntax error, and the script would not run at all. But having tried it, I see that it does run (it's not an error), and it even seems to work: $_="* foo"; m{^*\s+foo} returns true.

        Still, I prefer using a backslash when I want to match a literal "*" character. Note that  m{*\s+foo} is a syntax error.

        (And make sure you understand the distinction: slash is "/", backslash is "\", and the two have very different meanings and uses in perl.)

        i am combining the pattern match of storage1 and storage2 by alternation operator

        You don't need to use an alternation (|) in the regex. The code that I suggested above uses a quantifier, so that one, two, three or more slashes in the path string can be treated by the single expression -- shown here with commentary:

        m{^ # at the start of the string \* # match a literal asterisk character \s+ # then one or more whitespace characters (?: # begin a non-capturing group expression / # match a literal slash character \w+ # then one or more alphanumeric_word characters )+ # close the group, match 1 or more instances of that expr +ession \s+ # then one or more whitespace characters (\S+) # capture a group of non-whitespace characters }x # end of regex (x modifier lets comments and spacing be i +gnored)

        The last couple things that you are not paying attention to are: whether you need to be processing your data line by line, rather than slurping all the lines into a single "$arr" variable, and whether you would be better off using "split" instead of regex matches.

        That's better (I think). Your input data look like this:

        * /vobs/cs_test_scripts /usr/add-on/puccase_vob01/ccvob01/cs_test_scri +pts.vbs public (replicated) * /scm /usr/add-on/puccase_vob01/ccvob01/scm.test.vbs

        And your output should be:

        /usr/add-on/puccase_vob01/ccvob01/cs_test_scripts.vbs /usr/add-on/puccase_vob01/ccvob01/scm.test.vbs

        I wouldn't do this with regular expressions, I'd use split:

        my @data = ( "* /vobs/cs_test_scripts /usr/add-on/puccase_vob01/ccvob01/cs_test_s +cripts.vbs public (replicated)", "* /scm /usr/add-on/puccase_vob01/ccvob01/scm.test.vbs" ); for ( @data ) { my $path = (split)[2]; print "$path\n"; }
        Outputs:

        /usr/add-on/puccase_vob01/ccvob01/cs_test_scripts.vbs /usr/add-on/puccase_vob01/ccvob01/scm.test.vbs
        Is that what you're after?
Re^3: Problem in pattern matching with alternation
by naikonta (Curate) on Aug 12, 2007 at 15:14 UTC
    First of all, "/" is slash, "\" is backslash. Having two slashes in path (for Unix like) means that the path has two parts. Saying "2 '\\'" means (at least to me) there are two pairs of adjacent '\', that is "\\" and "\\" :-)

    OK, back to the problem...
    I tend to think that you want the last part of the path which you can use basename functionality. So,

    while (<DATA>) { chomp; (my $tag = (split)[0]) =~ s!.*(/.*)\.vbs$!$1!; print $tag, "\n"; } __DATA__ /usr/add-on/puccase_vob01/ccvob01/bt_rel.vbs public /usr/add-on/puccase_vob01/ccvob01/scm.vbs /usr/add-on/puccase_vob01/ccvob01/v_dialerclient_rel.vbs
    would result in
    /bt_rel /scm /v_dialerclient_rel
    But, I'm confused with /vobs/bt_rel. Where the /vobs part comes from?

    Update: The problem with your code is that you are trying to track the path level manually, and using regex complicates the situation.

    Update2: I just noticed that the leading asterisks are part of the lines, followed by some space(s) then the tags. Here is my modified code:

    $ cat extract-vobtags.pl while (<DATA>) { chomp; my($tag, $storage) = (split)[1,2]; printf "%20s: %s\n", $tag, $storage; } __DATA__ * /vobs/bt_rel /usr/add-on/puccase_vob01/ccvob01/bt_rel.vbs public (re +plicated) * /scm /usr/add-on/puccase_vob01/ccvob01/scm.vbs public (replicated) * /v_dialermidtier /usr/addon/puccase_vob01/ccvob01/v_dialermidtier.vb +s public (replicated) * /v_dialer /usr/add-on/puccase_vob01/ccvob01/v_dialer.vbs public (rep +licated) * /vobs/UMTools /user/addon/puccase_vob01/ccvob01/UMtools.vbs replicat +ed) $ perl extract-vobtags.pl /vobs/bt_rel: /usr/add-on/puccase_vob01/ccvob01/bt_rel.vbs /scm: /usr/add-on/puccase_vob01/ccvob01/scm.vbs /v_dialermidtier: /usr/addon/puccase_vob01/ccvob01/v_dialermidtier +.vbs /v_dialer: /usr/add-on/puccase_vob01/ccvob01/v_dialer.vbs /vobs/UMTools: /user/addon/puccase_vob01/ccvob01/UMtools.vbs

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!