http://www.perlmonks.org?node_id=1225519

nursyza has asked for the wisdom of the Perl Monks concerning the following question:

I'm a newbie in Perl. I need some help with this coding.

MODULE C17 (N1, N2, N3, N6, N7, N22, N23);

Based on the line above, I want to program so that it will print out only the name "MODULE C17".

Here is the coding that I tried but it prints out the whole line.

my $MODULE_NAME = $_; if (defined($MODULE_NAME) && ($MODULE_NAME =~ /MODULE (.*);/)) { my $module_name = $1; print "Module name = $module_name\n"; }

Replies are listed 'Best First'.
Re: Pattern matching
by Athanasius (Archbishop) on Nov 10, 2018 at 08:54 UTC

    Hello nursyza, and welcome to the Monastery!

    Parentheses within a regex are for capturing a match. To match a literal left parenthesis, you have to escape it:

    use strict; use warnings; my $MODULE_NAME = 'MODULE C17 (N1, N2, N3, N6, N7, N22, N23)'; if (defined($MODULE_NAME) && ($MODULE_NAME =~ / ^ (.+) \s+ \( /x)) { my $module_name = $1; print "Module name = >$module_name<\n"; }

    Output:

    18:49 >perl 1939_SoPW.pl Module name = >MODULE C17< 18:49 >

    Note: The /x modifier is used here to make the regex more readable. The regex says: match the beginning of a line (^), then capture as many characters as possible, providing that these captured characters are followed by (a) one or more spaces, then (b) an opening (left) parenthesis.

    Update: My use of \s+ above is sub-optimal, because it matches only the last whitespace character following the module name. Better would be:

    if (defined($MODULE_NAME) && ($MODULE_NAME =~ / ^ (.+) \( /x)) { my $module_name = $1; $module_name =~ s/ \s+ $ //x; print "Module name = >$module_name<\n"; }

    which explicitly removes trailing whitespace from the module name.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Sorry i might forget to mention that the line MODULE C17 (N1, N2, N3, N6, N7, N22, N23) is actually obtained from a text file.

      MODULE C17 (N1, N2, N3, N6, N7, N22, N23); INPUT N1, N2, N3, N6, N7; OUTPUT N22, N23; WIRE N10, N11, N16, N19; NAND NAND2_1 (N10, N1, N3); NAND NAND2_2 (N11, N3, N6); NAND NAND2_3 (N16, N2, N11); NAND NAND2_4 (N19, N11, N7); NAND NAND2_5 (N22, N10, N16); NAND NAND2_6 (N23, N16, N19); ENDMODULE

      I tried your coding but it also prints the NAND line.

        Unfortunately, what you have provided now is still not a complete spec. You might want any of these things:

        • Just process the first line of the file.
        • Just process the first line which has a bracket.
        • Process every line which starts with an M.
        • Process every line which starts with an M and contains a bracket.
        • ...

        So, your first task is to create a tight specification. After that you can devise an algorithm and only then start to consider the coding.

Re: Pattern matching
by parv (Parson) on Nov 10, 2018 at 09:02 UTC

    Above could not possibly "print[s] the whole line" for "MODULE " will be missing. As you are capturing any- & everything after "MODULE " with "(.*)", so you would get "C17 (N...)" string. Capture what you actually want.

    Surely there must be some other patterns besides one specific case? What are they?

    In this specific case, try ...

    m{ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] }x

      Thank you. And btw can you explain to me how to read the patterns you wrote?

        m{ # Word boundary. \b # Start capture of pattern matched; ( # literal string "MODULE", MODULE # one or more space characters, \s+ # one or more A-Z letters (represented as character class), [A-Z]+ # one or more 0-9 digits, [0-9]+ # stop capture. ) # Zero or more space characters. \s* # 1-element character class, or "escaped" "(" (not start of captur +e); [(] # any & everything until ... .+? # ... literal ")". [)] }x # /x flag allows to expand the regex as you see above & mentio +ned elsewhere.
        ... can you explain to me how to read the patterns ...

        Because parv's regex contains nothing that is not supported by Perl version 5.6, the YAPE::Regex::Explain module can help.

        c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; my $rx = qr{ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] }x; ;; print YAPE::Regex::Explain->new($rx)->explain; " The regular expression: (?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------


        Give a man a fish:  <%-{-{-{-<

        Hi nursyza,

        I see that parv already provided you with an explanation of the regex pattern for you. I wanted to let you know that you can use the YAPE::Regex::Explain module to provide an explanation of any regular expression pattern. Once you have the package installed you can do something like this at the command line to get the explanation for your pattern

        You may also want to look at perlre to get more familiar with regular expressions.

        UPDATE: As parv, soonix, and AnomalousMonk pointed out (in the replies to this node), the above usage of YAPE::Regex::Explain is not correct. Passing the regex as a double-quoted string caused problems.

        The following code gives the correct output

        #!/usr/bin/env perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] /x; my $exp = YAPE::Regex::Explain->new($re)->explain; print $exp; exit;
        Here is the output
        The regular expression: (?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      It works now. Thank you!

Re: Pattern matching
by haj (Vicar) on Nov 10, 2018 at 09:03 UTC

    Welcome to the monastery!

    In Perl, the parentheses define what you are capturing. With (.*) you are capturing everything up to the semicolon, therefore you get everything except the leading MODULE into $1. Depending on how your input varies, you could use one of the following:

    my $MODULE_NAME = 'MODULE C17 (N1, N2, N3, N6, N7, N22, N23);'; # Capture MODULE and the first "word" after MODULE if (defined($MODULE_NAME) && ($MODULE_NAME =~ /(MODULE \w+) /)) # Capture everything before the opening parenthesis, without spaces # if (defined($MODULE_NAME) && ($MODULE_NAME =~ /^(.*?)\s+\(/)) { my $module_name = $1; print "Module name = $module_name\n"; }

    Note that in the second regex I had to escape the opening parenthesis to tell Perl that this is the character ( and not the start of a new capturing group.

Re: Pattern matching
by Laurent_R (Canon) on Nov 10, 2018 at 10:59 UTC
    Hi nursyza,

    You've received good answers already, but I would add that the syntax could be simplified. Assuming that your target string is in the $_ default variable because you're reading the file in a while loop (for example a while (<$IN>) { # ... construct, assuming $IN is your file descriptor), then you don't need to test for "definedness," because the while loop already does that check. You also don't need the extra $MODULE_NAME variable; using directly the $_ variable makes the regex syntax simpler (if you don't specify any target string, a regex will by default match against $_). So putting this together, you could have something like this:

    print "Module name = $1\n" if /(MODULE\s+\w+)\s*\(/; # Prints: Modul +e name = MODULE C17
    Or, using the /x modifier already described by other monks:
    print "Module name = $1\n" if /( MODULE \s+ \w+ ) \s* \(/x;