Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Regular Expression

by pramod (Initiate)
on Jun 27, 2012 at 11:37 UTC ( #978622=perlquestion: print w/replies, xml ) Need Help??
pramod has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to read a file containing the following strings: "Width = 32" and "Descr - This is Register1 comment"

and extract the Width = 32 from file, but my code fails to match.

File contains:

Width = 32 <uart0_rx_data : 16'h0000> Descr - "This is Register1 comment" # f_name bit_pos RESERVED 31:8 RXDATA 5:0 </uart0_rx_data>

Code snippet:

if($line =~ m/^Width[\s]*\=[\s]*\d+$/) # to match Width { update_curs ($line); } elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) # to match Descr { update_curs ($line); } else { printf "Garbage found: \"%s\" \n",$line; }

Output: Garbage found: "Width = 32"

Kindly help.

UPDATE: Thanks. I have removed extra spaces and new line character in my code. When I run the code individually it works, but when integrated with main package it doesn't work. As tobyink mentioned "Descr" is not matched.

sub validate_save_regs { my $line = $_; chomp $line; if($line =~ m/^Width[\s]*\=[\s]*\d+$/){ $reg_width_found = 1; update_curs ($line); } elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) { printf "inside elsif, line is %s\n",$line; if(!$reg_offset_found) { printf "Garbage found at: %d \n", $line_no; exit(0); } $name_found = 1; update_curs ($line); } else { printf "Garbage found %s\n", $line; } }

In this integrated code "Descr - This is Register1 comment" doesn't match and goes to Garbage rule.

Replies are listed 'Best First'.
Re: Regular Expression
by moritz (Cardinal) on Jun 27, 2012 at 11:52 UTC

    I cannot reproduce your problem here. On my machine I ran this code:

    use strict; use warnings; my $line = 'Width = 32'; if($line =~ m/^Width[\s]*\=[\s]*(\d+)$/) # to match Width { print "Width found: $1\n" } elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) # to match Descr { print 'Descr found' } else { printf "Garbage found: \"%s\" \n",$line; }

    And it prints out Width found: 32

    If it doesn't match for you, maybe $line contains some non-printable characters (maybe because the file is stored in UTF-16?)

    To find out, you can use something like

    use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper $line;
Re: Regular Expression
by tobyink (Abbot) on Jun 27, 2012 at 11:48 UTC

    The data you posted, does correctly match your regular expression for width. So if it's not working for you, then there's probably something else important in your code that you've neglected to tell us.

    One possibility is that your input data actually contains trailing whitespace on the Width line. You don't account for leading or trailing whitespace in your regular expression.

    The Descr regular expression isn't going to match though.

    Example follows. Don't copy and paste because it may be whitespace-sensitive. Use the "download" link which PerlMonks provides underneath the code...

    while (<DATA>) { chomp; my $line = $_; if ($line =~ m/^Width[\s]*\=[\s]*\d+$/) { printf "Found width: \"%s\"\n", $line; } elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) { printf "Found descr: \"%s\"\n", $line; } else { printf "Garbage found: \"%s\" \n",$line; } } __DATA__ Width = 32 <uart0_rx_data : 16'h0000> Descr - "This is Register1 comment" # f_name bit_pos RESERVED 31:8 RXDATA 5:0 </uart0_rx_data>
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Regular Expression
by roboticus (Chancellor) on Jun 27, 2012 at 13:26 UTC


    I notice your regex includes '$' with no '\s*' before it. Perhaps you're forgetting to chomp your input lines? That could easily make your regular expression fail.


    When your only tool is a hammer, all problems look like your thumb.

      awesome sir

Re: Regular Expression
by ckj (Chaplain) on Jun 27, 2012 at 12:00 UTC
    Well you can go for this, this change is required in your first line of code:
    if($line =~ m/^Width(\s*)\=(\s*)(\d*)/)#to match width
    Please go through Regular Expressions tutorials for a better understanding, these are very simple search techniques This is my program
    $line = 'Width = 32 <uart0_rx_data : 16\'h0000> Descr - "This is Regis +ter1 comment" f_name bit_pos RESERVED 31:8 RXDATA 5:0 + </uart0_rx_data>'; if($line =~ m/^Width(\s*)\=(\s*)(\d*)/) # to match Width { print "something"; } elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) # to match Descr { print "noone"; } else { printf "Garbage found: \"%s\" \n",$line; }
    UPDATE:: So, WW I haven't said that this is the only way there are lots of way to do things in perl. Secondly, I didn't say that what you need to print the width value. Yes $3 will give the exact output, while you are completely wrong on the statement that it will not match ) after equal to. Check it and then say.
      Capturing the zero-or-more-spaces surrounding the equals symbol is NOT helpful... and means OP, were your suggestion adopted, would have to use $3 for the value of width.

      Moreover, if there were zero-digits after the equals symbol, the data would not match OP's explicit example and implicit spec.

      Update re ckj's update which is MOSTLY WRONG. Reread the above.

      (\d+) captures one or more digits. (\d*) matches ZERO digits which is useless.

      perl -e "use 5.014; my $foo= 'bc'; if ($foo = /(\d*)/ ) {say 'match'; +}else{ say 'Duh!';}" match

      OP does not ask to capture the spaces. There is no need to capture the spaces to account for their quantity; 0, 1, 2 or whatever.
      Your parens are capturing. Grouping (aka 'non-capturing') parens are (?:...).

      Nothing was said about print.
      And "not helpful" ne "not match."

      Please read carefully before unburdening yourself of mis-statements.

      Despite it's irrelevance here, though, you did get one aspect of your reply partially right: "TIMTOWTDI." But -- as the rest of that wisdom goes - 'and most of them are wrong.'


Re: Regular Expression
by prashantktyagi (Scribe) on Jun 27, 2012 at 11:47 UTC
    Width=`grep ^Width <filename> | sed 's/Width=//'` Descr=`grep ^Descr <filename> | sed 's/Descr-//'`

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://978622]
Approved by moritz
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2018-05-27 16:32 GMT
Find Nodes?
    Voting Booth?