Regular Expression

pramod has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to read a file containing the following strings: "Width = 32" and "Descr - This is Register1 comment"

and extract the Width = 32 from file, but my code fails to match.

File contains:

Width = 32
<uart0_rx_data : 16'h0000>
Descr - "This is Register1 comment"
# f_name    bit_pos            
RESERVED    31:8    
RXDATA        5:0        
</uart0_rx_data>
[download]

Code snippet:


if($line =~ m/^Width[\s]*\=[\s]*\d+$/) # to match Width
{
  update_curs    ($line);
}
elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/) # to match Descr
{
  update_curs    ($line);
}
else
{
  printf "Garbage found: \"%s\" \n",$line;
}
[download]

Output: Garbage found: "Width = 32"

Kindly help.

UPDATE: Thanks. I have removed extra spaces and new line character in my code. When I run the code individually it works, but when integrated with main package it doesn't work. As tobyink mentioned "Descr" is not matched.

sub validate_save_regs 
{
    my $line = $_;
        chomp $line;
    if($line =~ m/^Width[\s]*\=[\s]*\d+$/){
            $reg_width_found = 1;
        update_curs    ($line);
    }
    elsif ($line =~ m/^Descr[\s]*-[\s]*[\w]+$/)
    {
        printf "inside elsif, line is %s\n",$line;
        if(!$reg_offset_found)
        {
            printf    "Garbage found at: %d \n", $line_no;
            exit(0);
        }
        $name_found     = 1;
        update_curs    ($line);
    }
    else 
    {
        printf    "Garbage found %s\n", $line;
    }
}
[download]

In this integrated code "Descr - This is Register1 comment" doesn't match and goes to Garbage rule.

Comment on Regular Expression Select or Download Code

Replies are listed 'Best First'.
Re: Regular Expression by moritz (Cardinal) on Jun 27, 2012 at 11:52 UTC
I cannot reproduce your problem here. On my machine I ran this code: `use strict; use warnings; my $line = 'Width = 32'; if($line =~ m/^Width[\s]\=[\s](\d+)$/) # to match Width { print "Width found: $1\n" } elsif ($line =~ m/^Descr[\s]-[\s][\w]+$/) # to match Descr { print 'Descr found' } else { printf "Garbage found: \"%s\" \n",$line; }` [download] And it prints out `Width found: 32` If it doesn't match for you, maybe $line contains some non-printable characters (maybe because the file is stored in UTF-16?) To find out, you can use something like `use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper $line;` [download] Perl 6 - the future is here, just unevenly distributed	[reply] [d/l] [select]
Re: Regular Expression by tobyink (Canon) on Jun 27, 2012 at 11:48 UTC
The data you posted, does correctly match your regular expression for width. So if it's not working for you, then there's probably something else important in your code that you've neglected to tell us. One possibility is that your input data actually contains trailing whitespace on the Width line. You don't account for leading or trailing whitespace in your regular expression. The Descr regular expression isn't going to match though. Example follows. Don't copy and paste because it may be whitespace-sensitive. Use the "download" link which PerlMonks provides underneath the code... `while (<DATA>) { chomp; my $line = $_; if ($line =~ m/^Width[\s]\=[\s]\d+$/) { printf "Found width: \"%s\"\n", $line; } elsif ($line =~ m/^Descr[\s]-[\s][\w]+$/) { printf "Found descr: \"%s\"\n", $line; } else { printf "Garbage found: \"%s\" \n",$line; } } __DATA__ Width = 32 <uart0_rx_data : 16'h0000> Descr - "This is Register1 comment" # f_name bit_pos RESERVED 31:8 RXDATA 5:0 </uart0_rx_data>` [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re: Regular Expression by roboticus (Chancellor) on Jun 27, 2012 at 13:26 UTC
pramod: I notice your regex includes '$' with no '\s' before it. Perhaps you're forgetting to chomp your input lines? That could easily make your regular expression fail. ...roboticus When your only tool is a hammer, all problems look like your thumb.*	[reply]
Re^2: Regular Expression by raaj (Acolyte) on Jun 28, 2012 at 09:32 UTC
awesome sir	[reply]
Re: Regular Expression by ckj (Chaplain) on Jun 27, 2012 at 12:00 UTC
Well you can go for this, this change is required in your first line of code: `if($line =~ m/^Width(\s)\=(\s)(\d)/)#to match width` [download] Please go through Regular Expressions tutorials for a better understanding, these are very simple search techniques This is my program `$line = 'Width = 32 <uart0_rx_data : 16\'h0000> Descr - "This is Regis +ter1 comment" f_name bit_pos RESERVED 31:8 RXDATA 5:0 + </uart0_rx_data>'; if($line =~ m/^Width(\s)\=(\s)(\d)/) # to match Width { print "something"; } elsif ($line =~ m/^Descr[\s]-[\s][\w]+$/) # to match Descr { print "noone"; } else { printf "Garbage found: \"%s\" \n",$line; }` [download] UPDATE:: So, WW I haven't said that this is the only way there are lots of way to do things in perl. Secondly, I didn't say that what you need to print the width value. Yes $3 will give the exact output, while you are completely wrong on the statement that it will not match ) after equal to. Check it and then say.	[reply] [d/l] [select]
Re^2: Regular Expression by ww (Archbishop) on Jun 27, 2012 at 14:15 UTC
Capturing the zero-or-more-spaces surrounding the equals symbol is NOT helpful... and means OP, were your suggestion adopted, would have to use $3 for the value of width. Moreover, if there were zero-digits after the equals symbol, the data would not match OP's explicit example and implicit spec. Update re ckj's update which is MOSTLY WRONG. Reread the above. `(\d+)` captures one or more digits. `(\d)` matches ZERO digits which is useless. `perl -e "use 5.014; my $foo= 'bc'; if ($foo = /(\d)/ ) {say 'match'; +}else{ say 'Duh!';}" match` [download] OP does not ask to capture the spaces. There is no need to capture the spaces to account for their quantity; 0, 1, 2 or whatever. Your parens are capturing. Grouping (aka 'non-capturing') parens are `(?:...)`. Nothing was said about `print`. And "not helpful" `ne` "not match." Please read carefully before unburdening yourself of mis-statements. Despite it's irrelevance here, though, you did get one aspect of your reply partially right: "TIMTOWTDI." But -- as the rest of that wisdom goes - 'and most of them are wrong.' </Update>	[reply] [d/l] [select]
Re: Regular Expression by prashantktyagi (Scribe) on Jun 27, 2012 at 11:47 UTC
Width=`grep ^Width <filename> \| sed 's/Width=//'` Descr=`grep ^Descr <filename> \| sed 's/Descr-//'` [download] update	[reply] [d/l]

Back to Seekers of Perl Wisdom