Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Bug in script, regex help req extreme urgent

by sid.verycool (Novice)
on Mar 09, 2013 at 10:05 UTC ( #1022549=perlquestion: print w/ replies, xml ) Need Help??
sid.verycool has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, i need to debug this script and make it work according to my requirement. This is an urgent requirement, i need to fix this asap.

The concept of the script was to replace "oldname" written after the word "module" with "newname" This worked well until the word module came in the comment. CONTENT OF SCRIPT
#!/usr/bin/perl -w BEGIN {undef $/;} # I TRIED 1ST my $match = "^module.*?$ARGV[2].*?([\\(;])"; 2ND my $mat +ch = "\^module.*?$ARGV[2].*?([\\(;])"; 3RD my $match = "\\^module.*?$ +ARGV[2].*?([\\(;])" my $match = "module.*?$ARGV[2].*?([\\(;])"; #print "$match"; my $filename = $ARGV[0]; open (INFILE, "<", $filename) or die "Failed to read file $filename +: $! \n"; $string = <INFILE>; close INFILE; #I ALSO TRIED "$string =~ s/^$match/module $ARGV[1]$1/sg;"; $string =~ s/$match/module $ARGV[1]$1/sg; open OUTFILE, ">$ARGV[0]" || die "Failed to create $ARGV[0]\n"; print OUTFILE ($string); close OUTFILE;
what this script does to input file is : CONTENT OF BEFORE SCIPT IS RUN ON FILE
//Verilog HDL for "tt", "hh" "functional" // if i write the word here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule
NOW I RUN script.pl FILE NEW OLD now the CONTENT OF FILE BECOMES
//Verilog HDL for "tt", "hh" "functional" // if i write the word here the script goofs up `timescale 1ps/10fs module NEW(Y, A, B ); output Y; input A; input B; endmodule
which is good but if i write the word "module" in the comment line i.e. CONTENT OF BEFORE SCrIPT IS RUN ON FILE which goofs up
//Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule

Now the contents become
//Verilog HDL for "tt", "hh" "functional" // if i write the word module NEW(Y, A, B ); output Y; input A; input B; endmodule
which is unacceptable to me. plz help guys,

Comment on Bug in script, regex help req extreme urgent
Select or Download Code
Re: Bug in script, regex help req extreme urgent
by choroba (Abbot) on Mar 09, 2013 at 10:38 UTC
    Why do you use the /s modifier of the substitution? It makes . match a newline which is the reason of your problem. Is it possible in your input file for module to be separated by a newline from the module name or arguments?
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Bug in script, regex help req extreme urgent
by Don Coyote (Monk) on Mar 09, 2013 at 10:45 UTC

    Use a look-behind assertion. Rather than looking for the string 'module ' then testing if the $ARGV[2] string follows, look for the $ARGV[2] string and see if it is preceded by the string 'module '.

    my $match = "(?<=module )$ARGV[2]"

    Using Look-ahead and Look-behind

      If i do
      my $match = "(?<=module ).*?$ARGV[2].*?([\\(;])";
      Then the output is
      //Verilog HDL for "tt", "hh" "functional" // if i write the word module module NEW(Y, A, B ); output Y; input A; input B; endmodule
      this does not solve the issue :(
      tHANKS Don Coyote, actually i tried what i suggested again... and fortunately its doing what i want, but i dont have any idea how its doing its job, specially the substitution part? we say it to replace $string with ARGV1$1.
      my $match = "(?<=module )$ARGV[2].*?([\\(;])"; print "$match"; my $filename = $ARGV[0]; open (INFILE, "<", $filename) or die "Failed to read file $filename +: $! \n"; $string = <INFILE>; close INFILE; $string =~ s/$match/$ARGV[1]$1/sg; print "$1";
      The output is just what i want i.e.
      script.pl f5 NEW OLD (?<=module )OLD.*?([\(;]) #> cat f5 //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module NEW(Y, A, B ); output Y; input A; input B; endmodule
      please help me in understanding this
        Its solves my issue but fails in the scenario when i have  module this is in between OLD(A, B, Y); , i even want my script to change it to,   module this is in between NEW(A, B, Y); hence i tried doing this  (?<=module .*)$ARGV[2].*?([\\(;])"; which doesnt work and shell says <bold>Variable length lookbehind not implemented in regex; marked by <-- HERE in m/(?<=module .*)NEW.*?(\(;) <-- HERE / at script.pl line 23.</bold>. How can i implement this?
Re: Bug in script, regex help req extreme urgent
by CountZero (Bishop) on Mar 09, 2013 at 12:49 UTC
    Or go through the file line by line and skip the lines that start with //.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      This should work:
      use Modern::Perl; my $filename = $ARGV[0]; die "Usage: modulenamechanger.pl filename newmodulename oldmodulename\ +n" unless @ARGV == 3; open my $INFILE, "<", $filename or die "Failed to read file $filename +: $!"; say "Now parsing $filename"; my @file = <$INFILE>; close $INFILE; open my $OUTFILE, '>', $filename or die "Failed to create $filename"; my $match = qr "module\s*($ARGV[2])"; say "Matching: $match"; for my $line (@file) { if ( $line =~ m(^//) ) { print $OUTFILE $line; next; } $line =~ s/$match/module $ARGV[1]/; print $OUTFILE $line; }

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics
        So now i have 2 scripts, one which is doing all work correctly but does NOT USE LINE BY LINE match the code is as follows
        #!/usr/bin/perl -w BEGIN {undef $/;} # I TRIED 1ST my $match = "^module.*?$ARGV[2].*?([\\(;])"; 2ND my $mat +ch = "\^module.*?$ARGV[2].*?([\\(;])"; 3RD my $match = "\\^module.*?$ +ARGV[2].*?([\\(;])" #"(?<=module )$ARGV[2]" my $match = "(?<=module )$ARGV[2].*?([\\(;])"; #print "$match"; my $filename = $ARGV[0]; open (INFILE, "<", $filename) or die "Failed to read file $filename +: $! \n"; $string = <INFILE>; close INFILE; #I TRIED "$string =~ s/^$match/module $ARGV[1]$1/sg;"; $string =~ s/$match/$ARGV[1]$1/sg; #print "$1"; open OUTFILE, ">$ARGV[0]" || die "Failed to create $ARGV[0]\n"; print OUTFILE ($string); close OUTFILE;
        this will do well until in the comment// some1 writes our pattern. Now to come up with a new script which GOES LINE BY LINE courtesy @CountZero i have
        use Modern::Perl; my $filename = $ARGV[0]; die "Usage: modulenamechanger.pl filename newmodulename oldmodulename\ +n" unless @ARGV == 3; open my $INFILE, "<", $filename or die "Failed to read file $filename +: $!"; say "Now parsing $filename"; my @file = <$INFILE>; close $INFILE; open my $OUTFILE, '>', $filename or die "Failed to create $filename"; my $match = qr "module\s*($ARGV[2])"; say "Matching: $match"; for my $line (@file) { if ( $line =~ m(^//) ) { print $OUTFILE $line; next; } $line =~ s/$match/module $ARGV[1]/; print $OUTFILE $line; }
        But my problems with this are as follows: 1. NO modern perl 2.This code didnt work for me 3.I dont want to change the pattern regex my $match = "(?<=module )$ARGV[2].*?([\\(;])";
        (it may cause some other effect because im not 100% sure about its intent and why it was written in that way, so lets keep it that way only). 4. I didnt find any else portion in your code @zerocount 5.Also there is some problem in parsing, i have modified your code but still not getting desired o/p. Please see the below code, i cant find out whats wrong ,i tried various print statements as well to see but in vain
        #!/usr/bin/perl -w BEGIN {undef $/;} #use Modern::Perl; my $filename = $ARGV[0]; die "Usage: modulenamechanger.pl filename newmodulename oldmodulename\ +n" unless @ARGV == 3; open my $INFILE, "<", $filename or die "Failed to read file $filename +: $!"; print "Now parsing $filename\n"; my @file = <$INFILE>; close $INFILE; open my $OUTFILE, '>', $filename or die "Failed to create $filename"; # WITHOUT ABANSAL CONCEPT my $match = qr "module\s*($ARGV[2])"; my $match = "(?<=module )$ARGV[2].*?([\\(;])"; print "Matching: $match\n"; for my $line (@file) { print "READING LINE $line \n"; if ( $line =~ m(^//) ) { print "INSIDE IF\n"; print $OUTFILE $line; next; } else { print "INSIDE ELSE \n BEFORE change:\t $line\n"; $line =~ s/$match/moduleXXX $ARGV[1]/; print "AFTER :\t $line\n"; print $OUTFILE $line; next; } }
        The output is
        my.pl f7 NEW OLD Now parsing f7 Matching: (?<=module )OLD.*?([\(;]) READING LINE //Verilog HDL for "tt", "hh" "functional" // if i write the word module OLD(Y, A, B ); here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule INSIDE IF
Re: Bug in script, regex help req extreme urgent
by pvaldes (Chaplain) on Mar 09, 2013 at 13:14 UTC

    I need to debug this script. This is an urgent requirement

    Type: use strict; very fast

    --> Global symbol "$string" requires explicit package name at bugscri +pt.pl line 18. Execution of bugscript.pl aborted due to compilation errors.

    Ok, we got one!

    Can't see the strings "newname" or "oldname" in your script. Can we have an example of the matching line?

    (...few minutes later...)

    Ok, i see... Try to be as clear as you can.

    my $filename = $ARGV[0]; my $oldname = $ARVG[2]; my $newname = $ARGV[1];

    oldname = 2, newname = 1... you want to change 2 by 1. Could I suggest to replace 1 by 2 instead and avoid unnecessary obfuscation?

    my $string =~ s/^module.*?$oldname.*?([\\(;])/module $newname$1/sg;

    Mmmh, I'm not very comfortable with the idea of to use ";" or "(" as last character of a filename...

      Thanks for the reply pvaldis newname is NEW (module NEW) oldname is OLD (MODULE OLD)
Re: Bug in script, regex help req extreme urgent
by 2teez (Priest) on Mar 09, 2013 at 13:47 UTC

    Hi sid.verycool
    Using the data set you provided,

    • you could use lookahead assertion like so:
      use warnings; use strict; while (<DATA>) { chomp; s/(.+?)OLD(?=\()/$1NEW/; print $_, $/; } __DATA__ //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule
    • you can also use the wisdom of CountZero
      while (<DATA>) { chomp; s/(.+?)OLD/$1NEW/ unless m{//}; print $_, $/; } __DATA__ //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule
    Output:
    //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module NEW(Y, A, B ); output Y; input A; input B; endmodule

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Bug in script, regex help req extreme urgent
by karlgoethebier (Priest) on Mar 09, 2013 at 15:15 UTC

    Perhaps this isn't my very best day or i miss something essential - please feel free to correct me. But this seems to work...

    #!/usr/bin/perl use strict; use warnings; undef $/; my $file = shift; my $module = shift; my $replace = shift; open ( my $fh, "<", $file); my $data = <$fh>; close $fh; print $data; print qq(\n\n\n); if ( $data =~ m/module $module/ ) { $data =~ s/$module/$replace/; } # ... # $data =~ s/$module/$replace/; print $data; __END__ Karls-Mac-mini:monks karl$ ./new.pl test.txt OLD "What are you doing? +" //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module OLD(Y, A, B ); output Y; input A; input B; endmodule //Verilog HDL for "tt", "hh" "functional" // if i write the word module here the script goofs up `timescale 1ps/10fs module What are you doing?(Y, A, B ); output Y; input A; input B; endmodule

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Bug in script, regex help req extreme urgent
by jaredor (Curate) on Mar 09, 2013 at 16:22 UTC

    I like the test harness of 2teez, it is easily adapted for a perl -p command line. So I'll reuse it for an old sed-der comment/observation:

    For mass transformations of text, trade off flexible regex for static strings when possible.

    while (<DATA>) { s/\b module \s+ OLD \b/module NEW/xms; print $_; }

    Of course, this means that whatever whitespace separation you used to have between the keyword "module" and the module name is gone, but whitespace niceties usually aren't a big deal with mechanically gunkulated code. On the other hand, you don't have to ignore any lines that have commenting on them.

    This is a line oriented solution, so slurping in a file and transforming that long string will require using the "g" flag and listening to sages such as choroba. For line-oriented text, I believe line oriented processing gives the fewest surprises (e.g., a surprise such as having a comment change affect code manipulation). However if the two adjacent words you are keying off of can be separated by a newline, then you have to use something like the fancier regex solutions suggested.

    One thing left to worry about is whether "module OLD" in a comment should be transformed to "module NEW". But if you keep in mind the old maxim, "All comments lie," then you shouldn't worry too much ;-)

      1 confession, this code was written my 1 of my colleagues who left and i have to fix it. I'm trying to understand what this script exactly wants (specially the regex in my $string), so that any of my changes dont break original intent of script. Plz help me in understanding its objective and how can i transform it to read thru line by line, which is i think the easier way (atleast for a newbie like me) and yes if in the comment we change OLD to new it wud b gr8 because that is also wrong (although comments lie ;) shell#> script.pl file NEW OLD
      #!/usr/bin/perl -w BEGIN {undef $/;} my $match = "module.*?$ARGV[2].*?([\\(;])"; my $filename = $ARGV[0]; open (INFILE, "<", $filename) or die "Failed to read file $filename +: $! \n"; $string = <INFILE>; close INFILE; $string =~ s/$match/module $ARGV[1]$1/sg; open OUTFILE, ">$ARGV[0]" || die "Failed to create $ARGV[0]\n"; print OUTFILE ($string); close OUTFILE;
        we change OLD to new it wud b gr8 because that is also wrong

        Can not understand this language, sorry. Maybe you need to provide a better example?.

        If you want a likely story to explain things, then here's mine: Your colleague wanted to change an entire module definition file "in place" but didn't want to use a temporary file.

        On a Unix-type command line, you can do the same thing with

        perl -pie 's/\b module \s+ OLD \b/module NEW/xms;' file_name

        But back to your script. Your colleague didn't have to slurp the file into a single variable, he or she could have read the file into an array. The amount of memory taken up would have roughly been the same and the text processing could then have been done line-by-line.

        I don't know what relevant CPAN modules are out there to make in-place text file processing easier, but I'd bet there are some. For your emergency need right now the other excellent answers in this thread should give you enough to deliver something.

Re: Bug in script, regex help req extreme urgent
by Athanasius (Monsignor) on Mar 10, 2013 at 04:27 UTC

    Hello sid.verycool, and welcome to the Monastery!

    Please note that when a regular expression is assigned to a string prior to its use in a match, the string should be quoted using Perl’s qr// operator. That is, a line such as:

    my $match = "(?<=module )$ARGV[2].*?([\\(;])";

    is better written:

    my $match = qr/(?<=module )$ARGV[2].*?([\\(;])/;

    See Regexp Quote Like Operators in perlop for the advantages of using this form of quoting.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      thanks Althanasius, this monastry is indeed a gr8 place to come to (with the number of patient and helpful people i have come across in only 1 day) Actually im not sure what $ARGV[2].*?([\\(;])/ is doing, $ARGV2 is the argument i know suppose i passed OLD i,e, $ARGV2=OLD, now can any1 explain the intent of (?<=module )$ARGV[2].*?([\\(;]) i know . means any character * 0 or more occurrences,

        Tip #9 from the Basic debugging checklist:

        Demystify regular expressions by installing and using the CPAN module YAPE::Regex::Explain

        So:

        #! perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/(?<=module )$ARGV[2].*?([\\(;])/; print YAPE::Regex::Explain->new($re)->explain();

        Output:

        15:40 >perl 566_SoPW.pl FILE OLD NEW The regular expression: (?-imsx:(?<=module )NEW.*?([\\(;])) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?<= look behind to see if there is: ---------------------------------------------------------------------- module 'module ' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- NEW 'NEW' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [\\(;] any character of: '\\', '(', ';' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- 15:40 >

        See also perlretut and perlre.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1022549]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2014-12-20 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls