http://www.perlmonks.org?node_id=862201


in reply to Re: regular expessions question: (replacing words)
in thread regular expessions question: (replacing words)

Hi Jethro, Thanks a lot for the useful reply. I will follow your advice and also try the script. I also just came up with this script and it worked. Any comments on it? (i would really appreciate it!!)
#!usr/bin/perl my $FILENAME4 = "organized.txt"; open(DATA2, $FILENAME4); #remove all previous re-organized files my $remove_reorganized = "re-organized.txt"; if (unlink($remove_reorganized) == 1) { print "Existing \"re-organized.txt\" file was +removed\n"; } #now make a file for the ouput my $outputfile = "re-organized.txt"; if (! open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n" +; exit; } while (my $organized = <DATA2>) { #do some re-organizing #sort out the group numbers first $organized =~ s/(\w+)[^(^\d+)(\s)]/z/g; my $organized2 = $organized; $organized2 =~ s/z(\d+)/z/g; print POS $organized2; }
Thanks, $new_guy

Replies are listed 'Best First'.
Re^3: regular expessions question: (replacing words)
by jethro (Monsignor) on Sep 27, 2010 at 14:07 UTC
    $organized =~ s/(\w+)[^(^\d+)(\s)]/z/g;

    In a regular expression [ ... ] is a character set. What you told perl to look for is a word followed by ONE character that is not a '(', ')','^' or '+' and neither a number nor a space character.

    Since the first word in your lines seems to be a single digit number (at least in your sample data), it is just coincidence that it isn't replaced. Any word of length 1 will not be replaced. Also any word aka element with a number or any of the other characters above as last character would not be replaced.

    In short, if these lines work for you, it probably is just a coincidence

    Maybe you should use more variable test data to check for edge cases, for example try:

    5 suf 6 va7 7dra de) e+f ed ed 5z5 nu3 b +4 s 5 + 33 44 55 z5 zb zzz zb z5 4zz

    PS: Please reread my first post, I had to correct an error in my regex

Re^3: regular expessions question: (replacing words)
by jwkrahn (Abbot) on Sep 27, 2010 at 14:34 UTC
    my $FILENAME4 = "organized.txt"; open(DATA2, $FILENAME4);

    You should always verify that the file opened correctly.

    my $FILENAME4 = "organized.txt"; open DATA2, '<', $FILENAME4 or die "Cannot open '$FILENAME4' $!";

    #remove all previous re-organized files my $remove_reorganized = "re-organized.txt"; if (unlink($remove_reorganized) == 1) { print "Existing \"re-organized.txt\" file was +removed\n"; } #now make a file for the ouput my $outputfile = "re-organized.txt"; if (! open(POS, ">>$outputfile") ) { print "Cannot open file \"$outputfile\" to write to!!\n\n" +; exit; }

    If you open the file for output instead of append then you don't have to delete the file first as that is a side effect when you open for output.

    # make a file for the ouput my $outputfile = "re-organized.txt"; open POS, '>', $outputfile or die "Cannot open file '$outputfile' to w +rite to because: $!";

    $organized =~ s/(\w+)[^(^\d+)(\s)]/z/g;

    You are using a regular expression that says: match one or more word characters followed by a single character that is not the character '(' or '^' or any digit or '+' or ')' or any whitespace, which does not make sense.    It could be that you do not understand how character classes work?

Re^3: regular expessions question: (replacing words)
by toolic (Bishop) on Sep 27, 2010 at 12:59 UTC
Re^3: regular expessions question: (replacing words)
by JavaFan (Canon) on Sep 27, 2010 at 12:51 UTC
    Any comments on it?
    Yeah, use an indentation style that makes sense. Indentation is there for *humans* only. It isn't some sort of magical lube that makes your program runs faster, and all that matters is to have some of it.
      Dear Perl monks,

      I have a successive question. Now how do I select two columns at random, count ONLY all the z's common to both columns.

      I would like to repeat this say 10 times and finally get the mean of all counts (i.e 10 random selection).

      It gets more complicated. In the next round of random selection, I want to pick 3 columns and count the z's common to all of them, repeat this ten times. Do this .... until say n = 18 columns! getting the mean at each at the end of each instance! At the moment I have no idea on how to go about it! A hint would be really appreciated

      Thanks

        Does your data fit into memory? If not, it gets more complicated (or you just have to wait a long time for the data file to get read dozens of times). You would either have to store it into a database or compress it (i.e. 'z' is 1, not-z is 0, so that every element uses just one bit)

        If yes, read the file into an Array of Arrays:

        my @data; my $n=0; while ($organized=<DATA2>) { chomp; $organized=~s/(\s)\w+/$1z/g; push @{$data[$n++]}, (split /\s+/, $organized); }

        Now accessing column 5 of line 2 is just a simple $data[2][5]

        Now to get it easier, split your problem into easier parts. Create a subroutine that gets as parameter an arbitrary number of columns. This subroutine just counts all rows that have a 'z' in all these columns. You can do that with a loop (over the selected columns) inside a loop (over all rows).

        If you got that working (test it with some simple data), just create another array, add a random number. Then repeatedly add a random number (that is not already in the array) to the array, call the subroutine with it. Do that 18 times.