Re: Extracting string from a file
by RMGir (Prior) on Nov 11, 2013 at 13:02 UTC
|
As it stands, I don't think your code even compiles, since you've got an unbalanced '(' in your regular expression. I'd start by fixing that.
Second, '|' is a special character in regular expressions, meaning "this OR that". Since you want a literal '|', you need to precede it with a '\'.
Next, you're using '|' again between TOTAL and your digits, while your example has spaces there. Which is it?
'\d' is a character class by itself - it doesn't go in square brackets. But since you also need to match '.', use '0-9.'. Do you also need to match negative numbers? If so, that would be '-0-9.'.
Finally, you're only capturing the first number, and you said you need both.
Putting all of that together, please try this regular expression:
/\|TOTAL\s+([0-9.]+)%\s+([0-9.]+)/
I think that might work better for you....
Test case:
perl -e'$_="~|TOTAL 24.1% 0.4%"; /\|TOTAL\s+([0-9.]+)%\s+([0-9.]+)/ &&
+ print "Matched, $1 $2\n"'
Matched, 24.1 0.4
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Not to quarrel because your explanation of the regex problems is exemplary, but OP is clearly dealing with a multi-line logfile, in which some lines begin with ~|TOTAL. Hence, an array better matches the SOPW spec than the string is your "Test Case."
Aside to Bindo: Your spec comes up a little short of perfection because (very strictly speaking and very nitpicky) there's no requirement -- merely a single illustration -- that what's captured be numeric followed by a percent sign. What if the notation were hex, binary or some sort of non-Arabic numbers?
In any case, I've treated you spec as "any line that begins with tilde, pipe, 'TOTAL' followed by anything" which is the only reason my regex differs from RMGir's:
#!/usr/bin/perl
use 5.016;
use warnings;
#1061986
my @logfile = ("~|TOTAL 24.1% 0.4%",
"~|not a total 11%",
"~|TOTAL 21.0% 0.7%",
"FOOBAR",
"~|TOTAL 13.7% 10.2%",
"~|TOTAL last5 6",
);
my @FIGURE;
for my $logentry(@logfile) {
if ($logentry =~ /~\|(TOTAL.*)/ ) {
push @FIGURE, $1;
} else {
say "\t \$logentry, $logentry, does not match pattern";
}
}
for (@FIGURE) {
say $_;
}
=head execution:
C:\>1061986.pl
$logentry, ~|not a total 11%, does not match pattern
$logentry, FOOBAR, does not match pattern
TOTAL 24.1% 0.4%
TOTAL 21.0% 0.7%
TOTAL 13.7% 10.2%
TOTAL last5 6
=cut
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thank you very much for all the good advices gentlemen. Guess I owe you all a big apology since I have failed to reply. I was in the hospital due to a small accident and only last night I have been discharged. Now back at feet :)
I tried the following code but the program wont give any output nor any errors. Please can one of you correct this code for me? Please gentlemen Im a beginner who is trying to understand the whole concept of regexes more specifically with files, So be gentle.
my $SYS_HOME = $ENV{'SYSTEM_HOME'};
my $GD_FILE = $SYS_HOME."/GD.log";
my $FH;
my @DUMP;
open ($FH, '<', $GD_FILE) || die "Cant open : $!";
while (my $line = $FH) {
if ($line =~ /~\|(TOTAL.*)/){
#my $tmp = $1;
push @DUMP, $1;
foreach (@DUMP) {
print "$_\n";
}
}
}
Many thanks in advance!
/Bindo | [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Extracting string from a file
by builat (Monk) on Nov 11, 2013 at 16:35 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
my @dump;
my $false_count = 0;
open (FH, "<file_name") || die "Cant open : $!";
while (<FH>){
if (/^.\|TOTAL.*$/i){
my @tmp = $_ =~ /([0-9\.\-]+)/g;
push @dump, "@tmp";
}else{
$false_count++;
}
}
print 'Counted matches-> '.$#dump."\tUnmatched lines-> ".$false_count.
+"\n";
foreach (@dump){print $_."\n";}
| [reply] [Watch: Dir/Any] [d/l] |
|
builat: Close but 'no cigar.' No downvote, but pls test your code before posting and thereby implying that it constitutes a correct answer.
Sorry, but numerous minor problems, including unnecessary complication of your code and (not exactly minor) your use -- in your Ln 6, open (FH, "<file_name")... -- of data not shown or referenced in your post. I realize it may be the same as OP's, or mine, but if you don't show it or otherwise make it unambiguous, future readers can't be sure.
Then there's an actual code problem: $#array does NOT count the elements in the array; returns the last element's index. Since array indices start with 0, $#array is 1 less than the count of elements (or count of indices, if you prefer to think of it that way).
#!/usr/bin/perl
use 5.016;
use warnings;
# 1062018 builat in same thread as #1061986
my @dump;
my $false_count = 0;
while (<DATA>){
chomp ($_);
if ($_ =~ /~\|(TOTAL.*)/ ) {
my $tmp = $1;
push @dump, $tmp;
} else {
say "False: |\"$_\"| does not match pattern";
$false_count++;
}
}
say "\n\t DEBUG \$#dump: $#dump";
say "\t NB: last index of the array and thus 1 less than the count of
+array elements!\n";
say 'Counted matches-> '. ($#dump + 1) . "\tUnmatched lines-> " . $fal
+se_count;
for (@dump){
say $_."\n";
}
=head execution
C:\> 1062018.pl
False: |"~|first"| does not match pattern
False: |"~|not a total 11%, "| does not match pattern
False: |"FOOBAR, "| does not match pattern
DEBUG $#dump: 3
NB: last index of the array and thus 1 less than the count of
+ array elements!
Counted matches-> 4 Unmatched lines-> 3
TOTAL 24.1% 0.4%,
TOTAL 21.0% 0.7%,
TOTAL 13.7% 10.2%,
TOTAL last5 6
=cut
__DATA__
~|first
~|TOTAL 24.1% 0.4%,
~|not a total 11%,
~|TOTAL 21.0% 0.7%,
FOOBAR,
~|TOTAL 13.7% 10.2%,
~|TOTAL last5 6
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I have a problem with your regex /^.\|TOTAL.*$/i, more specifically with the .*$ part of it.That part actually says "anything (even nothing) until the end of the string" and is therefore superfluous. Worse is that /^.\|TOTAL.*$/i will allow to pass a line without any digits in it and therefore will push nothing on the @dump array but neither is the $false_count variable incremented. Of course it is very well possible that the file will always have digits on its "TOTAL" lines, but IMHO that is a dangerous assumption to make.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
You are absolutely right. And yes I really started from the premise that any string beginning with ~ | TOTAL further contains a set of numbers.
It would be better to add a check for the presence of numbers on the right side of expression.
Thank you.
| [reply] [Watch: Dir/Any] |
Re: Extracting string from a file
by sundialsvc4 (Abbot) on Nov 12, 2013 at 16:23 UTC
|
There is, of course, “more than one way to do it,™” but I think that the way that I would do it is to use the /g modifier as discussed in perldoc perlretut.
Something like ... (caution... extemporaneous code; your syntax may vary)
while (my $line = <FH>) {
next unless $line =~ /^\~\|TOTAL/;
my @percents = ( $line =~ /([\d\.]+)/g );
.. do something with @percents ..
}
First, we ignore any lines outright which do not begin with the proper string ... notice the use of the "^" symbol to anchor to start-of-line, and the backslash-escaping of special symbols that otherwise would be taken as part of (ill-formed) regular expression syntax.
Then, “the interesting bits” in the string are groupings of digits-and-decimal-points, so we gather up as many of them as are present anywhere in the line. In so-called “array context,” Perl will return an array containing all of the values found, without using a loop to do so, although we certainly could have done so using so-called “scalar context.” Notice the use of parentheses to indicate a substring that we wish to extract.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Extracting string from a file
by Bindo (Acolyte) on Nov 19, 2013 at 02:11 UTC
|
Thank you very much for all the good advices gentlemen. Guess I owe you all a big apology since I have failed to reply. I was in the hospital due to a small accident and only last night I have been discharged. Now back at feet :)
I tried the following code but the program wont give any output nor any errors. Please can one of you correct this code for me? Please gentlemen Im a beginner who is trying to understand the whole concept of regexes more specifically with files, So be gentle.
my $SYS_HOME = $ENV{'SYSTEM_HOME'};
my $GD_FILE = $SYS_HOME."/GD.log";
my $FH;
my @DUMP;
open ($FH, '<', $GD_FILE) || die "Cant open : $!";
while (my $line = $FH) {
if ($line =~ /~\|(TOTAL.*)/){
#my $tmp = $1;
push @DUMP, $1;
foreach (@DUMP) {
print "$_\n";
}
}
}
Many thanks in advance!
/Bindo | [reply] [Watch: Dir/Any] [d/l] |
|
It's a simple bug - you're copying the file handle rather than reading from it.
while (my $line = $FH) {
should instead be
while (my $line = <$FH>) {
The "<>" around $FH reads from the filehandle (a line at a time, in this context).
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Extracting string from a file
by pvaldes (Chaplain) on Nov 23, 2013 at 02:00 UTC
|
I have a log file with more than 10000 lines
Not to much lines, but, your target should be, probably, to discard the unnecessary lines as soon as you can. You are doing the loop "check all files for all regexes + discard if all fails".
And you could consider instead this: "next unless my first character is '\|' or what I'm expecting, and if not, take a closer look to the rest of the lines. If you are looking exactly for "horse in a meadow" and your first letter is a "p", you don't need to look further. next line.
you can also weed out your file with grep first. Treat first the most common group of lines expected (positives or negatives for your match). Use regexes then for the difficult and rare cases. Complicated regexes are expensive.
| [reply] [Watch: Dir/Any] |