Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Extracting string from a file

by RMGir (Prior)
on Nov 11, 2013 at 13:02 UTC ( #1061987=note: print w/replies, xml ) Need Help??

in reply to Extracting string from a file

As it stands, I don't think your code even compiles, since you've got an unbalanced '(' in your regular expression. I'd start by fixing that.

Second, '|' is a special character in regular expressions, meaning "this OR that". Since you want a literal '|', you need to precede it with a '\'.

Next, you're using '|' again between TOTAL and your digits, while your example has spaces there. Which is it?

'\d' is a character class by itself - it doesn't go in square brackets. But since you also need to match '.', use '0-9.'. Do you also need to match negative numbers? If so, that would be '-0-9.'.

Finally, you're only capturing the first number, and you said you need both.

Putting all of that together, please try this regular expression:

I think that might work better for you....

Test case:

perl -e'$_="~|TOTAL 24.1% 0.4%"; /\|TOTAL\s+([0-9.]+)%\s+([0-9.]+)/ && + print "Matched, $1 $2\n"' Matched, 24.1 0.4


Replies are listed 'Best First'.
Re^2: Extracting string from a file
by ww (Archbishop) on Nov 11, 2013 at 13:50 UTC
    Not to quarrel because your explanation of the regex problems is exemplary, but OP is clearly dealing with a multi-line logfile, in which some lines begin with ~|TOTAL. Hence, an array better matches the SOPW spec than the string is your "Test Case."

    Aside to Bindo: Your spec comes up a little short of perfection because (very strictly speaking and very nitpicky) there's no requirement -- merely a single illustration -- that what's captured be numeric followed by a percent sign. What if the notation were hex, binary or some sort of non-Arabic numbers? In any case, I've treated you spec as "any line that begins with tilde, pipe, 'TOTAL' followed by anything" which is the only reason my regex differs from RMGir's:

    #!/usr/bin/perl use 5.016; use warnings; #1061986 my @logfile = ("~|TOTAL 24.1% 0.4%", "~|not a total 11%", "~|TOTAL 21.0% 0.7%", "FOOBAR", "~|TOTAL 13.7% 10.2%", "~|TOTAL last5 6", ); my @FIGURE; for my $logentry(@logfile) { if ($logentry =~ /~\|(TOTAL.*)/ ) { push @FIGURE, $1; } else { say "\t \$logentry, $logentry, does not match pattern"; } } for (@FIGURE) { say $_; } =head execution: C:\> $logentry, ~|not a total 11%, does not match pattern $logentry, FOOBAR, does not match pattern TOTAL 24.1% 0.4% TOTAL 21.0% 0.7% TOTAL 13.7% 10.2% TOTAL last5 6 =cut

      Thank you very much for all the good advices gentlemen. Guess I owe you all a big apology since I have failed to reply. I was in the hospital due to a small accident and only last night I have been discharged. Now back at feet :)

      I tried the following code but the program wont give any output nor any errors. Please can one of you correct this code for me? Please gentlemen Im a beginner who is trying to understand the whole concept of regexes more specifically with files, So be gentle.

      my $SYS_HOME = $ENV{'SYSTEM_HOME'}; my $GD_FILE = $SYS_HOME."/GD.log"; my $FH; my @DUMP; open ($FH, '<', $GD_FILE) || die "Cant open : $!"; while (my $line = $FH) { if ($line =~ /~\|(TOTAL.*)/){ #my $tmp = $1; push @DUMP, $1; foreach (@DUMP) { print "$_\n"; } } }

      Many thanks in advance! /Bindo

      Sir can you please go through my latest reply at the end of the thread and advice? For some reason no one is replying. thanks.

        The "reason(s)" you're getting no replies may include:

        1. The tread has grown elderly (and deeper than the max depth to which some Monks read).
        2. The latest code you've posted shows little evidence that you've worked to understand and implement prior replies.
        3. The form of your question amounts to a 'gimme', AKA, a 'do it for me' which is not an approach approved here.

        Still, in the spirit of 'help, but don't do the (whole) job':

        • Your use of while is not doing what you want. See perldoc -f while (which refers you to a specific section of perlsyn, a document available on your own computer) and compare to the docs re for.
          open (my $FH, '<', "bindo1061990A.txt") || die "Cannot open bindo10619 +90A,txt: $!"; my @lines = <$FH>; for my $line (@lines) { say $line; }
        • Your foreach (@DUMP) { at Ln 10 should NOT be inside the loops beginning at Lns 6 and 7 as that will produce repetitive output for each (new) match.

        I hope this helps... and also clarfies that the saying "Heaven helps those who help themselves" can be paraphrased to apply here.

Re^2: Extracting string from a file
by oiskuu (Hermit) on Nov 12, 2013 at 03:19 UTC
    A nitpick. The \d is a character class, and it MAY go into brackets, to be combined with the rest.
    The following are equivalent: /[0-9]/, /[[:digit:]]/, /[\d]/, /\d/.

    Your regex could also be written as: /\|TOTAL\s+([\d.]+)%\s+([\d.]+)/.

Re^2: Extracting string from a file
by Bindo (Acolyte) on Nov 20, 2013 at 05:13 UTC

    Sir can you please go through my latest reply at the end of the thread and advice? For some reason no one is replying. thanks.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1061987]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2018-02-20 22:01 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (274 votes). Check out past polls.