Re: How to print the lines immediately above and below a matching line?
by toolic (Bishop) on Nov 25, 2012 at 13:23 UTC
|
if ($this_line =~ /<DATA>/) {
Are you trying to read from the special DATA handle, or are you trying to match the exact string <DATA>? re indicates you are doing the latter:
perl -Mre=debug mycode.pl
Compiling REx "<DATA>"
Final program:
1: EXACT <<DATA>> (4)
4: END (0)
anchored "<DATA>" at 0 (checking anchored isall) minlen 6
Error opening file - No such file or directory
Freeing REx: "<DATA>"
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
Then you need to change your code. Something like:
my $data = <DATA>;
chomp $data;
if ($this_line =~ /\Q$data/)
| [reply] [d/l] |
|
|
|
|
Maybe you should read a line from that handle instead ... don't skip the basics, read perlintro
| [reply] |
|
|
Re: How to print the lines immediately above and below a matching line?
by karlgoethebier (Abbot) on Nov 25, 2012 at 20:35 UTC
|
As far as i understood the basic theme is: "...match a specific part of a line and print the line above and below it, in full." Please correct me if i'm wrong.
I would do it like this:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
# my $pattern = qr/(^4000.+)/;
# my $pattern = qr/(^4001.+)/;
my $pattern = qr/(^4002.+)/;
# my $pattern = qr/(4003.+)/;
# my $pattern = qr/(^4004.+)/;
# my $pattern = qr/(^4005.+)/;
tie my @lines, 'Tie::File', shift || die;
my $idx = 0;
for my $line(@lines){
print qq($idx $line\n);
if( $line =~ m/($pattern)/ ){
if( $idx == 0){
print qq(Heuraka: $1 next: $lines[ ( $idx + 1) ]\n);
};
if ( $idx == scalar( @lines - 1 ) ) {
print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 )] \n);
};
if ( $idx ~~ [ 1..scalar( @lines - 2 ) ]) {
print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 ) ] next:
+$lines[ ( $idx + 1 ) ]\n);
};
}
++$idx;
}
untie @lines || die;
__END__
Karls-Mac-mini:Desktop karl$ cat MyData.txt
4000_1#0
4001_1#1
4002_1#2
4003_1#3
4004_1#4
4005_1#5
Karls-Mac-mini:Desktop karl$ ./test.pl MyData.txt
0 4000_1#0
1 4001_1#1
2 4002_1#2
Heuraka: 4002_1#2 previous: 4001_1#1 next: 4003_1#3
3 4003_1#3
4 4004_1#4
5 4005_1#5
See also: Tie::File
Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] |
Re: How to print the lines immediately above and below a matching line?
by afoken (Chancellor) on Nov 25, 2012 at 16:56 UTC
|
/tmp>grep -C1 halt /etc/passwd
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/:/bin/false
/tmp>ack -C1 halt /etc/passwd
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/:/bin/false
/tmp>
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] |
|
Karls-Mac-mini:Desktop karl$ grep -C1 4003 MyData.txt
4002_1#2
4003_1#3
4004_1#4
...but do you really want to qx this on a 4 GByte file? Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] |
|
but do you really want to qx this on a 4 GByte file?
No. I would not use Perl at all just to call grep. My shell can start grep fine without needing Perl.
4 GByte should be no problem for grep, at least not for GNU grep. Actually, I expect grep to be at least as fast as a perl script, and I expect it to use less memory. Simply because grep is optimized for exactly that job.
By the way: grep has lots of other useful options, like showing line numbers and/or file names, again no need to write Perl code.
A quite useful alternative to grep is ack. It shares many features with GNU grep, and does some things better. ack ignores files and directories you typically do not want to search by default, it uses Perl regexp syntax instead of "basic" or "extended" regexp syntax, and it has a configuration file.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] |
|
|
shh! don't give away the secrets
| [reply] |
Re: How to print the lines immediately above and below a matching line?
by Kenosis (Priest) on Nov 25, 2012 at 18:37 UTC
|
You mention wanting to print the lines above and below a matching line, but your code prints the matching line, too. In case you wanted to print all (two or) three lines, consider the following that you can adapt for files:
use strict;
use warnings;
my ( $prevLine, $nextLine );
for ( ; ; ) {
last if eof DATA;
chomp( my $currLine = defined $nextLine ? $nextLine : <DATA> );
if ( $currLine =~ /match this/ ) {
print '-' x 25, "\n";
chomp( $nextLine = <DATA> ) if !eof DATA;
print $prevLine, "\n" if defined $prevLine;
print $currLine, "\n";
print $nextLine, "\n" if defined $nextLine;
print '-' x 25, "\n";
}
else {
undef $nextLine;
}
$prevLine = $currLine;
}
__DATA__
The first line match this
Not this
abcdefg
The one above
Another match this 1
Another match this 2
the one below match this 2
zxcvbnn
Another match this blank above
Second to the last line
The last line match this
Output:
-------------------------
The first line match this
Not this
-------------------------
-------------------------
The one above
Another match this 1
Another match this 2
-------------------------
-------------------------
Another match this 1
Another match this 2
the one below match this 2
-------------------------
-------------------------
Another match this 2
the one below match this 2
zxcvbnn
-------------------------
-------------------------
Another match this blank above
Second to the last line
-------------------------
-------------------------
Second to the last line
The last line match this
-------------------------
The dashes are printed to show the desired output. If the first or last line is a match, only two lines are printed. If you only want the lines above and below a matching line, delete print $currLine, "\n";.
Hope this helps!
Addition: If you want to avoid printing the same line more than once--like in the example above--and have output that more closely resembles grepping the file, you can do the following:
| [reply] [d/l] [select] |
Re: How to print the lines immediately above and below a matching line?
by space_monk (Chaplain) on Nov 25, 2012 at 16:29 UTC
|
TMTOWTDI answer. :-)
Instead of reading the file line by line as suggested in other answers above, you could also read the entire file into a scalar and use a multi line regexp to do it. Look up use of the /m option on regexp matching.
Any Monk who wishes to extend this thread with a complete answer using this method is more than welcome to do so (I'm a bit short of time)
A Monk aims to give answers to those who have none, and to learn from those who know more.
| [reply] |
|
If at ever there will need to be printed most at then some three lines, read the whole file is waste of memory
| [reply] |
|
Waste of memory? Memory is there to be used ... what are you saving it for?
Sure, for a long-lived application may want to be leery of using a large quantity of ram. But if it solves the immediate problem at hand, then using memory isn't really a problem. Also, memory is so large anymore that you need to work with *big* files if you're going to make memory usage a problem. The ordinary file isn't really going to be a problem.
For example, here's a histogram of file sizes on a couple of my machines--my work laptop (LT0186) and my goofing off computer (Boink):
files smallar than | LT0186 | Boink |
1 | 2852 | 122103 |
10 | 701 | 5777 |
25 | 3920 | 30988 |
50 | 7793 | 31843 |
100 | 4501 | 41932 |
250 | 10112 | 128614 |
500 | 14385 | 119923 |
1k | 31564 | 192614 |
2.5k | 40133 | 275173 |
5k | 33471 | 218245 |
10k | 34710 | 233223 |
25k | 27628 | 211316 |
50k | 14394 | 100595 |
100k | 12579 | 71556 |
250k | 9674 | 61003 |
500k | 4961 | 22754 |
1M | 3508 | 13800 |
2.5M | 2013 | 8325 |
5M | 852 | 4279 |
10M | 738 | 4586 |
25M | 365 | 1958 |
50M | 223 | 634 |
100M | 54 | 372 |
250M | 52 | 129 |
500M | 22 | 63 |
1G | 16 | 55 |
2.5G | 9 | 32 |
5G | 0 | 3 |
10G | 0 | 1 |
I wouldn't take a second thought about just loading a file under 500M into RAM, and as you can see, I have *very few* files larger than that. And for a simple task like the one presented, I'd probably go ahead and try it on larger files (swap space permitting) and go take a break.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] |
|
|
|
|
|