Re: Matching over multiple lines in a scalar
by Enlil (Parson) on Oct 27, 2002 at 22:18 UTC
|
I am assuming you want to slurp the whole file into the variable $info. It might be easier to do this as follows: {
local $/ = undef;
$info = <DATA>;
}
For more info on $\ look at perlvar
Apart from this, I have changed the regex a little bit to do more of what I think you want it to do. Here is the code.
use strict;
use warnings;
use Data::Dumper;
my $info;
{
local $/ = undef;
$info = <DATA>;
}
my %lines;
while ($info =~ m/(\d+)\: (.+?)\n(?=\d)/gs) {
$lines{$1} = $2;
}
print Dumper (%lines);
__DATA__
3: Tag <test> found
Tag <test> found
5: Tag <test> found
7: Tag <test> found
14: Tag <test> found
16: Tag <test> found
18: Tag <test> found
21: Tag <test> found
25: Tag <test> found
27: Tag <test> found
29: Tag <test> found
32: Tag <test> found
34: Tag <test> found
49: Tag <test> found
80: Tag <test> found
98: Tag <test> found
Tag <test> found
and here is the output:$VAR1 = '29';
$VAR2 = 'Tag <test> found';
$VAR3 = '21';
$VAR4 = 'Tag <test> found';
$VAR5 = '7';
$VAR6 = 'Tag <test> found';
$VAR7 = '14';
$VAR8 = 'Tag <test> found';
$VAR9 = '80';
$VAR10 = 'Tag <test> found';
$VAR11 = '32';
$VAR12 = 'Tag <test> found';
$VAR13 = '16';
$VAR14 = 'Tag <test> found';
$VAR15 = '49';
$VAR16 = 'Tag <test> found';
$VAR17 = '25';
$VAR18 = 'Tag <test> found';
$VAR19 = '3';
$VAR20 = 'Tag <test> found
Tag <test> found';
$VAR21 = '34';
$VAR22 = 'Tag <test> found';
$VAR23 = '18';
$VAR24 = 'Tag <test> found';
$VAR25 = '27';
$VAR26 = 'Tag <test> found';
$VAR27 = '5';
$VAR28 = 'Tag <test> found';
.
The regex: m/(\d+)\: (.+?)\n(?=\d)/gs
looks for a number then lazily matches up to where the next thing is a new line, but only if the first thing after that new line is a digit. Well, I think this is what you want.
UPDATE: I meant to have placed the Dumper(\%lines) as BrowserUK has done below, instead of just Dumper(%lines). Don't know what I was thinking.
-enlil | [reply] [d/l] [select] |
|
my $info;
{
local $/ = undef;
$info = <DATA>;
}
can be golfed reduced down to:
my $info = do {local $/;<DATA>};
UPDATE:
Well shucks ... chromatic has already said
that in this thread (and thanks for the doobie doobie do,
Aristotle).
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
|
Thanks very much - that's exactly what I was looking for. I need to study up on ?=...
The only problem in using that regex is that is doesn't capture the last line of input because the data doesn't end with a digit. The easy way around that was to add $info .= '00:';.
«Rich36»
| [reply] [d/l] [select] |
|
{
local $/ = undef;
$info = <DATA>;
}
For more info on $\ look at perlvar
I think that you mean "info on $/ look ...", at least that variable exists elsewhere. :) | [reply] [d/l] |
Re: Matching over multiple lines in a scalar
by BrowserUk (Pope) on Oct 27, 2002 at 22:31 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $info;
$info .= $_ while(<DATA>);
my %lines;
while($info =~ m/(?:^|\n)(\d+)\:(.+?)(?=(?:\n\d)|$)/gs) {
$lines{$1} = $2;
}
print Dumper (\%lines);
__DATA__
3: Tag <test> found
Tag <test> found
5: Tag <test> found
7: Tag <test> found
14: Tag <test> found
16: Tag <test> found
18: Tag <test> found
21: Tag <test> found
25: Tag <test> found
27: Tag <test> found
29: Tag <test> found
32: Tag <test> found
34: Tag <test> found
49: Tag <test> found
80: Tag <test> found
98: Tag <test> found
Tag <test> found
Gives
c:\test>208384
$VAR1 = {
'29' => ' Tag <test> found',
'21' => ' Tag <test> found',
'7' => ' Tag <test> found',
'14' => ' Tag <test> found',
'80' => ' Tag <test> found',
'32' => ' Tag <test> found',
'16' => ' Tag <test> found',
'49' => ' Tag <test> found',
'25' => ' Tag <test> found',
'3' => ' Tag <test> found
Tag <test> found',
'98' => ' Tag <test> found
Tag <test> found',
'34' => ' Tag <test> found',
'18' => ' Tag <test> found',
'27' => ' Tag <test> found',
'5' => ' Tag <test> found'
};
c:\test>
Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy | [reply] [d/l] [select] |
|
| [reply] |
|
If you pass the hash, it gets flattened and passed as a list of 30 seperate scalars, the association between key=>value pairs is lost. By passing the reference, Data::Dumper knows it a hash, and outputs it as such, with the Key=>value pairs clearly associated and shown as being a part of a compound entity.
In fact, you can then write this output to a file and then read it back, eval the string and it will recontruct the hash in memory. Often used as a cheap man DB.
Run the program both ways to see the difference.
Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy
| [reply] [d/l] |
Re: Matching over multiple lines in a scalar
by gjb (Vicar) on Oct 27, 2002 at 22:23 UTC
|
I'd suggest a slightly different approach that has the advantage that one can read line by line so that there's no need to have all data in memory (which is nice if you've a lot of data).
#!perl
use strict;
my %data;
my ($key, $data);
while (<DATA>) {
chomp($_);
if (/^(\d+):\s*(.+)$/) {
$data{$key} = $data if defined $key;
$key = $1;
$data = $2;
} else {
$data .= " $_";
}
}
$data{$key} = $data if defined $key;
foreach my $key (sort {$a <=> $b} keys %data) {
print "$key: '$data{$key}'\n";
}
__DATA__
3: Tag <test> found 1
Tag <test> found 2
5: Tag <test> found 3
7: Tag <test> found 4
14: Tag <test> found 5
16: Tag <test> found 6
18: Tag <test> found 7
21: Tag <test> found 8
25: Tag <test> found 9
27: Tag <test> found 10
29: Tag <test> found 11
32: Tag <test> found 12
34: Tag <test> found 13
49: Tag <test> found 14
80: Tag <test> found 15
98: Tag <test> found 16
Tag <test> found 17
Essentially, this is a finite state machine with two states, new-line and continue-line, represented by the if and the else part with the variable $key playing the role of state variable.
Essentially, this is a finite state machine with three states, initial, new-line and continue-line, the last two represented by the if and the else part with the variable $key playing the role of state variable distinguishing between the initial (undef) and the other two states.
(I modified the data slightly to be able to check that the data actually ends up with the right key in the hash.)
Hope this helps, -gjb-
Update: this explanation is more precise than the version I striked out.
| [reply] [d/l] [select] |
|
| [reply] |
Re: Matching over multiple lines in a scalar
by chromatic (Archbishop) on Oct 27, 2002 at 22:48 UTC
|
Why use negative-width assertions, when you're already using the /m flag? I like this:
use Data::Dumper;
my $info = do { local $/; <DATA> };
my %lines;
while($info =~ m/(\d+)\: (.+?)$/gm) {
$lines{$1} = $2;
}
print Dumper (\%lines);
Of course, this also has an appeal:
my %lines = $info =~ /(\d+): (.+?)$/gm;
Passing a reference to Dumper allows Data::Dumper to dump the entire data structure without listifying it first.
Update: I miscopied the test data. Oops. Negative-width assertions are the way to go. :)
| [reply] [d/l] [select] |
|
Why use negative-width assertions, when you're already using the /m flag?
To catch the broken lines. Yours is a very elegant construction, which I intend to steal, but I don't think that snippet meets the original requirements as it is. if you knew the tags wouldn't contain numerals, which I doubt, you could change it to:
my $info = do { local $/; <DATA> };
my %lines = $info =~ /(\d+): ([^\d]+)/gs;
but otherwise I can't see an alternative to the (?:^|\n).
btw, is there any way to catch the matched values during a split? it would make this nice and tidy.
update: damnation. redundant again.
another update: I couldn't resist shrinking gjb's cheaper version and introducing a useful but quite unrequested array reference:
my ($key, %data);
for (<DATA>) {
/^(?:(\d+):\s*)*(.+)$/;
push @{ $data{ $key = $1 || $key } }, $2;
}
| [reply] [d/l] [select] |
|
my %lines = do{local$/; <DATA> = ~m/(?:^|\n)(\d+)\:(.+?)(?=(?:\n\d)|$)
+/gs };
Now I'll wait for sauoq to reduce the regex to 3 chars and a twiddle and we've got a golf solution.:^)
Nah! Your thinking of Simon Templar, originally played by Roger Moore and later by Ian Ogilvy | [reply] [d/l] |
|
|
| [reply] [d/l] |