Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Extract Tags between Two strings

by Athanasius (Chancellor)
on Jun 11, 2012 at 14:58 UTC ( #975586=note: print w/replies, xml ) Need Help??

in reply to Extract Tags between Two strings

The following makes a single pass over the input file (I have used in-file DATA for convenience. You will, of course, have to change this to read from your data file):

use strict; use warnings; my (%tags_to_match, @extracted_tags); my $in_matching = 0; my $tag_prefix = 'bbc_'; my $tag_regex = qr{ ( $tag_prefix \w+ _ \d+ ) }x; while (my $line = <DATA>) { if ($in_matching) { if ($line =~ / ^ \s* \[ end \] \s* $ /x) { $in_matching = 0; } elsif ($line =~ $tag_regex) { $tags_to_match{ $1 }++; } } elsif ($line =~ / ^ \s* \[ start \] \s* $ /x) { $in_matching = 1; } elsif ($line =~ $tag_regex) { my $tag = $1; foreach (keys %tags_to_match) { if ($tag eq $_) { push @extracted_tags, $tag; last; } } } } say "\@extracted_tags = ", join(', ', @extracted_tags); __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

This should work provided the tags to be extracted always appear after the start/end block in which they are specified. If this is not the case for your input file, you will need to make two passes over the file: the first to read the contents of the start/end block(s), the second to extract the specified tags.

Also note that your regex may not be doing what you wanted. [a-zA-Z]*[0-9]*_* means: zero or more letters, followed by zero or more digits, followed by zero or more underscores. In my code I use a regex which is a guess at what was intended.


Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975586]
[Discipulus]: congrats choroba!
Discipulus shutdown and logoff seem untrappable by Perl on win. But it is in Cygwin. but i cannot switch to it
[choroba]: It has the widest rear seats space available in the same price category - needed for the 3 kids.

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2017-01-17 09:35 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (154 votes). Check out past polls.