Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Extract Tags between Two strings

by Athanasius (Monsignor)
on Jun 11, 2012 at 14:58 UTC ( #975586=note: print w/ replies, xml ) Need Help??


in reply to Extract Tags between Two strings

The following makes a single pass over the input file (I have used in-file DATA for convenience. You will, of course, have to change this to read from your data file):

use strict; use warnings; my (%tags_to_match, @extracted_tags); my $in_matching = 0; my $tag_prefix = 'bbc_'; my $tag_regex = qr{ ( $tag_prefix \w+ _ \d+ ) }x; while (my $line = <DATA>) { if ($in_matching) { if ($line =~ / ^ \s* \[ end \] \s* $ /x) { $in_matching = 0; } elsif ($line =~ $tag_regex) { $tags_to_match{ $1 }++; } } elsif ($line =~ / ^ \s* \[ start \] \s* $ /x) { $in_matching = 1; } elsif ($line =~ $tag_regex) { my $tag = $1; foreach (keys %tags_to_match) { if ($tag eq $_) { push @extracted_tags, $tag; last; } } } } say "\@extracted_tags = ", join(', ', @extracted_tags); __DATA__ [start] bbc_arc_001 bbc_arc_002 abc_arc_001 [end] bbc_arc_001 bbc_arc_002 bbc_arc_003 bbc_arc_004

This should work provided the tags to be extracted always appear after the start/end block in which they are specified. If this is not the case for your input file, you will need to make two passes over the file: the first to read the contents of the start/end block(s), the second to extract the specified tags.

Also note that your regex may not be doing what you wanted. [a-zA-Z]*[0-9]*_* means: zero or more letters, followed by zero or more digits, followed by zero or more underscores. In my code I use a regex which is a guess at what was intended.

HTH,

Athanasius <°(((><contra mundum


Comment on Re: Extract Tags between Two strings
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975586]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2014-12-21 16:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls