Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

regex to match the following

by Anonymous Monk
on May 08, 2007 at 08:10 UTC ( #614098=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi all,

my input file contatins <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_tch_ +network_report_spr">\ <meta-parameter id="rep_vnz_gprs_quality_report_daily_spr" typ +e="stored-procedure">\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_nss_b +enchmarking_report_spr">\ <meta-parameter id="rep_vnz_cchnwpossumreport2_spr" type="stor +ed-procedure">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stor +ed-procedure">\ <meta-parameter id="rep_vnz_cchnwpossumreport2_spr" type="stor +ed-procedure">\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_t +ch_network_report_spr">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_vnz_cchnwpossu +mreport_spr1">\ <meta-parameter type="stored-procedure" id="rep_data_availabil +ity_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_optregionn +wperfreport_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_optregionn +wperfreport_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_attach_suc +cess_ratio_report_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_attach_suc +cess_ratio_report_spr">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stor +ed-procedure">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stor +ed-procedure">\ <meta-parameter id="rep_vfnz_gsm_bss_tch_network_report_spr" t +ype="stored-procedure">\
if the input file's line contains type="stored-procedure" then i am interested to extract the value within id.
what is the way I can acheive this task.

Comment on regex to match the following
Download Code
Re: regex to match the following
by Krambambuli (Deacon) on May 08, 2007 at 08:50 UTC
    One possible way:
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { my $id; if (m/type="stored-procedure"/) { $id = $1 if m/id="([^"]+)"/; } print "ID: $id\n" if defined $id; } __DATA__ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_tch_netwo +rk_report_spr">\ <meta-parameter id="rep_vnz_gprs_quality_report_daily_spr" type="store +d-procedure">\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_nss_benchmark +ing_report_spr">\ <meta-parameter id="rep_vnz_cchnwpossumreport2_spr" type="stored-proce +dure">\ <meta-parameter id="rep_vfnz_gsm_bss_tch_network_report_spr" type="sto +red-procedure">
    However, you'll probably need to know some more details about the input: can you rely on getting the params always in lowercase, do you get one meta_parameter per input line, etc.
Re: regex to match the following
by jesuashok (Curate) on May 08, 2007 at 08:57 UTC
    perl -nle 'm/type="stored-procedure"/ and m/id="([^"]+)"/ and print "$ +1"' {input_file}
Re: regex to match the following
by cool (Scribe) on May 08, 2007 at 09:20 UTC

    Hope this small piece would of some help...Because your file has uniform pattern so without regex also this can be tackled

    #! /usr/bin/perl -w use strict; open (FH,"infile") or die "File cant be open\n"; while(<FH>) { my @temp=split(/"/); if ($temp[1] eq 'stored-procedure') { print "$temp[3]\n"; } } close(FH);

    But as always "There's more than one way to do it" n better also :)

      Sadly, the file does not have a uniform pattern. Several lines have the id="..." before the type="stored-procedure" so I think your method might fail.

      Cheers,

      JohnGG

Re: regex to match the following
by wfsp (Abbot) on May 08, 2007 at 10:59 UTC
    Your data has tags and attributes so out of curiosity I parsed it with HTML::TokeParser::Simple.

    #!C:/Perl/bin/perl.exe use strict; use warnings; use HTML::TokeParser::Simple; my $data = do {local $/;<DATA>}; my $p = HTML::TokeParser::Simple->new(\$data); my @table; while (my $t = $p->get_tag('meta-parameter')){ if ( $t->get_attr('type') and $t->get_attr('type') eq q{stored-procedure} and $t->get_attr('id')) { push @table, $t->get_attr('id'); } } print "->$_<-\n" for @table; __DATA__ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_tch_netwo +rk_report_spr">\ <meta-parameter id="rep_vnz_gprs_quality_report_daily_spr" type="store +d-procedure">\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_nss_benchmark +ing_report_spr">\ <meta-parameter id="rep_vnz_cchnwpossumreport2_spr" type="stored-proce +dure">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stored-proce +dure">\ <meta-parameter id="rep_vnz_cchnwpossumreport2_spr" type="stored-proce +dure">\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_tch_netwo +rk_report_spr">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_kit">\ <meta-parameter type="stored-procedure" id="rep_vnz_cchnwpossumreport_ +spr1">\ <meta-parameter type="stored-procedure" id="rep_data_availability_spr" +>\ <meta-parameter type="stored-procedure" id="rep_vnz_optregionnwperfrep +ort_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_optregionnwperfrep +ort_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_attach_success_rat +io_report_spr">\ <meta-parameter type="stored-procedure" id="rep_vnz_attach_success_rat +io_report_spr">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stored-proce +dure">\ <meta-parameter id="rep_vnz_cchnwpossumreport_spr1" type="stored-proce +dure">\ <meta-parameter id="rep_vfnz_gsm_bss_tch_network_report_spr" type="sto +red-procedure">\
    output:
    ->rep_vfnz_gsm_bss_tch_network_report_spr<- ->rep_vnz_gprs_quality_report_daily_spr<- ->rep_vfnz_gsm_nss_benchmarking_report_spr<- ->rep_vnz_cchnwpossumreport2_spr<- ->rep_vnz_cchnwpossumreport_spr1<- ->rep_vnz_cchnwpossumreport2_spr<- ->rep_vfnz_gsm_bss_tch_network_report_spr<- ->rep_kit<- ->rep_kit<- ->rep_kit<- ->rep_kit<- ->rep_vnz_cchnwpossumreport_spr1<- ->rep_data_availability_spr<- ->rep_vnz_optregionnwperfreport_spr<- ->rep_vnz_optregionnwperfreport_spr<- ->rep_vnz_attach_success_ratio_report_spr<- ->rep_vnz_attach_success_ratio_report_spr<- ->rep_vnz_cchnwpossumreport_spr1<- ->rep_vnz_cchnwpossumreport_spr1<- ->rep_vfnz_gsm_bss_tch_network_report_spr<-

    Also out of curiosity, do monks really prefer having the and at the begining of the line?

      Also out of curiosity, do monks really prefer having the and at the begining of the line?

      I do. Those boolean operators are a lot like control flow statements (e.g., next) because they control whether different parts of the expression are evaluated. I like to have them out front, clearly visible. That way I can more easily tell what might run and what might not.

      Also out of curiosity, do monks really prefer having the and at the begining of the line?

      Most definitely. It's easier to comment out a conditional to flip it off for debugging purposes, or if you just decide you don't want it any more. One comment at the start of the line and I'm finished.

      The only remotely tricky thing that pops up is for the last conditional in a block, because that one'll have the semicolon at the end. Solution in that case? ;#. Not quite as clear for the poor sap who maintains it after it's commented that way, but still easy to type.

Re: regex to match the following
by jettero (Monsignor) on May 08, 2007 at 11:04 UTC

    Honestly, I think the simplest solution is XML::XPath.

    my $xp = XML::XPath->new( ioref => $input ); my $mp = $xp->find('/meta-parameter'); for my $n ($mp->get_nodelist) { if( $xp->findvalue('@type' => $n) eq "stored-procedure" ) { warn "found: " . $xp->findvalue( '@id' => $n ); } }

    For regulars of this type, I usually multi-stage it so I can be sure I matched the right thing.

    if( m/<([^>]*meta-parameter[^>]*stored-procedure[^>]*)>/ ) { my $tag = $1; if( $tag =~ m/id=['"](.+?)['"]/ ) { my $id = $1; # argueably not the most robust thing in the whole wide world, # but you get the idea } }

    -Paul

Re: regex to match the following
by Jenda (Abbot) on May 09, 2007 at 13:41 UTC

    Assuming the file is XML, what about something like:

    use XML::Rules; my %seen; my $parser = XML::Rules->new( rules => [ '_default' => '', 'meta-parameter' => sub { return unless $_[1]->{type} eq "stored-procedure"; $seen{$_[1]->{id}}++; return; } ] ); $parser->parse(\*DATA); print "There were those stored procedures:\n"; foreach (sort keys %seen) { print " $_ ($seen{$_} times)\n"; } __DATA__ <some> <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_bss_tch_ +network_report_spr"/>\ <meta-parameter id="rep_vnz_gprs_quality_report_daily_spr" typ +e="stored-procedure"/>\ <meta-parameter type="stored-procedure" id="rep_vfnz_gsm_nss_b +enchmarking_report_spr"/>\ </some>

    This is much safer than trying to parse the file with regexps.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://614098]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (13)
As of 2014-08-22 14:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (158 votes), past polls