Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

tag a matched text

by aakikce (Acolyte)
on Apr 16, 2007 at 11:06 UTC ( #610318=perlquestion: print w/replies, xml ) Need Help??

aakikce has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I want to tag separte authors from the following author list. Authos may come in any order in input(surname, givenname).

I have updated the input and output

Input:

<surname>Coff</surname>, <givenname>D.,</givenname>, <surname>Coff</surname>, <givenname>D.,</givenname>

<givenname>D.,</givenname>, <surname>Coff</surname>, <givenname>D.,</givenname>, <surname>Coff</surname>

Output:

<au><surname>Coff</surname>, <givenname>D.,</givenname></au>, <au><surname>Coff</surname>, <givenname>D.,</givenname></au>

<au><givenname>D.,</givenname>, <surname>Coff</surname></au>, <au><givenname>D.,</givenname>, <surname>Coff</surname></au>

I tried like below. But I got wrong once givenname comes first.

undef $/; $content = <DATA>; $content =~s#([^<]*)(<(?:sur|given)name>.*</(?:sur|given)name>)#$1."<a +uthorfield>".&indivauth($2)."</authorfield>"#egi; sub indivauth { $au{sur} = 'given'; $au{given} = 'sur'; ($inline) = @_; $inline =~s#(<surname>(?:(?!<sur|</au>).)*?</givenname>)#<au>$1</a +u>#gsi; return "$inline"; } open (FOUT, "> out.txt"); print FOUT $content;

Can you help me

Thanks

aakikce

Replies are listed 'Best First'.
Re: tag a matched text
by GrandFather (Sage) on Apr 16, 2007 at 13:20 UTC

    indivauth's regex doesn't allow opposite ordering of given and surnames. Consider instead:

    use strict; use warnings; undef $/; my $alt = '(?:sur|given)name'; my $content = <DATA>; $content =~ s#([^<]*)(<$alt>.*</$alt>)#$1."<authorfield>".&indivauth($ +2)."</authorfield>\n"#egi; sub indivauth { my ($inline) = @_; $inline =~s#(<$alt>(?:(?!$alt|</au>).)*?</$alt>)#<au>$1</au>#gsi; return "$inline"; } print $content; __DATA__ <surname>Coff</surname>, <givenname>D.,</givenname>, <surname>Coff</su +rname>, <givenname>D.,</givenname> <givenname>D.,</givenname>, <surname>Coff</surname>, <givenname>D.,</g +ivenname>, <surname>Coff</surname>

    Prints:

    <authorfield><au><surname>Coff</surname></au>, <au><givenname>D.,</giv +enname></au>, <au><surname>Coff</surname></au>, <au><givenname>D.,</g +ivenname></au></authorfield> <authorfield><au><givenname>D.,</givenname></au>, <au><surname>Coff</s +urname></au>, <au><givenname>D.,</givenname></au>, <au><surname>Coff< +/surname></au></authorfield>

    DWIM is Perl's answer to Gödel
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: tag a matched text
by wfsp (Abbot) on Apr 16, 2007 at 15:44 UTC
    Here's my stab using a parser.
    #!/usr/local/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $data = do{local $/; <DATA>}; my $p = HTML::TokeParser::Simple->new(\$data); my ($in_tag, $in_g, @AoH, $tag, $txt); while (my $t = $p->get_token){ $in_tag++, next if $t->is_start_tag; $txt = $t->as_is if $t->is_text; if ($t->is_end_tag){ push @AoH, { tag => $t->get_tag, txt => $txt, }; $in_tag--; } } while (@AoH){ my @rec = splice @AoH, 0, 2; print '<au>'; printf "<%s>%s</%s>", $rec[0]->{tag}, $rec[0]->{txt}, $rec[0]->{tag} +; print ', '; printf "<%s>%s</%s>", $rec[1]->{tag}, $rec[1]->{txt}, $rec[1]->{tag} +; print '</au>,', "\n"; } __DATA__ <surname>Coff</surname>, <givenname>D.,</givenname>, <surname>Coff</su +rname>, <givenname>D.,</givenname> <givenname>D.,</givenname>, <surname>Coff</surname>, <givenname>D.,</g +ivenname>, <surname>Coff</surname>
    outputs:
    <au><surname>Coff</surname>, <givenname>D.,</givenname></au>, <au><surname>Coff</surname>, <givenname>D.,</givenname></au>, <au><givenname>D.,</givenname>, <surname>Coff</surname></au>, <au><givenname>D.,</givenname>, <surname>Coff</surname></au>,
Re: tag a matched text
by Samy_rio (Vicar) on Apr 16, 2007 at 11:58 UTC

    Hi aakikce, try like this,

    use strict; use warnings; my $content = do{local $/; <DATA>}; $content =~ s/(<surname>(?:(?:(?!<\/surname>).)*)<\/surname>\,?\s*<giv +enname>(?:(?:(?!<\/givenname>).)*)<\/givenname>)/<au>$1<\/au>/gsi; $content =~ s/(<\/givenname>\,\s*)<au>(\s*<surname>)/$1$2/gis; $content =~ s/(<\/givenname>\s*)<\/au>(\,\s*<surname>)/$1$2/gis; $content =~ s/(<givenname>(?:(?:(?!<\/givenname>).)*)<\/givenname>\,?\ +s*<surname>(?:(?:(?!<\/surname>).)*)<\/surname>)/<au>$1<\/au>/gsi; print $content; __DATA__ <surname>Coff</surname>, <givenname>D.,</givenname> et al., <givenname +>J.</givenname> <surname>Amat</surname> <surname>Coffman</surname>, < +givenname>D.D.,</givenname> <givenname>D.,</givenname>, <surname>Coff</surname> et al., <givenname +>J.</givenname> <surname>Amat</surname> <surname>Coffman</surname>, < +givenname>D.D.,</givenname> output: ------- <au><surname>Coff</surname>, <givenname>D.,</givenname></au>, <au><sur +name>Coff</surname>, <givenname>D.,</givenname></au> <au><givenname>D.,</givenname>, <surname>Coff</surname></au>, <au><giv +enname>D.,</givenname>, <surname>Coff</surname></au>

    In your expected output, second line is same as input. I think it's typo.

    Updated: aakikce, now you can try with your sample.

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://610318]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2020-07-02 09:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?