Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

create an xml file from column file

by lakssreedhar (Acolyte)
on Jul 24, 2013 at 10:52 UTC ( [id://1046069]=perlquestion: print w/replies, xml ) Need Help??

lakssreedhar has asked for the wisdom of the Perl Monks concerning the following question:

i want to create an xml file for the following input

how B-NP are I-NP you I-NP

the output should be as follows

 <text> how are you</text> <anotation> <type> NP </type> <text> how</text></annotation>.

Replies are listed 'Best First'.
Re: create an xml file from column file
by choroba (Cardinal) on Jul 24, 2013 at 11:24 UTC
    If your input does not contain <, &, or ]]>, you do not need anything special:
    #!/usr/bin/perl use warnings; use strict; use constant { WORD => 0, TYPE => 1, }; my @annotations; while (<DATA>) { my ($word, $type) = split; $type =~ s/.-*//; push @annotations, [ $word, $type ]; } print '<text>'; print join ' ', map $_->[WORD], @annotations; print '</text>'; for my $annotation (@annotations) { print '<annotation>'; print '<type>', $annotation->[TYPE], '</type>'; print '<text>', $annotation->[WORD], '</text>'; print '</annotation>'; } __DATA__ how B-NP are I-NP you I-NP
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      how would it be if my input was

      how o B-NP are o o you o I-NP
      now i need annotation tags only for those words which has B-NP and I-NP in the third column

        I pressed the wrong button and shrunk the code accidentially. It still should do what you want.

        use strict; use warnings; print "<text>",join(" ",map{/^(\w+).*?(B-NP|I-NP)?$/; $a.="<annotation><type>NP</type><text>$1</text></annotation>\n"if$2;$1 +}<DATA>),"<text>\n$a"; __DATA__ how B-NP are you I-NP really
Re: create an xml file from column file
by hdb (Monsignor) on Jul 24, 2013 at 10:55 UTC

    The "NP" in your desired output is taken from which line of your input?

      the full output i want is as shown

      <text> how are you> </text> <annotation><type> NP</type><text>how></text></annotation><annotation><type>NP</type><text>are</text></annotation><annotation><type>NP</type><text>you</text></annotation>

      so the type should contain the 2nd column of corresponding text in the tab seperated input file

        Ah! Then this is how I would do it:

        1. Loop over the lines of your input.
        2. Extract all word characters from the beginning of the line for the text field.
        3. Extract all word characters at the end of the line for the type field.
        4. Independently build the initial <text>...</text> and the <annotation>...</annotation> parts by adding them to two strings.
        5. After the loop print the two strings into an xml file.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1046069]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found