Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

extract tag form xml file

by Murugan (Initiate)
on Dec 03, 2018 at 10:46 UTC ( [id://1226654]=perlquestion: print w/replies, xml ) Need Help??

Murugan has asked for the wisdom of the Perl Monks concerning the following question:

Below three lines is in one file data.xml

<inline-formula id="ieqn-1"><alternatives><mml:math display="inline"><mml:mi>τ</mml:mi></mml:math></alternatives></inline-formula>

<inline-formula id="ieqn-2"><alternatives><mml:math display="inline"><mml:mi>τ</mml:mi></mml:math></alternatives></inline-formula>

<inline-formula id="ieqn-3"><alternatives><mml:math display="inline"><mml:mi>τ</mml:mi></mml:math></alternatives></inline-formula>

Now i need this lines as separate file like below:

ieqn-1.mml

ieqn-2.mml

ieqn-3.mml

perl script needed.

Replies are listed 'Best First'.
Re: extract tag form xml file
by marto (Cardinal) on Dec 03, 2018 at 10:53 UTC

    "perl script needed."

    Welcome, this isn't a code writing service. What have you tried, how did it fail? Each time you post lots of helpful information and pointers are displayed, you should take the time to read them. In addition Tutorials->PerlMonks for the Absolute Beginner.

    Update: Please don't downvote the heck out of a new users first post.

Re: extract tag form xml file
by hippo (Bishop) on Dec 03, 2018 at 11:19 UTC
    perl script needed.

    One liner provided.

    $ perl -F\" -ne 'print "$F[1].mml\n"' foo.xml ieqn-1.mml ieqn-2.mml ieqn-3.mml

    Of course, this is tremendously fragile and does no XML parsing and therefore should never be used in production.

      "Now i need this lines as separate file like below:"

      I read this as wanting the matching tag saved to individual files.

        It could certainly be interpreted that way too. Who doesn't love a gimme with a loose spec?

      I just realized that xml is the mark-up language of the monastery. Am I replicating your result?

      $ cat 1.murugan.xml <inline-formula id="ieqn-1"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-2"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-3"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> $ perl -F\" -ne 'print "$F[1].mml\n"' 1.murugan.xml ieqn-1.mml .mml ieqn-2.mml .mml ieqn-3.mml $

        Indeed you are. The difference in output is due to a difference in input only. You have assumed that every other line of input is blank whereas I assumed those were just attempts by Murugan to separate the lines in markup since for some reason they (unlike you) did not enclose their data in <code> tags.

        I did say it was tremendously fragile.

Re: extract tag form xml file
by NetWallah (Canon) on Dec 04, 2018 at 00:55 UTC
    Here is a one-liner for marto's use case (Windows cmd version):
    perl -F\^" -ne "$a=$F[1] or next;open $x,qq|>> $a.mml|;print $x $_;clo +se $x" test1.xml
    For linux, remove the "^", and you may substitute single quotes for double after -e.

    This would be easier with the non-core module Path::Tiny and the like.

    "Fragile" warnings apply.

                    Memory fault   --   brain fried

Re: extract tag form xml file
by marto (Cardinal) on Dec 04, 2018 at 10:28 UTC

    Since various other solutions have been posted, and you haven't been exactly clear in your requirements, here I use a parser, Mojo::DOM, and Mojo::File for reading/writing. There's more than one way to do it, other parsers exist (XML::Twig etc)...

    source.xml

    <inline-formula id="ieqn-1"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-2"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-3"><alternatives><mml:math display="inline"><mml:mi>&#964;</mml:mi></mml:math></alternatives></in +line-formula>

    1226654.pl

    #!/usr/bin/perl use strict; use warnings; use Mojo::DOM; use Mojo::File; # read the xml file my $source = Mojo::File->new( 'source.xml' ); my $xml = $source->slurp; # use Mojo::Dom to parse the XML my $dom = Mojo::DOM->new->xml(1)->parse( $xml ); # for each inline-formula id in the file for my $e ( $dom->find('inline-formula[id]')->each ){ # create file named value_of_id.mml my $file = "$e->{id}.mml"; my $path = Mojo::File->new( $file ); # write contents to file $path->spurt( $e->to_string ); }

    output

    -rw-rw-r-- 1 marto marto 131 Dec 4 10:17 ieqn-3.mml -rw-rw-r-- 1 marto marto 131 Dec 4 10:17 ieqn-2.mml -rw-rw-r-- 1 marto marto 131 Dec 4 10:17 ieqn-1.mml marto@Shemp:~/code/perlmonks$ cat ieqn-1.mml <inline-formula id="ieqn-1"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> marto@Shemp:~/code/perlmonks$ cat ieqn-2.mml <inline-formula id="ieqn-2"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula>
Re: extract tag form xml file
by karlgoethebier (Abbot) on Dec 04, 2018 at 11:09 UTC

    See XML::LibXML::Simple.

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: extract tag form xml file
by Jenda (Abbot) on Mar 20, 2019 at 15:28 UTC

    Just for future reference, here's a solution (proper one), that uses XML::Rules.

    use strict; use warnings; use XML::Rules; my $rules =XML::Rules->new( style => 'parser', rules => { 'inline-formula' => sub { # (tag, attrs, context, parents, parser) open my $OUT, '>:utf8', $_[1]{id} . ".xml"; # open file na +med after the contents of the id attribute print $OUT $_[4]->toXML($_[0], $_[1]); # serialize the tag + with attributes close $OUT; }, '_default' => 'raw', # make sure we keep everything else intac +t } ); $rules->parse( *DATA ); __DATA__ <data> <inline-formula id="ieqn-1"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-2"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> <inline-formula id="ieqn-3"><alternatives><mml:math display="inline">< +mml:mi>&#964;</mml:mi></mml:math></alternatives></inline-formula> </data>

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1226654]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2024-04-20 03:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found