Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

perl substitute

by ambrill (Novice)
on Nov 03, 2013 at 21:04 UTC ( [id://1061075]=perlquestion: print w/replies, xml ) Need Help??

ambrill has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Thanks for your help. I was closer than i thought…. however, i am struggling to keep the <infoTable> parameter which is included on the same line as <infomationTable…. which i am essentially deleting. Suggestions on how to retain the <infoTable> text would be appreciated. Also, the <?xml line is a bear to get rid of? thanks

#!/usr/local/bin/perl # $file = 'test.xml'; # Name the file open(INFO, $file); # Open the file @lines = <INFO>; # Read it into an array close(INFO); # Close the file open STDOUT, ">$file" or die "cannot open file $!\n" ; # Open t +he file for (@lines) { s#<informationTable.*##; } print @lines; close(STDOUT);
#sample xml file <?xml version="1.0" encoding="UTF-8"?> <informationTable xmlns="http://www.sec.gov/edgar/document/thirteenf/i +nformationtable" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance +"> <infoTable> </infoTable> </informationTable> <?xml version="1.0" encoding="UTF-8"?>

Replies are listed 'Best First'.
Re: perl substitute
by hippo (Bishop) on Nov 03, 2013 at 21:27 UTC

    Some thoughts:

    • s/foore/barstring/ works on $_ in isolation, but you're not setting $_, even implicitly.
    • You set $doc but never subsequently refer to it.
    • You use IO::File but never subsequently refer to it.
    • You alternate between slashes and hashes for regexp delimiters for no apparent reason
    • Looks like your input XML is not valid XML
    • and finally, you both use an XML parser and then attempt to just regexp your way through the string. Pick one, not both.
      > You alternate between slashes and hashes for regexp delimiters for no apparent reason

      Well he needs another regex-delimiter an substitutions with </end> tags to avoid the slash.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        Sure, but that's no reason to alternate - just use hashes throughout. Inconsistency is the bug's friend. :)

        Thank you. This helps.
Re: perl substitute
by LanX (Saint) on Nov 03, 2013 at 21:18 UTC
    Why the hell do you try to combine XML::LibXML with regexes?

    (update: deleted some phrases about random coding)

    To answer your question, your substitutes only operate on $_ not $doc.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: perl substitute
by sundialsvc4 (Abbot) on Nov 04, 2013 at 16:48 UTC

    Having started with (very good!) XML::LibXML, now you need to finish with it.   You are, in fact, much closer to your goal than you dream.

    Having parsed the file successfully, now you can use XPath expressions to “query” the file for the nodes that you want to, in this case, rename.   Then, by iterating through each returned list of nodes, you can change the content of the node including its name.   That is, if you want to do this using Perl . . .

    You might also be interested to know that a technology known as XSLT can do this sort of thing without programming ... neither in Perl nor anything else.   (Well, except for the fact that “XSL” is very-much a programming tool unto itself, as any sort of tool meant to solve that particular problem must be.)   If you want a crystalline example of just how powerful this technology is, and how available it is, surf here:   http://excelhero.com/periodic-table/.

    (No, I am not going to spell-out for you all of the Magickal Mysteries that you will uncover here when you diligently use the “View Source” option of your browser . . .   Keep Digging.)

    What you see here is an XHTML document that is directing your web browser to, on-the-fly(!) transform an external XML document (containing “information about all of the elements”) into the HTML representation that you now see being displayed.   (A small amount of JavaScript drives the interaction with the finished page, but not the construction of the page from its XML sources.   (I repeat... JavaScript did not do this!)

    It appears to me that what you ultimately want to do to this file is to re-name a few of the nodes, wherever they might be found.   XSLT can do that, without Perl or anything else.

    This being said, XML::LibXML gives Perl access to the same underlying functionality that you see your web-browser (or most XSLT processors) using.   (So, your web browser is not making Perl “look bad.”   “The full power of LibXML” is at your/Perl’s fingertips, too.)   In all cases, regardless of tool, you/they approach the problem in fundamentally the same way:   parse the XML document to a set of nodes, transform the nodes, then emit the node-structure as XML.   At no point do you/they attempt to treat the XML document as text.

Re: perl substitute..try this
by stylechief (Sexton) on Nov 07, 2013 at 00:17 UTC

    It looks like you want to remove namespaces, and don't want to mess with validating the XML with a DTD

    # suppressing DTD validation and errors due to no DTD validation my $parser = XML::LibXML->new(load_ext_dtd => 0, expand_entities => 0, + suppress_errors => 1);

    From XML::LibXML::Element:

    setAttributeNS

      $node->setAttributeNS( $nsURI, $aname, $avalue );

    Namespace-aware version of setAttribute, where $nsURI is a namespace URI, $aname is a qualified name, and $avalue is the value. The namespace URI may be null (empty or undefined) in order to create an attribute which has no namespace.

    or...

    removeAttribute

      $node->removeAttribute( $aname );

    The method removes the attribute $aname from the node's attribute list, if the attribute can be found.

    SC
Re: perl substitute
by pvaldes (Chaplain) on Nov 04, 2013 at 17:53 UTC

    mmmh....

    s#</informationTable>'##g;

    Maybe not relevant, but check the use of the "'" here

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1061075]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-03-29 05:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found