http://www.perlmonks.org?node_id=181603

bob has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that calls down an xml file from the web. The file has lots of content, but I only want certain stuff. I use a substitute expression to change that pattern of content into html... Like so.... (simplified example)



$page2 =~ s/<title>(.*)<\/title>/<li>A$1C</li>/gi;

However, when I go to write out $page2 to a file, the substitution output is there all right, and fine, but the saved data also includes everything else in the original file too; in other words, all the material which didn't match the substitution pattern.

I want to be able to save just the string resulting from the substitution. This sounds like it ought to be a simple thing to me, but I've sought the answer for days without luck.

Thanks in advance, of course....

Replies are listed 'Best First'.
Re: Storing Substitution Output String
by little (Curate) on Jul 14, 2002 at 17:46 UTC

    So you better use $1 as thats the match you where looking for then a string that you unneccesarily altered, so use a match

    $page2 =~ m|<title>(.*)</title>|gi; print $1;
    instead. And you might consider not to use dotStar, see Ovid's Death to Dot Star!:-)

    Have a nice day
    All decision is left to your taste

    Update

    so alike this?
    print '<li>'.$1.'</li>';

      Ahhhh.... your update has me thinking... Let me try to fir it like that and see if it'll work. As mentioned above, the sub is really somewhat more complex than drawn in my first post....
        Printing the scalars (and adding the add'l text and tags at that point) allows me to print only the first instance of each match, Little. How would you suggest printing all of the matches made?
      The string is altered on purpose--that's the whole point, in this case, of the expression.
Re: Storing Substitution Output String
by rattusillegitimus (Friar) on Jul 14, 2002 at 19:07 UTC

    Perhaps it's because now that I have a hammer, everything looks like a nail, but that sounds like an ideal use for XSLT. I've had a great deal of luck using XML::LibXML and XML::LibXSLT in conjunction to do things very like what you're describing.

    Your script would be something along the lines of:

    #!/usr/bin/perl -wT use strict; use XML::LibXSLT; use XML::LibXML; my $page2 = "<root> <title>First title</title> <othertag>Other stuff</othertag> <title>Second title</title> <othertag>More other stuff</othertag> </root>"; my $parser = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); # assumes you've got the XML doc as a string in $page2 my $source = $parser->parse_string($page2); # assumes your XSL file is convert.xsl my $style_doc = $parser->parse_file('convert.xsl'); my $stylesheet = $xslt->parse_stylesheet($style_doc); my $results = $stylesheet->transform($source); print $stylesheet->output_string($results);

    * Code above adapted from the XML::LibXSLT docs

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" versi +on="1.0"> <xsl:template match="/"> <root> <xsl:apply-templates select="//root/title"/> </root> </xsl:template> <xsl:template match="title"> <li> <xsl:text>A</xsl:text> <xsl:value-of select="."/> <xsl:text>C</xsl:text> </li> </xsl:template> </xsl:stylesheet>

    The above script and XSL file together yield the following output:

    <?xml version="1.0"?> <root><li>AFirst titleC</li><li>ASecond titleC</li></root>

    This may well be too heavy handed if your project is relatively small, but if there are more chunks you are trying to capture in similar ways, you might consider such an approach.

    __________
    He seemed like such a nice guy to his neighbors / Kept to himself and never bothered them with favors
    - Jefferson Airplane, "Assassin"

    Update: Whoops, in other words, what trs80 said above ;)

      Thanks to everyone suggesting xslt--you've guessed part of what's happening here, but I'm trying to go at it slightly differently. You might like one of my sites... Daily Rotation
Re: Storing Substitution Output String
by Courage (Parson) on Jul 14, 2002 at 17:45 UTC
    1. use non-greedy regexp: (.*?) instead of (.*)
    2. add "s" modifier to end of string to include possible newlines

    Courage, the Cowardly Dog

Re: Storing Substitution Output String
by mrbbking (Hermit) on Jul 14, 2002 at 17:49 UTC
    s///; returns the string sent to it if no substitutions actually happened.

    You might try something like this:

    #!usr/bin/perl -w use strict; my $page2 = ''; while( <DATA> ){ if( s!<title>(.*)</title>!<li>A$1C</li>!gi ){ $page2 .= $_; } } print $page2; __DATA__ <title>first story</title> This story is short. <title>second story</title> This story is longer. But not much longer.
    I used a ! instead of the / as the delimiters, so I could leave the literal slashes alone in the regex.
Re: Storing Substitution Output String
by Abigail-II (Bishop) on Jul 15, 2002 at 10:06 UTC
    Well, if you want a match, you ought to use a match, and not a substitution ;-).
    $page2 = "<li>A$1C</li>" if $page =~ m{<title>(.*)</title>};

    Abigail

Re: Storing Substitution Output String
by trs80 (Priest) on Jul 14, 2002 at 18:34 UTC