<?xml version="1.0" encoding="windows-1252"?>
<node id="981643" title="Re: new to perl" created="2012-07-13 11:12:35" updated="2012-07-13 11:12:35">
<type id="11">
note</type>
<author id="131741">
zentara</author>
<data>
<field name="doctext">
I don't know if anyone noticed, but in the sample file given, there was a space between the last word and the bracketed br in the first line, and no space in the second. I don't know if it was a typo, or if it was an intentional fine point. In any event, it screwed up the spilt on space to array, giving 1 extra element.
&lt;p&gt;There may be a clever regex to do this, but here is a simple way a beginner can understand.
&lt;c&gt;
#!/usr/bin/perl
use warnings;
use strict;

open (my $fh, "&lt; test.txt") or die "$!\n";  #input file
open (my $oh, "&gt; $0-out.txt") or die "$!\n"; #output file

my $script = '/home/whoever/bin/myscript.pl';

while (&lt;$fh&gt;){

my $string = $_;

# strip off trailing &lt;br&gt; and anything after it
$string =~ s/&lt;br&gt;.*$//;

#strip whitespace at end in case space preceded the &lt;br&gt;
$string =~  s/\s+$//;

## split on space
my @words = split / /, $string;

#print join "\n",@words,"\n";

my $lastword = $words[-1];

print $oh "$script $lastword\n";

}
&lt;/c&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-131741"&gt;
&lt;hr /&gt;
I'm not really a human, but I play one on earth.&lt;br&gt;
[id://630805] ................... &lt;a href=http://zentara.net/japh.html&gt; flash japh &lt;/a&gt;

&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
981621</field>
<field name="parent_node">
981621</field>
</data>
</node>
