http://www.perlmonks.org?node_id=930168

tej has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have data like

<i><ce:bib-reference id="bib1"> <ce:other-ref> <ce:textref>Adolophs R (2002), Trust in the brain. Nat Neurosci 5(3):1 +92-193.</ce:textref></ce:other-ref> </ce:bib-reference> <ce:bib-reference id="bib2"> <ce:other-ref> <ce:textref>American Psychiatric Association. Diagnostic and Statistic +al Manual of Mental Disorders, (4<ce:sup>th</ce:sup> ed., text revisi +on). Washington, DC: Author, 2000.</ce:textref></ce:other-ref> </ce:bib-reference> <ce:bib-reference id="bib3"> <ce:other-ref> <ce:textref>Amici R, Zamboni G, Perez E, Jones CA, Parmeggiani PL (199 +8), The influence of a heavy thermal load on REM sleep in the rat. Br +ain Res 781:252-258.</ce:textref></ce:other-ref> </ce:bib-reference> </i>

What I want to do is to add "<ce:label>Data before first comma</ce:label> after each "<ce:bib-reference id="bib\d">"

I have written a script for this, but finally when i try to replace the data in file script freezes

my code is like:
use warnings; use strict; use Cwd; my $dir=getcwd; print "$dir\n"; opendir DIR, $dir or die "cant open dir"; my @files = grep /mlx/,(readdir DIR); closedir DIR; foreach my $file (@files) { open(INF, $file) or die "cannot open $file: $!\n"; my $fdata=join("", <INF>); close INF; ##<ce:bib-reference id="bib2">... </ce:bib-reference> while ($fdata=~m{(<ce:bib\-reference id\=\"bib[0-9]+\">\n)(.*?)(<\ +/ce:bib\-reference>)}sg){ my $bib=$1; my $label=ExtractAuth($&); $fdata=~s/(\Q$bib\E)/$1$label/; } print "$fdata"; } sub ExtractAuth{ my $bib=shift; my $auth=""; my $label=""; ##<ce:textref>...</ce:textref> while ($bib=~m{<ce:textref>(.*?)<\/ce:textref>}sg){ $auth=$1; if ($auth=~m{\(\d{4}[A-z]?\)}){ my $count= ($auth =~ tr/\,//); $label=AddLabel($auth,$count); #$bib=~s/(<ce:bib\-reference id\=\"bib[0-9]+\">\n)/$1$labe +l/; }else { $label="Year Not Found\n"; #$bib=~s/(<ce:bib\-reference id\=\"bib[0-9]+\">\n)/$1$labe +l/; } } #$bib=~s/(<ce:bib\-reference id\=\"bib[0-9]+\">\n)/$1$label/g; return ($label); } sub AddLabel{ my $auth=shift; my $count=shift; my $fauth=""; my $year; my $labelst=""; if ($count<=1){ if ($auth=~m{(.*?)(\(\d{4}[A-z]?\))}){ $fauth=$1; $year=$2; $fauth=~s/\s$//; $labelst="<ce:label>$fauth\, $year<\/ce:label>\n"; } } return ($labelst); }

Whats going wrong in this code?

Replies are listed 'Best First'.
Re: Problem in String Replacement
by moritz (Cardinal) on Oct 07, 2011 at 12:26 UTC

    With the example input data you gave us, nothing much happens, neither do <ce:label> tags get added, nor does the code loop forever. So we can only guess what happens with your real data.

    My guess is that it is related to the fact that you conduct two matches/substitutions on the same string; the substitution might reset pos, so that the match starts from the beginning again.

    Update:

    Here's a short demonstration of what the problem might be:

    use strict; use warnings; my $str = 'aa'; while ($str =~ /a/g) { $str =~ s/a/ba/g; print $str, "\n"; last if length($str)> 10; }

    Without the last if..., this would loop infinitely. A possible fix is to look only for a's that don't come before a b:

    while ($str =~ /(?<!b)a/g) { ...
Re: Problem in String Replacement
by DanielSpaniel (Scribe) on Oct 07, 2011 at 12:13 UTC

    Without reading through your code (because I'm too tired right now), what happens when you try to step through the code from the command line? i.e. where does it fail.

    For instance, you could step through it with debug, such as:

    $ perl -d yourscript.pl

    ... and entering "n" to step through each line (and "s" to go into any sub-routines) ... and then you should see where it is having a problem.

      Thanks, I'l try this..

      where does it fail. : Script goes in infinite loop..

      Thanks Mortiz.. This really resolved my issue..
Re: Problem in String Replacement
by choroba (Cardinal) on Oct 07, 2011 at 13:15 UTC
    I usually use XML::XSH2 for XML manipulation. In this case, something like this might work for you:
    open 930168.xml ; register-namespace ce 'http://put-your-namespace-URI-here' ; for /ce:bib/ce:bib-reference/ce:other-ref/ce:textref { my $label = substring-before(.,'),') ; if ($label and xsh:matches($label,'\d{4}$')) { insert element ce:label prepend ../.. ; insert text concat($label, ')') into ../../ce:label[1] ; } } save :b ;
    I wrapped the XML chunk into ce:bib, your root element might have a different name. Also do not forget to change the namespace URI.