Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: appending a unique marker to each url in a file

by thatguy (Parson)
on Aug 08, 2001 at 07:54 UTC ( #102967=note: print w/ replies, xml ) Need Help??


in reply to appending a unique marker to each url in a file

I think using a regex on the entire file may get a little complicated.

I would use HTML::TokeParser to pull the links out of your file and then modify them from there, like so

#!/usr/bin/perl -w use HTML::TokeParser; use strict; my $i=0; ## set marker definintions my @markers = qw/ anfnf11 iopi1p83288 9032-jjjf /; my $htmlfile = "index.html"; my $content; ## get contents of your html file open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: $ +!\n"; while(<FILE>) { $content .= $_; } close(FILE); my $parse = HTML::TokeParser->new(\$content); while (my $token = $parse->get_tag("a")) { my $url = $token->[1]{href} || "-"; ## put link into $url my $text = $parse->get_trimmed_text("/a"); ## put link de +sc into $text if ($markers[$i]) { print "<a href=$url/$markers[$i]>$text</a>\n"; } else { ## no more markers... } $i++; } exit;

Update: fixed the way data was put into $content courtesy of Hofmator.

-p


Comment on Re: appending a unique marker to each url in a file
Download Code
Replies are listed 'Best First'.
Re: Re: appending a unique marker to each url in a file
by Hofmator (Curate) on Aug 08, 2001 at 14:14 UTC

    open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: $ +!\n"; while(<FILE>) { $content="$content$_\n"; } close(FILE);
    This construct is not ideal. You are interpolating the variable $content into a new string for every line. You should use concatenation and just append to the string:
    while(<FILE>) { $content .= "$_\n"; }
    But what you are doing now is slurping in the whole file and adding an extra newline at the end of each line (for which I see absolutely no reason). The same thing can be achieved by undefing $/ like this:
    { local $/; # undefs $/ for this block of code only open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: + $!\n"; $content = <FILE>; # reads in whole file $content =~ s/\n/\n\n/g; # if really necessary to duplicate newlines close(FILE); }

    -- Hofmator

      How about just $content = join'', <FILE>; No need to mess with $/ and have to remember to localise it. Wo betide he who forgets to localise $, $" $/ $\

      Update

      As chipmunk points out this is slower than undef $/ for the gory details see Re: Re: Re: Re: Re: appending a unique marker to each url in a file. For big files the difference is significant, for small ones it is negligible but who wants to paint themselves into a scaling corner? It is better to undef $/, just remember to localise it.

      Ugh posted bad code again.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Using $content = join '', <FILE>; rather than local $/; $content = <FILE> is fine, as long as you don't care that the latter is roughly an order of magnitude faster.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://102967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (15)
As of 2015-07-31 21:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (282 votes), past polls