Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: appending a unique marker to each url in a file

by thatguy (Parson)
on Aug 08, 2001 at 07:54 UTC ( #102967=note: print w/replies, xml ) Need Help??

in reply to appending a unique marker to each url in a file

I think using a regex on the entire file may get a little complicated.

I would use HTML::TokeParser to pull the links out of your file and then modify them from there, like so

#!/usr/bin/perl -w use HTML::TokeParser; use strict; my $i=0; ## set marker definintions my @markers = qw/ anfnf11 iopi1p83288 9032-jjjf /; my $htmlfile = "index.html"; my $content; ## get contents of your html file open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: $ +!\n"; while(<FILE>) { $content .= $_; } close(FILE); my $parse = HTML::TokeParser->new(\$content); while (my $token = $parse->get_tag("a")) { my $url = $token->[1]{href} || "-"; ## put link into $url my $text = $parse->get_trimmed_text("/a"); ## put link de +sc into $text if ($markers[$i]) { print "<a href=$url/$markers[$i]>$text</a>\n"; } else { ## no more markers... } $i++; } exit;

Update: fixed the way data was put into $content courtesy of Hofmator.


Replies are listed 'Best First'.
Re: Re: appending a unique marker to each url in a file
by Hofmator (Curate) on Aug 08, 2001 at 14:14 UTC

    open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: $ +!\n"; while(<FILE>) { $content="$content$_\n"; } close(FILE);
    This construct is not ideal. You are interpolating the variable $content into a new string for every line. You should use concatenation and just append to the string:
    while(<FILE>) { $content .= "$_\n"; }
    But what you are doing now is slurping in the whole file and adding an extra newline at the end of each line (for which I see absolutely no reason). The same thing can be achieved by undefing $/ like this:
    { local $/; # undefs $/ for this block of code only open (FILE," $htmlfile") || die "Cannot open HTML file for parsing!: + $!\n"; $content = <FILE>; # reads in whole file $content =~ s/\n/\n\n/g; # if really necessary to duplicate newlines close(FILE); }

    -- Hofmator

      How about just $content = join'', <FILE>; No need to mess with $/ and have to remember to localise it. Wo betide he who forgets to localise $, $" $/ $\


      As chipmunk points out this is slower than undef $/ for the gory details see Re: Re: Re: Re: Re: appending a unique marker to each url in a file. For big files the difference is significant, for small ones it is negligible but who wants to paint themselves into a scaling corner? It is better to undef $/, just remember to localise it.

      Ugh posted bad code again.




        Using $content = join '', <FILE>; rather than local $/; $content = <FILE> is fine, as long as you don't care that the latter is roughly an order of magnitude faster.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://102967]
[1nickt]: ( Sometimes when idle I browse remote corners of the code repo at $work ... usually this yields knowledge of projects to decline and coworkers to avoid ... )
[LanX]: sure
[Corion]: 1nickt: Finding autobox in production would give me pause, yes
[LanX]: efficient survey
[MidLifeXis]: And under MINGW64_NT-6.1 MYHOST 2.6.0(0.304/5/3) 2016-09-09 09:46 x86_64 Msys there seem to be issues with escapes in external build tool calls.
[Corion]: I mean, it's a technical feat it achieves, but... why? ;)
[MidLifeXis]: And it also has the 0.14 version of the tarball in its manifest.
[LanX]: avoiding unreadable brackets
[MidLifeXis]: Although the previous one could be a b0rken PATH, I would need to dig for that.
[thezip]: I've got to go to meetings now. If anyone has further comments regarding Spreadsheet::XLSX deployment to Strawberry Perl 5.24.1, please /msg me -- thanks!

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (13)
As of 2017-03-23 17:24 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (291 votes). Check out past polls.