Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re-write all internal links on a web page.

by ehdonhon (Curate)
on Jun 26, 2003 at 18:53 UTC ( #269365=snippet: print w/replies, xml ) Need Help??

The problem is to maintain persistent session information without using cookies. The solution is to encode the session as a get parameters on all links that link to another internal page.

The second problem is that most of the web pages are maintained by somebody that just barely understands html. We don't want to need to teach them about sessions and stuff.

The solution is to take the html content and re-write all of the links prior to displaying the page. I'm using HTML::TreeBuilder to solve this. There are probably other ways.

my $owned_sites = qr/mysite\.(com|net|org)/i;

sub add_sessions {
    my $root = HTML::TreeBuilder->new_from_content( shift() );
    my $session = shift;

    foreach my $link ($root->look_down( '_tag', 'a' ) ) {
        next unless my $url = $link->attr('href');

        if ( $url =~ m|://([^/]*)/| ) {
            next if ( $1 !~ $owned_sites );
        # Look for mailto: links.
        next if ( $url =~ m|^[^/]*:| );

        my ( $path, $params ) = split /\?/, $url, 2;
        my %params = map { split( /=/, $_, 2 ) } split( /&/, $params )
        $params{session} ||= $session;

        $url = join( '?', $path, join( '&', map { "$_=$params{$_}" } k
+eys( %params ) ) );
        $link->attr('href', $url);

    my $html = $root->as_HTML;

    return $html;

Now, I just know somebody is going to tell me that I should be using URI::URL and that my session info is not going to be escaped, etc... But lets just consider that an excercise for another day. The point here is mainly to provide an example where HTML::TreeBuilder saves the day.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://269365]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2022-06-28 00:14 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (89 votes). Check out past polls.