Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Xanga Archive

by MistaMuShu (Beadle)
on Aug 09, 2004 at 20:53 UTC ( [id://381358]=CUFP: print w/replies, xml ) Need Help??

My friends and I all like to keep online journals. I would've liked to try something other than Xanga, but all my friends recommended it. The problem was that the "archive" feature is a paid service. So whenever I felt like going back to read some old entries, I had to keep clicking the "Next 5" link. Relatively new to Perl still (about 3 weeks or so into it), but I came up with something I find genuinely useful! Thanks Monks! Hopefully someone else can use this...

#Usage: archive.pl USERNAME # #Description: Saves all entries of USERNAME's xanga to "archive.html" + in the working directory use LWP::UserAgent; $end = 'http://www.xanga.com/'; if ($a=shift) { $uid = $a; } else { print "What is your username? "; $uid = <STDIN>; chop $uid; } $first_page = 'http://www.xanga.com/home.aspx?user=' . $uid; print "Connecting to $uid's Xanga...\n"; grab($first_page); $next_page = save(); #save() returns the url to Next 5 print "\$next_page is $next_page\n"; until ($finished) { grab($next_page); $next_page = save(); print "\$next_page is $next_page\n"; last if $next_page =~ /$end$/; } print "\n\n\nCompleted Archiving\n\n\n"; #Usage: grab(url) # #Description: sub grab{ open TMP, ">tmp.html" or die; $url = shift; print "grabbing $url\n"; $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); # Be nice to Xanga servers ;-) sleep 5; # Create a request my $req = HTTP::Request->new(GET => $url); $req->content_type('application/x-www-form-urlencoded'); $req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print TMP $res->content; close TMP; print "Successfully grabbed html...\n"; } else { print $res->status_line, "\n"; } } #Useage: save(url); # #Description: sub save parses through a given URL and appends all fo +und entries of that page to # "archive.html" It also finds the url of the next page to gra +b sub save { open IN, "tmp.html" or die; open OUT, ">>archive.html" or die; print "Saving...\n"; while ($line = <IN>) { if ($line =~ /<div class="blogheader">/) { last; } } print OUT $line; print "Wrote out \$line\n"; REST: while($line = <IN>) { print OUT $line; last REST if $line =~ /Next 5 &gt;&gt;/; } print "Saved\n"; $line = reverse($line); $line =~ /"(.*?)"/; close IN; close OUT; $a = 'http://www.xanga.com/' . reverse($1); #home.aspx?user=.... }

I know it's a bit crude, but it works! ;-) For now I'm too lazy to clean it up properly, but suggestions would be great! When I feel like it I'd think I'd add incremental archiving (instead of going through entire xanga), a GUI, saving images and comments to harddrive, etc...

janitored by ybiC: Balanced <readmore> tags around longish codeblock, to reduce scrolling

Replies are listed 'Best First'.
Re: Xanga Archive
by gaal (Parson) on Aug 10, 2004 at 05:41 UTC
    LJ give great service, are open source, are Perl, are quite un-evil, and allow proper programmatic journal archiving. Try them too, perhaps?
      But all the cute asian chicks are at xanga!
        All of them? Sounds crowded.
Re: Xanga Archive
by Anonymous Monk on Aug 10, 2004 at 21:54 UTC
    Oops, I was doing some blog software research and I realized that there's a much better version of what I did out there already. Credit definately goes to Mark Wang, and it looks way prettier too! I have yet to try it...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://381358]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-06-22 19:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.