Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

My friends and I all like to keep online journals. I would've liked to try something other than Xanga, but all my friends recommended it. The problem was that the "archive" feature is a paid service. So whenever I felt like going back to read some old entries, I had to keep clicking the "Next 5" link. Relatively new to Perl still (about 3 weeks or so into it), but I came up with something I find genuinely useful! Thanks Monks! Hopefully someone else can use this...

#Usage: archive.pl USERNAME # #Description: Saves all entries of USERNAME's xanga to "archive.html" + in the working directory use LWP::UserAgent; $end = 'http://www.xanga.com/'; if ($a=shift) { $uid = $a; } else { print "What is your username? "; $uid = <STDIN>; chop $uid; } $first_page = 'http://www.xanga.com/home.aspx?user=' . $uid; print "Connecting to $uid's Xanga...\n"; grab($first_page); $next_page = save(); #save() returns the url to Next 5 print "\$next_page is $next_page\n"; until ($finished) { grab($next_page); $next_page = save(); print "\$next_page is $next_page\n"; last if $next_page =~ /$end$/; } print "\n\n\nCompleted Archiving\n\n\n"; #Usage: grab(url) # #Description: sub grab{ open TMP, ">tmp.html" or die; $url = shift; print "grabbing $url\n"; $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); # Be nice to Xanga servers ;-) sleep 5; # Create a request my $req = HTTP::Request->new(GET => $url); $req->content_type('application/x-www-form-urlencoded'); $req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print TMP $res->content; close TMP; print "Successfully grabbed html...\n"; } else { print $res->status_line, "\n"; } } #Useage: save(url); # #Description: sub save parses through a given URL and appends all fo +und entries of that page to # "archive.html" It also finds the url of the next page to gra +b sub save { open IN, "tmp.html" or die; open OUT, ">>archive.html" or die; print "Saving...\n"; while ($line = <IN>) { if ($line =~ /<div class="blogheader">/) { last; } } print OUT $line; print "Wrote out \$line\n"; REST: while($line = <IN>) { print OUT $line; last REST if $line =~ /Next 5 &gt;&gt;/; } print "Saved\n"; $line = reverse($line); $line =~ /"(.*?)"/; close IN; close OUT; $a = 'http://www.xanga.com/' . reverse($1); #home.aspx?user=.... }

I know it's a bit crude, but it works! ;-) For now I'm too lazy to clean it up properly, but suggestions would be great! When I feel like it I'd think I'd add incremental archiving (instead of going through entire xanga), a GUI, saving images and comments to harddrive, etc...

janitored by ybiC: Balanced <readmore> tags around longish codeblock, to reduce scrolling


In reply to Xanga Archive by MistaMuShu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others admiring the Monastery: (3)
    As of 2024-07-25 09:51 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?
      erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.