Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: parsing CSV

by GrandFather (Saint)
on Oct 09, 2016 at 10:04 UTC ( [id://1173577]=note: print w/replies, xml ) Need Help??


in reply to parsing CSV

This doesn't seem to be getting very far very fast. The following puts all the pieces together, albeit using data from Re^2: parsing CSV rather than the real data. The parsing and clean up will no doubt need to be different for the real data. This just pulls out the first two pre tags from one page rather than fetching two pages and doing whatever is needed to pull out the interesting content.

use strict; use warnings; use MIME::Lite; use LWP::Simple; use Text::CSV; use HTML::TreeBuilder; # Fetch the "pages" my $content = get("http://perlmonks.org/?node_id=1173447"); die "Couldn't get it!" unless defined $content; # Parse pages and clean up content my $root = HTML::TreeBuilder->new_from_content($content); my ($page1, $page2) = map {$_->as_text()} $root->find_by_tag_name('pre +'); s/\[download\]//g for $page1, $page2; s/\n\+//g for $page1, $page2; # Process page 1 my $csv = Text::CSV->new(); my %idData; open my $pg1In, '<', \$page1; while (my $row = $csv->getline($pg1In)) { s/^\s+|\s+$//g for @$row; $idData{$row->[1]}{size} = $row->[0]; $idData{$row->[1]}{name} = '-- missing --'; } close $pg1In; # Process page 2 $page2 =~ s/\b(?=\w+,)/\n/g; # Insert newlines in front of id codes open my $pg2In, '<', \$page2; while (my $row = $csv->getline($pg2In)) { next if !$row->[0]; # Skip blank lines s/^\s+|\s+$//g for @$row; $idData{$row->[0]}{name} = $row->[1]; $idData{$row->[0]}{size} //= '-- missing --'; } close $pg2In; # Generate output string my $output; for my $id (sort keys %idData) { $output .= "$id: $idData{$id}{name} size $idData{$id}{size}\n"; } # Build the email my $msg = MIME::Lite->new( From => 'me@myhost.com', To => 'you@yourhost.com', Cc => 'some@other.com, some@more.com', Subject => "Here's the data you wanted", Data => $output ); # and "send" it (just '$msg->send()' in the next line to really send i +t print $msg->as_string();

Prints:

Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 X-Mailer: MIME::Lite 3.030 (F2.85; T2.13; A2.16; B3.15; Q3.13) Date: Sun, 9 Oct 2016 22:55:39 +1300 From: me@myhost.com To: you@yourhost.com Cc: some@other.com, some@more.com Subject: Here's the data you wanted c100: Joe Shmo size 512.45 c200: Jack Black size 6734 c300: Cinderella size 5653.2 c400: Barack Obama size -- missing -- c500: Cruella Deville size -- missing --

I suggest you leave the print line in until the body of the email looks right before you change it to the send line.

Premature optimization is the root of all job security

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1173577]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-26 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found