Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

bobf's scratchpad

by bobf (Monsignor)
on Aug 12, 2004 at 05:12 UTC ( [id://382174]=scratchpad: print w/replies, xml ) Need Help??

For Lady_Aleena: Working example based on my understanding of what you were trying to do. Adding tied hashes, opening files, etc is left to you. Movies can be done the same was as for tv. The sub could be tightened up but I don't see a need for it. Hope this helps.

use strict; use warnings; use Data::Dumper; my @television_headings = qw{ title alt_title start_year end_year wikipedia allmovie imdb tvcom genre }; my @csv_television; # creating dummy data for the example foreach my $n ( 1 .. 3 ) { push( @csv_television, join( '|', map { $_ . $n } @television_head +ings ) ); } my $tv_hash = {}; load_h( $tv_hash, \@television_headings, \@csv_television ); print Dumper $tv_hash; sub load_h { my ( $h, $headings, $csv_data ) = @_; foreach my $line ( @$csv_data ) { my @vals = split( /\Q|\E/, $line ); # might want a real CSV pa +rser instead my %data; @data{ @$headings } = @vals; # hash slice $h->{ $vals[0] } = \%data; } }

The latest example I found of Really Ugly Code:

open(SHELL,">shell.sh"); print SHELL "cd $dir","\n"; print SHELL "cp $file $configParams{BASEDIR}"; system ("bash shell.sh") ; close(SHELL);


Quick links:

PerlMonks::Mechanized - revision ideas for davido
PerlMonks::Mechanized - comments to davido
PerlMonks::Mechanized test script
PerlMonks::Mechanized test script for retrieving all private msgs - for atcroft
PerlMonks::Mechanized::private_message
PerlMonks::Mechanized::Data - proposed
Sudoku solver


Revision of PerlMonks::Mechanized

Just a bunch of ideas rolling around my head - please let me know what you think. Thanks!

# my original idea was to have a getter/setter for each "extended info +" item: sub pass_ticker_id{ my( $self, $value ) = @_; if( defined $value ){ $self->{pass_ticker_id} = $value ? 1 : 0; } return $self->{pass_ticker_id}; } # we could lump them together: sub pass_extended_info{ my( $self, $value ) = @_; if( defined $value ){ $self->{pass_ticker_id} = $value ? 1 : 0; $self->{kitchen_sink} = $value ? 1 : 0; } return $value; } # the ticker methods could then be something like: sub threaded_ids { my( $self, $base ) = @_; $self->{agent}->get( $self->{site} . "?node_id=180684;id=$base" ); $self->{agent}->success() or die "Unable to fetch thread ticker for id = $base.\n"; my $struct = XMLin( $self->{agent}->content(), ForceArray => 1, KeepRoot => 1 ); my $ref = $self->pass_extended_info ? { 'ticker_id' => 180684, 'data' => $struct } : $struct; return $ref; } # if you wanted to lump the extended info into a single hash: my $ref = $self->pass_extended_info ? { 'info' => { 'ticker_id' => 180684, 'kitchen_sink' => $self->{kitchen_sink} }, 'data' => $struct } : $struct; # if the extended_info part was sub-ified: sub threaded_ids { my( $self, $base ) = @_; $self->{agent}->get( $self->{site} . "?node_id=180684;id=$base" ); $self->{agent}->success() or die "Unable to fetch thread ticker for id = $base.\n"; my $struct = XMLin( $self->{agent}->content(), ForceArray => 1, KeepRoot => 1 ); return $self->_format_struct( '180684', $struct ); } sub _format_struct{ my( $self, $tickerid, $struct ) = @_; my $ref = $self->pass_extended_info ? { 'ticker_id' => $tickerid, 'data' => $struct } : $struct; return $ref; } # or, to really make a mess of things: sub threaded_ids { my( $self, $base ) = @_; my $url = $self->{site} . "?node_id=180684;id=$base"; $self->_get_ticker( $url ) or return; return $self->_parse_struct( 180684 ); } sub _get_ticker{ my( $self, $url ) = @_; $self->{agent}->get( $url ); $self->{agent}->success() or do{ warn "Unable to fetch ticker for $url\n"; return; }; } sub _parse_struct{ my( $self, $tickerid ) = @_; my $struct = XMLin( $self->{agent}->content(), ForceArray => 1, KeepRoot => 1 ); return $self->_format_struct( $tickerid, $struct ); }


PerlMonks::Mechanized (PerlMonks::Mechanized (beta))

davido:

I added this method to PM::Mech to obtain a list of all nodes created by a given user. It is based on the user_stats method.

# Grab info (title, date, reputation, etc) for a user's nodes. # Uses the user node info XML Ticker (id://32704). If the # "reputation" field is not required, foruser=userNameOrID can # be used, which avoids requiring a login. Otherwise returns # information about nodes by the logged-in user (or by Anonymous # Monk if no login). sub user_nodes { my( $self, %params ) = @_; my $parameters = ''; foreach( ( 'for_user', 'for_userid' ) ) { $parameters .= exists( $params{$_} ) ? ";$_=$params{$_}" : ''; } $self->{agent}->get( $self->{site} . "?node_id=32704;" . $parameters . $self->_login_URI() ); $self->{agent}->success() or die "Unable to fetch user nodes ticker.\n"; return XMLin( $self->{agent}->content(), ForceArray => 1 ); }
In addition, since the methods are so similar to each other I started pulling out common elements. I ended up with something that is probably much less readable, but I'd still like your thoughts on it as it gives me a better feel for what is Good Design. (I don't think the code below is any easier to read than the original, so I question if it was the right move.) Using the new method from above as an example (untested):
sub user_nodes { my( $self, %params ) = @_; my @allowed_params = ( 'for_user', 'for_userid' ); my $url = join( '', '?node_id=32704;', get_param_list( \@allowed_params, %params ), $self->_login_URI() ); get_ticker( $self, $url, 'user nodes' ); return XMLin( $self->{agent}->content(), ForceArray => 1 ); } sub get_param_list { my ( $ref2allowed, %params ) = @_; my $parameters = ''; foreach( @$ref2allowed ) { $parameters .= exists( $params{$_} ) ? ";$_=$params{$_}" : ''; } return $parameters; } sub get_ticker { my ( $self, $url, $tickertype ) = @_; $self->{agent}->get( $self->{site} . $url ); $self->{agent}->success() or die "Unable to fetch $tickertype ticker.\n"; }

I'm also kicking around the idea of adding PerlMonks::Mechanized::Data (or something similar). The idea would be to provide an OO interface to the data elements returned by Mech so the user wouldn't have to know anything about the data structure that is returned, and to insulate Mech users from future changes to the XML generators (either in format or content).

I thought one WTDI would be to change Mech to return ::Data objects, then call a parse method on them (which could be in the Data parent class and, at least initially, the parser could just pass the data to XML::Simple). Each ticker could have subclassed methods, if necessary, but some of them (e.g., those that deal with node writeups) could utilize the same methods. In the case of node writeups, a class object/datastructure could be used to store the data for future use. For a given node data object, accessor methods could include 'title', 'rep', 'author_id', 'author_name', 'date_created', etc. I'm very new to OO, so this design is probably not optimal.

Finally, thanks for writing PM::Mech. I tried it out (after upgrading WWW::Mech and friends) and it works great!


PerlMonks::Mechanized test script

use strict; use warnings; use Data::Dumper; use PerlMonks::Mechanized; # pass username and password to 'new' my $pm_obj = PerlMonks::Mechanized->new( 'xxxx', 'xxxx' ); my $root_node_id = 466016; # node used for example my @node_ids = ( $root_node_id, $root_node_id + 1 ); #*********************************************************** print "PM object:\n"; print Dumper( $pm_obj ); { print "user_stats:\n"; my $data = $pm_obj->user_stats( showall => 1 ); print Dumper( $data ); } print "PM object:\n"; print Dumper( $pm_obj ); foreach my $method qw( threaded_ids thread_list node_info node_content ) { print "$method:\n"; my $data = $pm_obj->$method( $root_node_id ); print Dumper( $data ); } foreach my $method qw( node_info node_titles ) { print "$method:\n"; my $data = $pm_obj->$method( @node_ids ); print Dumper( $data ); }

PerlMonks::Mechanized test script for retrieving all private msgs

atcroft:

use strict; use warnings; use PerlMonks::Mechanized; my $timeformat = '%Y-%m-%d %H:%M:%S'; # format for strftime # should put username and password into ENV vars instead my $pm_obj = PerlMonks::Mechanized->new( 'username', 'passwd' ); #*********************************************************** my @msgs; my $since_id = 0; # get all messages my $delay = 10; # num secs between page requests (init value) my $max_recs = 100; # max # records to return at a time (init value) while( 1 ) { my ( $data, $info ) = $pm_obj->private_message( archived => 'both', xmlstyle => 'clean', since_id => $since_id, max_recs => $max_recs, min_poll_seconds => $delay ); last if( not defined $data ); # update the max_recs and min_poll_seconds params # based on the values in INFO $delay = $info->[0]->{min_poll_seconds} || $delay; $max_recs = $info->[0]->{max_recs} || $max_recs; # save the msgs for processing (could just print now instead) push( @msgs, @{ $data } ); # msgs are returned in ascending id order, so the # last msg is the most recent $since_id = $data->[-1]->{message_id}; print 'retrieved ', scalar @{ $data }, ' messages'; if( scalar @{ $data } <= $max_recs ) { print "\n"; last; } print ", sleeping $delay secs\n"; sleep( $delay ); } print "\nprivate messages:\n"; foreach my $msg ( @msgs ) { my $datetime = format_datetime_string( $msg->{time}, $timeformat ); print "$datetime - $msg->{author}: $msg->{content}\n\n"; } #*********************************************************** sub format_datetime_string { my ( $string, $format ) = @_; # $string is of the format: YYYYMMDDhhmmss # YYYY = 4 digit year # MM = month, 1-12 # DD = day, 1-31 # hh = hour (24 hr scale, EST) # mm = min # ss = sec my $year = substr( $string, 0, 4 ); my $month = substr( $string, 4, 2 ); my $day = substr( $string, 6, 2 ); my $hour = substr( $string, 8, 2 ); my $min = substr( $string, 10, 2 ); my $sec = substr( $string, 12, 2 ); # strftime expects $month to be 0..11 and # $year to be num yrs since 1900 return POSIX::strftime( $format, $sec, $min, $hour, $day, $month-1, $year-1900 ); }

...and the updated code for PerlMonks::Mechanized::private_message

sub private_message { my( $self, %params ) = @_; my $parameters = ''; foreach( ( 'max_recs', 'since_id', 'prior_to', 'archived' ) ) { $parameters .= exists( $params{$_} ) ? ";$_=$params{$_}" : ''; } $self->{agent}->get( $self->{site} . "?node_id=15848;xmlstyle=clean" . $parameters . $self->_login_URI() ); $self->{agent}->success() or die "Unable to fetch Private Message ticker.\n"; # Return values modified by bobf 6-20-05. # Was just the {message} block, now is ( {message}, {info} ). # This allows the caller to obtain params from the INFO section, # including the max_recs and min_poll_seconds params set by PM. # This change is not necessary if the while() loop to get all # msgs is included in this method rather than in the caller. my $data = XMLin( $self->{agent}->content(), ForceArray => 1 ); return( $data->{message}, $data->{INFO} ); }

PerlMonks::Mechanized::Data - ideas and rough code

davido:

The following module was written as a first bash at standardizing the data returned by the PM::Mech methods. It is still very rough, poorly documented, and minimally tested, but it serves as an example for what I was starting to think about. First, some comments:

  • The purpose of this module is to standardize the data returned by PM::Mech's node methods so it can be accessed uniformly. The module should be renamed to reflect this.
  • Since PM::Mech also handles other types of data (chatter, messages, thread data, user stats, other users, scratchpad, newest nodes), it might make sense to create a larger "PM::Data" class with ::Node, ::Msg, etc subclasses. For this reason, this module might be better named "PM::Mech::Data::Node", or something along those lines.
  • If this were converted to OO, $datatype could be determined by the class (provided the PM::Mech methods (user_nodes, etc) returned a data object rather than a ref from XML::Simple), which would eliminate _determine_data_type(). The integration subroutine could then be an object method that is called directly from add_node_data()'s caller, and all of the node data currently stored in %$alldata_ref could be a class variable with accessor methods.
  • One way to structure this is to have the PM::Mech methods get the ticker data, then pass it to ::Data->new to create a new data object. ::Data->new could determine what subclass to use (::Node, ::Msg, etc), and call the appropriate parser. The parser could be a class method consisting only of XML::Simple, or it could be a subclass-specific method that standardizes the data structure (like the subs in this module). In any case, a Data object (class singleton?) could be returned to the code that called the original PM::Mech method, and (sub?)class-specific accessor methods could take over from there. This is probably not the best design, but it's a starting point. I'm open to suggestions and would appreciate feedback.
  • As you mentioned, converting PM::Mech to return objects would require updating PM::Mech::Janitor accordingly. Alternatively, we could leave PM::Mech as-is, then have the user pass the $data ref from the PM::Mech method to PM::Mech::Data->new(). That wouldn't break Janitor and it would be more flexible (the user could choose whether or not to use Data objects), but it would require extra steps from the user if they did want to use ::Data.

I envision being able to do something like this:

my $pm_obj = PerlMonks::Mechanized->new(); my $node_data = $pm_obj->node_info( @node_ids ); my $title = $node_data->title( $node_ids[3] );

With PM::Mech::Data, we can get this far:

my $pm_obj = PerlMonks::Mechanized->new(); my $node_data = $pm_obj->node_info( @node_ids ); add_node_data( $node_data, \%alldata ); my $title = $alldata{ $node_ids[3] }{title};

Example for PerlMonks::Mechanized::Data

my %nodedata; my $pm_obj = PerlMonks::Mechanized->new(); my $data = $pm_obj->node_info( @node_ids ); add_node_data( $data, \%nodedata );

And now for the code:

package PerlMonks::Mechanized::Data; # PerlMonks::Mechanized::Data standardizes the data structure returned # by the PM::Mech user_nodes, node_info, node_content, and node_titles # methods. The add_node_data routine takes as input the ref returned # from those methods (output from XML::Simple) and a ref # to a master hash. # The structure of the master hash is shown below. #$alldata_ref: # $node_id => # { # 'node_id' => '466017', # 'root_node' => '466016', # 'parent_node' => '466016', # # 'author_user' => '333489', # 'author_name' => 'muba', # # 'title' => 'Re: regex for word puzzle', # 'content' => (node text), # 'reputation' => '17', # 'nodetype' => 'note', # # 'createtime' => '20050612204931', # 'created' => '2005-06-12 20:49:31', # 'lastupdate' => '', # 'lastedit' => '20050407145724' # } use strict; use warnings; use Carp qw( carp ); use Data::Dumper; use Exporter; our @ISA = ("Exporter"); #our @EXPORT = (); our @EXPORT_OK = qw( add_node_data ); our $VERSION = 0.01; #********************************************************************* my %integrate = ( user_nodes => \&_add_user_nodes, node_info => \&_add_node_info, node_content => \&_add_node_content, node_titles => \&_add_node_titles ); sub add_node_data { my ( $newdata_ref, $alldata_ref ) = @_; # $newdata_ref = ref to the output from the PM::Mech methods # $alldata_ref = ref to the master hash containing all node data my $datatype = _determine_data_type( $newdata_ref ); if( not defined $datatype ) { return 1; } $integrate{$datatype}->( $newdata_ref, $alldata_ref ); return 0; } #********************************************************************* sub _determine_data_type { my ( $newdata_ref ) = @_; if( ref( $newdata_ref ) eq 'HASH' && exists $newdata_ref->{INFO} && exists $newdata_ref->{NODE} ) { return 'user_nodes'; } elsif( ref( $newdata_ref ) eq 'HASH' && exists $newdata_ref->{title} && exists $newdata_ref->{author} ) { return 'node_content'; } elsif( ref( $newdata_ref ) eq 'ARRAY' && ref( $newdata_ref->[0] ) eq 'HASH' ) { return 'node_info'; } elsif( ref( $newdata_ref ) eq 'ARRAY' && ref( $newdata_ref->[0] ) eq 'ARRAY' ) { return 'node_titles'; } else { Carp::carp "\nUnrecognized data type"; print "\n"; return undef; } } sub _add_node_info { my ( $newdata_ref, $alldata_ref ) = @_; foreach my $src_ref ( @{ $newdata_ref } ) { my $node_id = $src_ref->{node_id}; my $dest_ref = \%{ $alldata_ref->{$node_id} }; my %data = ( lastupdate => $src_ref->{lastupdate}, nodetype => $src_ref->{nodetype}, root_node => $src_ref->{root_node}, title => $src_ref->{content}, createtime => $src_ref->{createtime}, node_id => $src_ref->{node_id}, author_user => $src_ref->{author_user}, author_name => $src_ref->{author_name}, parent_node => $src_ref->{parent_node} ); foreach my $key qw( root_node parent_node ) { if( not defined $data{$key} ) { # The root and parent nodes are undef if this node is # not a reply, so skip them. # We could set root and parent = $node_id in %data if # undef, instead (but it may not be expected behavior) delete $data{$key}; } } foreach my $key ( keys %data ) { if( exists $dest_ref->{$key} && $data{$key} ne $dest_ref->{$key} ) { _print_warning( $node_id, $key, $dest_ref->{$key}, $data{$key} ); } $dest_ref->{$key} = $data{$key}; } } } sub _add_node_titles { my ( $newdata_ref, $alldata_ref ) = @_; foreach my $src_ref ( @{ $newdata_ref } ) { my $node_id = $src_ref->[0]; my $title = $src_ref->[1]; my $dest_ref = \%{ $alldata_ref->{$node_id} }; if( exists $dest_ref->{title} && $title ne $dest_ref->{title} ) { _print_warning( $node_id, 'title', $dest_ref->{title}, $title ); } $dest_ref->{title} = $title; } } sub _add_node_content { my ( $newdata_ref, $alldata_ref ) = @_; # is the data in 'updated' in the same format as for 'lastupdate'? # retain the 'created' key (in a diff format than 'createtime') my $node_id = $newdata_ref->{id}; my %data = ( title => $newdata_ref->{title}, lastupdate => $newdata_ref->{updated}, created => $newdata_ref->{created}, content => $newdata_ref->{doctext}{content}, nodetype => $newdata_ref->{type}{content}, author_name => $newdata_ref->{author}{content}, author_user => $newdata_ref->{author}{id} ); my $dest_ref = \%{ $alldata_ref->{$node_id} }; foreach my $key ( keys %data ) { if( exists $dest_ref->{$key} && $data{$key} ne $dest_ref->{$key} ) { _print_warning( $node_id, $key, $dest_ref->{$key}, $data{$key} ); } $dest_ref->{$key} = $data{$key}; } } sub _add_user_nodes { my ( $newdata_ref, $alldata_ref ) = @_; my $author_name = $newdata_ref->{INFO}->[0]->{foruser}; while( my( $node_id, $noderef ) = each %{ $newdata_ref->{NODE} } ) { my %data = ( reputation => $noderef->{reputation}, created => $noderef->{createtime}, title => $noderef->{content}, lastupdate => $noderef->{lastupdate}, lastedit => $noderef->{lastedit} ); my $dest_ref = \%{ $alldata_ref->{$node_id} }; foreach my $key ( keys %data ) { if( exists $dest_ref->{$key} && $data{$key} ne $dest_ref->{$key} ) { _print_warning( $node_id, $key, $dest_ref->{$key}, $data{$key} ); } $dest_ref->{$key} = $data{$key}; } } } sub _print_warning { my ( $node_id, $key, $dest_val, $src_val ) = @_; print "Warning - data discrepancy for node ID $node_id:\n"; print " current $key = $dest_val\n"; print " new $key = $src_val\n"; print " The new data will replace the current data\n"; }

Sudoku solver (my own feeble attempt)

# Each row must contain the digits 1 through 9 in any order. # Each column must contain the digits 1 through 9 in any order. # The 9x9 grid holds nine 3x3 grids. Each of those 3x3 grids must # contain the digits 1 through 9 in any order. use strict; use warnings; use Data::Dumper; use POSIX qw( ceil ); use List::Compare; # $solution[$row][$col], 0 = unknown # a dummy row and col will be added to @solution to allow indeces 1..9 my @solution = ( [ qw( 8 5 0 1 0 2 7 0 3 ) ], [ qw( 3 0 0 0 4 0 0 0 0 ) ], [ qw( 0 0 4 7 3 0 0 0 0 ) ], [ qw( 4 0 0 0 0 0 8 5 0 ) ], [ qw( 0 2 0 0 0 0 0 1 0 ) ], [ qw( 0 1 5 0 0 0 0 0 4 ) ], [ qw( 0 0 0 0 1 7 4 0 5 ) ], [ qw( 0 0 0 0 2 0 0 0 1 ) ], [ qw( 7 0 0 9 0 5 0 2 6 ) ] ); unshift( @solution, [] ); foreach my $row ( 1..9 ) { unshift( @{ $solution[$row] }, 0 ); } # the 3x3 grids are arranged into a 9x9 cell table as follows # 1 2 3 # 4 5 6 # 7 8 9 # the 9x9 cell table has rows 1..9 and cols 1..9 # translate row and col indeces (concatenated) of the upper left cell +in each 3x3 grid into an index my %gridnum = ( 11 => 1, 12 => 2, 13 => 3, 21 => 4, 22 => 5, 23 => 6, 31 => 7, 32 => 8, 33 => 9 ); # $grid[1..9] = [ options left for this 3x3 grid ] my @grids; initialize_grids(); # $rows[1..9] = [ options left for this row ] # $cols[1..9] = [ options left for this col ] my @rows; my @cols; initialize_rows(); initialize_cols(); my $unsolved = ( 9 * 9 ) - num_hints(); while( $unsolved ) { foreach my $row ( 1..9 ) { foreach my $col ( 1..9 ) { next if $solution[$row][$col]; # find intersection of @row and @col for this cell, and in +tersection with @grid = options left for this 3x3 grid # if only 1 left, assign to @solution and subtract from @r +ow and @col and @grid, $unsolved-- my $gridkey = join( '', POSIX::ceil( $row/3 ), POSIX::ceil +( $col/3 ) ); my $gridnum = $gridnum{$gridkey}; my $lc_obj = List::Compare->new( $rows[$row], $cols[$col], + $grids[$gridnum] ); my @options = $lc_obj->get_intersection(); if( scalar @options == 1 ) { $solution[$row][$col] = $options[0]; foreach my $a_ref ( $rows[$row], $cols[$col], $grids[$ +gridnum] ) { @{ $a_ref } = grep{ $_ != $options[0] } @{ $a_ref +}; } $unsolved--; } } } } foreach my $row ( 1..9 ) { print join( ' ', @{ $solution[$row] }[1..9] ), "\n"; } sub initialize_grids { # determine what numbers are available as options for each 3x3 gri +d # initialize each 3x3 grid foreach my $gridnum ( 1..9 ) { @{ $grids[$gridnum] } = ( 1..9 ); } # filter out hints already in @solution foreach my $row ( 1..9 ) { foreach my $col ( 1..9 ) { my $gridkey = join( '', POSIX::ceil( $row/3 ), POSIX::ceil +( $col/3 ) ); my $gridnum = $gridnum{$gridkey}; @{ $grids[$gridnum] } = grep{ $_ != $solution[$row][$col] +} @{ $grids[$gridnum] }; } } } sub initialize_rows { # determine what numbers are available as options for each row foreach my $row ( 1..9 ) { @{ $rows[$row] } = ( 1..9 ); foreach my $col ( 1..9 ) { @{ $rows[$row] } = grep{ $_ != $solution[$row][$col] } @{ +$rows[$row] }; } } } sub initialize_cols { # determine what numbers are available as options for each col foreach my $col ( 1..9 ) { @{ $cols[$col] } = ( 1..9 ); foreach my $row ( 1..9 ) { @{ $cols[$col] } = grep{ $_ != $solution[$row][$col] } @{ +$cols[$col] }; } } } sub num_hints { my $hints = 0; foreach my $row( 1..9 ) { $hints += scalar grep{ $_ != 0 } @{ $solution[$row] }; } return $hints; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-19 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found