Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

using Getopt::Long to modify the default values in new

by Aldebaran (Chaplain)
on Jul 09, 2019 at 23:18 UTC ( #11102616=perlquestion: print w/replies, xml ) Need Help??

Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

I present again with the hopes of creating an authentic distribution along the lines described in _Intermediate Perl_. bliako and I have been taking turns as each others' sounding board. I have vivisected his original script in Re^4: chunking up texts correctly for online translation. I present invocation, output, then source.

$ ./5.MY.translate.pl --configfile C --from tja --infile /home/bob/Documents/meditations/Algorithm-Markov-Multiorder-Learner-master/data/2.short.shelley.txt --outfile /home/bob/Desktop/1.state

I'll put output in pre tags, because I'm never sure what code tags are gonna do with it:

input  is “The ancient teachers of this science,” said he,
“promised impossibilities and performed nothing. The modern masters
...
unfold to the world the deepest mysteries of creation.


----------------
in new, param_hr is
{
  CONTENT => "\x{201C}The ancient teachers of this science,\x{201D} said he,\n\x{201C}promised 
... the steps\nalready marked, I will pioneer a new way, explore unknown powers, and\nunfold to the world the deepest mysteries of creation.\n\n",
  FROM => "tja",
  key => 123,
  TO => undef,
}
in new, self is 
bless({ format => 5.23, FROM => "en", key => 321, START => 1562613962, TO => "ru" }, "My::Module")
in sub key
akey is  123 
you called key() method on object 'My::Module=HASH(0x5638464c0768)'
key() : changing key to '123'
in sub key
Use of uninitialized value $akey in concatenation (.) or string at ./5.MY.translate.pl line 38.
akey is   
you called key() method on object 'My::Module=HASH(0x5638464c0768)'
my key: 123
--- mod is
bless({ format => 5.23, FROM => "en", key => 123, START => 1562613962, TO => "ru" }, "My::Module")
$ 

Source:

#!/usr/bin/perl -w use 5.011; binmode STDOUT, ":utf8"; use open IN => ':crlf'; use open OUT => ':utf8'; package My::Module; sub new { my ( $class, $param_hr ) = @_; $param_hr = {} unless defined $param_hr; my $self = { # hashref or arrayref key => 321, format => 5.23, START => time(), FROM => 'en', TO => 'ru', }; bless $self, $class; # now your hash is an object of class $clas +s. if ( exists $param_hr->{'key'} ) { say "in new, param_hr is"; use Data::Dump; dd $param_hr; say "in new, self is "; dd $self; } if ( exists $param_hr->{'key'} ) { $self->key( $param_hr->{'key'} ) +} else { warn "param 'key' is required."; return undef } return $self; # return hash, now blessed into a class instance, +hallelujah } # get or set the key sub key { say "in sub key"; my $self = $_[0]; my $akey = $_[1]; # optional key say "akey is $akey "; print "you called key() method on object '$self'\n"; if ( defined $akey ) { print "key() : changing key to '$akey'\n"; $self->{'key'} = $akey; } return $self->{'key'}; } 1; package main; use Getopt::Long; my $outfile = undef; my $configfile = undef; my $infile = undef; my $from = undef; my $to = undef; if ( !Getopt::Long::GetOptions( "outfile=s", \$outfile, "infile=s", \$infile, "configfile=s", \$configfile, "from=s", \$from, "to=s", \$to, "help", sub { print "Usage : $0 --configfile C [--outfile O] [--infile I] [--h +elp]\n"; exit 0; }, ) ) { die "error, commandline"; } die "configfile is needed (via --configfile)" unless defined $configfi +le; my $inFH; if ( defined($infile) ) { open( $inFH, '<:crlf:encoding(UTF-8)', $infile ) or die "opening input file $infile, $!"; } my $instr; { local $/ = undef; $instr = <$inFH> } close $inFH; if ( defined($instr) ) { say "input is $instr"; } say "----------------"; # uncomment only if My::Module is in separate file: #use My::Module; my $mod = My::Module->new( { 'key' => 123, 'CONTENT' => $instr, FROM => $from, TO => $to, } ); die unless defined $mod; print "my key: " . $mod->key() . "\n"; say "--- mod is"; dd $mod; __END__

I thought that I would emulate the syntax in source listing for Translate.pm. (Is this well-written?) With this syntax, I believe that $param_hr is to be understood as an existing reference to a hash of parameters. I believe that this is the main data structure for this object.

sub new { my ( $class, $param_hr ) = @_; my %self = ( key => 0, format => 0, model => 0, prettyprint => 0, default_source => 0, default_target => 0, data_format => 'perl', timeout => 60, force_post => 0, rest_url => $REST_URL, agent => ( sprintf '%s/%s', __PACKAGE__, $VERSION ), cache_file => 0, headers => {}, );

I've been looking at the rest of the code for new here. What does this do?

for my $property ( keys %self ) { if ( exists $param_hr->{$property} ) { my $type = ref $param_hr->{$property} || 'String' +; my $expected_type = ref $self{$property} || 'String' +; croak "$property should be a $expected_type" if $expected_type ne $type; $self{$property} = delete $param_hr->{$property}; } }

It would seem to check whether certain values are strings, but I don't understand the right hand side with

ref something  || something else

How do I get the supplied value from the command line to overwrite the default value from %self ? In this example, I would want to see the value from

--from tja

in the final output for $mod .

Thanks for your comment.

Replies are listed 'Best First'.
Re: using Getopt::Long to modify the default values in new
by Athanasius (Bishop) on Jul 10, 2019 at 06:53 UTC

    Hello Aldebaran,

    I thought that I would emulate the syntax in source listing for Translate.pm. (Is this well-written?)

    Not entirely — see below.

    With this syntax, I believe that $param_hr is to be understood as an existing reference to a hash of parameters. I believe that this is the main data structure for this object.

    Correct in both cases.

    I've been looking at the rest of the code for new here. What does this do?

    The for loop iterates through the keys of %self (in no particular order) — i.e., "key", "format", "model", etc. — assigning each in turn to $property. The if condition checks whether the current property is an entry in the user-supplied hash referenced by $param_hr:

    • If not, the property keeps its default value assigned previously in my %self = (...);
    • If so, the line $self{$property} = delete $param_hr->{$property}; deletes the entry from the user-supplied hash and assigns (overwrites) it to $self.

    But first, there is a check to ensure that the data type of the user-supplied entry matches the type expected by new(). This check is performed using the built-in function ref, which is documented as follows:

    ref EXPR
    ...
    Examines the value of EXPR, expecting it to be a reference, and returns a string giving information about the reference and the type of referent....

    If the operand is not a reference, then the empty string will be returned. An empty string will only be returned in this situation. ref is often useful to just test whether a value is a reference, which can be done by comparing the result to the empty string. It is a common mistake to use the result of ref directly as a truth value: this goes wrong because 0 (which is false) can be returned for a reference.

    — which is why I said above that the code is not entirely well-written: instead of testing directly against the empty string, the code first changes an empty string into the string 'String', which is not only unnecessary, but in fact is the “common mistake” mentioned in the documentation.

    It would seem to check whether certain values are strings, but I don't understand the right hand side with
    ref something || something else

    No, it’s not checking whether values are strings, it’s checking whether values are references. (In this case, only the value associated with the headers key will actually be a reference.) The || (logical OR) operator checks its LHS first and returns it if it is true; otherwise, it returns the RHS. So in the lines:

    my $type = ref $param_hr->{$property} || 'String'; my $expected_type = ref $self{$property} || 'String';

    the first line assigns 'String' to $type if, and only if, the expression ref $param_hr->{$property} evaluates to a value which Perl considers “false.” If $param_hr->{$property} is not a reference, ref returns the empty string (""), which Perl does consider false, so the || operator evaluates to 'String'.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      instead of testing directly against the empty string, the code first changes an empty string into the string 'String', which is not only unnecessary, but in fact is the “common mistake” mentioned in the documentation.

      Thanks for your generous comments, Athanasius. This is a good spot for me to repost. Even with edits, it goes on long enough for readmore tags. Output, source, and slightly different questions follow.

        #$self{$property} = delete $param_hr->{$property};
        
        The way I read the delete function in perldoc.perl.org listing for delete, the RHS populates the left.
        

        From delete:

        In list context, usually returns the value or values deleted, or the last such element in scalar context. The return list's length corresponds to that of the argument list: deleting non-existent elements returns the undefined value in their corresponding positions.
        

        Perhaps it's not a good idea to delete the keys you set in $self from your input parameters. A simple $self{$property} = $param_hr->{$property}; suffices. You realise that by deleting, you modify a data structure created by the caller (%param_hr).

        The Use of uninitialized value $akey in concatenation (.) or string at ./6.MY.translate.pl line 62. can be avoided if you move say "akey is  $akey "; after you check it is defined

        Validating your input is great. But personally I would not validate the data types of params passed in a module's constructor. When you have a script and users run it from command line it is wise to (edit:over-)validate because a user may not have read the manual or is confused about input parameters. This is the normal scenario - judging from how I make such mistakes. However, I would categorise the programmer/user/caller of an API/module in slightly higher level and I would trust this user more. So I would still validate input params for right input values but checking for type, well that may cause me and the CPU too much extra work and just I do not do it. Practical reason: I sometimes set default params in $self to be undef. In which case no data type can be deduced. Saying in the pod that "this module does not validate input types, please observe the parameters' types stated" is enough for me. But may not be for others.

        Note that if ( exists $param_hr->{$property} ) { passes if key exists, but its value may be undefined. e.g. for this input: $param_hr{'key'} = undef; edit: clearly 'key' exists, so exists passes, but its value is undef which !perhaps! misses what author intended.

Re: using Getopt::Long to modify the default values in new
by Anonymous Monk on Jul 13, 2019 at 01:50 UTC

    I've been looking at the rest of the code for new here. What does this do?

    Find out? Yeah :) Basic debugging checklist use Data::Dump...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11102616]
Front-paged by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2019-07-19 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?