comment on

Well, tilly, thanx a lot for the thorough critics. And also for the last remark, I needed that ;-). I have taken your code in the script, but merged it to some of my code where appropiate.

Of course, I have some counter-remarks, here we go.

1. Why reverse the order for the arguments? join and split both first start with the separator, and than the input. So I changed the order back to strings, input.

2. I don't like to pre-compile the regex's, otherwise the split couldn't cope with changing delimiters, as in a text file (see the SYNOPSIS), or with sprintf'ed data. So I changed that back. Furthermore, I couldn't find any reference to qr// in manpages. Could you please explain?

3. On tie's comments: more dimensional arrays are a perl 5 feature, so I should check for that anyway. Out of time now, so next version of supersplit.

4. I really like the recursive approach.

5. I don't see the need for a separate IO version, so I changed that back, too. I just try to treat the string as a filehandle, or try to open it as file (new feature). I didn't succeed to get supersplit( INPUT ), with INPUT as a filehandle, to work. That's peculiar, because the manpage tells me that <$fh>, with $fh='INPUT', should work.

6. You are totally right on the matter of the inner/ outer naming convention.

7. And ++ for the join( $_, @_) stuff. I never would have dared to use it. But of course $_ and @_ have different namespaces...

8. I removed the BEGIN blocks. Is this something for the manpages (perldoc perlmod)?

Finally, I tested the code with 2D-arrays. It works. I'm leaving home for the remainder of this year, so we'll continue next year.

Happy new year everyone, best wishes, and thanx for the comments!

Jeroen

The new code, with POD, are here:

package SuperSplit;
use strict;

=head1 NAME

SuperSplit - Provides methods to split/join in two dimensions

=head1 SYNOPSIS

 use SuperSplit;
 
 #first example: split on newlines and whitespace and print
 #the same data joined on tabs and whitespace. The split works on STDI
+N
 #
 print superjoin( supersplit() );
 
 #second: split a table in a text file, and join it to HTML
 #
 my $array2D   = supersplit( \*INPUT )  #filehandle must be open
 my $htmltable = superjoin( '</TD><TD>', "</TD></TR>\n  <TR><TD>", 
                  $array2D );
 $htmltable    = "<TABLE>\n  <TR><TD>" . $htmltable . "</TD></TR>\n</T
+ABLE>";
 print $htmltable;
 
 #third: perl allows you to have varying number of columns in a row,
 # so don't stop with simple tables. To split a piece of text into 
 # paragraphs, than words, try this:
 #
 undef $/;
 $_ = <>;
 tr/.!();:?/ /; #remove punctiation
 my $array = supersplit( '\s+', '\n\s*\n', $_ );
 # now you can do something nifty as counting the number of words in e
+ach
 # paragraph
 my @numwords = (); my $i=0;
 for my $rowref (@$array) {
    push( @numwords, scalar(@$rowref) );  #2D-array: array of refs!
    print "Found $numwords[$i] \twords in paragraph \t$i\n";
    $i++;
 }

=head1 DESCRIPTION

Supersplit is just a consequence of the possibility to use multi-dimen
+sional 
arrays in perl. Because that is possible, one also wants a way to 
convenienently split data into a nD-array (at least I want to).  And v
+ice 
versa, of course.  Supersplit/join just do that.

Because I intend to use these methods in numerous one-liners and in my
+ 
collection of handy filters, an object interface is more often than no
+t 
cumbersome.  So, this module exports two methods, but it's also all it
+ has.  
If you think modules shouldn't do that, period, use the object interfa
+ce, 
SuperSplit::Obj. TIMTOWTDI

=over 4

=item supersplit($colseparator,$rowseparator, (...,) $filehandleref ||
+ $string);

The first method, supersplit, returns a nD-array.  To do that, it need
+s 
data and the strings to split with.  Data may be provided as a referen
+ce to 
a filehandle, or as a string.  If you want use a string for the data, 
+you 
MUST provide the strings to split with (3 argument mode).  If you don'
+t 
provide data, supersplit works on STDIN. If you provide a filehandle (
+like 
\*INPUT) or filename, supersplit doesn't need the splitting strings, a
+nd 
assumes columns are separated by whitespace, and rows are separated by
+ 
newlines.  Strings are passed directly to split. If you provide more s
+trings, 
they will split the higher dimensions.

Supersplit returns a multi-dimensional array or undef if an error occu
+rred. 

=item superjoin( $colseparator, $rowseparator, $array2D );

The second and last method, superjoin, takes a nD-array and returns it
+ as a 
string.  The default behavior assumes 2D-array.  In the string, column
+s 
(adjacent cells) are separated by the first argument provided.  Rows 
(normally lines) are separated by the second argument.  Alternatively,
+ you 
may give the 2D-array as the only argument.  In that case, superjoin j
+oins 
columns with a tab ("\t"), and rows with a newline ("\n").  If you hav
+e 
more dimensions in your array, all separators for all dimensions shoul
+d be 
provided.

Superjoin returns an undef if an error occurred, for example if you gi
+ve a
ref to an hash. If your first dimension points to hashes or strings,
superjoin will return undef. Mixed arrays will break the code. 

=back

=head1 AUTHOR

Jeroen Elassaiss-Schaap, with great help from tilly, who rewrote most 
+of 
the code for version 0.03..

=head1 LICENSE

Perl/ artisitic license

=head1 STATUS

Alpha

=cut

use Exporter;
use vars qw( @EXPORT @ISA @VERSION);
@VERSION = 0.03;
@ISA = qw( Exporter );
@EXPORT = qw( &supersplit &superjoin );

sub supersplit{
    my $text = _text( pop );
    $_[0] || ( $_[0] = '\s+' );
    $_[1] || ( $_[1] = '\n'  );
    _split($text, @_);
}

sub _text{
    my $fh = shift;
    unless (defined($fh)) {   
         $fh = \*STDIN;  
    }
    if (open INPUT, "<$fh" ) {
        $fh = join '', <INPUT>;
        close INPUT;
    }
    no strict 'refs';
    (join '', <$fh>) || $fh;
}

sub _split {
    my $text = shift;
      my $re = pop;
      my @res = split($re, $text); # Consider the third arg?  
      if (@_) {
          @res = map { _split( $_, @_) } @res;
      }
      \@res;
}

sub superjoin{
    my $array_ref = pop;
    push ( @_, "\t") if @_ < 1;  
    push ( @_, "\n") if @_ < 2;  
    return undef unless( ref( $array_ref ) eq 'ARRAY' );
    return undef unless( ref( $array_ref->[0] ) =~ /ARRAY/ );
    _join( @_, $array_ref);
}

sub _join{
    my $array_ref = pop;
    my $str = pop;  
    if (@_) {    
        @$array_ref = map {_join( @_, $_)} @$array_ref;  
    }  
    join $str, @$array_ref;
}

1;
[download]

I was dreaming of guitarnotes that would irritate an executive kind of guy (FZ)

In reply to Re: Re (tilly) 1: Supersplit by jeroenes
in thread Supersplit by jeroenes

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks