parsing a space-separated filename in a line with fields separated by spaces

princepawn has asked for the wisdom of the Perl Monks concerning the following question:

Hello parsing fans, let's start with some sample data:

Mon Oct  1 17:09:23 2001 0 127.0.0.1 2611 1774034 a _ o r tmbranno ftp
+ 0 * c
Mon Oct  1 17:09:27 2001 0 127.0.0.1 22 1774034 a _ o r tmbranno ftp 0
+ * c
Mon Oct  1 17:09:27 2001 0 127.0.0.1 22 file with spaces in it.zip a _
+ o r tmbranno ftp 0 * c
Mon Oct  1 17:09:31 2001 0 127.0.0.1 7276 p1774034_11i_zhs.zip a _ o r
+ tmbranno ftp 0 * c
[download]

Now, if it were not for the 3rd line, I could simply split on whitespace to get each field:

    our @field = qw(day_name month day current_time  year  transfer_ti
+me
       remote_host     file_size filename   transfer_type   special_ac
+tion_flag    
       direction access_mode username   service_name    authentication
+_method  
       authenticated_user_id completion_status);
    my %field;
    @field{@field} = split /\s+/, $line;
[download]

In then we have our data in a hash, and can access fields by name instead of position. This is how my module Net::FTPServer::XferLog has worked fine for years, but I just learned of a poor guy getting filenames with spaces in them. So, my approach to this problem is to split like normal, but shift and pop off data with care from either side of the filename field. and then whatever is left after that, join with empty string to make the file field:

sub parse_line {
    my $self = shift;   my $line = shift or die "must supply xferlog l
+ine";

    my @field = qw(day_name month day current_time  year  transfer_tim
+e
           remote_host     file_size  filename   transfer_type   
           special_action_flag    direction access_mode username   
           service_name    authentication_method  authenticated_user_i
+d
           completion_status);

    my %field;


    my @tmp = split /\s+/, $line;
    if (scalar @tmp == scalar @field) {
    @field{@field} = @tmp;
    } else {
    for (@field) {
        last if $_ eq 'filename';
        $field{$_} = shift @tmp;
    }
        
    @field = reverse @field;
    @tmp   = reverse @tmp;

    for (@field) {
        last if $_ eq 'filename';
        $field{$_} = shift @tmp;
    }

    @tmp = reverse @tmp ;
    $field{filename} = "@tmp";
    }



#    map { print "$_ => $field{$_} \n" } @field;
#    print "-------------------";
    \%field;
}
[download]

But that is not very 'phisticated and I just KNOW some 1337 h4x0R out there is dying to flex his text parsing skIllZ and make the crowd go ooh and ahhh, so show me whatcha got!

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

Comment on parsing a space-separated filename in a line with fields separated by spaces Select or Download Code

Replies are listed 'Best First'.
Re: parsing a space-separated filename in a line with fields separated by spaces by BrowserUk (Patriarch) on Aug 15, 2007 at 22:07 UTC
7HiS sEeem5 2 dO 743 tRicK. #! perl -slw use strict; use Data::Dump qw[ pp ]; while( <DATA> ) { my %fields; my @bits = m[ ^ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ ( .+ ) \s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+)\s+ (\S+) $ ]x; @fields{ qw( day_name month day current_time year transfer_time remote_host file_size filename transfer_type special_action_flag direction access_mode username service_name authentication_method authenticated_user_id completion_status ) } = @bits; print pp \%fields; } __DATA__ Mon Oct 1 17:09:23 2001 0 127.0.0.1 2611 1774034 a _ o r tmbranno ftp + 0 * c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 1774034 a _ o r tmbranno ftp 0 + * c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 file with spaces in it.zip a _ + o r tmbranno ftp 0 * c Mon Oct 1 17:09:31 2001 0 127.0.0.1 7276 p1774034_11i_zhs.zip a _ o r + tmbranno ftp 0 * c [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: parsing a space-separated filename in a line with fields separated by spaces by johngg (Canon) on Aug 15, 2007 at 22:32 UTC
You can work your way in from both ends using the thre-argument form of split, like this. I have just populated an AoA here but once you have done that you can do what you like with it. use strict; use warnings; use Data::Dumper; my @linesData; while ( <DATA> ) { chomp; my @flds = split m{\s+}, $_, 9; my $rest = pop @flds; push @flds, reverse map { $_ = reverse } split m{\s+}, reverse($rest), 10; push @linesData, \@flds; } print Data::Dumper->Dumpxs([\@linesData], [qw{linesData}]); __END__ Mon Oct 1 17:09:23 2001 0 127.0.0.1 2611 1774034 a _ o r tmbranno ftp + 0 c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 1774034 a _ o r tmbranno ftp 0 + * c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 file with spaces in it.zip a _ + o r tmbranno ftp 0 * c Mon Oct 1 17:09:31 2001 0 127.0.0.1 7276 p1774034_11i_zhs.zip a _ o r + tmbranno ftp 0 * c [download] This produces Read more... (3 kB) I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re: parsing a space-separated filename in a line with fields separated by spaces by FunkyMonk (Chancellor) on Aug 15, 2007 at 22:11 UTC
My take... while ( <DATA> ) { chomp; my @fields1 = split ' ', $_, 9; my @fields2 = split / /, pop @fields1; if ( @fields2 > 10 ) { my @filename = splice @fields2, 0, @fields2 - 9; unshift @fields2, join ' ', @filename; } push @fields1, @fields2; printf "%3s %3s %2d %8s %4s %s %-14s %4d %-26s %s %s %s %s %-10s % +3s %s %s %s\n", @fields1; } __DATA__ Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 file with spaces in it.zip a _ + o r tmbranno ftp 0 * c Mon Oct 1 17:09:23 2001 0 127.0.0.1 2611 1774034 a _ o r tmbranno ftp + 0 * c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 1774034 a _ o r tmbranno ftp 0 + * c Mon Oct 1 17:09:31 2001 0 127.0.0.1 7276 p1774034_11i_zhs.zip a _ o r + tmbranno ftp 0 * c [download] Output: `Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 file with spaces in it. +zip a _ o r tmbranno ftp 0 * c Mon Oct 1 17:09:23 2001 0 127.0.0.1 2611 1774034 + a _ o r tmbranno ftp 0 * c Mon Oct 1 17:09:27 2001 0 127.0.0.1 22 1774034 + a _ o r tmbranno ftp 0 * c Mon Oct 1 17:09:31 2001 0 127.0.0.1 7276 p1774034_11i_zhs.zip + a _ o r tmbranno ftp 0 * c` [download]	[reply] [d/l] [select]
Re: parsing a space-separated filename in a line with fields separated by spaces by mamawe (Sexton) on Aug 15, 2007 at 22:28 UTC
Would you mind using a regex? `while (<>) { if (/^(\w{3} \w{3} [ :\d]{16}) (\d+) ([.\d]+) (\d+) (.+) ([a]) ([_ +]) ([o]) ([r]) (\w+) (\w+) (\d) (\S) ([c])$/) { print "$1 $2 $3 $4 '$5' $6 $7 $8 $9 $10 $11 $12 $13 $14\n"; } }` [download] puts nice single quotes around the name and you can access every field as well. You might even assign it to a list of variables.	[reply] [d/l]
Re: parsing a space-separated filename in a line with fields separated by spaces by jwkrahn (Abbot) on Aug 15, 2007 at 22:30 UTC
Assuming that file names can also have leading and/or trailing spaces in them, for example `' file name '` then you may want something like this: while ( <$in> ) { chomp; my %field; # remove and capture leading fields s/^ *(\S+) (\S+) +(\d+) ([\d:]+) (\d+) (\d+) ([\d.]+) (\d+) // and @field{ qw/ day_name month day current_time year transfer_time + remote_host file_size / } = ( $1, $2, $3, $4, $5, $6, $7, $8 ); # remove and capture trailing fields s/ (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+)$// and @field{ qw/ transfer_type special_action_flag direction access +_mode username service_name authentication_method authenticated_user_ +id completion_status / } = ( $1, $2, $3, $4, $5, $6, $7, $8, $9 ); # only thing left is file name $field{ filename } = $_; print "$_ = '$field{$_}'\n" for keys %field; print "\n"; } [download]	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom