Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

parse a line with varied number of fields

by raggmopp (Novice)
on Jul 03, 2012 at 03:27 UTC ( #979570=perlquestion: print w/ replies, xml ) Need Help??
raggmopp has asked for the wisdom of the Perl Monks concerning the following question:

Hi all:

Working with perl 5.8.8 on RHEL 5.x. An ftp server using vsftp. I want to get the file name but the file name could include spaces, thus there could be a varied number of fields per line.

The file name, with it's PATH, begins at the 9th field of a line. The last 9 fields are the same as they pertain to the ftp process. So I am thinking on dropping the 1st 8 fields and the last 9 fields.

If there are no spaces in the file name, there will be 18 fields on a line. If there is a space, or spaces, in a file name then there will be 19, or more, fields on a line.

But the file name will always start at the 9th field and the last 9 fields can be dropped. What is in the middle, 1 field, 2 fields, 3 fields, ... will be the file name.

I can identify the lines I need, but when I see a space in a file name, or spaces, it kinda halts my thought.

Many thanks

The code I've put together. I am testing for incoming and a pattern match, 1 of 4 possibilities.

my $vsftplog = "vsftplog"; open(VLOG, $vsftplog) || die "Cannot open log file: $!\n"; while (<VLOG>) { my @fields = split(/ /); if (( "$fields[-7]" eq "i" ) && ( $fields[9] =~ m/(\/fileA-*?.?)|(\/ +fileB-*?.?)|(\fileC-*?.?)|(\/fileD-*?.?)/ )) { print "@fields"; } }

From these lines that have been identified, cut '$fields[0]...$fields7' and cut '$fields$NF...$fields-8'. So what is left is the file name with it's PATH and any spaces if spaces exist in the file name.

Comment on parse a line with varied number of fields
Download Code
Re: parse a line with varied number of fields
by davido (Archbishop) on Jul 03, 2012 at 03:45 UTC

    What does your current attempt look like, and how is it failing to meet your specification?


    Dave

      I have updated with with some code. I can identify the lines but the number of fields per line can vary.

      Cut fields 0-7 and cut fields $NF - -8. What remains is the file name with spaces. The number of fields left to print could be varied, 1 fields, 2 fields, ...

      Thanks

Re: parse a line with varied number of fields
by Marshall (Prior) on Jul 03, 2012 at 05:34 UTC
    So you are try to parse some output from an application?
    Right?

    Show some examples of what you are trying to parse and the expected result.
    Make this a simple as possible.
    If for example, the first 7 fields don't matter, don't show them.
    If you have some Perl code, show it.

      I have updated with with some code. I can identify the lines but the number of fields per line can vary.

      Cut fields 0-7 and cut fields $NF - -8. What remains is the file name with spaces. The number of fields left to print could be varied, 1 field, 2 fields, ...

      Thanks

        Ok, you have some code now which is a start. However, in general showing some code that "doesn't work" is not that useful unless you also include the actual input, actual output and DESIRED output of that code.

        The best is if I have a runnable code example to see exactly what it does given the input. Then some explanation of what you want it to do.

        Showing the DATA is important.
        Show some examples of these various types of lines - It is possible that I will code it very differently than your formulation.

        If there are X fields at the beginning of some huge line, maybe you don't have so show all of them - maybe just one of the first 8-9 fields here would be sufficient? Show one example of the "whole thing", then the simplified examples where the variations start.

Re: parse a line with varied number of fields
by AnomalousMonk (Monsignor) on Jul 03, 2012 at 06:31 UTC

    As davido and Marshall have pointed out, it would be nice to see some code, even a very bare beginning.

    However, a few hints and suggestions:

    • One might be tempted to extract the filename with a regex, but the use of split might be more direct if less elegant. One problem with split is that the fields matching the splitting regex are typically lost, so if you split on multiple spaces and the file name has multiple spaces between file name sub-fields such as
          my $string = 'aa bbb a  file   name cc ddd';
      this information is lost. This can be avoided by enclosing the split regex in capturing parentheses, which preserves fields matching the splitting regex, so
          my @fields = split /(\s+)/, $string;
      and everything can be joined back together again later.
    • Take a look at splice. This built-in will chop n elements off the end of an array such as might be produced by split (and return them, but you don't have to use them) if given a negative offset -n, and delete and return all elements from offset n to the end of an array if given a positive offset. You seem to know the constant field offsets from the start and end of the original string.
    • Last but not least, join, your constant companion.

      Thanks. I will look into splice.

Re: parse a line with varied number of fields
by bitingduck (Friar) on Jul 03, 2012 at 06:48 UTC

    Can you use something like Net::SFTP instead? It can give you a list of filenames as a list, so you don't have to go parsing the lines yourself. It looks like it's been stable since before 5.8.8

      Thanks. I will look into Net::SFTP. It may not be work in this specific application but I do have an SFTP site that will need parsing as well.

Re: parse a line with varied number of fields
by johngg (Abbot) on Jul 03, 2012 at 09:35 UTC

    As long as you have a constant number of fields and the only field that can contain spaces is your filename one, the problem can be solved using the third argument to split combined with reverse.

    knoppix@Microknoppix:~$ perl -Mstrict -Mwarnings -E ' > my @lines = ( > q{1 2 3 4 5 6 7 8 filename 10 11 12 13 14 15 16 17 18}, > q{1 2 3 4 5 6 7 8 file name 10 11 12 13 14 15 16 17 18}, > q{1 2 3 4 5 6 7 8 new file name 10 11 12 13 14 15 16 17 18}, > ); > > foreach my $line ( @lines ) > { > my @flds = split m{\s+}, $line, 9; > @flds = split m{\s+}, reverse( $flds[ -1 ] ), 10; > my $filename = reverse $flds[ -1 ]; > say qq{>$filename<}; > }' >filename< >file name< >new file name< knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

      Thanks John. I have been thinking about reverse. Hoping to see if there is something better.

      At first glance this seemed unnecessarily complicated but then I thought, "No, it couldn't be doing that!" So I tried...
      #!/usr/bin/perl use Modern::Perl; my @lines = ( q{1 2 3 4 5 6 7 8 filename 10 11 12 13 14 15 16 17 18}, q{1 2 3 4 5 6 7 8 file name 10 11 12 13 14 15 16 17 18}, q{1 2 3 4 5 6 7 8 new file name 10 11 12 13 14 15 16 17 18}, ); foreach my $line ( @lines ) { my @flds = split m{\s+}, $line, 9; @flds = split m{\s+}, reverse( $flds[ -1 ] ), 10; my $filename = reverse $flds[ -1 ]; say qq{>$filename<}; } foreach my $line ( @lines ) { my @flds = split(' ', $line); my $filename = join(' ', @flds[8 .. $#flds - 9]); say qq{<$filename>}; }
      Yes, yours preserves multiple spaces in the file name. Thanks for the enlightenment JohnGG!
        preserves multiple spaces in the file name

        In case others are puzzled let's share the enlightenment and explain what is going on. The reason spaces are preserved is that the file name field is never actually split. First, the line is broken into nine fields.

        1 2 3 4 5 6 7 8 file name 10 11 12 13 14 15 16 17 18 ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        Then the last field from the first split operation is reversed and broken into 10 fields.

        81 71 61 51 41 31 21 11 01 eman elif ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^^^^^^^^^^^^

        Finally the last field from the second split operation is reversed to obtain the file name.

        file name

        Note that spaces in the file name will be preserved but if the file name has leading and/or trailing spaces the method fails as they will be lost.

        Cheers,

        JohnGG

Re: parse a line with varied number of fields
by BillKSmith (Chaplain) on Jul 03, 2012 at 14:22 UTC

    I feel that a REGEX makes the intent clear. Efficiency should not be an issue.

    use strict; use warnings; my $string = 'field1 ' . 'field2 ' . 'field3 ' . 'field4 ' . 'field5 ' . 'field6 ' . 'field7 ' . 'field8 ' . 'field9 ' . 'file name here ' . 'field11 ' . 'field12 ' . 'field`3 ' . 'field14 ' . 'field15 ' . 'field16 ' . 'field17 ' . 'field18 ' . 'field19' ."\n" ; my $skip_field = qr /[^\s]+\s+/; # Include any non-whitspace character my $file_field = qr /[\w\s]+/; # Include word characters and spaces my ($file_name) = $string =~ m/\A$skip_field{9}($file_field)\s$skip_field{9}\z/; print $file_name, "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979570]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-07-30 00:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls