http://www.perlmonks.org?node_id=1013867

brad_nov has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Split file using perl and regexp
by davido (Cardinal) on Jan 17, 2013 at 19:45 UTC

    Could you explain how this question substantially or conceptually differs from Split a file based on column, which you posted (and followed-up to) yesterday? I thought we already dealt with this.

    What did you mean, in that thread, by "Thanks, got it working"?


    Dave

      I mean I was able to split the file based on the solution given by Kenosis. It's an extension to yesterday's problem. Sorry if I was not clear. Thanks.

        If it's an extension of yesterday's problem, post the code that you're currently using so that we can help in extending it.

        Otherwise it just looks like you're making zero progress on your own, and hoping someone will do free work for you.


        Dave

Re: Split file using perl and regexp
by ww (Archbishop) on Jan 17, 2013 at 19:43 UTC
    ... or into hiring a programmer?

    You've outlined, by example, a longish spec ...but haven't shown any hint of an attempt to solve your problem, even though you've asked similar questions at least 3 times in the past 60 days.

    Where's your code? Precisely, what's wrong with it?

Re: Split file using perl and regexp
by keszler (Priest) on Jan 17, 2013 at 19:36 UTC
    Have you looked into using any of the various CSV modules?
Re: Split file using perl and regexp
by AnomalousMonk (Archbishop) on Jan 17, 2013 at 22:37 UTC

    Here's an approach based on the observation of certain similarities (common prefix characters) in the data fields of interest in the three different types of data files. No discrimination between the three data file types is needed in the code.

    Some notes of caution:

    • The code shown assumes the data being fed to it is valid. It is intended only as an example of a regex-based approach.
    • The code is critically dependent on the definition of the  $rx_oct regex. (I gave it this name because it superficially suggests an IP octet.) The OP shows only limited examples of this sub-field in the range (10 .. 13). You (brad_nov) will have to change this regex to reflect the real data – or else maybe reveal an actual spec!

    >perl -wMstrict -le "my @records = ( '1|1212|34353|56575|||||4|~some~~pi=[10.10.10.10.10],uid=[11]}~', '1|1212|34353|56575|||||4|~som~~390=10.10.10.10.11,391=222,394~', '1|1212|34353|56575|||||4|~somedata~10.10.10.10.12~3333~~a~~~~', ); ;; my $rx_oct = qr{ \d{1,3} }xms; my $rx_quint = qr{ $rx_oct (?: \. $rx_oct){4} }xms; ;; my $rx_dotted = qr{ (?<! \d) $rx_quint (?! \d) }xms; my $rx_int = qr{ \d+ }xms; ;; for my $record (@records) { print qq{'$record'}; my ($const, $var) = $record =~ m{ ( \A .+) \| ( .* \z) }xms; my (undef, $dotted, $int) = $var =~ m{ (\D) ($rx_dotted) .*? \1 ($rx_int) }xms; my $new_record = join '|', $const, $dotted, $int; print qq{'$new_record' \n}; } " '1|1212|34353|56575|||||4|~some~~pi=[10.10.10.10.10],uid=[11]}~' '1|1212|34353|56575|||||4|10.10.10.10.10|11' '1|1212|34353|56575|||||4|~som~~390=10.10.10.10.11,391=222,394~' '1|1212|34353|56575|||||4|10.10.10.10.11|222' '1|1212|34353|56575|||||4|~somedata~10.10.10.10.12~3333~~a~~~~' '1|1212|34353|56575|||||4|10.10.10.10.12|3333'

    Update: After playing around with this a bit and doing a little, um, testing, I think I would change the definition of  $rx_dotted as follows (change to final look-ahead):
        my $rx_dotted = qr{ (?<! \d) $rx_quint (?! [.\d]) }xms;
    This change does not affect behavior for valid records.

Re: Split file using perl and regexp
by Anonymous Monk on Jan 18, 2013 at 01:00 UTC