Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Break a file into seperate files based on string match

by hvirani (Initiate)
on Dec 29, 2017 at 18:23 UTC ( #1206430=perlquestion: print w/replies, xml ) Need Help??
hvirani has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a txt file containing
test/foo/bar : tag1 test/abc/xyz : tag1 test/def/abc : tag2 test/bar/foo : tag2 test/dummy/foo : tag1
I want to move the content to 2 files: tag1.txt and tag2.txt containing the respective test/... strings based on the associated tags e.g. tag1.txt will contain
test/foo/bar test/abc/xyz test/dummy/foo
and similar for tag2.txt. How can I do this?

Replies are listed 'Best First'.
Re: Break a file into separate files based on string match
by shmem (Chancellor) on Dec 30, 2017 at 00:12 UTC
    How can I do this?

    What Anonymous Monk says. But that's probably too dense at first glance. So let's go step by step.
    You have a well formed input file containing two fields separated by ' : '.
    The first field is data, the second field denominates the destination file.
    To write the data portion to the destination, you have to open/create a file for writing, for the respective destination. See open.
    While processing each line, you want to separate the data from the destination denominator. See split.
    Now, how can you get the filehandle to print at from the destination denominator? You store the filehandle in a hash. See perldata.
    A hash can store filehandles as values to be retrieved via a string. Since you have to retrieve the destination for each line of input, you need not open the files beforehand, since you can combine the retrievement of a filehandle from the hash with a logical or: if it doesn't exist, you open the file and store the filehandle in the filehandle hash under the respective destination key.

    How could you put that into code? Well, first let's set up a filehandle hash containing the tokens and filehandles for the destination. See my.

    my %fh;

    Reading a file given as an argument at the command line is easy:

    while(<>) { # the line read is stored into $_ chomp; # remove line ending character from $_ ... }

    See readline for the diamond operator <>. See chomp.
    Split the line based on your separator (see perlre, perlop and perlfunc):

    my( $data, $fh_token ) = split /\s*:\s*/;

    Now, get the filehandle for $fh_token, and if it doesn't exist, create one and store the filehandle into %fh

    if( ! $fh{$fh_token}) { open my $fh, '>', "$fh_token.txt" or die "Can't write to '$fh_token.txt': $!\n"; $fh{$fh_token} = $fh; # store filehandle } my $fh = $fh{$fh_token}; # this is the filehandle to print to

    Then print the data stuff to the determined filehandle:

    print $fh $data, "\n"; # append a line break. You'd use "\r\n" on +windows.

    So, we have:

    my %fh; while(<>) { # the line read is stored into $_ chomp; # remove line ending character from $_ my( $data, $fh_token ) = split /\s*:\s*/; if( ! $fh{$fh_token}) { open my $fh, '>', "$fh_token.txt" or die "Can't write to '$fh_token.txt': $!\n"; $fh{$fh_token} = $fh; # store filehandle } my $fh = $fh{$fh_token}; print $fh $data, "\n"; # append a line break. You'd use "\r\n" on +windows. }

    The code by Anonymous Monk above makes use of features built into perl. See perlrun and IO::File. Condensation of this code to resemble the succinct variation is left as an excercise to the reader. TIMTOWTDI (there's more than one way to do it).

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Break a file into seperate files based on string match
by 1nickt (Monsignor) on Dec 29, 2017 at 18:35 UTC

    Hi, what have you tried, and how did it not work for you?

    Update: ahh, what's the use? Don't copy this:

    use strict; use warnings; use feature 'say'; use Path::Tiny; my %file_map = ( tag1 => Path::Tiny->tempfile, tag2 => Path::Tiny->tem +pfile ); chomp( my @lines = <DATA> ); for ( @lines ) { my ( $txt, $tag ) = split / : /; path( $file_map{ $tag } )->append("$txt\n"); } for ( keys %file_map ) { say $file_map{ $_ }; say $file_map{ $_ }->slurp; } __END__ test/foo/bar : tag1 test/abc/xyz : tag1 test/def/abc : tag2 test/bar/foo : tag2 test/dummy/foo : tag1
    Output:
    perl 1206430.pl /tmp/k6iEPAjgjR test/def/abc test/bar/foo /tmp/aFliecLlJN test/foo/bar test/abc/xyz test/dummy/foo


    The way forward always starts with a minimal test.
Re: Break a file into seperate files based on string match
by tybalt89 (Priest) on Dec 29, 2017 at 19:35 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1206430 use strict; use warnings; use Path::Tiny; my %parts; while(<DATA>) { /^(.*) : (\S+)$/ and push @{ $parts{"$2.txt"} }, "$1\n"; } for my $filename (sort keys %parts) { path($filename)->spew( $parts{$filename} ); } __DATA__ test/foo/bar : tag1 test/abc/xyz : tag1 test/def/abc : tag2 test/bar/foo : tag2 test/dummy/foo : tag1
Re: Break a file into seperate files based on string match
by karlgoethebier (Monsignor) on Dec 30, 2017 at 13:43 UTC

    A solution using Path::Tiny, map and grep:

    #!/usr/bin/env perl use strict; use warnings; use Path::Tiny; use Data::Dump; use feature qw(say); my @data = path("data.txt")->lines( { chomp => 1 } ); dd \@data; my @tag1 = map { "$_\n" } grep { defined } map { /(.+) : tag1/; $1 } @data; my @tag2 = map { "$_\n" } grep { defined } map { /(.+) : tag2/; $1 } @data; dd \@tag1; dd \@tag2; path("tag1.txt")->spew(@tag1); path("tag2.txt")->spew(@tag2); say path("tag1.txt")->slurp; say path("tag2.txt")->slurp; __END__ karls-mac-mini:hvirani karl$ ./hvirani.pl [ "test/foo/bar : tag1", "test/abc/xyz : tag1", "test/def/abc : tag2", "test/bar/foo : tag2", "test/dummy/foo : tag1", ] ["test/foo/bar\n", "test/abc/xyz\n", "test/dummy/foo\n"] ["test/def/abc\n", "test/bar/foo\n"] test/foo/bar test/abc/xyz test/dummy/foo test/def/abc test/bar/foo

    Best regards, Karl

    Minor update: Added missing square bracket in output.

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Break a file into seperate files based on string match
by Anonymous Monk on Dec 29, 2017 at 22:44 UTC

    $ perl -lanF'\s:\s' -e'print {$F{$F[1]} //= IO::File->new("$F[1].txt", "w")} $F[0]'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1206430]
Approved by 1nickt
Front-paged by 1nickt
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2018-06-21 01:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?



    Results (117 votes). Check out past polls.

    Notices?