Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: filter the files in a folder on the basis of some variable present in them

by Discipulus (Canon)
on Jun 10, 2015 at 08:34 UTC ( [id://1129820]=note: print w/replies, xml ) Need Help??


in reply to filter the files in a folder on the basis of some variable present in them

mmh not sure if i have understood all your question.. anyway i have some hints for you.

First your method to get the file list ($i = 6000000) is, mmh how to say, bizarre? Perl has a glob function, use it.
Second, never use bareword filehandles (FH) use the lexical form.

Third if you really have so much files then copy a little bounch of them (with some positive case included) in a development directory and write your script against them, so you'll have a fast feedback.

A basic approch can be similar to this pseudo-code
#pseudo-code $|++; #flush output to stdout as soon as possible my @files = glob '*.txt'; foreach my $file (@files){ my ($var_one, $var_two ...); #your var names to be checked open my $fh, '<', $file or die "..."; while (<$fh>) { #use regex to put something inside $var_one, $var_two } # if (all vars needed are defined and pass your check){ print "$file + IS VALID\n"; system 'mv $file /new/path' } }


HtH
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
  • Comment on Re: filter the files in a folder on the basis of some variable present in them
  • Download Code

Replies are listed 'Best First'.
Re^2: filter the files in a folder on the basis of some variable present in them
by reciter (Novice) on Jun 10, 2015 at 09:54 UTC

    Hi discipulus, I am a novice to perl thats why I do so much blunder. I tried another way to filter files. Instead using x2 z2 and all as variable I tried to search them as pattern. but it also having problem can you please check and tell what can I do so it may start working. I have updated once

    #usr/bin/perl use strict; use warnings; $|++; #flush output to stdout as soon as possible my @files = glob '*.txt'; my $pat='/^x2=0.[6-9][0-9][0-9][0-9][0-9]\n'; my $pat1='/^z2=0.[6-9][0-9][0-9][0-9][0-9]\n'; my $pat3='/^some_t2=0.[6-9][0-9][0-9][0-9][0-9]\n'; foreach my $file (@files) { foreach my $file1(@files) { foreach my $files2(@files) { if (($file=~/^$pat/) && ($file1=~/^$pat1/) && ($$file2=~/^$pat3/)) { print "$file IS VALID\n"; system 'mv $file E:\test\some' } } else print "not relevant\n" }

      Instead using x2 z2 and all as variable I tried to search them as pattern ??
      Also the triple foreach make no sense to me: first a regex can be precompiled using the qr operator, second you are appliyng the regex to filename! not the content. You need a lot of practice with Perl subjects: open file, regexes (basics), loop ..

      Anyway, following the basic structure mentioned by me above, and given the following folder content:
      ls -l -rw-rw-rw- 1 user group 26 Jun 10 13:27 invalid.txt -rw-rw-rw- 1 user group 1049 Jun 10 13:44 reciter.pl -rw-rw-rw- 1 user group 36 Jun 10 13:28 valid.txt cat invalid.txt dfd wdfq qwef z2=0.7 cat valid.txt adf df x2=0.7 z2=0.7 some_t2=0.7
      you must have something like (tested working code):
      #!/usr/bin/perl use strict; use warnings; $|++; #flush output to stdout as soon as possible my @files = glob '*.txt'; my $pat = qr/^x2=(0.[6-9])$/; my $pat1= qr/^z2=(0.[6-9])$/; my $pat3= qr/^some_t2=(0.[6-9])$/; foreach my $file (@files){ my ($var_one, $var_two, $var_three); #your var names to be checked print "checking '$file'\n"; open my $fh, '<', $file or die "..."; while (<$fh>) { #use regex to put something inside $var_one, $var_two.. chomp $_; if ($_ =~ $pat) {$var_one = $1; print "\tfound:'$_'\n"} if ($_ =~ $pat1) {$var_two = $1; print "\tfound:'$_'\n"} if ($_ =~ $pat3) {$var_three = $1; print "\tfound:'$_'\n"} } # #if (all vars needed are defined and pass your check){ print "$fil +e IS VALID\n"; system 'mv $file /new/path' } if (defined $var_one && defined $var_two && defined $var_three ) { print "FILE $file has a valid content ( x2=$var_one, z2=$var_two +, some_t2=$var_three)\n"; # system "mv $file x:/valid_files" } }
      and the output will be:
      perl reciter.pl checking 'invalid.txt' found:'z2=0.7' checking 'valid.txt' found:'x2=0.7' found:'z2=0.7' found:'some_t2=0.7' FILE valid.txt has a valid content ( x2=0.7, z2=0.7, some_t2=0.7)


      HtH
      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        Hey Discipulus sorry to bother you once again, but when I run code written by you. I got output:
        perl discipulus.pl checking 'invalid.txt' found:'z2=0.7' checking 'valid.txt' found:'some_t2=0.7'
        can you please see to it one more time? Yes, I know I need to practice perl a lot and I am going to do it (help me with this)

      A few problems with your code:

      • regular expressions (regexes or patterns) inadequate
      • you're not searching each file for the patterns (you're searching the file names)

      I also recommend precompiling the regexes before use via the qr// operator. Try to use those clues (and the help from Discipulus) to modify your code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1129820]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-19 11:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found