Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: text processing

by Limbic~Region (Chancellor)
on Apr 22, 2014 at 17:23 UTC ( #1083204=note: print w/replies, xml ) Need Help??


in reply to text processing

DAVERN,
There are many ways to do this because one of Perl's mottos is there is more than one way to do it.

The first method is probably the most common and easiest to think of. At the top of the loop, skip lines that you don't want

while (<DATA>) { next if ! /^DATA/; # ...

Another common first step might be to throw away the first N lines

<DATA> for 1 .. 3; while (<DATA>) { # ...

Sometimes it gets more complicated and you need to check a state variable against multiple lines. I won't give you an example of that abstract case but I will show you what some people do:

my $found_start_of_data; while (<DATA>) { last if $found_start_of_data; # ... some complex code that sets the flag } while (<DATA>) { # ... }
The above has a micro-optimization which you should avoid unless you need it. Essentially, it avoids paying the penalty of checking to see if we are in the good data against all lines and starts a new loop with only the processing we care about.

The final method I will share is where you extract or eliminate what you don't want.

# Extract my ($want) = $data =~ m{(some_regular_expression)}; # Eliminate $data =~ s{some_regular_expression}{};
As you can see, there are many ways to do what you are looking to accomplish. If you don't understand something, please ask.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: text processing
by DAVERN (Initiate) on Apr 22, 2014 at 17:49 UTC

    Hi Limbic~Region, i did it on two separate programs on the first one i delete from the file the lines i do not use and i generate a new file, on the second program i process the rest of the text, i want to join it but do not find the way

    my $output = 'output.txt';

    open my $outfile, '>', $output or die "Can't write to $output: $!";

    my @array = read_file('file1.log');

    for (@array){

    next if ($_ =~ /^\TABLE NAME|HEAD0|END|^\s+$/);

    print $outfile $_ ;

    Second file:

    open my $IN, '<', 'output.txt' or die $!;

    my @lines = <$IN>;

    close $IN;

    open my $OUT, '>', 'file2.txt' or die $!;

    for my $line(@lines){

    chomp $line;

    my @data = split /\s+/, $line;

    print {$OUT} "xxxxx", $data[0], "yyy", $data2,";","\n";

    }

    close $OUT;

    I do not have idea of to do it all in only one program

    BR

      Your focus appears to be all wrong. If you are looking for something specific in a file why not just select that thing?

      my @output = (); while(<DATA>){ next unless (m/DATA/); my $line = $_; while($line=~m/(DATA\d+)/g){ push @output,$1; } } print join qq|,|, map {qq~xxx=$_~} @output; print qq|;\n|; 1; __END__ TABLE NAME HEAD0 HEAD1 HEAD2 DATA00 DATA10 DATA20 DATA01 DATA11 DATA21 END
      Produces...
      xxx=DATA00,xxx=DATA10,xxx=DATA20,xxx=DATA01,xxx=DATA11,xxx=DATA21;

      Celebrate Intellectual Diversity

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1083204]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2018-05-26 10:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?