Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Line by line parsing from one file, comparing line by line to another file

by web_developer (Initiate)
on Jul 05, 2009 at 03:20 UTC ( #777297=perlquestion: print w/ replies, xml ) Need Help??
web_developer has asked for the wisdom of the Perl Monks concerning the following question:

OK, I am new to perl scripting.
I need help, I have a script that i am trying to do the following:
file1= list of unique ID numbers
file2= list of unique html code with the unique ID numbers within the code per line(about 16k-bytes each seperated by carage.

For file1.txt
123432
432342
456645
and so on for example..
file2.txt
<html>...... UNIQUE ID..</html> <html>...... UNIQUE ID..</html> and so on..
Both are standard text files.
===========UPDATE============
July 4, 2009 12:09:22 AM EST
for example, file1 line 1 has 12345 for example,
file2 has .<html>...... 12345..</html>
on line 1. I need line 1 of file2 to
be made into a new file called 12345.html
==========UPDATE===========
July 4, 2009 12:14:56 AM EST
Now file2 is data_file array,
file1 is names_file array and
the results being several unique (names).html
files.
===========
#!/usr/bin/perl<br> #read each line in test1.txt into data_file array<br> $data_file="test1.txt";<br> open(DATA, $data_file) || die("Could not open file!");<br> @data_file=&lt;DATA&gt;;<br> <br> #read each line in code.txt into a names_file array<br> $names_file="code.txt"<br> open(NAMES, $names_file) || die("Could not open file!");<br> @names_data=&lt;NAMES&gt;;<br> <br> #create loop that reads each ID in code.txt (NAMES array), searches fo +r<br> #each in array elements for test1.txt (DATA array), redirects a new<br +> #(NAMES).html for each element<br> foreach ( $NAMES )<br> {<br> chomp($NAMES);<br> ($NAMES=$DATA&lt;0&gt; &gt; +("$NAMES&lt;0&gt;.html"));<br> }<br> <br> close NAMES;<br> close DATA;<br> <br>
I am new to perl but this is absolutely riddled with errors and I have written this according to examples of similar scripts.

Comment on Line by line parsing from one file, comparing line by line to another file
Download Code
Re: Line by line parsing from one file, comparing line by line to another file
by ww (Bishop) on Jul 05, 2009 at 04:01 UTC

    Is this what you need? Is the content of the two source files something akin to what you're working with?

    #!/usr/bin/perl use strict; use warnings; # 777297 my $data_file = "test1.txt"; open(DATA, $data_file) || die("Could not open file!"); my @data_file = <DATA>; my $names_file = "code.txt"; open(NAMES, $names_file || die "Could not open file!"); my @names_data = <NAMES>; # create loop that reads each ID in code.txt (NAMES array), searches f +or each in array elements for #test1.txt # (DATA array), redirects a new (NAMES).html for each element my ($names, $data); for $names(@names_data) { chomp($names); for $data(@data_file) { chomp ($data); if ($data =~ /$names/) { print "found \$data ( $data ) in \$names \n"; } } } close NAMES; close DATA;

    test1.txt:

    12345 34567 89246 54321 98765

    code.txt:

    <html> <head> <title "777297 code"</title> </head> <body> <p><span class="b">12345 </span> foobar</p> <p><span class="b">54321 </span></p> <ul><li>89246</li></ul> <div id="second">34567 <br>78912 but this one ain't there</div> </body> </html>
    If so, you weren't too far off.

    But please, when you post code, use <code>...</code> (or <c>...</c> tags) around code and data.

    Update, in light of OP's update:
    The above produces this output:

    found $data ( 12345 ) in $names found $data ( 34567 ) in $names found $data ( 89246 ) in $names found $data ( 54321 ) in $names found $data ( 98765 ) in $names

    So now you need to add the appropriate code to create a new file for each printed line of output and print the content of names therein.

    And two BTW's:

    • Your update time is off by a day. (which happens at this time of night)
      and
    • Welcome to the Monastery.
      YES!
      ===========UPDATE============
      July 4, 2009 12:09:22 AM EST
      for example, file1 line 1 has 12345 for example,
      file2 has .<html>...... 12345..</html>
      on line 1. I need line 1 of file2 to
      be made into a new file called 12345.html
      ==========UPDATE===========
      July 4, 2009 12:14:56 AM EST
      Now file2 is data_file array,
      file1 is names_file array and
      the results being several unique (names).html
      files.
      my $names_file = "code.txt"; open(NAMES, $names_file || die "Could not open file!");

      Because of the high precedence of the  || operator die will never be called because $names_file is always true.

      You need to either use parentheses in the proper place:

      open(NAMES, $names_file) || die "Could not open file!";

      Or use the lower precedence  or operator:

      open NAMES, $names_file or die "Could not open file!";
      I atempted a modified version of your script and got this:
      $ perl.exe perl2.pl
      Global symbol "@data" requires explicit package name at perl2.pl line 27.
      Can't find string terminator "|" anywhere before EOF at perl2.pl line 27.

      #!/usr/bin/perl<br> use strict;<br> use warnings;<br> <br> # 777297<br> <br> my $data_file = "test1.txt";<br> open(DATA, $data_file) || die("Could not open file!");<br> my @data_file = &lt;DATA&gt;;<br> <br> my $names_file = "code.txt";<br> <br> open(NAMES, $names_file || die "Could not open file!");<br> my @names_data = &lt;NAMES&gt;;<br> <br> # create loop that reads each ID in code.txt (NAMES array), searches f +<br> #+or each in array elements for #test1.txt<br> # (DATA array), redirects a new (NAMES).html for each element<br> <br> my ($names, $data);<br> for $names(@names_data) {<br> &nbsp;&nbsp;&nbsp; chomp($names);<br> &nbsp;&nbsp;&nbsp; for $data(@data_file) {<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; chomp ($data);<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if ($data =~ /$names/) {<br +> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print "found \$names ( $names )&nbsp; in \$data \n";<br> push @data, qq | &gt; $names.html "\n";<br> &nbsp;}<br> &nbsp;&nbsp;&nbsp; }<br> }<br> <br> close NAMES;<br> close DATA;

        The message you cite is about push @data, qq | > $names.html "\n"; is about the missing closing delimiter ( "|" ) for qq (your lack of code tags forced me to count. Meh!) But that's not the whole problem by a long shot.

        If you really want to learn Perl, "cargo-culting" (which is what you appear to be doing) is a poor course of instruction. Instead, suggest you start working your way through Learning Perl; the Tutorials here; from http://www.perltraining.com.au or from any one of several very good introductory courses available on-line (your pick, but those from colleges and universites, .edu, are sometimes better than the [very low] average web site).

Re: Line by line parsing from one file, comparing line by line to another file
by Polyglot (Monk) on Jul 05, 2009 at 04:15 UTC
    #!/usr/bin/perl #read each line in test1.txt into data_file array use strict; use warnings; my $data_file = "test1.txt"; my $names_file = "code.txt"; my $target_file = "target.txt"; my @data = (); # Cannot be the same name as above my @names_data = (); my @target = (); my $line = ''; open(DATA, $data_file) || die ("Could not open file! $!\n"); @data=<DATA>; close DATA; open(NAMES, $names_file) || die ("Could not open file! $!\n"); @names_data=<NAMES>; close NAMES; #read each line in code.txt into a names_file array foreach $line (@data) { #create loop that reads each ID in #code.txt (NAMES array), searches for #each in array elements for test1.txt #(DATA array), redirects a new #(NAMES).html for each element foreach ( @names_data ) { chomp; push @target, qq|$line<0> > +("$_<0>.html")\n|; #I do not understand what you are trying to do above... #Just modify the part between the pipes |...\n| to include #whatever variables you are wanting. } } open TARGET ">$target_file" || die "Cannot open target file. $!\n"; print TARGET @target; close TARGET;

    Blessings,

    ~Polyglot~

      OK, now I understand!
      Thank you, however, the variable is just the
      element (ie unique name (names).html), i added the <0>
      as I thought based on my ksh script, which is terribly slow
      required a (cnt(cnt+1)) with cnt(0).html as the
      output to redirct to.
      But I had to then had to rename 1.html, 2.html and so forth
      into a 12345.html, 34343.html for example
      I am going to test this to see what happens
      Upon my modifications, and executing the script, I get the following errors.
      $ perl.exe perl1.pl
      Variable "$NAMES" is not imported at perl1.pl line 36.
      Global symbol "$NAMES" requires explicit package name at perl1.pl line 36.
      Missing comma after first argument to open function at perl1.pl line 43, near ""
      Cannot open target file. $!\n";"
      Execution of perl1.pl aborted due to compilation errors. UPDATE- Modifications.
      #!/usr/bin/perl
      #read each line in test1.txt into data_file array

      use strict;
      use warnings;

      my $data_file = "test1.txt";
      my $names_file = "code.txt";
      my $target_file = "target.txt";
      my @data = (); # Cannot be the same name as above
      my @names_data = ();
      my @target = ();
      my $line = '';

      open(DATA, $data_file)
            || die ("Could not open file! $!\n");
      @data=<DATA>;
      close DATA;

      open(NAMES, $names_file)
            || die ("Could not open file! $!\n");
      @names_data=<NAMES>;
      close NAMES;

      #read each line in code.txt into a names_file array
      foreach $line (@data) {

      #create loop that reads each ID in
      #code.txt (NAMES array), searches for
      #each in array elements for test1.txt
      #(DATA array), redirects a new
      #(NAMES).html for each element
      foreach ( @names_data )
      {
      chomp;
      push @target, qq|$line<0> > +("$NAMES.html")\n|;
      #I do not understand what you are trying to do above...
      #Just modify the part between the pipes |...\n| to include
      #whatever variables you are wanting.
      }
      }
      open TARGET ">$target_file"
           || die "Cannot open target file. $!\n";
      print TARGET @target;
      close TARGET;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://777297]
Approved by grep
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2014-10-25 21:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls