Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Line by line parsing from one file, comparing line by line to another file

by web_developer (Initiate)
on Jul 05, 2009 at 03:20 UTC ( #777297=perlquestion: print w/ replies, xml ) Need Help??
web_developer has asked for the wisdom of the Perl Monks concerning the following question:

OK, I am new to perl scripting.
I need help, I have a script that i am trying to do the following:
file1= list of unique ID numbers
file2= list of unique html code with the unique ID numbers within the code per line(about 16k-bytes each seperated by carage.

For file1.txt
123432
432342
456645
and so on for example..
file2.txt
<html>...... UNIQUE ID..</html> <html>...... UNIQUE ID..</html> and so on..
Both are standard text files.
===========UPDATE============
July 4, 2009 12:09:22 AM EST
for example, file1 line 1 has 12345 for example,
file2 has .<html>...... 12345..</html>
on line 1. I need line 1 of file2 to
be made into a new file called 12345.html
==========UPDATE===========
July 4, 2009 12:14:56 AM EST
Now file2 is data_file array,
file1 is names_file array and
the results being several unique (names).html
files.
===========
#!/usr/bin/perl<br> #read each line in test1.txt into data_file array<br> $data_file="test1.txt";<br> open(DATA, $data_file) || die("Could not open file!");<br> @data_file=&lt;DATA&gt;;<br> <br> #read each line in code.txt into a names_file array<br> $names_file="code.txt"<br> open(NAMES, $names_file) || die("Could not open file!");<br> @names_data=&lt;NAMES&gt;;<br> <br> #create loop that reads each ID in code.txt (NAMES array), searches fo +r<br> #each in array elements for test1.txt (DATA array), redirects a new<br +> #(NAMES).html for each element<br> foreach ( $NAMES )<br> {<br> chomp($NAMES);<br> ($NAMES=$DATA&lt;0&gt; &gt; +("$NAMES&lt;0&gt;.html"));<br> }<br> <br> close NAMES;<br> close DATA;<br> <br>
I am new to perl but this is absolutely riddled with errors and I have written this according to examples of similar scripts.

Comment on Line by line parsing from one file, comparing line by line to another file
Download Code
Re: Line by line parsing from one file, comparing line by line to another file
by ww (Bishop) on Jul 05, 2009 at 04:01 UTC

    Is this what you need? Is the content of the two source files something akin to what you're working with?

    #!/usr/bin/perl use strict; use warnings; # 777297 my $data_file = "test1.txt"; open(DATA, $data_file) || die("Could not open file!"); my @data_file = <DATA>; my $names_file = "code.txt"; open(NAMES, $names_file || die "Could not open file!"); my @names_data = <NAMES>; # create loop that reads each ID in code.txt (NAMES array), searches f +or each in array elements for #test1.txt # (DATA array), redirects a new (NAMES).html for each element my ($names, $data); for $names(@names_data) { chomp($names); for $data(@data_file) { chomp ($data); if ($data =~ /$names/) { print "found \$data ( $data ) in \$names \n"; } } } close NAMES; close DATA;

    test1.txt:

    12345 34567 89246 54321 98765

    code.txt:

    <html> <head> <title "777297 code"</title> </head> <body> <p><span class="b">12345 </span> foobar</p> <p><span class="b">54321 </span></p> <ul><li>89246</li></ul> <div id="second">34567 <br>78912 but this one ain't there</div> </body> </html>
    If so, you weren't too far off.

    But please, when you post code, use <code>...</code> (or <c>...</c> tags) around code and data.

    Update, in light of OP's update:
    The above produces this output:

    found $data ( 12345 ) in $names found $data ( 34567 ) in $names found $data ( 89246 ) in $names found $data ( 54321 ) in $names found $data ( 98765 ) in $names

    So now you need to add the appropriate code to create a new file for each printed line of output and print the content of names therein.

    And two BTW's:

    • Your update time is off by a day. (which happens at this time of night)
      and
    • Welcome to the Monastery.
      YES!
      ===========UPDATE============
      July 4, 2009 12:09:22 AM EST
      for example, file1 line 1 has 12345 for example,
      file2 has .<html>...... 12345..</html>
      on line 1. I need line 1 of file2 to
      be made into a new file called 12345.html
      ==========UPDATE===========
      July 4, 2009 12:14:56 AM EST
      Now file2 is data_file array,
      file1 is names_file array and
      the results being several unique (names).html
      files.
      my $names_file = "code.txt"; open(NAMES, $names_file || die "Could not open file!");

      Because of the high precedence of the  || operator die will never be called because $names_file is always true.

      You need to either use parentheses in the proper place:

      open(NAMES, $names_file) || die "Could not open file!";

      Or use the lower precedence  or operator:

      open NAMES, $names_file or die "Could not open file!";
      I atempted a modified version of your script and got this:
      $ perl.exe perl2.pl
      Global symbol "@data" requires explicit package name at perl2.pl line 27.
      Can't find string terminator "|" anywhere before EOF at perl2.pl line 27.

      #!/usr/bin/perl<br> use strict;<br> use warnings;<br> <br> # 777297<br> <br> my $data_file = "test1.txt";<br> open(DATA, $data_file) || die("Could not open file!");<br> my @data_file = &lt;DATA&gt;;<br> <br> my $names_file = "code.txt";<br> <br> open(NAMES, $names_file || die "Could not open file!");<br> my @names_data = &lt;NAMES&gt;;<br> <br> # create loop that reads each ID in code.txt (NAMES array), searches f +<br> #+or each in array elements for #test1.txt<br> # (DATA array), redirects a new (NAMES).html for each element<br> <br> my ($names, $data);<br> for $names(@names_data) {<br> &nbsp;&nbsp;&nbsp; chomp($names);<br> &nbsp;&nbsp;&nbsp; for $data(@data_file) {<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; chomp ($data);<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if ($data =~ /$names/) {<br +> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print "found \$names ( $names )&nbsp; in \$data \n";<br> push @data, qq | &gt; $names.html "\n";<br> &nbsp;}<br> &nbsp;&nbsp;&nbsp; }<br> }<br> <br> close NAMES;<br> close DATA;

        The message you cite is about push @data, qq | > $names.html "\n"; is about the missing closing delimiter ( "|" ) for qq (your lack of code tags forced me to count. Meh!) But that's not the whole problem by a long shot.

        If you really want to learn Perl, "cargo-culting" (which is what you appear to be doing) is a poor course of instruction. Instead, suggest you start working your way through Learning Perl; the Tutorials here; from http://www.perltraining.com.au or from any one of several very good introductory courses available on-line (your pick, but those from colleges and universites, .edu, are sometimes better than the [very low] average web site).

Re: Line by line parsing from one file, comparing line by line to another file
by Polyglot (Monk) on Jul 05, 2009 at 04:15 UTC
    #!/usr/bin/perl #read each line in test1.txt into data_file array use strict; use warnings; my $data_file = "test1.txt"; my $names_file = "code.txt"; my $target_file = "target.txt"; my @data = (); # Cannot be the same name as above my @names_data = (); my @target = (); my $line = ''; open(DATA, $data_file) || die ("Could not open file! $!\n"); @data=<DATA>; close DATA; open(NAMES, $names_file) || die ("Could not open file! $!\n"); @names_data=<NAMES>; close NAMES; #read each line in code.txt into a names_file array foreach $line (@data) { #create loop that reads each ID in #code.txt (NAMES array), searches for #each in array elements for test1.txt #(DATA array), redirects a new #(NAMES).html for each element foreach ( @names_data ) { chomp; push @target, qq|$line<0> > +("$_<0>.html")\n|; #I do not understand what you are trying to do above... #Just modify the part between the pipes |...\n| to include #whatever variables you are wanting. } } open TARGET ">$target_file" || die "Cannot open target file. $!\n"; print TARGET @target; close TARGET;

    Blessings,

    ~Polyglot~

      OK, now I understand!
      Thank you, however, the variable is just the
      element (ie unique name (names).html), i added the <0>
      as I thought based on my ksh script, which is terribly slow
      required a (cnt(cnt+1)) with cnt(0).html as the
      output to redirct to.
      But I had to then had to rename 1.html, 2.html and so forth
      into a 12345.html, 34343.html for example
      I am going to test this to see what happens
      Upon my modifications, and executing the script, I get the following errors.
      $ perl.exe perl1.pl
      Variable "$NAMES" is not imported at perl1.pl line 36.
      Global symbol "$NAMES" requires explicit package name at perl1.pl line 36.
      Missing comma after first argument to open function at perl1.pl line 43, near ""
      Cannot open target file. $!\n";"
      Execution of perl1.pl aborted due to compilation errors. UPDATE- Modifications.
      #!/usr/bin/perl
      #read each line in test1.txt into data_file array

      use strict;
      use warnings;

      my $data_file = "test1.txt";
      my $names_file = "code.txt";
      my $target_file = "target.txt";
      my @data = (); # Cannot be the same name as above
      my @names_data = ();
      my @target = ();
      my $line = '';

      open(DATA, $data_file)
            || die ("Could not open file! $!\n");
      @data=<DATA>;
      close DATA;

      open(NAMES, $names_file)
            || die ("Could not open file! $!\n");
      @names_data=<NAMES>;
      close NAMES;

      #read each line in code.txt into a names_file array
      foreach $line (@data) {

      #create loop that reads each ID in
      #code.txt (NAMES array), searches for
      #each in array elements for test1.txt
      #(DATA array), redirects a new
      #(NAMES).html for each element
      foreach ( @names_data )
      {
      chomp;
      push @target, qq|$line<0> > +("$NAMES.html")\n|;
      #I do not understand what you are trying to do above...
      #Just modify the part between the pipes |...\n| to include
      #whatever variables you are wanting.
      }
      }
      open TARGET ">$target_file"
           || die "Cannot open target file. $!\n";
      print TARGET @target;
      close TARGET;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://777297]
Approved by grep
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-07-23 23:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (154 votes), past polls