Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
go ahead... be a heretic
 
PerlMonks

Optimizing File handling operations

by paragkalra (Acolyte)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on Nov 06, 2009 at 06:39 UTC ( #805411=perlquestion: print w/ replies, xml ) Need Help??
paragkalra has asked for the wisdom of the Perl Monks concerning the following question:

Hey Folks,

I Frequently use perl to process 2 files line by line.

Most of the times I compare two files line by line and check if one line is same to corresponding line of other file, or if one line is substring of other line etc and many more operations.

The technique which I generally use is to first save lines of two files in 2 different arrays.

Then execute a for loop where I compare corresponding index elements of the 2 arrays.

I guess this is not an optimize solution since if both files are huge creating large sized arrays may consume lot of memory.

So wanted to know if there is any better approach to process 2 files line by line.

Cheers,

Parag

Comment on Optimizing File handling operations
Re: Optimizing File handling operations
by CountZero (Canon) on Nov 06, 2009 at 07:03 UTC
    if both files are huge creating large sized arrays may consume lot of memory.
    That is indeed true, but memory is cheap and as long as you are not running out of memory there is nothing wrong with your approach.

    However, if you are running into a lack of memory, then just open the two files at the same time and read them line by line:

    use strict; use warnings; open my $first_file, '<', '/my/first/file'; open my $second_file, '<', '/my/second/file'; while (my $line_first = <$first_file>) { my $line_second = <$second_file>; # do something with $line_first and $line_second # ... }
    This will work if both files have the same length or /my/first/file has more lines than /my/second/file. If not you will have to put in some extra tests.

    And of course, you will want to check if the open succeeded!

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

[reply]
[d/l]
[select]
Re: Optimizing File handling operations
by Anonymous Monk on Nov 06, 2009 at 20:32 UTC
    If you're on a *nix system, you could diff the files and just process the lines that are different.

    -Greg

[reply]

Back to Seekers of Perl Wisdom


Login:
Password
remember me
What's my password?
Create A New User

Node Status
node history
Node Type: perlquestion [id://805411]
Approved by CountZero
Front-paged by Arunbear
help
Community Ads
Chatterbox
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users
Others musing on the Monastery: (10)
GrandFather
wfsp
atcroft
herveus
Eyck
clinton
$self
vishi83
gnosti
im2
As of 2009-11-21 09:44 GMT
Sections
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth

Future historians will find that the material characteristic of the current era is...

Aluminium
Plastic
Oil
Water
Carbon dioxide
Copper
Iron
Silicon
Salt
Uranium
Hydrogen
Other

Results (729 votes), past polls