Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Fastest way of changing the beginning of a txt file..

by pedrete (Sexton)
on Feb 18, 2019 at 09:08 UTC ( [id://1230067]=perlquestion: print w/replies, xml ) Need Help??

pedrete has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

i want to read a huge text file reading its lines... once i have found the line i was looking for, i want to "cut" the file so it will start at this very line (discarding previous ones).

There are many ways to do this, but looking for the fastest way i was wondering...

is there any way to tell the OS some thing like: "Set actual FILEHANDLE's position as the file-start postion"??? in a way that if i close the FILEHANDLE, then the file will only have valid lines...

Thank...

Pedreter.

  • Comment on Fastest way of changing the beginning of a txt file..

Replies are listed 'Best First'.
Re: Fastest way of changing the beginning of a txt file..
by dave_the_m (Monsignor) on Feb 18, 2019 at 11:22 UTC
    In general, OSes don't allow you to truncate the start (as opposed to end) of a file. So you have to create a new file and copy lines N onwards to it from the old file

    Dave.

Re: Fastest way of changing the beginning of a txt file.. -- oneliner
by Discipulus (Canon) on Feb 18, 2019 at 09:55 UTC
    Hello pedrete

    A oneliner?

    # windows double quotes perl -lne "$mark++ if /MyWantedTextMark/; print if $mark" a.txt > b.t +xt # Linux quotes perl -lne '$mark++ if /MyWantedTextMark/; print if $mark' a.txt > b.t +xt

    See perlrun where perl switch are described

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Thanks for your reply...

      Your idea would work, but is not exactly what i was trying to do.... i just want move the filesystem pointer from the actual "begginning of file" to a "new beginning of file"...

        i just want move the filesystem pointer from the actual "begginning of file" to a "new beginning of file"...

        No filesystem (AFAIK) provides that facility.

        In theory, it could be done (after a fashion) using low-level IO primitives, but only to the nearest block boundary -- which might be modulus 512, 4096 or some other power of 2 depending upon the device the file is located on -- which isn't very useful.

        It is possible to truncate the end of a file on most file systems; and again in theory, it might be somewhat quicker to copy the rest of the file over the removed portion in situ, then truncate the end. But it is easy to get this wrong and there is no guarentee it will be quicker on any given device or day.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        Hello,

        you can use seek but it seeks in bytes and you anyway have to scan the file in advance to track the position of wanted first line so I see no point in this (unless you alredy have it recorderd somehow).

        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Fastest way of changing the beginning of a txt file..
by 1nickt (Canon) on Feb 18, 2019 at 11:16 UTC

    See seek.


    The way forward always starts with a minimal test.
Re: Fastest way of changing the beginning of a txt file..
by BillKSmith (Monsignor) on Feb 18, 2019 at 16:44 UTC
    I doubt that there is any satisfactory solution to the problem you have posted. The restriction has nothing to do with perl, but rather, your operating system. Perhaps, if you would tell us about the problem that appears to need this operation, we could provide a working solution to that problem.
    Bill
Re: Fastest way of changing the beginning of a txt file..
by hippo (Bishop) on Feb 18, 2019 at 11:36 UTC
    a huge text file

    How huge? More or less than available RAM?

Re: Fastest way of changing the beginning of a txt file..
by Marshall (Canon) on Feb 20, 2019 at 01:36 UTC
    The simple answer is NO!

    Think of a disk file like an old fashioned cassette tape. You start at the beginning of the tape and write stuff on the tape. You cannot go into the middle of the tape to delete stuff or add extra stuff - the tape recording has to be contiguous - a disk file is the same way. You can't replace a song in the middle of the tape with another song.

    If you want to truncate a disk file (so the stuff at the beginning is not there anymore), the only way to do that is to "make a new cassette tape". Basically you read the file and process the data you want, then copy the rest of the original data to a brand new file. Then you rename that brand new file to the original file name. Whola, you have processed the data and truncated the original file.

    In the cassette analog world, you would play cassette 1 until you skipped over the stuff you didn't want to save, and then start recording the output from cassette 1 onto a new cassette 2 from that point. The disk file system essentially works the same way.

    THERE MAY BE ANOTHER WAY FOR YOU:

    Unlike a cassette tape, a disk file can be "fast forwarded" very, very quickly.
    As another analogy, think about a bookmark. When you read a long novel, you put a bookmark in so that you can restart reading again where you left off. You can do the same thing with a disk file. There is function, tell() where you can find out exactly what byte from the beginning of the file that you are currently at. There is a function, seek() where you can go back to that exact byte position where left off before.

    So instead of actually truncating the file (which requires copying all the data that you haven't processed to another file), you can keep track of where you are in the input file (the exact byte offset). Going back to that exact byte can happen quickly, much faster than re-reading the first part of the novel. Of course this means that the actual storage space of the file on the disk doesn't get smaller.

    You are asking about something that is possible, but unusual. Can you tell us more about what your application is?
    Of course to seek() to where you left off, that info would have to be stored somewhere - maybe in another file or DB?

Re: Fastest way of changing the beginning of a txt file..
by pwagyi (Monk) on Feb 19, 2019 at 03:35 UTC

    >> is there any way to tell the OS some thing like: "Set actual FILEHANDLE's position as the file-start postion"??? in a way that if i close the FILEHANDLE, then the file will only have valid lines... Your best bet is to copy the only part of the file you want and write to new file, and rename the file. You cannot shrink a file nor insert in the middle in-place. you could append/overwrite at the end.

Re: Fastest way of changing the beginning of a txt file..
by pablopelos (Sexton) on Feb 22, 2019 at 20:35 UTC
    Changing the front of a file is hard on most file systems, they just aren't built for this. Most DVRs use special file systems that allow for this, kind of like a doubly linked list. This way they can trim of the front of a file and alter the end as well, or keep a rolling 1 hr video file. There really is no quick way to do it safely other than choose a different file system.
Re: Fastest way of changing the beginning of a txt file..
by pwagyi (Monk) on Feb 19, 2019 at 03:21 UTC

    You can try Tie::File (https://perldoc.perl.org/Tie/File.html)

    The file is not loaded into memory, so this will work even for giganti +c files. Changes to the array are reflected in the file immediately.
      You can try Tie::File

      Unfortunately, Tie::File adds significant overhead, so it's pretty safe to assume that it would slow things down and burn more memory.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1230067]
Approved by Ratazong
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-25 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found