Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Removing Duplicates from a multiline entry

by 7stud (Deacon)
on Feb 27, 2013 at 18:55 UTC ( #1020947=note: print w/ replies, xml ) Need Help??


in reply to Removing Duplicates from a multiline entry

I would suggest to set the input separator ($/) to paragraph mode (empty string) and get the product id from the beginning of every paragraph.

Some explanation(if needed). A text file is really just one long string of characters, e.g.:

line 1\nline 2\nline 3\n

By default, perl reads a file line by line, where the definition of a line is to read all the characters up to and including a newline(\n). However, a paragraph is denoted by two newlines(\n\n):

line1\nline2\n\nline1\nline2\n

The double newline is what creates the blank line. Try it: type some text and at the end of the line hit RETURN, then hit RETURN again--you'll get a paragraph. Each time you hit RETURN when you are typing some text, a newline is entered in your text.

Conveniently, perl allows you to change the definition of what a line is. You can tell perl that you want a line to consist of all the characters up to and including two consecutive newlines. That is known as paragraph mode, and you set paragraph mode by setting $/ to a blank string(yeah, it would make more sense to set it to "\n\n", but that's perl.).

The neat thing about being able to set the definition of a line is that you can also read chunks of files that look like this:

aaaaa
bbbb
ccccc
..
ddddd
eeeee
fffffff
ggggg
..

For instance:

use strict; use warnings; use 5.012; $/ = "..\n"; while (my $line = <DATA>) { say '-' x 20; print $line; say '=' x 20; } __DATA__ aaaaa bbbb ccccc xx ddddd eeeee fffffff ggggg xx --output:-- -------------------- aaaaa bbbb ccccc .. ==================== -------------------- ddddd eeeee fffffff ggggg .. ====================

The other common mode besides paragraph mode is slurp mode. If you set $/ to undef, then perl will read the whole file into a single string.


Comment on Re: Removing Duplicates from a multiline entry
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020947]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2014-07-31 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (245 votes), past polls