Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Removing Duplicates from a multiline entry

by 7stud (Deacon)
on Feb 27, 2013 at 18:55 UTC ( #1020947=note: print w/ replies, xml ) Need Help??


in reply to Removing Duplicates from a multiline entry

I would suggest to set the input separator ($/) to paragraph mode (empty string) and get the product id from the beginning of every paragraph.

Some explanation(if needed). A text file is really just one long string of characters, e.g.:

line 1\nline 2\nline 3\n

By default, perl reads a file line by line, where the definition of a line is to read all the characters up to and including a newline(\n). However, a paragraph is denoted by two newlines(\n\n):

line1\nline2\n\nline1\nline2\n

The double newline is what creates the blank line. Try it: type some text and at the end of the line hit RETURN, then hit RETURN again--you'll get a paragraph. Each time you hit RETURN when you are typing some text, a newline is entered in your text.

Conveniently, perl allows you to change the definition of what a line is. You can tell perl that you want a line to consist of all the characters up to and including two consecutive newlines. That is known as paragraph mode, and you set paragraph mode by setting $/ to a blank string(yeah, it would make more sense to set it to "\n\n", but that's perl.).

The neat thing about being able to set the definition of a line is that you can also read chunks of files that look like this:

aaaaa
bbbb
ccccc
..
ddddd
eeeee
fffffff
ggggg
..

For instance:

use strict; use warnings; use 5.012; $/ = "..\n"; while (my $line = <DATA>) { say '-' x 20; print $line; say '=' x 20; } __DATA__ aaaaa bbbb ccccc xx ddddd eeeee fffffff ggggg xx --output:-- -------------------- aaaaa bbbb ccccc .. ==================== -------------------- ddddd eeeee fffffff ggggg .. ====================

The other common mode besides paragraph mode is slurp mode. If you set $/ to undef, then perl will read the whole file into a single string.


Comment on Re: Removing Duplicates from a multiline entry
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020947]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2015-07-05 05:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls