Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Shorten a list...

by la (Novice)
on Oct 16, 2011 at 00:03 UTC ( [id://931713]=perlquestion: print w/replies, xml ) Need Help??

la has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a list of data that are in the format:

Group 1

Group 2

Group 3
Group 3

Group 4

Group 5
Group 5
Group 5

I need to write a perl script that returns only list of the groups with more than one entry (eg. I would want to shorten this list to only include Group 3 and Group 5's entries).

COuld any one give me some suggestions on how to do this? It is the "group has more than 1 line" part below that I am having trouble writing. Eg.

if(/^\Group\s+(\d+)/){ if(group has more than one line){ print group lines }}

THanks!

Replies are listed 'Best First'.
Re: Shorten a list...
by toolic (Bishop) on Oct 16, 2011 at 00:10 UTC
    One way is to keep track of the group count in a hash:
    use warnings; use strict; my %groups; while (<DATA>) { next unless /\S/; chomp; $groups{$_}++; } for (sort keys %groups) { print "$_\n" if $groups{$_} > 1; } __DATA__ Group 1 Group 2 Group 3 Group 3 Group 4 Group 5 Group 5 Group 5
    Prints...
    Group 3 Group 5

      Thanks for the quick reply. I want the resultant list to read:

      Group 3
      Group 3
      Group 5
      Group 5
      Group 5

      is this possible with the code you have given?

        You're welcome.

        The code I provided can be modified to produce that output. Give it a try.

Re: Shorten a list...
by eyepopslikeamosquito (Archbishop) on Oct 16, 2011 at 10:35 UTC

    Here's a version that preserves the line order within the file while allowing out of order lines (for example, "Group 1" appears on the first and last line below) and making empty lines separating groups optional (for example, there are blank lines within the "Group 5" lines below):

    use strict; use warnings; my %seen; print map { $seen{$_} > 1 ? $_ x $seen{$_} . "\n" : () } grep { not $seen{$_}++ } grep { !/^\s*$/ } <DATA>; __DATA__ Group 1 Group 2 Group 3 Group 3 Group 3 Group 4 Group 5 Group 5 Group 5 Group 5 Group 1
    Running the above program produces:
    Group 1 Group 1 Group 3 Group 3 Group 3 Group 5 Group 5 Group 5 Group 5

      Thanks so much for all of the feedback. As I have been working on this, the format has been changed so that the original input list now looks like:

      Group 1
      name
      Group 2
      name
      Group 3
      name
      name
      Group 4
      name
      Group 5
      name
      name
      name

      Your suggestions so far have been really helpful. Can anyone help me now with trying to only print the groups with multiple entries (Group 3 and Group 5) in this format:

      Group 3
      name
      name

      Group 5
      name
      name
      name

      Again, your help is greatly appreciated

        There may be other ways than this. If the file is large (MB or GB), you might not want this method. It reads the whole file into an array, @data, before printing.
        #!/usr/bin/perl use strict; use warnings; my (@buffer, @data); while (<DATA>) { if (/^Group/) { push @data, [@buffer] if @buffer > 2; @buffer = $_; } else { push @buffer, $_; } } push @data, [@buffer] if @buffer > 2; { local $" = ''; print join("\n", map "@$_", @data); }
        Chris
Re: Shorten a list...
by Cristoforo (Curate) on Oct 16, 2011 at 01:31 UTC
    My program starts by reading in paragraph mode, then chomping. Only groups with more than 1 line can have 1 or more newlines.
    #!/usr/bin/perl use strict; use warnings; use 5.014; { local $/ = ""; while (<DATA>) { chomp; print "$_\n" if tr/\n//; # if 1 or more newlines } } __DATA__ Group 1 Group 2 Group 3 Group 3 Group 4 Group 5 Group 5 Group 5
    prints:
    Group 3 Group 3 Group 5 Group 5 Group 5
Re: Shorten a list...
by chromatic (Archbishop) on Oct 16, 2011 at 00:10 UTC
Re: Shorten a list...
by ambrus (Abbot) on Oct 16, 2011 at 09:28 UTC
Re: Shorten a list...
by sundialsvc4 (Abbot) on Oct 16, 2011 at 13:43 UTC

    A list which appears to the user as, say:

    (1,2), (1,4), (1,7), (2,1), (3,5), (4,161), (4,1991)

    could be physically represented in the actual application like this:   (“Holy LISP, Batman!!”)

    (1, (2,4,7)), (2,(1)), (3, (5)), (4, (161, 1991))

    An abstract data type could be constructed which knew about this internal efficiency without exposing it to its clients, providing them a “list of 2-tuples” interface while, unbeknownst to them, actually storing it and/or indexing it in a more efficient way.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://931713]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-19 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found