Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: How to split into paragraphs?

by ikegami (Pope)
on Nov 16, 2006 at 05:39 UTC ( #584376=note: print w/ replies, xml ) Need Help??


in reply to How to split into paragraphs?

If you're reading from a file, $/ = '' sets paragraph mode.

local $/ = ''; print OUT ("<$_>") while <IN>;

Alternatively, here's a solution that works for strings:

$out = join '', map { "<$_>" } map { /\G((?:(?!\n\n).)*\n+|.+\z)/sg } $in;


Comment on Re: How to split into paragraphs?
Select or Download Code
Re^2: How to split into paragraphs?
by jrw (Scribe) on Nov 16, 2006 at 12:25 UTC
    Ikegami, see clarification above. I am partitioning based on being able to detect the start of each substring, not based on a separator between substrings.
Re^2: How to split into paragraphs?
by jrw (Scribe) on Nov 16, 2006 at 12:36 UTC
    ikegami: I hadn't thought of using map to generate a list when given only a single input -- that's an interesting idea.

    One thing I'm trying to do is avoid repeating the pattern used for detecting the start of each substring. When I try to capture using m//, I end up having to repeat the pattern to stop each match:

    /(START_PATTERN.*?)(?!START_PATTERN)/g

    split seems to say what I want: "here is the thing that separates the paragraphs from each other". But then I have to piece the parts back together again (see my original post's code) and I'm trying to avoid that.

      Ah, I see. Well, I've already provided the building blocks, but they are well hidden. Let me expose them.

      You need something along the lines of /[^$chars]*/, but instead of negatively matching chars, you want to negatively match a regexp.

      The direct equivalent of
      /[^$chars]*/
      for regexps is
      /(?:(?!$re).)*/

      In context,

      # Input the string. my $in = do { local $/; <DATA> }; # Must move "pos" on a match. # Zero-width match won't work. my $start_pat = qr/^\S+/m; # Break the input into paragraghs. my @paras = $in =~ / \G ( $start_pat (?: (?!$start_pat). )* ) /xgs; # Manipulate the paragraghs. @paras = map { "<$_>" } @paras; # Recombine the paragraphs. my $out = join '', @paras; # Output the string. print($out); __DATA__ abc: asdf1 asdf2 def: asdf3 ghi: asdf4 asdf5

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://584376]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2015-07-04 14:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls