Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

matching first paragraph satisfying condition

by mnshptl32 (Initiate)
on Aug 05, 2019 at 16:14 UTC ( #11103963=perlquestion: print w/replies, xml ) Need Help??

mnshptl32 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings! I just registered here and hope this is an appropriate venue for my question.

I'm new to perl and am trying to write a perl one-liner that returns everything from the first non-indented line of a file up until the end of that paragraph, terminated by a blank line. I can do this using awk with the command:

awk -v RS='' -v ORS='\n\n' '/^[^ ].*$/' file.txt | awk -v RS='' 'NR==1 +{print $0}'
My problem translating this to perl is that the first non-indented line of the file may or may not be the first line of the file. In the former case, this works:
perl -0pe 's/.*?\n*?([^ \n].*?)\n\n.*/$1/gs' file.txt
as does this:
perl -0pe 's/([^ \n].*?)\n\n.*/$1/gs' file.txt
In the latter case, this works:
perl -0pe 's/.*?\n([^ \n].*?)\n\n.*/$1/gs' file.txt
But is there a simple perl one-liner that works in both cases? I've tried writing a semicolon-separated perl command intended to prepend a blank line to the file before the search in the event that the first line is not indented, using something like
if ( $. == 1 and /^[^ ].*$/ ) {...}
but I can't get the syntax right. Obviously I could string together a sequence of commands like
echo '' > tempfile.txt ; cat file.txt >> tempfile.txt ; perl ...
or use some bash conditional like
if [[ $(egrep -n -m1 -e '^[^ ]' file.txt | sed 's/^\([0-9]\+\):.*/\1/g +') -eq 1 ]] ; then perl ... ; else perl ... ; fi
however, I'd like to know if there's some more elegant "pure perl" solution I'm overlooking.

Best regards,

Maneesh

Replies are listed 'Best First'.
Re: matching first paragraph satisfying condition
by 1nickt (Abbot) on Aug 05, 2019 at 16:52 UTC

    Hi, welcome to Perl, the One True Religion.

    See the "flip-flop" operator in https://perldoc.perl.org/perlop.html#Range-Operators. (And $/, the input record separator, in perlvar.)

    $ cat foo.txt indented not indented bla bla bla not indented yak
    $ perl -Mstrict -wE '$/="";while (<>) { chomp; if (/^\w/ .. /^$/) {pri +nt; exit} }' < foo.txt not indented bla bla bla

    Update: fix re

    Hope this helps!


    The way forward always starts with a minimal test.
      $ echo " indented not indented bla bla bla not indented yak " | perl -00ne'/^\S/&&print&&exit' not indented bla bla bla

           :)

        Thank you ++ was not familiar with -00


        The way forward always starts with a minimal test.
Re: matching first paragraph satisfying condition
by mnshptl32 (Initiate) on Aug 05, 2019 at 19:55 UTC

    Thank you, 1nickt and jwkrahn, for your lovely solutions to the problem! I also appreciate that your one-liners interpret "indented" as any whitespace at the start of the line, not just spaces. Incidentally, I didn't mention some of my other failed attempts, such as perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

    Best regards,

    Maneesh

      perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

      The | alternation operator has pretty low precedence, so it kind of depends on what your expectations are :-) A common trap is to write something like /^foo|bar$/ and expect that to match only "foo" or "bar", when in fact it is matching ^foo or bar$ - the correct way to express that would have been /^(foo|bar)$/ or /^(?:foo|bar)$/.

      Based on your $1 in the replacement, I suspect you were doing something like s/f(o)o|b(a)r/$1/g and expecting the string "bar" to be turned into "a"? In that case, you need the "branch reset" pattern (?|...) (perlre): s/(?|f(o)o|b(a)r)/$1/g will replace "foo" with "o" and "bar" with "a".

        Thank you, haukex, for the information about the correct use of the branch reset, and sorry for making you guess my incorrect code. It was (and is) a bit of a moot point, since I've been given multiple perl solutions to my problem; however, for the record, my aforementioned failed attempt was to combine the ideas I had for what in my original post I described as the former and latter cases:

        perl -0pe 's/(.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs' +file.txt

        This simply returned the whole file, and I thought it had something to do with the nested parentheses altering what $1 refers to. When I rewrite it correctly, as per haukex's helpful advice:

        perl -0pe 's/(?|.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs +' file.txt

        it does return something, but that something is the desired output only in the case where the first line is non-indented. The reason, obviously, is that I was making a stupid mistake: the first alternative returns a match whether or not the first line of file.txt is indented, so that code is never going to return the output of:

        perl -0pe 's/.*?\n([^ \n].*?)\n\n.*/$1/gs' file.txt

        As I said, it's a moot point, but I thought this follow-up was worth mentioning in case someone who lands on this page learns from my stupid mistake.

        Regards,

        Maneesh

      s/(?:foo|bar)/baz/;
      (non-capturing group)


      The way forward always starts with a minimal test.

        s/(?:foo|bar)/baz/ is pretty much identical to s/foo|bar/baz/, so I don't think that's the issue.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11103963]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2019-12-09 05:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?