Adding back missing newlines between records

puterboy has asked for the wisdom of the Perl Monks concerning the following question:

I have a set of records in plaintext where each record begins with "# file" and that line doesn't appear anywhere but the start of the record. Records should be separated by a blank line (\n\n) but some records are missing the space.

I thought the following perl one-liner would work but it doesn't. What am I doing wrong???

cat <record file> | perl -p -e "s/(?=\w)\n# file/\n\n# file/sg"

My problem is that somehow I can't seem to search across the newline which is what I thought the /s was supposed to help with

Comment on Adding back missing newlines between records Download Code

Replies are listed 'Best First'.
Re: Adding back missing newlines between records by ikegami (Patriarch) on Nov 06, 2009 at 06:30 UTC
My problem is that somehow I can't seem to search across the newline which is what I thought the /s was supposed to help with The `s` modifier makes `.` match every character, including the newline which it doesn't match by default. Useless here since you don't use `.`. The `-p` causes the expression to be applies to each line of input. You're trying to match something you haven't read yet! One way of fixing this is to change the definition of line so that the whole file is read at once. (`-0777`) Then there's the issue that /`(?=\w)\n`/ will never match. How can the next character be both a word character and a newline? `perl -0777pe's/(?<!\n)\n# file/\n\n# file/g' record_file` [download]	[reply] [d/l] [select]
Re^2: Adding back missing newlines between records by 7stud (Deacon) on Nov 06, 2009 at 11:08 UTC
`perl -pe 's/(?<!\n)\n# file/\n\n# file/g' record_file` [download] I don't see how that is supposed to work. The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop). The s/// operator in your code is going to operate on the $_ variable, and the diamond operator(<>) will assign each line in the file to $_ one line at a time. As far as I can tell, at some point $_ will be equal to the string "# file\n", and the previous string will have been "hello world\n" (i.e. not "\n" as desired). Your regex is looking for "\n# file" preceded by a "\n". First, because it seems to me that the diamond operator will produce the line "# file\n", your regex won't match because there is no "\n# file" in that line. Second, it looks to me like you are doing a negative lookbehind beyond the start of the string. How is that supposed to work?	[reply] [d/l]
Re^3: Adding back missing newlines between records by johngg (Canon) on Nov 06, 2009 at 11:29 UTC
I don't see how that is supposed to work. It's supposed to work because, as ikegami pointed out, you use the `-0777` switch to make the interpreter slurp the whole file in one go, the equivalent of undefining `$/` in a script. Thus, the global replace operates on a single string which is the whole file and the `while` implied by `-p` only iterates once. I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Adding back missing newlines between records by Anonymous Monk on Nov 06, 2009 at 11:27 UTC
Its the magick `-0777` option that sets input record separator, so instead of reading lines, it reads records of no more than `oct(0777)` (511) bytes, or if your platform doesn't have record oriented files, it reads the whole file.	[reply] [d/l] [select]
Re^4: Adding back missing newlines between records by 7stud (Deacon) on Nov 06, 2009 at 12:38 UTC
Re^5: Adding back missing newlines between records by Anonymous Monk on Nov 06, 2009 at 12:42 UTC
Re^4: Adding back missing newlines between records by 7stud (Deacon) on Nov 06, 2009 at 12:20 UTC
Re^3: Adding back missing newlines between records by 7stud (Deacon) on Nov 06, 2009 at 12:06 UTC
The -p flag creates a while(<>) loop around the code specified for the -e flag(with print; as the last line in the while loop) Actually, that's not quite accurate. According to what I read, the while loop looks like this: `LINE: while (<>) { # your code goes here } continue { print or die "-p destination: $!\n"; }` [download] A continue block gets executed the instant before the loop condition is evaluated. So 'redo' does not cause the continue block to execute, but 'next' does, and a normal iteration of the loop causes the continue block to execute as well. This works for me: `perl -pe 'if($_ eq "\n"){$n=1;next;} if($n){$n=0;next;}else{s/# file/\ +n# file/;}' data1.txt` [download]	[reply] [d/l] [select]
Re^2: Adding back missing newlines between records by puterboy (Scribe) on Nov 10, 2009 at 07:21 UTC
Thanks for the code and the helpful explanation. I have read 'man perlre' many times but as you pointed out I missed several points there. Thanks for the clarification.	[reply]

Back to Seekers of Perl Wisdom