Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Joining multiple lines together while parsing

by haukex (Archbishop)
on Mar 24, 2017 at 10:31 UTC ( [id://1185754]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Joining multiple lines together while parsing
in thread Joining multiple lines together while parsing

In that case, how are continuation lines identified? In other words, how should the program act in the case of the following input?

Detail: Some really nice infos these are Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Warning bad news here is this a continuation or not?

Replies are listed 'Best First'.
Re^4: Joining multiple lines together while parsing
by Arengin (Novice) on Mar 24, 2017 at 10:42 UTC
    The input is always like:
    Dumpdata example ----------------- Warning Detail: Info: Spec: Dumpdata example ----------------- Warning Detail: Info: Dumpdata example ----------------- Warning Detail: Spec: Dumpdata example ----------------- Warning Detail: Info: Spec:

    The "Dumpdata example" always starts a new section even if not all elements were present.

      Here's one possible regex approach using named captures (perlre) and %+. However, it's starting to get a little complex, so the line-based state machine type approach from Corion's post is starting to look a little better in this case (I gave some more examples of the state machine approach in this thread).

      use warnings; use strict; use Data::Dump; local $/ = "\n\n\n"; while (<DATA>) { # clobber the header s/ ^ \s* Dumpdata\s+example \s* \n \s* -+ \s* \n //msx or next; my %row; $row{$+{key}} = $+{val} while m{ (?: # the very first key doesn't need colon \A \s* (?<key> \w+ ) # but the other keys need colons | ^ \s* (?<key> \w+ : ) ) \s+ # values should end at the next key (?<val> (?: (?!^\s*\w+:) . )+ ) }xmsg; s/\s+/ /g for values %row; dd \%row; } __DATA__ Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Dumpdata example ----------------- Warning test Detail: foo bar Info: quz baz Spec: blah

      Output:

      { "Detail:" => "Some really nice infos these are ", "Info:" => "This is a problem but there is a solution ", "Warning" => "bad news here ", } { "Detail:" => "foo bar ", "Info:" => "quz baz ", "Spec:" => "blah ", "Warning" => "test ", }

      Update: Using the "branch reset" pattern (?|...) allows for a little bit of simplification:

      my %row = m{ (?| \A \s* ( \w+ ) | ^ \s* ( \w+ : ) ) \s+ ( (?: (?!^\s*\w+:) . )+ ) }xmsg;
Re^4: Joining multiple lines together while parsing
by Arengin (Novice) on Mar 24, 2017 at 11:28 UTC
    Hi
    That last one works great, but is there a way, to add " to every field at the start and end, so that the values are like "value" so I can use it for csv?
      so I can use it for csv?

      Use Text::CSV:

      use Text::CSV; my $csv = Text::CSV->new({binary=>1, always_quote=>1, blank_is_undef=>1, eol=>$/, auto_diag=>2}); $csv->print(select, ['foo', 'bar']); __END__ "foo","bar"

      Replace the call to select with a $filehandle (open) if you're writing to a file.

      Just use a proper CSV module which will do this automatically for you. eg Text::CSV

Re^4: Joining multiple lines together while parsing
by Arengin (Novice) on Mar 24, 2017 at 11:40 UTC
    Works great. thank you very much

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1185754]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 13:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found