Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Regex for non-patterned input

by sidsinha (Acolyte)
on Aug 15, 2013 at 10:21 UTC ( [id://1049545]=perlquestion: print w/replies, xml ) Need Help??

sidsinha has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I am trying to format the content of an array into a table. The input contains rows of data and each of them seperated by a space or a tab (As below). However, i want the table to have 4 columns but input data has one element which needs to be treated as a single value.

For ex, in the below table:
'cOne cTwo cThree 13 sec cFour cOne cTwo cThree 11 sec cFour cOne cTwo cThree 1 min 2 sec cFour cOne cTwo cThree 13 sec cFour';
should be printed as:
ColumnA ColumnB ColumnC ColumnD cOne cTwo 13 sec cFour cOne cTwo 11 sec cFour cOne cTwo 1 min 2 sec cFour
the entries with say "13 sec" or "1 min 13 sec" should be in one column. Heres the code I tried but its terribly naive. Could someone help me... thanks
use strict; use warnings; use HTML::Table; my $table = new HTML::Table(-border=>0.2, -bgcolor=>'#F4F5F7', -head=> ['ColumnA','ColumnB','ColumnC', 'ColumnD']); + my @wtodays= 'cOne cTwo cThree 13 sec cFour cOne cTwo cThree 11 sec cFour cOne cTwo cThree 1 min 2 sec cFour cOne cTwo cThree 13 sec cFour'; for ( @wtodays ) { $table->addRow(split(/\s+/, "$_\n")); } print $table;

Replies are listed 'Best First'.
Re: Regex for non-patterned input
by choroba (Cardinal) on Aug 15, 2013 at 11:57 UTC
    If you know that only the third column might contain whitespace, you can split each line, and then join the cells from the third to the last but one.
    for (@lines) { my @cells = split; my $time = join ' ', @cells[2 .. $#cells - 1]; $table->addRow(@cells[0, 1], $time, $cells[-1]); }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Regex for non-patterned input
by boftx (Deacon) on Aug 15, 2013 at 12:46 UTC

    I humbly submit that the "array" definition and subsequent code will never do what is desired:

    my @wtodays= 'cOne cTwo cThree 13 sec cFour cOne cTwo cThree 11 sec cFour cOne cTwo cThree 1 min 2 sec cFour cOne cTwo cThree 13 sec cFour'; for ( @wtodays ) { $table->addRow(split(/\s+/, "$_\n")); }

    The array is being assigned a single string constant as if it was a scalar. The data in the array must first be organized properly before any meaningful manipulations can be done with it. For example:

    my @wtodays = ( 'cOne cTwo cThree 13 sec cFour'. 'cOne cTwo cThree 11 sec cFour', 'cOne cTwo cThree 1 min 2 sec cFour', 'cOne cTwo cThree 13 sec cFour', ); for ( @wtodays ) { # do whatever text processing you need on a single row }

    And yes, if there are tabs between the column data, and spaces only occur in the column values, then the split becomes trivial once the array is properly defined.

    Update: grammar correction in previous paragraph.

Re: Regex for non-patterned input
by Lawliet (Curate) on Aug 15, 2013 at 11:13 UTC

    Ah, so you cannot simply split on whitespace, because one of your columns has whitespace in it. Luckily, the data looks simple enough that we can get around that. For example, try the following (untested) regex:

    for ( @wtodays ) { if (/^(\w+)\s+(\w+)\s+(\w+)\s+([\w\s]+)\s+(\w+)$/) { $table->addRow($1, $2, $3, $4, $5); } }

    We individually capture each column. You can see that the regex for capturing the fourth column looks different than the others because of the whitespace it will contain. Specifically, instead of grabbing all the word-like characters, we grab all word-like and space-like characters, and then continue on our merry way to capturing the fifth column.

    I hope this helps, and I hope you understand the logic behind it.

Re: Regex for non-patterned input
by mtmcc (Hermit) on Aug 15, 2013 at 11:39 UTC
    Does your data have to be all in a single array?

    If your data has tabs/spaces/newlines where they should be, you can just split on the tab, and not just on any space, something like this:

    #!/usr/bin/perl use strict; use warnings; use HTML::Table; my $table = new HTML::Table(-border=>0.2, -bgcolor=>'#F4F5F7', -head=> ['ColumnA','ColumnB','ColumnC', 'ColumnD', 'ColumnE']); + while ( <DATA> ) { my @row = split (/\t/, $_); $table->addRow(@row); } print $table; __DATA__ cOne cTwo cThree 13 sec cFour cOne cTwo cThree 11 sec cFour cOne cTwo cThree 1 min 2 sec cFour cOne cTwo cThree 13 sec cFour

Re: Regex for non-patterned input
by kcott (Archbishop) on Aug 16, 2013 at 08:56 UTC

    G'day sidsinha,

    The data you present is at odds with what you've described. The following works on the data you've shown:

    $ perl -Mstrict -Mwarnings -le ' my @wtodays= q{cOne cTwo cThree 13 sec cFour cOne cTwo cThree 11 sec cFour cOne cTwo cThree 1 min 2 sec cFour cOne cTwo cThree 13 sec cFour}; my $re = qr{ ^ # anchor: start of line \s* # discard: possible whites +pace (\w+) # capture: cOne \s+ # discard: whitespace (\w+) # capture: cTwo \s+ # discard: whitespace \w+ # discard: cThree \s+ # discard: whitespace ((?:\d+ \s+ min \s+)? \d+ \s+ sec) # capture: possible min an +d sec \s+ # discard: whitespace (\w+) # capture: cFour \s* # discard: possible whites +pace $ # anchor: end of line }mx; my @table_data; for (@wtodays) { while (/$re/g) { push @table_data => [$1, $2, $3, $4]; } } { local $" = "|"; for (@table_data) { print "@$_"; } } ' cOne|cTwo|13 sec|cFour cOne|cTwo|11 sec|cFour cOne|cTwo|1 min 2 sec|cFour cOne|cTwo|13 sec|cFour

    -- Ken

Re: Regex for non-patterned input
by Anonymous Monk on Aug 15, 2013 at 11:23 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1049545]
Approved by Lawliet
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-26 01:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found