Re: Re: Bottom-Up Data Mining with Perl


more useful options
	PerlMonks

Re: Re: Bottom-Up Data Mining with Perl

by jjohhn (Scribe)

on Mar 10, 2003 at 01:58 UTC ( [id://241643]=note: print w/replies, xml )

Need Help??

in reply to Re: Bottom-Up Data Mining with Perl
in thread Bottom-Up Data Mining with Perl

Could you expand on how split, pack, unpack and regexes are related? I feel there's something to what you say, but I can't at all pin it down.

Comment on Re: Re: Bottom-Up Data Mining with Perl

Replies are listed 'Best First'.

Re3: Bottom-Up Data Mining with Perl
by dragonchild (Archbishop) on Mar 10, 2003 at 15:32 UTC

split is more useful with delimited lines, such as tab-delimited or comma-delimited. (However, using a module like Text::CSV is better for delimited text. This is because of lines like "abcd,'Smith, John', blah" - the comma in the quotes is part of the item, not a delimiter.) Now, one could use a regex here, but the regex is harder to understand, and even harder to get right.
```
    my @items = split $delim, $line;
#### vs. (and I know this will make mistakes
    my @items = $line =~ /^?([^$delim]*)(?:${delim}$)?/g;
[download]
```
unpack (if you understand how to use it!) is really good with data that is formatted, like so many columns is the first thing, so many the second, etc. This is often data from a mainframe.
Again, you can use a regex here, but you have to roll it for it to be maintainable. (I'd put an unpack example here, if I was comfortable knowing how to work it.)
```
my @columns = ( 20, 10, 25, 5, 2, 2, 20);
my $regex = map { "(.{$_})" } @columns;
$regex = qr/^${regex}$/;

my @items = $line =~ /$regex/;
[download]
```

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

[reply]
[d/l]
[select]

In Section Meditations

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://241643]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others sharing their wisdom with the Monastery: (6)

As of 2024-04-18 07:05 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found