ellem has asked for the wisdom of the Perl Monks concerning the following question:

I don't think I really need a split lesson or code but if I do tell me...

I have a file that looks like:
GENERIC HEADER G08762100 20000310 G08762100 *279048437* 03099111fbdsx *BD* + 00000000*1206.75*000000*200.45*700
The idea is to extract the *highlighted* text. So should I count the positions? Or in the case of the long string work off of the decimal? (ie two to the right 7 to the left (should never be more than 7!) (the *s are to highlight what I need - they do not appear in the text... sorry for not making that clear.)
Ideas, better way?

--
ell em 52 @ g mail, com
There's more than one way to do it, but only some of them actually work.

Replies are listed 'Best First'.
Re: Smartest Way To Split
by Zaxo (Archbishop) on Mar 01, 2005 at 03:38 UTC

    The answer depends on how rigid and accurately followed the data format is. If those are fixed-width fields, unpack may be best. If it's always the same fields highlighted by stars, split will be fine. Here's a way to get just the highlighted ones, wherever they are:

    while (<$fh>) { my @star_fields = m/\*([^*]*)\*/g; # . . . }

    Note the escapes for literal '*'. I imagine those were what were giving you trouble. They will need to be escaped for a split pattern, too.

    After Compline,
    Zaxo

      Note the escapes for literal '*'. I imagine those were what were giving you trouble

      I'd imagine that the asterisks are only there for our benefit. But I could be wrong. I think ellem will have to specify more clearly how the split is supposed to work.

Re: Smartest Way To Split
by trammell (Priest) on Mar 01, 2005 at 04:08 UTC
    Hard to say from the data you've posted, but if the data is delimited by whitespace, you can use split ' ', eg.::
    ... my @fields = (split ' ', $line)[1,3,4,5,7]; ...
Re: Smartest Way To Split
by jpeg (Chaplain) on Mar 01, 2005 at 04:16 UTC
    it looks like you're in the opposite situation as meisterperl was in this node. Regexes could work well for you if the fifth text element's numbers are always separated by one or more 0's.

    splitting on whitespace will get the first two elements you need. Splitting the fifth line element on decimals, then again on multiple 0's, abs, and the . operator to put them back together again might work.

    of course, tmtowtdi/ymmv.