Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

grabbing chunks of text

by spencerd (Novice)
on Mar 10, 2010 at 18:01 UTC ( [id://827852]=perlquestion: print w/replies, xml ) Need Help??

spencerd has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to grab all the blocks that have :\w ... type = ... :\w with @blocks = /:\w.*? type = .*?:\w/s where I have the text as a scalar and I want to ignore anythhing with :\w ... :\w with no type = inbetween
amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 15:OBFCYCXYE12S amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 17: -98uA*((1.3465+(-0.0073193*(55+temperature)))+(0.00060726* +((55+temperature)^1.3646)))/1.08790902930332 < ipu < -38uA*((1.3465+( +-0.0073193*(55+temperature)))+(0.00060726*((55+temperature)^1.3646))) +/0.872973017262164, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 18: 38uA*((1.3465+(-0.0073193*(55+temperature)))+(0.00060726*( +(55+temperature)^1.3646)))/0.872973017262164 < ipd < 98uA*((1.3465+(- +0.0073193*(55+temperature)))+(0.00060726*((55+temperature)^1.3646)))/ +1.08790902930332, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 19: -10uA < iil < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 20: -10uA < iih < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 21: -0.5v < vi_max < (1.10 * Vdd), amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 22: -100mA < i_max < 100mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 23: -1.10v < vdiode_vss < -0.20v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 24: 0.0v < vin < 3.63v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 25: (Vdd - 0.4) < voh < Vdd, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 26: 0.0v < vol < 0.4v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 27: iol = 12mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 28: ioh = -12mA; amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 30:iohstl152dax amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 32: -10uA < iil < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 33: -10uA < iih < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 34: -0.5v < vi_max < (1.10 * Vdd), amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 35: -100mA < i_max < 100mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 36: -1.10v < vdiode_vss < -0.20v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 31: type = digital_bidir, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 37: 0.20v < vdiode_vdd < 1.10v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 38: 0.2v < vd < 1.2v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 39: 0.0v < vcm < VCC, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 40: (Vdd - 0.4) < voh < Vdd, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 41: 0.0v < vol < 0.4v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 42: iol = 16mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 43: ioh = -16mA; amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 45:IOFCXVCVCXVE12S amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 46: type = digital_bidir, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 47: 1.16v < vih < Vdd, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 48: 0.0v < vil < 0.49v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 49: -98uA*((1.3465+(-0.0073193*(55+temperature)))+(0.00060726* +((55+temperature)^1.3646)))/1.08790902930332 < ipu < -38uA*((1.3465+( +-0.0073193*(55+temperature)))+(0.00060726*((55+temperature)^1.3646))) +/0.872973017262164, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 50: 38uA*((1.3465+(-0.0073193*(55+temperature)))+(0.00060726*( +(55+temperature)^1.3646)))/0.872973017262164 < ipd < 98uA*((1.3465+(- +0.0073193*(55+temperature)))+(0.00060726*((55+temperature)^1.3646)))/ +1.08790902930332, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 51: -10uA < iil < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 52: -10uA < iih < 10uA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 53: -0.5v < vi_max < (1.10 * Vdd), amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 54: -100mA < i_max < 100mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 55: -1.10v < vdiode_vss < -0.20v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 56: 0.20v < vdiode_vdd < 1.10v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 57: (Vdd - 0.4) < voh < Vdd, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 58: 0.0v < vol < 0.4v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 59: iol = 12mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 60: ioh = -12mA; amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 45:IOFCXVCVCXVE12

Replies are listed 'Best First'.
Re: grabbing chunks of text
by almut (Canon) on Mar 10, 2010 at 18:29 UTC
    @blocks = /:\w.*? type = .*?(?=:\w)/sg;

    You need zero-width lookahead for the final :\w substring. Otherwise it will already have been consumed and no longer available when the next match attempt starts.  Also, you need option /g for "global" matching.

      almut That grabbed the chunks of text I was looking for but it also grabbed the chunk from :OBFCYCXYE12S to :iohstl152dax which I was trying to exclude because there is not "type =" in the middle of it

        .* is too permissive. You could use a complicated trick to fix this, but it's easiest to extract the blocks then to filter out the ones you don't want.

        @blocks = grep / type =/, /:\w.*?(?=:\w)/sg;
        If you want to break the blocks at the start of the line, the following will do that for you:
        @blocks = grep / type =/, /^[^:]*:\w.*\n(?:[^:]*: .*\n)*/mg;

        It also demonstrates how lookahead and the non-greedy modifier can be avoided here.

Re: grabbing chunks of text
by BioLion (Curate) on Mar 10, 2010 at 18:11 UTC

    Assuming you can get the lines into an array, grep is your friend:

    use strict; use warnings; use Data::Dumper qw/Dumper/; my @lines = (<DATA>); @lines = grep {/type\s=/} @lines; print Dumper \@lines; __DATA__ amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 35: -100mA < i_max < 100mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 36: -1.10v < vdiode_vss < -0.20v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 31: type = digital_bidir, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 38: 0.2v < vd < 1.2v, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 42: iol = 16mA, amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 43: ioh = -16mA; amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 45:IOFCXVCVCXVE12S amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_1.65v_m40c. +scl 46: type = digital_bidir,

    Gives:

    $VAR1 = [ 'amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_ +1.65v_m40c.scl 31: type = digital_bidir, ', 'amis150hx/logic/scl/amis150hxapra/current/amis150hxapra_bc_ +1.65v_m40c.scl 46: type = digital_bidir, ' ];
    Alternately, you can process your file line by line, capturing only lines that match the simple regex I used in the grep block. HTH.

    Just a something something...
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://827852]
Approved by BioLion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-19 20:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found