Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Regex help!

by lpanokarren (Novice)
on Jun 07, 2013 at 21:26 UTC ( [id://1037760]=perlquestion: print w/replies, xml ) Need Help??

lpanokarren has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am having trouble with extracting text from unformatted descriptions. Basically, I need to extract disk capacity from the data center description which basically has a numeric value preceding the unit of measure like GB

E.g. x='Disk - HD 500GB 2.5 SATA (00X3Y)'


I wrote a sed script

echo $x | sed "s/^[^0-9]*\([0-9]*[.]*[0-9]*\)[^0-9A-Z]*GB.*/\1/g"


this works in this case. However, if I run into - x='USE 28-000009 FLASH CARD, IODUO, 1280GB, MLC, FUSION IO, FS3-204-641-CS-0001'

echo $x | sed "s/^[^0-9]*\([0-9]*[.]*[0-9]*\)[^0-9A-Z]*GB.*/\1/g"


returns nothing

Any ideas?

Replies are listed 'Best First'.
Re: Regex help!
by LanX (Saint) on Jun 07, 2013 at 22:03 UTC
    this is Perl

    DB<110> @a= ('USE 28-000009 FLASH CARD, IODUO, 1280GB, MLC, FUSION +IO, FS3-204-641-CS-0001' , 'Disk - HD 500GB 2.5 SATA (00X3Y)', 'Disk +- HD 500.31415926GB 2.5 SATA (00X3Y)') => ( "USE 28-000009 FLASH CARD, IODUO, 1280GB, MLC, FUSION IO, FS3-204-64 +1-CS-0001", "Disk - HD 500GB 2.5 SATA (00X3Y)", "Disk - HD 500.31415926GB 2.5 SATA (00X3Y)", ) DB<111> print / (\d+(?:\.\d+)?)GB\W/,"\n" for @a 1280 500 500.31415926

    and sed ain't no PCRE!

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Thanks a ton Rolf! although, do you know if your code works for this input?

      'Disk - HD 600GB 15K'

      I used the regex you provided, and it seems for this the output is all 0s - unless I am doing something wrong

        > unless I am doing something wrong

        you are doing something wrong!

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        I used the regex you provided, and it seems for this the output is all 0s - unless I am doing something wrong

        But you don't show what you're doing

Re: Regex help! (OT)
by LanX (Saint) on Jun 07, 2013 at 21:40 UTC
    > Any ideas?

    use <code> tags and ask sedmonks ? =)

    edit

    but as far as I can decipher the stuff you posted your regex checks for strings with non-numbers from start till "size GB". (this /^[^0-9]*...)

    But the failing string has numbers in between.

    If you provide Perl code and sample data people here might help you. =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Regex help!
by shmem (Chancellor) on Jun 09, 2013 at 08:33 UTC
    Basically, I need to extract disk capacity from the data center description which basically has a numeric value preceding the unit of measure like GB
    ...
    echo $x | sed "s/^[^0-9]*\([0-9]*[.]*[0-9]*\)[^0-9A-Z]*GB.*/\1/g"

    Why so complicated? This should do:

    echo $x | sed "s/.\+ \([0-9]\+\)GB.*/\1/g"

    But as LanX says, sed(1) doesn't use PCRE. Read perlrun and perlre to understand the following:

    echo $x | perl -ple 's/.+?(\d+)\s?GB.*/$1/'
    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Regex help!
by lpanokarren (Novice) on Jun 07, 2013 at 21:44 UTC

    sorry - for the poor formatting - I am actually trying to implement this in Informatica which accepts Perl compatible regular expressions (PCRE) - so this is technically a problem now I am not able to solve even with PCRE I used sed to illustrate the PCRE I was using - not good at one liners.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1037760]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-19 19:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found