Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
While you're including sample input, you're not showing what the intended output is. So, I had to guess. Perhaps the following will work for you:
#!/usr/bin/perl use 5.010; use strict; use warnings; while (<DATA>) { my %chunks = /(\S+)="([^"]+)"/g; my $header = delete $chunks{GI} || delete $chunks{protein_id} or n +ext; print ">$header"; print ' ', $_, '="', $chunks{$_}, '"' for keys %chunks; print "\n"; } __DATA__ >1001585.MDS_0001 protein_id="YP_004377784.1" product="chromosomal rep +lication initiation protein" GI="330500915" GeneID="10459818" >1001585.MDS_0002 protein_id="YP_004377785.1" product="DNA polymerase +III subunit beta" GI="330500916" GeneID="10454784" >1001585.MDS_0003 protein_id="YP_004377786.1" product="recombination p +rotein F" GI="330500917" GeneID="10454785" >1001585.MDS_0004 protein_id="YP_004377787.1" product="DNA gyrase subu +nit B" GI="330500918" GeneID="10454786" >1001585.MDS_0005 protein_id="YP_004377788.1" GI="330500919" GeneID="1 +0454787" >1001585.MDS_0006 protein_id="YP_004377789.1" GI="330500920" GeneID="1 +0454788" >1001585.MDS_0007 protein_id="YP_004377790.1" GI="330500921" GeneID="1 +0454789" >1001585.MDS_0008 protein_id="YP_004377791.1" GI="330500922" GeneID="1 +0454790" >1001585.MDS_0009 protein_id="YP_004377792.1" product="ABC transporter + permease" GI="330500923" GeneID="10454791" >1001585.MDS_0010 protein_id="YP_004377793.1" product="ABC transporter + ATP-binding protein" GI="330500924" GeneID="10454792" >245014.CK3_35030 protein_id="CBL42879.1" product="Predicted transcrip +tion factor, homolog of eukaryotic MBF1" >245014.CK3_35040 protein_id="CBL42880.1" product="Bacterial protein o +f unknown function (DUF961)."
Which gives as output:
>330500915 protein_id="YP_004377784.1" product="chromosomal replicatio +n initiation protein" GeneID="10459818" >330500916 protein_id="YP_004377785.1" product="DNA polymerase III sub +unit beta" GeneID="10454784" >330500917 protein_id="YP_004377786.1" product="recombination protein +F" GeneID="10454785" >330500918 protein_id="YP_004377787.1" product="DNA gyrase subunit B" +GeneID="10454786" >330500919 protein_id="YP_004377788.1" GeneID="10454787" >330500920 protein_id="YP_004377789.1" GeneID="10454788" >330500921 protein_id="YP_004377790.1" GeneID="10454789" >330500922 protein_id="YP_004377791.1" GeneID="10454790" >330500923 protein_id="YP_004377792.1" product="ABC transporter permea +se" GeneID="10454791" >330500924 protein_id="YP_004377793.1" product="ABC transporter ATP-bi +nding protein" GeneID="10454792" >CBL42879.1 product="Predicted transcription factor, homolog of eukary +otic MBF1" >CBL42880.1 product="Bacterial protein of unknown function (DUF961)."
It removes the GI or protein_id from the line, and doesn't keep the order of the fields. It also assumes all values are enclosed by double quotes (they all do in the input). It also assumes no GI or protein_id is 0.

In reply to Re: help with regex by JavaFan
in thread help with regex by AWallBuilder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2024-04-23 17:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found