Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

The lines of the First file are long so they are being printed in multiple lines. Previously I only gave one line now I am including many lines in the following file sample for file 1 I hope file2 format is understandable.

Gm10 Glyma10g00200.1 CDS_11 8569 8705 - 2 2 6 + 10 14 GAAGAACCAAAATCGCTGATTTTGAAAAATAGGGGGACTGAATGTCTTGATTT +GGAAATAGGGGGACCAAAATTGCGGATTGAGCATTATAGGGGACCAATAGTGTATTTTAGCCTTTATTT +TATTTTGAATGTGTG Gm10 Glyma10g00200.1 CDS_10 8822 8822 - 0 0 0 + 0 0 G Gm10 Glyma10g00200.1 CDS_9 8885 9079 - 2 13 2 +3 38 50 GTATTTGTGTATTGCTCCTGGATCTAGACCTTCCCCTGTTGCTGCTGCTGCT +GCAAGTCACAAATTAATCACATGTATTTCATGCTTCGATGAGCGTTCACTTGCATATCATGCTGTTGGA +TATGGAAGAGGATCTCACATTCCAGCAGTAGCCATTACATCATCAGGCACTGCAGTTTCCAACCTTCTT +CCTGC Gm10 Glyma10g00200.1 CDS_8 9151 9255 - 2 2 11 + 15 19 GTTGGATCATGCCAATGAATTATCAAACTCCCTGAAAGAAAGTGCCAATATTA +ACACTGTTCGGGCATCGCTTATTGTTGAAGAATGCACAAGACTTGGTTTGAT Gm10 Glyma10g00200.1 CDS_7 11534 11698 - 1 7 +16 24 33 GGAATATACTGATCCTCACAAAGGTTCTCTTAACCTTGCTGTGAAGTTGCC +TAATGGTTCTCTAAAGCCAGACCTGGGGCCAAAAACATATATTGCTTATGGATTTCTTCAGGAGCTTGG +ACGTGGTGATTCAGTGACTAAGCTCCATTGTGATATGTCTGATGC Gm10 Glyma10g00200.1 CDS_6 12777 12823 - 0 0 +8 8 11 GCTTTTTACACAAATTTCATTCTTCACCTTCATTCTCATGAAGATTA Gm10 Glyma10g00200.1 CDS_5 12958 13022 - 1 4 +4 9 10 GCTGTAGATCTTCAGTACAAGTATTTAAGGCATTTTCAGTGGCATTGGGAAAA +GGGGGAACCGCT Gm10 Glyma10g00200.1 CDS_4 13348 13502 - 0 4 +20 24 33 GAGGATCCTTATCTTCTTCTTGGGTTCTGAAAAAATAAAAACATCAACTCC +CTCCTGTATTCATGCTTGGTGTTGGTTATGTGTCATGTCTGGTCTACTATTGACACATGTAAATCCAAA +TGCACTAGAGCTGCCCTCAAAAAACTCAATTTGGT Gm10 Glyma10g00200.1 CDS_3 13713 13829 - 0 2 +13 15 25 GATTGAGTTAGATGAGCTGGAAAGTGTTTCCATTCTGTCAATGACCCTTGC +ATGGGATGAATTTTCCTTCTCTACTTTTCAAGAAGCCCATTATTCACTTCAAGATTCTCTAGACCA Gm10 Glyma10g00200.1 CDS_2 13915 14209 - 8 11 + 40 59 88 GCTGCTCTTCTCCCTCAGCTGCTCTGTCTCCGGCGTTGACATTGGAGGAG +GGACTTGAGAAACTGAAGGAGGCTCTCCAAATCTTGAATTCTCCTTCCCCTTCTTCCCCTACTGGATTC +CTTAGGTTTCAGGTGGCGCTCCCTCCCAGTCCTAAGACCTTCACTTTGTTTTGCTCCCAACCCCACTCC +TCCTCGGTCTTTCCTCTCATTTATGTTTCCAAGAACGACGCCGACTCTAAATCACTCTATGTCAATTTG +GATGATAATGTGAGTCACCGGAAAGAAAGGTTCTTTTT Gm10 Glyma10g00200.1 CDS_1 14242 14286 - 2 1 +9 12 21 GATCTCGTTCCCCCTCACCACATAACACCCTGTCGTTCCCTTCAC Gm10 Glyma10g00210.1 CDS_7 15480 15722 - 4 7 +26 37 47 GGTACGATCAAACACATTGGGAGCTGTTGGATTCCTTGGGGATAGCAGAAG +AATCAATGTTGCCATCACAAGAGCACGCAAACATTTAGCTTTGGTCTGTGACAGCTCGACTATATGCCA +CAATACCTTCTTAGCAAGGCTTCTGCGTCATATTAGACACTTTGGTAGGGTGAAGCATGCAGAACCAGG +TAGTTTTGGAGGATATGGACTTGGGATGAATCCAATATTACCTTCCATTAATTA Gm10 Glyma10g00210.1 CDS_6 16625 16782 - 1 9 +15 25 32 GGTGTTAGCCCAACAGCTATTGCAGTGCAATCCCCTTATGTTGCTCAAGTA +CAACTTTTGAGGGACAAGCTTGATGAATTTCCAGAAGCAGCAGGTACTGAGGTTGCAACCATTGACAGT +TTTCAAGGTCGGGAAGCTGATGCAGTAATTTTATCCAT Gm10 Glyma10g00210.1 CDS_5 17595 17763 - 2 9 +16 27 34 GCCTACTTGGATAACACAATGCCCGCTGCTATTGCTAGATACTAGAATGCC +ATATGGAAGTCTGTCAGTTGGTTGTGAAGAGCATCTAGACCCGGCTGGAACAGGCTCACTTTATAATGA +AGGAGAAGCTGAGATAGTTTTGCAGCATGTATTTTCCTTAATCTATGCC Gm10 Glyma10g00210.1 CDS_4 18046 19077 - 12 42 + 91 145 173 GCGCAACTGTGAAGCTTTAATGCTGCTTCAGAAGAATGGTTTACGAA +AGAAGAATCCTTCAATTTCTGTTGTTGCTACACTGTTTGGAGATGGGGAAGATGTTGCATGGCTTGAGA +AAAATCATTTGGCTGACTGGGCAGAAGAAAAATTGGATGGAAGATTAGGAAATGAAACCTTTGATGATT +CTCAGTGGAGAGCAATTGCAATGGGTTTGAATAAAAAGAGGCCTGTATTGGTTATCCAAGGCCCTCCTG +GTACAGGCAAGACTGGTTTGCTCAAGCAACTTATAGCATGTGCTGTTCAGCAAGGTGAAAGGGTTCTTG +TTACAGCACCTACTAATGCAGCTGTTGATAACATGGTAGAAAAGCTTTCAAATGTTGGATTAAATATAG +TGCGGGTTGGAAATCCAGCTCGTATATCAAAAACAGTGGGATCAAAGTCTTTGGAAGAAATTGTAAATG +CTAAGCTTGCAAGTTTTCGAGAAGAGTATGAGAGGAAGAAGTCAGATCTAAGAAAAGATCTAAGACATT +GTTTAAGGGATGATTCACTAGCTTCAGGCATACGCCAACTTCTGAAGCAACTGGGAAGGTCACTGAAGA +AAAAGGAAAAGCAGACCGTAATTGAAGTTCTGTCTAGTGCTCAAGTTGTGGTTGCCACTAATACTGGAG +CAGCTGACCCTTTGGTTCGAAGGCTAGATACCTTTGATTTGGTTGTCATAGATGAAGCGGGACAGGCAA +TTGAACCCTCTTGCTGGATTCCTATATTGCAGGGAAAGCGCTGCATTCTTGCTGGTGATCAATGCCAAC +TTGCTCCTGTCATATTATCTAGAAAGGCCTTAGAAGTTGGTCTAGGAATATCTCTACTGGAGAGAGCTG +CAACTTTGCATGAAGGGATTCTCACCACTAGGTTAACAACACAATACCGTATGAATGATGCAATTGCTA +GTTGGGCTTCAAAGGAGATGTACGGAGGATTATTGAAGTCCTCTGAGACTGTCTTTTCTCATCTTCTAG +TAGACTCTCCTTTTGTTAA Gm10 Glyma10g00210.1 CDS_3 21109 21392 - 4 8 +30 42 52 GGACTTGGGGGAATGCATTTGGTGTTATTCAAGGTTGAAGGGAACCACCGG +TTACCACCAACCACTCTTTCACCTGGAGACATGGTTTGTGTGAGAACATATGACAGCATGGGTGCAATT +ACAACTTCTTGCATACAAGGATTTGTGAACAGTTTTGGGGATGATGGCTATAGTATTACAGTTGCTTTA +GAGTCACGCCATGGTGATCCTACATTCTCTAAACTATTTGGGAAGAGTGTGCGCATTGACCGTATTCAA +GGACTGGCTGATACACTTACTTATGA Gm10 Glyma10g00210.1 CDS_2 24354 24662 - 9 13 + 32 54 66 GCTCAGCATAGGGCGGTTGTGAGAAAGATAACGCAACCCAAGAGTGTCCA +AGGCGTTTTAGGAATGGACTTCGAAAAGGTCAAAGCATTACAGCACAGGATTGACGAATTCACCACCCA +TATGTCAGAACTACTCCGTATTGAAAGGGATGCTGAGTTGGAGTTTACTCAGGAGGAATTGGATGCTGT +TCCTAAACCAGATGATACTTCTGATTCTTCAAAAACGATTGATTTCTTGGTTAGCCATAGCCAGCCTCA +ACAAGAACTCTGCGACACCATTTGTAATTTAAACGCTATCAGTACCTCTACA

There are a total of 11 tab separated values in each line 1.) Chromosome (in this case Gm10) 2.) gene id(or Gene) 3.) Feature( which could be of many types CDS_\d or 5'UTR_\d or 3'UTR_\d e.t.c 4.) Start position of the gene 5.) End position 6.) Strand type + or - 7 to 11 ) different counts of different types 12) the Sequence.

Thanks graff for doing all the work. I'll try to see how your code works. But my other issue is I have 80 such files to compare with Methylation.gtf. I hope it can be done more efficiently. Thanks btw for all help appreciate it


In reply to Re^2: Comparing and getting information from two large files and appending it in a new file by perlkhan77
in thread Comparing and getting information from two large files and appending it in a new file by perlkhan77

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (6)
    As of 2014-09-17 01:44 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (56 votes), past polls