The lines of the First file are long so they are being printed in multiple lines. Previously I only gave one line now I am including many lines in the following file sample for file 1 I hope file2 format is understandable.
Gm10 Glyma10g00200.1 CDS_11 8569 8705 - 2 2 6
+ 10 14 GAAGAACCAAAATCGCTGATTTTGAAAAATAGGGGGACTGAATGTCTTGATTT
+GGAAATAGGGGGACCAAAATTGCGGATTGAGCATTATAGGGGACCAATAGTGTATTTTAGCCTTTATTT
+TATTTTGAATGTGTG
Gm10 Glyma10g00200.1 CDS_10 8822 8822 - 0 0 0
+ 0 0 G
Gm10 Glyma10g00200.1 CDS_9 8885 9079 - 2 13 2
+3 38 50 GTATTTGTGTATTGCTCCTGGATCTAGACCTTCCCCTGTTGCTGCTGCTGCT
+GCAAGTCACAAATTAATCACATGTATTTCATGCTTCGATGAGCGTTCACTTGCATATCATGCTGTTGGA
+TATGGAAGAGGATCTCACATTCCAGCAGTAGCCATTACATCATCAGGCACTGCAGTTTCCAACCTTCTT
+CCTGC
Gm10 Glyma10g00200.1 CDS_8 9151 9255 - 2 2 11
+ 15 19 GTTGGATCATGCCAATGAATTATCAAACTCCCTGAAAGAAAGTGCCAATATTA
+ACACTGTTCGGGCATCGCTTATTGTTGAAGAATGCACAAGACTTGGTTTGAT
Gm10 Glyma10g00200.1 CDS_7 11534 11698 - 1 7
+16 24 33 GGAATATACTGATCCTCACAAAGGTTCTCTTAACCTTGCTGTGAAGTTGCC
+TAATGGTTCTCTAAAGCCAGACCTGGGGCCAAAAACATATATTGCTTATGGATTTCTTCAGGAGCTTGG
+ACGTGGTGATTCAGTGACTAAGCTCCATTGTGATATGTCTGATGC
Gm10 Glyma10g00200.1 CDS_6 12777 12823 - 0 0
+8 8 11 GCTTTTTACACAAATTTCATTCTTCACCTTCATTCTCATGAAGATTA
Gm10 Glyma10g00200.1 CDS_5 12958 13022 - 1 4
+4 9 10 GCTGTAGATCTTCAGTACAAGTATTTAAGGCATTTTCAGTGGCATTGGGAAAA
+GGGGGAACCGCT
Gm10 Glyma10g00200.1 CDS_4 13348 13502 - 0 4
+20 24 33 GAGGATCCTTATCTTCTTCTTGGGTTCTGAAAAAATAAAAACATCAACTCC
+CTCCTGTATTCATGCTTGGTGTTGGTTATGTGTCATGTCTGGTCTACTATTGACACATGTAAATCCAAA
+TGCACTAGAGCTGCCCTCAAAAAACTCAATTTGGT
Gm10 Glyma10g00200.1 CDS_3 13713 13829 - 0 2
+13 15 25 GATTGAGTTAGATGAGCTGGAAAGTGTTTCCATTCTGTCAATGACCCTTGC
+ATGGGATGAATTTTCCTTCTCTACTTTTCAAGAAGCCCATTATTCACTTCAAGATTCTCTAGACCA
Gm10 Glyma10g00200.1 CDS_2 13915 14209 - 8 11
+ 40 59 88 GCTGCTCTTCTCCCTCAGCTGCTCTGTCTCCGGCGTTGACATTGGAGGAG
+GGACTTGAGAAACTGAAGGAGGCTCTCCAAATCTTGAATTCTCCTTCCCCTTCTTCCCCTACTGGATTC
+CTTAGGTTTCAGGTGGCGCTCCCTCCCAGTCCTAAGACCTTCACTTTGTTTTGCTCCCAACCCCACTCC
+TCCTCGGTCTTTCCTCTCATTTATGTTTCCAAGAACGACGCCGACTCTAAATCACTCTATGTCAATTTG
+GATGATAATGTGAGTCACCGGAAAGAAAGGTTCTTTTT
Gm10 Glyma10g00200.1 CDS_1 14242 14286 - 2 1
+9 12 21 GATCTCGTTCCCCCTCACCACATAACACCCTGTCGTTCCCTTCAC
Gm10 Glyma10g00210.1 CDS_7 15480 15722 - 4 7
+26 37 47 GGTACGATCAAACACATTGGGAGCTGTTGGATTCCTTGGGGATAGCAGAAG
+AATCAATGTTGCCATCACAAGAGCACGCAAACATTTAGCTTTGGTCTGTGACAGCTCGACTATATGCCA
+CAATACCTTCTTAGCAAGGCTTCTGCGTCATATTAGACACTTTGGTAGGGTGAAGCATGCAGAACCAGG
+TAGTTTTGGAGGATATGGACTTGGGATGAATCCAATATTACCTTCCATTAATTA
Gm10 Glyma10g00210.1 CDS_6 16625 16782 - 1 9
+15 25 32 GGTGTTAGCCCAACAGCTATTGCAGTGCAATCCCCTTATGTTGCTCAAGTA
+CAACTTTTGAGGGACAAGCTTGATGAATTTCCAGAAGCAGCAGGTACTGAGGTTGCAACCATTGACAGT
+TTTCAAGGTCGGGAAGCTGATGCAGTAATTTTATCCAT
Gm10 Glyma10g00210.1 CDS_5 17595 17763 - 2 9
+16 27 34 GCCTACTTGGATAACACAATGCCCGCTGCTATTGCTAGATACTAGAATGCC
+ATATGGAAGTCTGTCAGTTGGTTGTGAAGAGCATCTAGACCCGGCTGGAACAGGCTCACTTTATAATGA
+AGGAGAAGCTGAGATAGTTTTGCAGCATGTATTTTCCTTAATCTATGCC
Gm10 Glyma10g00210.1 CDS_4 18046 19077 - 12 42
+ 91 145 173 GCGCAACTGTGAAGCTTTAATGCTGCTTCAGAAGAATGGTTTACGAA
+AGAAGAATCCTTCAATTTCTGTTGTTGCTACACTGTTTGGAGATGGGGAAGATGTTGCATGGCTTGAGA
+AAAATCATTTGGCTGACTGGGCAGAAGAAAAATTGGATGGAAGATTAGGAAATGAAACCTTTGATGATT
+CTCAGTGGAGAGCAATTGCAATGGGTTTGAATAAAAAGAGGCCTGTATTGGTTATCCAAGGCCCTCCTG
+GTACAGGCAAGACTGGTTTGCTCAAGCAACTTATAGCATGTGCTGTTCAGCAAGGTGAAAGGGTTCTTG
+TTACAGCACCTACTAATGCAGCTGTTGATAACATGGTAGAAAAGCTTTCAAATGTTGGATTAAATATAG
+TGCGGGTTGGAAATCCAGCTCGTATATCAAAAACAGTGGGATCAAAGTCTTTGGAAGAAATTGTAAATG
+CTAAGCTTGCAAGTTTTCGAGAAGAGTATGAGAGGAAGAAGTCAGATCTAAGAAAAGATCTAAGACATT
+GTTTAAGGGATGATTCACTAGCTTCAGGCATACGCCAACTTCTGAAGCAACTGGGAAGGTCACTGAAGA
+AAAAGGAAAAGCAGACCGTAATTGAAGTTCTGTCTAGTGCTCAAGTTGTGGTTGCCACTAATACTGGAG
+CAGCTGACCCTTTGGTTCGAAGGCTAGATACCTTTGATTTGGTTGTCATAGATGAAGCGGGACAGGCAA
+TTGAACCCTCTTGCTGGATTCCTATATTGCAGGGAAAGCGCTGCATTCTTGCTGGTGATCAATGCCAAC
+TTGCTCCTGTCATATTATCTAGAAAGGCCTTAGAAGTTGGTCTAGGAATATCTCTACTGGAGAGAGCTG
+CAACTTTGCATGAAGGGATTCTCACCACTAGGTTAACAACACAATACCGTATGAATGATGCAATTGCTA
+GTTGGGCTTCAAAGGAGATGTACGGAGGATTATTGAAGTCCTCTGAGACTGTCTTTTCTCATCTTCTAG
+TAGACTCTCCTTTTGTTAA
Gm10 Glyma10g00210.1 CDS_3 21109 21392 - 4 8
+30 42 52 GGACTTGGGGGAATGCATTTGGTGTTATTCAAGGTTGAAGGGAACCACCGG
+TTACCACCAACCACTCTTTCACCTGGAGACATGGTTTGTGTGAGAACATATGACAGCATGGGTGCAATT
+ACAACTTCTTGCATACAAGGATTTGTGAACAGTTTTGGGGATGATGGCTATAGTATTACAGTTGCTTTA
+GAGTCACGCCATGGTGATCCTACATTCTCTAAACTATTTGGGAAGAGTGTGCGCATTGACCGTATTCAA
+GGACTGGCTGATACACTTACTTATGA
Gm10 Glyma10g00210.1 CDS_2 24354 24662 - 9 13
+ 32 54 66 GCTCAGCATAGGGCGGTTGTGAGAAAGATAACGCAACCCAAGAGTGTCCA
+AGGCGTTTTAGGAATGGACTTCGAAAAGGTCAAAGCATTACAGCACAGGATTGACGAATTCACCACCCA
+TATGTCAGAACTACTCCGTATTGAAAGGGATGCTGAGTTGGAGTTTACTCAGGAGGAATTGGATGCTGT
+TCCTAAACCAGATGATACTTCTGATTCTTCAAAAACGATTGATTTCTTGGTTAGCCATAGCCAGCCTCA
+ACAAGAACTCTGCGACACCATTTGTAATTTAAACGCTATCAGTACCTCTACA
There are a total of 11 tab separated values in each line
1.) Chromosome (in this case Gm10)
2.) gene id(or Gene)
3.) Feature( which could be of many types CDS_\d or 5'UTR_\d or 3'UTR_\d e.t.c
4.) Start position of the gene
5.) End position
6.) Strand type + or -
7 to 11 ) different counts of different types
12) the Sequence.
Thanks graff for doing all the work. I'll try to see how
your code works. But my other issue is I have 80 such files to compare with Methylation.gtf. I hope it can be done more efficiently. Thanks btw for all help appreciate it