Dear All,
Thank you for all the suggestions. Here is the details of the problem which I am trying to solve:
I have two data files:
Data file 1: It is a 4 column tab-delimited file with second column as fragment and fourth column as position.
Data file 2: It is a 6 column tab -delimited file with second column as base and fifth column as connection.
I am trying to find the match between the fragment (Data file 1, column second) and the base (Data file 2, column second) along with counting the number of connections (Data file 2, fifth column). Here the counting crietria is minium number of zero match should be greater than three but less than equal to 19.
I have written the code to read in both the files but having trouble to connect both of then and search the string.
Below I am posting the code, data file 1 and data file 2.
Final outcome what i am trying to find, if there is a match in from data file 1 to datafile 2 then it should print the fragment and position as final output.
Any help is greatly appreciated.
Regards
#!/usr/bin/perl
use strict;
use warnings;
my $ctfile = "M23263_rna_300-1.dat_1.ct";
my $matchfile = "match_super.dat";
my @match;my %hash;my $fragment; my $position;
open (A, "<", $matchfile) or die "Check the file $!\n";
while (my $line = <A>)
{
chomp $line;
(my $mat, $fragment, my $score, $position) = split(/\t/, $line);
$hash{$fragment} = $position;
}
foreach my $key(keys %hash)
{
# print "$key\t$hash{$key}\n";
my @temp = split ("", $key);
foreach my $val(@temp)
{
print "$val\t$hash{$key}\n";
}
}
close (A);
my ($num1, $base, $num2, $num3, $connect, $num4);
open (B, "<", $ctfile) or die "Check the file $!\n";
while (my $ctline = <B>)
{
chomp $ctline;
if ($ctline =~ /^(\d+\s+dG.*)/)
{
# print "$1\n"; next;
next;
}
my @tempfile = split (/\s+/, $ctline);
my $base = $tempfile[1];
my $connect = $tempfile[4];
print "$base\t$connect\n";
}
close (B);
Data File 1 (match_super.dat):
1-Match: GGTGTTGTATGCCTTTAAA 5 3545
2-Match: GGAAAGTCAAGCCCATCTA 9 3254
3-Match: GTTCTTATTTGCACCTACT 6 180
4-Match: GATGAGGAACAGCAACCTT 5 844
Data file 2 (M23263_rna_300-1.dat_1.ct)
300 dG = -62.54 [initially -70.70] gi178893_M23263_rna_300-1
1 G 0 2 0 1
2 A 1 3 0 2
3 A 2 4 0 3
4 U 3 5 0 4
5 U 4 6 0 5
6 C 5 7 0 6
7 C 6 8 0 7
8 G 7 9 34 8
9 G 8 10 33 9
10 C 9 11 0 10
11 G 10 12 32 11
12 G 11 13 31 12
13 A 12 14 0 13
14 G 13 15 30 14
15 A 14 16 29 15
16 G 15 17 28 16
17 A 16 18 27 17
18 A 17 19 26 18
19 C 18 20 25 19
20 C 19 21 0 20
21 C 20 22 0 21
22 U 21 23 0 22
23 C 22 24 0 23
24 U 23 25 0 24
25 G 24 26 19 25
26 U 25 27 18 26
27 U 26 28 17 27
28 U 27 29 16 28
29 U 28 30 15 29
30 C 29 31 14 30
31 C 30 32 12 31
32 C 31 33 11 32
33 C 32 34 9 33
34 C 33 35 8 34
35 A 34 36 0 35
36 C 35 37 0 36
37 U 36 38 294 37
38 C 37 39 293 38
39 U 38 40 292 39
40 C 39 41 0 40
41 U 40 42 290 41
42 C 41 43 289 42
43 U 42 44 288 43
44 C 43 45 287 44
45 C 44 46 0 45
46 A 45 47 0 46
47 C 46 48 0 47
48 C 47 49 0 48
49 U 48 50 0 49
50 C 49 51 0 50
51 C 50 52 84 51
52 U 51 53 83 52
53 C 52 54 82 53
54 C 53 55 0 54
55 U 54 56 81 55
56 G 55 57 80 56
57 C 56 58 79 57
58 C 57 59 0 58
59 U 58 60 0 59
60 U 59 61 78 60
61 C 60 62 77 61
62 C 61 63 76 62
63 C 62 64 0 63
64 C 63 65 74 64
65 A 64 66 73 65
66 C 65 67 72 66
67 C 66 68 0 67
68 C 67 69 0 68
69 C 68 70 0 69
70 G 69 71 0 70
71 A 70 72 0 71
72 G 71 73 66 72
73 U 72 74 65 73
74 G 73 75 64 74
75 C 74 76 0 75
76 G 75 77 62 76
77 G 76 78 61 77
78 A 77 79 60 78
79 G 78 80 57 79
80 C 79 81 56 80
81 A 80 82 55 81
82 G 81 83 53 82
83 A 82 84 52 83
84 G 83 85 51 84
85 A 84 86 0 85
86 U 85 87 266 86
87 C 86 88 265 87
88 A 87 89 264 88
89 A 88 90 263 89
90 A 89 91 262 90
91 A 90 92 261 91
92 G 91 93 260 92
93 A 92 94 259 93
94 U 93 95 258 94
95 G 94 96 257 95
96 A 95 97 0 96
97 A 96 98 0 97
98 A 97 99 0 98
99 A 98 100 0 99
100 G 99 101 121 100
101 G 100 102 120 101
102 C 101 103 119 102
103 A 102 104 0 103
104 G 103 105 0 104
105 U 104 106 0 105
106 C 105 107 0 106
107 A 106 108 0 107
108 G 107 109 0 108
109 G 108 110 0 109
110 U 109 111 0 110
111 C 110 112 0 111
112 U 111 113 0 112
113 U 112 114 0 113
114 C 113 115 0 114
115 A 114 116 0 115
116 G 115 117 0 116
117 U 116 118 0 117
118 A 117 119 0 118
119 G 118 120 102 119
120 C 119 121 101 120
121 C 120 122 100 121
122 A 121 123 0 122
123 A 122 124 0 123
124 A 123 125 0 124
125 A 124 126 0 125
126 A 125 127 0 126
127 A 126 128 0 127
128 C 127 129 0 128
129 A 128 130 0 129
130 A 129 131 0 130
131 A 130 132 0 131
132 A 131 133 0 132
133 C 132 134 0 133
134 A 133 135 0 134
135 A 134 136 0 135
136 A 135 137 0 136
137 C 136 138 0 137
138 A 137 139 0 138
139 A 138 140 0 139
140 A 139 141 0 140
141 A 140 142 0 141
142 A 141 143 0 142
143 C 142 144 0 143
144 A 143 145 0 144
145 A 144 146 0 145
146 A 145 147 0 146
147 A 146 148 0 147
148 A 147 149 0 148
149 A 148 150 0 149
150 G 149 151 0 150
151 C 150 152 255 151
152 C 151 153 254 152
153 G 152 154 252 153
154 A 153 155 251 154
155 A 154 156 0 155
156 A 155 157 0 156
157 U 156 158 247 157
158 A 157 159 246 158
159 A 158 160 245 159
160 A 159 161 244 160
161 A 160 162 243 161
162 G 161 163 242 162
163 A 162 164 241 163
164 A 163 165 240 164
165 A 164 166 239 165
166 A 165 167 238 166
167 A 166 168 237 167
168 G 167 169 236 168
169 A 168 170 235 169
170 U 169 171 234 170
171 A 170 172 233 171
172 A 171 173 232 172
173 U 172 174 0 173
174 A 173 175 0 174
175 A 174 176 0 175
176 C 175 177 0 176
177 U 176 178 0 177
178 C 177 179 0 178
179 A 178 180 231 179
180 G 179 181 230 180
181 U 180 182 229 181
182 U 181 183 228 182
183 C 182 184 227 183
184 U 183 185 226 184
185 U 184 186 225 185
186 A 185 187 0 186
187 U 186 188 0 187
188 U 187 189 0 188
189 U 188 190 0 189
190 G 189 191 0 190
191 C 190 192 224 191
192 A 191 193 223 192
193 C 192 194 222 193
194 C 193 195 221 194
195 U 194 196 220 195
196 A 195 197 0 196
197 C 196 198 218 197
198 U 197 199 217 198
199 U 198 200 212 199
200 C 199 201 211 200
201 A 200 202 210 201
202 G 201 203 209 202
203 U 202 204 208 203
204 G 203 205 0 204
205 G 204 206 0 205
206 A 205 207 0 206
207 C 206 208 0 207
208 A 207 209 203 208
209 C 208 210 202 209
210 U 209 211 201 210
211 G 210 212 200 211
212 A 211 213 199 212
213 A 212 214 0 213
214 U 213 215 0 214
215 U 214 216 0 215
216 U 215 217 0 216
217 G 216 218 198 217
218 G 217 219 197 218
219 A 218 220 0 219
220 A 219 221 195 220
221 G 220 222 194 221
222 G 221 223 193 222
223 U 222 224 192 223
224 G 223 225 191 224
225 G 224 226 185 225
226 A 225 227 184 226
227 G 226 228 183 227
228 G 227 229 182 228
229 A 228 230 181 229
230 U 229 231 180 230
231 U 230 232 179 231
232 U 231 233 172 232
233 U 232 234 171 233
234 G 233 235 170 234
235 U 234 236 169 235
236 U 235 237 168 236
237 U 236 238 167 237
238 U 237 239 166 238
239 U 238 240 165 239
240 U 239 241 164 240
241 U 240 242 163 241
242 C 241 243 162 242
243 U 242 244 161 243
244 U 243 245 160 244
245 U 244 246 159 245
246 U 245 247 158 246
247 A 246 248 157 247
248 A 247 249 0 248
249 G 248 250 0 249
250 A 249 251 0 250
251 U 250 252 154 251
252 C 251 253 153 252
253 U 252 254 0 253
254 G 253 255 152 254
255 G 254 256 151 255
256 G 255 257 0 256
257 C 256 258 95 257
258 A 257 259 94 258
259 U 258 260 93 259
260 C 259 261 92 260
261 U 260 262 91 261
262 U 261 263 90 262
263 U 262 264 89 263
264 U 263 265 88 264
265 G 264 266 87 265
266 A 265 267 86 266
267 A 266 268 0 267
268 U 267 269 0 268
269 C 268 270 0 269
270 U 269 271 0 270
271 A 270 272 0 271
272 C 271 273 0 272
273 C 272 274 0 273
274 C 273 275 0 274
275 U 274 276 0 275
276 U 275 277 0 276
277 C 276 278 0 277
278 A 277 279 0 278
279 A 278 280 0 279
280 G 279 281 0 280
281 U 280 282 0 281
282 A 281 283 0 282
283 U 282 284 0 283
284 U 283 285 0 284
285 A 284 286 0 285
286 A 285 287 0 286
287 G 286 288 44 287
288 A 287 289 43 288
289 G 288 290 42 289
290 A 289 291 41 290
291 C 290 292 0 291
292 A 291 293 39 292
293 G 292 294 38 293
294 A 293 295 37 294
295 C 294 296 0 295
296 U 295 297 0 296
297 G 296 298 0 297
298 U 297 299 0 298
299 G 298 300 0 299
300 A 299 0 0 300
|