Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How to extract the particular residues from PDB files

by Payal (Initiate)
on Apr 24, 2012 at 12:17 UTC ( [id://966813]=perlquestion: print w/replies, xml ) Need Help??

Payal has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Please tell me how to extract the residues left after ignoring the first 3 and last 3 residue of my interest from a PDB file. For e.g. The PDB files looks like as follows

Where the 6th column represents the residue number. so, i need to extract only the residues from 119-121.

This is an example i have multiple files like this, so could anyone suggest me any program to extract the residues ignoring the first and last 3 residues from the various PDB files and the residue numbers may vary for each file

please help!

ATOM 2279 P G 0 116 82.059 150.340 28.009 1.00 31.85 + P ATOM 2280 OP1 G 0 116 81.476 151.290 27.052 1.00 34.42 + O ATOM 2281 OP2 G 0 116 82.278 148.940 27.598 1.00 33.86 + O ATOM 2282 O5' G 0 116 83.444 150.926 28.532 1.00 30.55 + O ATOM 2283 C5' G 0 116 84.687 150.466 27.992 1.00 29.38 + C ATOM 2284 C4' G 0 116 85.826 151.264 28.568 1.00 28.39 + C ATOM 2285 O4' G 0 116 85.628 152.659 28.245 1.00 27.32 + O ATOM 2286 C3' G 0 116 85.947 151.219 30.082 1.00 28.22 + C ATOM 2287 O3' G 0 116 86.748 150.108 30.466 1.00 27.76 + O ATOM 2288 C2' G 0 116 86.627 152.542 30.399 1.00 27.78 + C ATOM 2289 O2' G 0 116 88.022 152.453 30.209 1.00 28.00 + O ATOM 2290 C1' G 0 116 86.039 153.463 29.328 1.00 26.71 + C ATOM 2291 N9 G 0 116 84.884 154.242 29.764 1.00 26.55 + N ATOM 2292 C8 G 0 116 83.566 154.016 29.448 1.00 26.21 + C ATOM 2293 N7 G 0 116 82.758 154.904 29.958 1.00 25.33 + N ATOM 2294 C5 G 0 116 83.590 155.763 30.658 1.00 25.18 + C ATOM 2295 C6 G 0 116 83.286 156.917 31.405 1.00 24.41 + C ATOM 2296 O6 G 0 116 82.185 157.436 31.603 1.00 25.14 + O ATOM 2297 N1 G 0 116 84.427 157.482 31.955 1.00 24.22 + N ATOM 2298 C2 G 0 116 85.700 156.997 31.802 1.00 25.05 + C ATOM 2299 N2 G 0 116 86.676 157.673 32.422 1.00 26.78 + N ATOM 2300 N3 G 0 116 85.999 155.928 31.099 1.00 25.66 + N ATOM 2301 C4 G 0 116 84.904 155.362 30.557 1.00 26.46 + C ATOM 2302 P A 0 117 86.659 149.533 31.961 1.00 26.52 + P ATOM 2303 OP1 A 0 117 87.699 148.487 32.051 1.00 26.42 + O ATOM 2304 OP2 A 0 117 85.262 149.206 32.295 1.00 28.33 + O ATOM 2305 O5' A 0 117 87.092 150.756 32.874 1.00 27.56 + O ATOM 2306 C5' A 0 117 88.476 151.098 33.043 1.00 26.15 + C ATOM 2307 C4' A 0 117 88.604 152.215 34.039 1.00 25.69 + C ATOM 2308 O4' A 0 117 87.956 153.394 33.506 1.00 25.31 + O ATOM 2309 C3' A 0 117 87.898 151.981 35.360 1.00 26.48 + C ATOM 2310 O3' A 0 117 88.662 151.204 36.264 1.00 28.07 + O ATOM 2311 C2' A 0 117 87.641 153.392 35.861 1.00 25.82 + C ATOM 2312 O2' A 0 117 88.767 153.953 36.493 1.00 27.16 + O ATOM 2313 C1' A 0 117 87.363 154.134 34.557 1.00 24.60 + C ATOM 2314 N9 A 0 117 85.935 154.263 34.287 1.00 22.06 + N ATOM 2315 C8 A 0 117 85.132 153.456 33.524 1.00 21.81 + C ATOM 2316 N7 A 0 117 83.880 153.839 33.489 1.00 20.36 + N ATOM 2317 C5 A 0 117 83.858 154.976 34.280 1.00 19.91 + C ATOM 2318 C6 A 0 117 82.829 155.852 34.643 1.00 18.69 + C ATOM 2319 N6 A 0 117 81.570 155.729 34.230 1.00 19.04 + N ATOM 2320 N1 A 0 117 83.141 156.881 35.452 1.00 19.35 + N ATOM 2321 C2 A 0 117 84.403 157.016 35.861 1.00 20.19 + C ATOM 2322 N3 A 0 117 85.460 156.265 35.584 1.00 22.83 + N ATOM 2323 C4 A 0 117 85.116 155.247 34.778 1.00 22.17 + C ATOM 2324 P G 0 118 87.902 150.250 37.307 1.00 28.29 + P ATOM 2325 OP1 G 0 118 88.940 149.427 37.950 1.00 29.52 + O ATOM 2326 OP2 G 0 118 86.775 149.594 36.622 1.00 28.46 + O ATOM 2327 O5' G 0 118 87.302 151.265 38.380 1.00 27.94 + O ATOM 2328 C5' G 0 118 88.169 151.888 39.333 1.00 27.42 + C ATOM 2329 C4' G 0 118 87.490 153.048 40.024 1.00 25.87 + C ATOM 2330 O4' G 0 118 87.004 154.008 39.052 1.00 26.86 + O ATOM 2331 C3' G 0 118 86.265 152.778 40.870 1.00 25.76 + C ATOM 2332 O3' G 0 118 86.556 152.253 42.141 1.00 25.76 + O ATOM 2333 C2' G 0 118 85.660 154.166 40.996 1.00 26.49 + C ATOM 2334 O2' G 0 118 86.309 154.965 41.971 1.00 26.41 + O ATOM 2335 C1' G 0 118 85.913 154.731 39.604 1.00 25.05 + C ATOM 2336 N9 G 0 118 84.720 154.532 38.784 1.00 24.09 + N ATOM 2337 C8 G 0 118 84.515 153.623 37.775 1.00 23.45 + C ATOM 2338 N7 G 0 118 83.308 153.676 37.279 1.00 23.59 + N ATOM 2339 C5 G 0 118 82.688 154.689 37.997 1.00 22.64 + C ATOM 2340 C6 G 0 118 81.368 155.203 37.920 1.00 21.82 + C ATOM 2341 O6 G 0 118 80.448 154.858 37.183 1.00 22.56 + O ATOM 2342 N1 G 0 118 81.165 156.224 38.837 1.00 24.13 + N ATOM 2343 C2 G 0 118 82.106 156.691 39.719 1.00 24.15 + C ATOM 2344 N2 G 0 118 81.728 157.689 40.530 1.00 22.10 + N ATOM 2345 N3 G 0 118 83.329 156.218 39.802 1.00 23.59 + N ATOM 2346 C4 G 0 118 83.550 155.229 38.919 1.00 22.57 + C ATOM 2347 P A 0 119 85.451 151.374 42.883 1.00 24.74 + P ATOM 2348 OP1 A 0 119 86.044 150.927 44.159 1.00 26.05 + O ATOM 2349 OP2 A 0 119 84.948 150.367 41.917 1.00 25.66 + O ATOM 2350 O5' A 0 119 84.274 152.399 43.184 1.00 23.08 + O ATOM 2351 C5' A 0 119 84.425 153.413 44.181 1.00 21.80 + C ATOM 2352 C4' A 0 119 83.165 154.231 44.297 1.00 21.62 + C ATOM 2353 O4' A 0 119 82.848 154.800 43.000 1.00 21.28 + O ATOM 2354 C3' A 0 119 81.893 153.504 44.712 1.00 21.73 + C ATOM 2355 O3' A 0 119 81.789 153.355 46.128 1.00 22.94 + O ATOM 2356 C2' A 0 119 80.808 154.419 44.158 1.00 21.68 + C ATOM 2357 O2' A 0 119 80.555 155.534 44.998 1.00 21.44 + O ATOM 2358 C1' A 0 119 81.443 154.902 42.853 1.00 20.53 + C ATOM 2359 N9 A 0 119 81.038 154.093 41.701 1.00 18.91 + N ATOM 2360 C8 A 0 119 81.773 153.156 41.019 1.00 17.60 + C ATOM 2361 N7 A 0 119 81.122 152.587 40.040 1.00 15.41 + N ATOM 2362 C5 A 0 119 79.876 153.191 40.075 1.00 15.59 + C ATOM 2363 C6 A 0 119 78.726 153.022 39.296 1.00 15.99 + C ATOM 2364 N6 A 0 119 78.644 152.170 38.276 1.00 17.57 + N ATOM 2365 N1 A 0 119 77.648 153.769 39.600 1.00 14.95 + N ATOM 2366 C2 A 0 119 77.735 154.630 40.610 1.00 13.99 + C ATOM 2367 N3 A 0 119 78.759 154.887 41.408 1.00 16.31 + N ATOM 2368 C4 A 0 119 79.812 154.122 41.088 1.00 15.76 + C ATOM 2369 P A 0 120 80.796 152.244 46.748 1.00 21.32 + P ATOM 2370 OP1 A 0 120 81.143 152.046 48.169 1.00 23.79 + O ATOM 2371 OP2 A 0 120 80.804 151.091 45.844 1.00 20.80 + O ATOM 2372 O5' A 0 120 79.374 152.967 46.734 1.00 22.75 + O ATOM 2373 C5' A 0 120 78.358 152.592 45.803 1.00 22.09 + C ATOM 2374 C4' A 0 120 76.997 152.623 46.462 1.00 22.55 + C ATOM 2375 O4' A 0 120 77.077 151.946 47.744 1.00 22.12 + O ATOM 2376 C3' A 0 120 76.371 153.971 46.789 1.00 21.85 + C ATOM 2377 O3' A 0 120 75.672 154.526 45.670 1.00 23.27 + O ATOM 2378 C2' A 0 120 75.383 153.593 47.883 1.00 21.38 + C ATOM 2379 O2' A 0 120 74.231 152.983 47.340 1.00 21.97 + O ATOM 2380 C1' A 0 120 76.158 152.526 48.649 1.00 21.08 + C ATOM 2381 N9 A 0 120 76.904 153.069 49.790 1.00 19.68 + N ATOM 2382 C8 A 0 120 78.199 152.806 50.146 1.00 19.52 + C ATOM 2383 N7 A 0 120 78.590 153.443 51.216 1.00 20.27 + N ATOM 2384 C5 A 0 120 77.479 154.180 51.593 1.00 19.90 + C ATOM 2385 C6 A 0 120 77.245 155.068 52.658 1.00 20.06 + C ATOM 2386 N6 A 0 120 78.159 155.385 53.573 1.00 20.97 + N ATOM 2387 N1 A 0 120 76.024 155.626 52.753 1.00 19.71 + N ATOM 2388 C2 A 0 120 75.108 155.310 51.840 1.00 20.35 + C ATOM 2389 N3 A 0 120 75.205 154.497 50.796 1.00 21.62 + N ATOM 2390 C4 A 0 120 76.434 153.958 50.726 1.00 19.27 + C ATOM 2391 P U 0 121 75.228 156.066 45.692 1.00 18.62 + P ATOM 2392 OP1 U 0 121 76.432 156.854 45.413 1.00 24.06 + O ATOM 2393 OP2 U 0 121 74.464 156.320 46.914 1.00 22.04 + O ATOM 2394 O5' U 0 121 74.239 156.216 44.458 1.00 23.38 + O ATOM 2395 C5' U 0 121 74.735 156.392 43.120 1.00 23.14 + C ATOM 2396 C4' U 0 121 73.669 156.018 42.123 1.00 23.88 + C ATOM 2397 O4' U 0 121 73.283 154.645 42.328 1.00 24.01 + O ATOM 2398 C3' U 0 121 72.376 156.805 42.242 1.00 25.26 + C ATOM 2399 O3' U 0 121 72.475 157.976 41.445 1.00 26.37 + O ATOM 2400 C2' U 0 121 71.326 155.856 41.676 1.00 25.26 + C ATOM 2401 O2' U 0 121 71.155 155.990 40.284 1.00 26.72 + O ATOM 2402 C1' U 0 121 71.924 154.482 41.987 1.00 25.28 + C ATOM 2403 N1 U 0 121 71.243 153.724 43.047 1.00 26.48 + N ATOM 2404 C2 U 0 121 70.174 152.931 42.675 1.00 26.92 + C ATOM 2405 O2 U 0 121 69.789 152.838 41.528 1.00 27.89 + O ATOM 2406 N3 U 0 121 69.569 152.251 43.700 1.00 27.64 + N ATOM 2407 C4 U 0 121 69.916 152.278 45.030 1.00 26.45 + C ATOM 2408 O4 U 0 121 69.276 151.602 45.833 1.00 28.11 + O ATOM 2409 C5 U 0 121 71.029 153.118 45.339 1.00 27.38 + C ATOM 2410 C6 U 0 121 71.642 153.794 44.363 1.00 26.84 + C ATOM 2411 P C 0 122 71.647 159.283 41.839 1.00 27.15 + P ATOM 2412 OP1 C 0 122 70.813 158.950 43.014 1.00 29.65 + O ATOM 2413 OP2 C 0 122 71.003 159.827 40.616 1.00 27.43 + O ATOM 2414 O5' C 0 122 72.774 160.295 42.310 1.00 27.51 + O ATOM 2415 C5' C 0 122 73.693 160.863 41.373 1.00 27.29 + C ATOM 2416 C4' C 0 122 75.078 160.844 41.952 1.00 27.42 + C ATOM 2417 O4' C 0 122 75.511 159.466 42.081 1.00 28.02 + O ATOM 2418 C3' C 0 122 76.149 161.492 41.103 1.00 27.54 + C ATOM 2419 O3' C 0 122 76.203 162.895 41.296 1.00 29.00 + O ATOM 2420 C2' C 0 122 77.406 160.790 41.583 1.00 27.18 + C ATOM 2421 O2' C 0 122 77.850 161.309 42.821 1.00 27.23 + O ATOM 2422 C1' C 0 122 76.895 159.367 41.795 1.00 26.41 + C ATOM 2423 N1 C 0 122 77.077 158.516 40.596 1.00 25.03 + N ATOM 2424 C2 C 0 122 78.371 158.091 40.254 1.00 24.32 + C ATOM 2425 O2 C 0 122 79.322 158.435 40.962 1.00 25.65 + O ATOM 2426 N3 C 0 122 78.549 157.322 39.156 1.00 21.66 + N ATOM 2427 C4 C 0 122 77.504 156.979 38.407 1.00 21.19 + C ATOM 2428 N4 C 0 122 77.729 156.225 37.333 1.00 19.39 + N ATOM 2429 C5 C 0 122 76.180 157.394 38.725 1.00 23.28 + C ATOM 2430 C6 C 0 122 76.011 158.151 39.820 1.00 25.06 + C ATOM 2431 P U 0 123 76.635 163.839 40.073 1.00 29.47 + P ATOM 2432 OP1 U 0 123 76.680 165.225 40.595 1.00 29.58 + O ATOM 2433 OP2 U 0 123 75.749 163.514 38.922 1.00 30.88 + O ATOM 2434 O5' U 0 123 78.119 163.364 39.727 1.00 29.53 + O ATOM 2435 C5' U 0 123 79.159 163.520 40.696 1.00 31.30 + C ATOM 2436 C4' U 0 123 80.450 162.859 40.250 1.00 31.85 + C ATOM 2437 O4' U 0 123 80.252 161.446 40.010 1.00 30.68 + O ATOM 2438 C3' U 0 123 81.180 163.328 38.998 1.00 32.03 + C ATOM 2439 O3' U 0 123 81.930 164.516 39.238 1.00 34.85 + O ATOM 2440 C2' U 0 123 82.143 162.168 38.759 1.00 31.57 + C ATOM 2441 O2' U 0 123 83.282 162.261 39.597 1.00 31.84 + O ATOM 2442 C1' U 0 123 81.321 160.963 39.216 1.00 29.73 + C ATOM 2443 N1 U 0 123 80.795 160.154 38.105 1.00 27.66 + N ATOM 2444 C2 U 0 123 81.680 159.301 37.468 1.00 26.05 + C ATOM 2445 O2 U 0 123 82.850 159.207 37.784 1.00 26.21 + O ATOM 2446 N3 U 0 123 81.144 158.566 36.445 1.00 24.39 + N ATOM 2447 C4 U 0 123 79.845 158.593 35.999 1.00 26.20 + C ATOM 2448 O4 U 0 123 79.501 157.828 35.094 1.00 28.05 + O ATOM 2449 C5 U 0 123 78.991 159.504 36.698 1.00 24.43 + C ATOM 2450 C6 U 0 123 79.481 160.233 37.702 1.00 26.73 + C ATOM 2451 P C 0 124 82.536 165.364 37.998 1.00 36.32 + P ATOM 2452 OP1 C 0 124 83.145 166.573 38.636 1.00 36.04 + O ATOM 2453 OP2 C 0 124 81.510 165.527 36.930 1.00 32.60 + O ATOM 2454 O5' C 0 124 83.732 164.471 37.438 1.00 36.15 + O ATOM 2455 C5' C 0 124 84.949 164.323 38.191 1.00 37.98 + C ATOM 2456 C4' C 0 124 85.914 163.412 37.468 1.00 40.39 + C ATOM 2457 O4' C 0 124 85.345 162.079 37.356 1.00 39.92 + O ATOM 2458 C3' C 0 124 86.268 163.782 36.035 1.00 42.18 + C ATOM 2459 O3' C 0 124 87.302 164.754 35.958 1.00 45.00 + O ATOM 2460 C2' C 0 124 86.726 162.449 35.462 1.00 41.48 + C ATOM 2461 O2' C 0 124 88.045 162.135 35.873 1.00 42.55 + O ATOM 2462 C1' C 0 124 85.766 161.479 36.146 1.00 39.16 + C ATOM 2463 N1 C 0 124 84.577 161.186 35.326 1.00 36.21 + N ATOM 2464 C2 C 0 124 84.640 160.142 34.394 1.00 34.43 + C ATOM 2465 O2 C 0 124 85.715 159.534 34.240 1.00 31.04 + O ATOM 2466 N3 C 0 124 83.533 159.828 33.680 1.00 33.71 + N ATOM 2467 C4 C 0 124 82.403 160.520 33.862 1.00 33.51 + C ATOM 2468 N4 C 0 124 81.322 160.154 33.166 1.00 31.93 + N ATOM 2469 C5 C 0 124 82.326 161.611 34.772 1.00 33.67 + C ATOM 2470 C6 C 0 124 83.424 161.906 35.476 1.00 35.09 + C

I tried writing the program, which looks like this , could anyone please tell me whats the bug in the program?

<p> open(FH,"C:\Users\Payal\Desktop\Project bogola\loop.pdb"); print"Enter the first residue\n"; $b=<>; print"Enter the second residue\n"; $c=<>; $d =chomp($b); $e =chomp($c); while($a=<FH>) { if($a=~/^ATOM/) { $residue=substr($d..$e,23,1..3); if($residue >=$d and $residue <=$e) { print FH1 $a; open(FH1,">>ichain.pdb") } }}

Replies are listed 'Best First'.
Re: How to extract the particular residues from PDB files
by MidLifeXis (Monsignor) on Apr 24, 2012 at 12:55 UTC

    It sounds to me like you want a program that does the following:

    • Read each line of your data file: see perlintro (see "Files and I/O" and "Conditional and looping constructs")
    • Split your line into tokens: see split, perlre
    • Compare the sixth element to see if it is what you want to use: see perlop
    • Emit the output you want using the data you are interested in: see print

    If you don't know what the last residue number is, you will also need to check that three more values exist in your file prior to emitting the output. Update: A FIFO (push data on the front end of an array during your reads, shift it off the back end of the array during your print) would be one structure that could fit this function.

    A note on perlmonks: you will find that you will often get a better discussion on your problem if you present code that you have tried and are getting stuck on. What you have provided is a work request ("Write my code for me for free"). There are many monks that will contract for work, given agreeable terms, but also monks that respond poorly to this no-effort-shown type of request.

    --MidLifeXis

Re: How to extract the particular residues from PDB files
by Sinistral (Monsignor) on Apr 24, 2012 at 12:53 UTC

    It didn't help that you didn't define what a "PDB file" is. A little Googling led me to RCSB PDB Protein Data Bank. There are actually tools on that site, including a structure comparison tool. Perhaps there is information on this seemingly rich site to help? I also looked at the BioPerl site, but the only hit was a link to Wikipedia's reference for Protein Data Bank, which has a further link back to the RCSB site listing many tools to use.

    You'll find help here for your specific problem, but it's not clear what your asking. Are you literally wanting the 1st three lines of the file and the last three lines of the file removed and then have a resulting file that's 6 lines shorter? If so, then you can use the head and tail utilities to remove the lines. I think you're actually asking for some processing of the data fields, though, and thus think that the RCSB site would be the best place to look

Re: How to extract the particular residues from PDB files
by Marshall (Canon) on Apr 24, 2012 at 12:46 UTC
    lines 119-121 are:
    ATOM 2397 O4' U 0 121 73.283 154.645 42.328 1.00 24.01 + O ATOM 2398 C3' U 0 121 72.376 156.805 42.242 1.00 25.26 + C ATOM 2399 O3' U 0 121 72.475 157.976 41.445 1.00 26.37 + O
    How are those lines different than the other lines?

      no difference, i just one to ignore first and last three residues

      LOL!

      Values of the 6th column, values in the range 119-121.

       perl -lane " if( $. > 3 ){ push @residues,$_ if $F[5] >= 119 and $F[5] <= 121; } END { pop @residues for 1..3; print for @residues " file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://966813]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-16 05:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found