Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Data extraction with specific keywords

by neeraj_kr (Initiate)
on Jan 19, 2018 at 06:31 UTC ( [id://1207502]=perlquestion: print w/replies, xml ) Need Help??

neeraj_kr has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to write a Perl code to extract specific keywords from a large data file. I have created an array where those specific keywords were stored. I was trying to read that large file line by line and extract matching keywords and want to store them in a new file. I am pasting my code here as it's not working properly. Help will be appreciated. Thank you.

$a=`head -1 $ARGV[0]`; @arr=split(/\t/,$a); $col=$#arr; $c=0; @arr1 = ("91:", "86:", "184:", "430:", "391:", "254:", "121:", "192:", + "404:", "12:", "87:", "638:", "417:", "129:", "549:", "548:", "122:" +, "443:", "378:", "365:", "665:", "148:", "185:", "88:", "629:", "6 +37:", "149:", "625:", "635:", "627:", "650:", "468:", "92:", "618:", +"212:", "85:", "628:", "171:", "649:", "15:", "61:", "169:", "104:", +"202:", "523:", "60:", "672:", "291:", "658:", "59:", "547:", "491:", + "234:", "411:", "620:", "581:", "414:", "14:", "412:", "416:", "345: +", "626:", "457:", "72:", "384:", "371:", "9:", "580:", "436:", "356: +", "385:", "58:", "669:", "388:", "386:", "390:", "636:", "619:", "16 +:", "413:", "17:", "524:", "579:", "624:", "90:", "471:", "410:", "55 +1:", "289:", "387:", "531:", "64:", "166:", "211:", "467:", "415:", " +232:", "550:", "362:", "375:", "401:", "359:", "372:", "398:", "360:" +, "364:", "399:", "403:", "373:", "377:", "18:", "118:", "585:", "427 +:", "424:", "586:", "469:", "425:", "429:", "13:", "423:", "500:", "6 +2:", "109:", "19:", "539:", "499:", "532:", "400:", "63:", "361:", "3 +74:", "73:", "449:", "175:", "426:", "89:", "507:", "397:", "389:", " +582:", "475:", "20:", "22:", "541:", "492:", "503:", "555:", "595:", +"596:", "450:", "23:", "611:", "509:", "3:", "485:", "24:", "438:", " +442:", "440:", "484:", "117:", "32:", "437:", "31:", "663:", "339:", +"535:", "21:", "470:", "439:", "525:", "172:", "40:", "65:", "487:", +"50:", "517:", "597:", "545:", "516:", "402:", "347:", "614:", "540:" +, "613:", "346:", "67:", "363:", "583:", "376:", "428:", "71:", "615: +", "332:", "271:", "5:", "508:", "74:"); #print "$m\n"; print $arr[2],"\n"; for($i=1;$i<=$#arr+1;$i++) { foreach $ar(@arr1) { if ($arr[$i] == $ar) { $c[$j]=$i; $j++; # print $j; } #print $arr[1],"\n"; } } open(fh,"$ARGV[0]"); while(<fh>) { chomp $_; @arr2=split(/\,/,$_); foreach $ar(@c) { # print $arr2[$ar],"\t"; } #print "\n"; } close(fh);

Replies are listed 'Best First'.
Re: Data extraction with specific keywords
by kcott (Archbishop) on Jan 19, 2018 at 07:59 UTC

    G'day neeraj_kr,

    Welcome to the Monastery.

    Your posted question is lacking important information. You do not show: sample input; actual output; expected output; or, warning or error messages. Saying "it's not working properly" is a pretty much useless error report: would you take a car to a mechanic, provide just that information, then walk away? Please read "How do I post a question effectively?".

    Your code, in general, leaves much to be desired. I'd recommend, in the first instance, that you read "perlintro -- Perl introduction for beginners". When you've done that, apply what you've learned and rewrite your code. Some areas that stand out as needing attention, include:

    • use of dynamic (global) package variables instead of lexically scoped variables;
    • meaningful variable names;
    • checking I/O operations actually worked (you might find the autodie pragma useful for this);
    • avoiding use of special variables as normal variables (e.g. $a on the first line);
    • not including platform-specific code that creates a new process when platform-agnostic Perl code (that doesn't require a new process) is possible;
    • avoid off-by-one errors, which I suspect you have in "for($i=1;$i<=$#arr+1;$i++)", by writing that more succinctly (and using a modern Perl idiom) like "for my $i (0 .. $#arr)".

    Your @arr1 array, with what looks like a couple of hundred elements, is fairly pointless. What is this supposed to tell us? Is there any sort of order or pattern to this data? Did you expect us to analyse this for you (without the benefit of your domain-specific knowledge)? Pick a discrete sample of representative data for your SSCCE.

    There are two general approaches to what you want to do. Without any knowledge of your input data, I'm not in a position to advise which might be best, or how you would implement it. One method involves storing your keywords in a hash then using exists; the other uses regular expression alternations (see perlre; although, you might want to look at perlretut first).

    Your "@arr2=split(/\,/,$_);" line of code suggests you're reading CSV data. Use Text::CSV for this; if you also have Text::CSV_XS installed, it will run faster.

    That's a lot of reading, things to fix, and additional information to provide. Once completed, you may have additional questions: please post these with this thread (not as new, separate questions).

    — Ken

      I am pasting few lines of my sample input file.

      0 2:-0.5795 3:0.33582025 4:55.8255 5:65.316997 6:15 7:16 8:57 9:28 10:29 11:23 12:5 13:4328.520884 14:5279.218852 15:7434.371708 16:7829.126536 17:7560.24877 18:7380.518025 19:7094.262906 20:6916.621367 21:6198.40255 22:11858.88819484 23:15547.317962699 24:23174.9885578928 25:26259.9933684153 26:26163.2825969745 27:26115.0415561043 28:25798.4258540249 29:24623.253542266 30:23630.8474599248 31:419.275504 32:463.700544 33:841.938352 34:1080.191664 35:1246.68676 36:1161.604228 37:1229.277976 38:1188.42918 39:1084.302664 40:83.0884742406 41:109.16288499 42:167.636308343 43:199.4597516818 44:203.7524145431 45:201.2240880658 46:204.6352052546 47:194.1195252421 48:187.4305290355 49:9335.0824128232 50:8977.2048903741 51:17564.3373325462 52:22978.817901802 53:27754.6236137749 54:25282.4660739928 55:27042.2374736936 56:26691.0761515138 57:23720.386147332 58:128.1944444444 59:155.2777777778 60:253.1111111111 61:311 62:335.75 63:328.4722222222 64:338.4166666667 65:318.0277777778 66:301.6388888889 67:75.9389628772 68:86.5445713443 69:66.9763216937 70:53.9939761103 71:44.7351998225 72:47.3110129808 73:42.4806162036 74:42.4332599202 75:41.8810983108 76:208.0506700849 77:254.8740649623 78:208.7836807017 79:181.1034025408 80:154.812323059 81:167.4041125391 82:154.4815919403 83:151.0629051673 84:159.6678882427 85:7.3557105965 86:7.6016482623 87:7.5850301982 88:7.4495976828 89:7.3768447337 90:7.4461809487 91:7.3609459641 92:7.2909765644 93:7.3263693514 94:1.4576925305 95:1.7895554916 96:1.5102370121 97:1.3755844944 98:1.205635589 99:1.2898980004 100:1.2253605105 101:1.1909173328 102:1.2664224935 103:163.7733756636 104:147.1672932848 105:158.2372732662 106:158.4746062193 107:164.2285420933 108:162.0670902179 109:161.9295657107 110:163.7489334449 111:160.2728793739 112:2.2490253411 113:2.5455373406 114:2.2802802803 115:2.1448275862 116:1.9866863905 117:2.1055911681 118:2.0264471058 119:1.9510906612 120:2.0381006006 121:0.438326144 122:-0.1349328964 123:-0.2131228532 124:0.2109309678 125:0.0364198771 126:-0.2562298834 127:0.1322421945 128:0.0578296255 129:-0.0582029638 130:1854.5687717193 131:204.2260149671 132:63.6505848975 133:-389.0286871456 134:203.5672867621 135:-670.8419399862 136:-546.8660197741 137:598.0390915922 138:-677.5242209203 139:2940.0118958545 140:-61.0569165052 141:403.3629827841 142:-876.0885934905 143:371.8760389321 144:-585.6821768037 145:-1089.2389258814 146:922.1265795456 147:-887.43245912 148:1.6043528421 149:0.1460650637 150:-0.346842903 151:0.3020267701 152:-0.1545685319 153:-0.9490281994 154:0.2432321994 155:0.1995307147 156:-0.5438212742 157:12.9098316175 158:-1.2992316335 159:2.2196293644 160:-3.9817177075 161:1.480317246 162:-1.6371646637 163:-4.7547855107 164:3.7112800785 165:-3.3733655727 166:87.8230655603 167:-29.9536110802 168:15.3945287813 169:-16.0913978857 170:-0.0363171598 171:-0.4845321047 172:-16.4539972044 173:11.7400794867 174:-11.6570804117 175:14.5058479532 176:2.020544783 177:1.3906586642 178:0.4592951677 179:-0.8311788243 180:-9.5584025854 181:-1.6828254848 182:4.5510156971 183:-4.5658664204 184:0.0076899324 185:-0.0022120147 186:-0.0019200257 187:0.0014546963 188:0.0002155022 189:-0.0016424993 190:0.0007918694 191:0.000354783 192:-0.0003932633 193:32.5362942407 194:3.3479674585 195:0.5734286928 196:-2.6829564631 197:1.2045401584 198:-4.3002688461 199:-3.274646825 200:3.6689514822 201:-4.5778663576 202:51.5791560676 203:-1.0009330575 204:3.6339007458 205:-6.0419902999 206:2.2004499345 207:-3.7543729282 208:-6.5223887777 209:5.6572182794 210:-5.9961652643 211:0.0281465411 212:0.0023945092 213:-0.0031247108 214:0.0020829432 215:-0.0009146067 216:-0.0060835141 217:0.0014564802 218:0.0012241148 219:-0.0036744681 220:0.226488274 221:-0.0212988792 222:0.0199966609 223:-0.0274601221 224:0.0087592736 225:-0.0104946453 226:-0.0284717695 227:0.0227685894 228:-0.0227930106 229:1.5407555361 230:-0.4910428046 231:0.1386894485 232:-0.1109751578 233:-0.0002148944 234:-0.003105975 235:-0.0985269294 236:0.0720250275 237:-0.0787640568 238:0.2544885606 239:0.033123685 240:0.0125284564 241:0.0031675529 242:-0.0049182179 243:-0.0612718114 244:-0.0100767993 245:0.0279203417 246:-0.0308504488 247:-0.2876507352 248:-0.2496804416 249:0.1891689371 250:0.0280239437 251:-0.2135908585 252:0.1029748237 253:0.0461360338 254:-0.0511400167 255:0.1028994708 256:0.0176242779 257:-0.0824604192 258:0.0370214306 259:-0.132168366 260:-0.1006459679 261:0.1127648851 262:-0.1407002999 263:-0.0194057665 264:0.0704528927 265:-0.1171401543 266:0.0426616118 267:-0.0727885684 268:-0.1264539646 269:0.1096803188 270:-0.1162517133 271:0.0850729485 272:-0.1110158022 273:0.0740035245 274:-0.0324944616 275:-0.2161371829 276:0.0517463312 277:0.0434907725 278:-0.1305477663 279:-0.0940396554 280:0.0882900496 281:-0.1212430191 282:0.038674292 283:-0.0463363736 284:-0.1257096848 285:0.1005287781 286:-0.1006366035 287:-0.318702606 288:0.0900139219 289:-0.0720264541 290:-0.0001394734 291:-0.0020158779 292:-0.0639471526 293:0.0467465642 294:-0.0511204114 295:0.1301578542 296:0.0492299395 297:0.0124467397 298:-0.0193258899 299:-0.2407645016 300:-0.0395962761 301:0.1097115785 302:-0.121225287 303:1.4855322143 304:1.8648954684 305:0.9411501302 306:1.1721286102 307:1.4660332124 308:0.9507235805 309:0.9439255533 310:0.8779664661 311:0.8817902572 312:1.0010647512 313:1.0737141541 314:0.9630768785 315:1.1360464666 316:1.0868266345 317:0.8695031838 318:1.1143866248 319:1.0696645329 320:0.9017239511 321:1.1058598928 322:0.9165198433 323:1.0342781749 324:1.1029231225 325:0.8697769965 326:1.1109393826 327:0.9432703655 328:1.4951822273 329:1.0315869199 330:1.1945508439 331:1.4545623761 332:0.9921446169 333:0.912129081 334:1.0470096907 335:1.1879116719 336:0.8948216441 337:1.1205317167 338:0.9132745523 339:1.0073848542 340:1.1024974968 341:0.8744420228 342:1.1017724911 343:1.5568560424 344:1.0679091304 345:1.1468530074 346:0.9943244717 347:1.0534951824 348:1.0642314937 349:0.9047174347 350:1.048687376 351:0.8016363702 352:0.909094541 353:0.9902730957 354:1.0710953723 355:1.4071346244 356:1.0344785132 357:0.8279574546 358:1.1806877428 359:242.8075356833 360:121.6142333227 361:178.0580590073 362:243.3687601731 363:8.6917414348 364:121.6142333227 365:0.7142857143 366:0.0017262356 367:6.1651270994679E-05 368:-17.8130741659 369:262.6819305563 370:9.3814975199 371:15.5986429501 372:242.856167877 373:121.6380855773 374:178.0951236847 375:243.4159900068 376:8.6934282145 377:121.6380855773 378:0.7125008924 379:0.0017217572 380:6.14913290515805E-05 381:-17.8203476471 382:262.7353331274 383:9.3834047545 384:15.5992121252 385:307.3759058156 386:152.6164428284 387:226.2604316647 388:306.3495052248 389:10.941053758 390:152.6164428284 391:-1.5966231412 392:0.0021934089 393:0.000078336 394:-17.1424354733 395:299.7838194194 396:10.7065649793 397:15.9685725163 398:243.1609809571 399:121.7875776897 400:178.3274247002 401:243.7120141444 402:8.7040005052 403:121.7875776897 404:0.7013149656 405:0.0016937626 406:6.04915219373642E-05 407:-17.8662479123 408:263.0723015109 409:9.3954393397 410:15.6028009342 411:335.6228629665 412:166.0095973367 413:247.0935998516 414:333.9572785509 415:11.9270456625 416:166.0095973367 417:-2.5909090909 418:0.0031668647 419:0.0001131023 420:-16.1140370937 421:266.6487130283 422:9.5231683224 423:15.6406099097 424:231.6103290283 425:116.1137708548 426:169.5121348246 427:232.4952381141 428:8.3034013612 429:116.1137708548 430:1.1262479273 431:0.0028520421 432:0.0001018586 433:-16.4072160876 434:252.5378230418 435:9.0192079658 436:15.4883708888 437:281.707822041 438:140.4007178393 439:207.6497191708 440:282.1695197635 441:10.0774828487 442:140.4007178393 443:0.8079710145 444:0.077238656 445:0.0027585234 446:-7.1703946216 447:274.8568568626 448:9.8163163165 449:15.7255012366 450:2 451:11.9 452:14.0071321949 453:-0.3053551863 454:0.1369864377 455:5.5469232566 456:11.0801767373 457:32 458:61 459:61 460:45 461:16 462:16 463:40.283003 464:4.064427114 465:3.8721338627 466:3.6638718365 467:3.4271346032 468:3.2644227649 469:3.0864222079 470:2.8395424467 471:2.699703016 472:2.0212232614 473:1.9923679159 474:1.906999427 475:1.6982759201 476:1.6064260527 477:1.4614826238 478:1.3242195483 479:1.2221962031 480:3.9837303167 481:3.839522292 482:3.6180383497 483:3.3760340255 484:3.244426536 485:3.0322825745 486:2.7934329558 487:2.6794126112 488:2.0306051298 489:2.0067079104 490:1.9081792943 491:1.7368870199 492:1.585440537 493:1.4588269255 494:1.2609123295 495:1.1452966733 496:4.0913115254 497:3.9245546332 498:3.7636656925 499:3.5203002753 500:3.3772223461 501:3.2283173076 502:2.9857050594 503:2.862477594 504:1.9092311184 505:1.8917977913 506:1.7517830736 507:1.535951876 508:1.3965759499 509:1.1922514071 510:0.9901267386 511:0.8695590108 512:3.9707793081 513:3.8389225481 514:3.6197079287 515:3.3757832072 516:3.2512006583 517:3.0337561403 518:2.8012310654 519:2.6946335916 520:2.0272325637 521:2.0018470261 522:1.898625878 523:1.7396700523 524:1.5647216909 525:1.436784937 526:1.2166724714 527:1.0949055969 528:4.1334702793 529:3.9619297397 530:3.8255439018 531:3.5814792068 532:3.4372290617 533:3.3102142841 534:3.0750271131 535:2.9501803389 536:1.8755908406 537:1.8625312941 538:1.6984369754 539:1.4750017307 540:1.3387447107 541:1.1091661276 542:0.9156927375 543:0.7920993079 544:4.1586909199 545:3.9848251073 546:3.6514598136 547:3.5196549485 548:3.3657757429 549:3.1766971846 550:3.0068166027 551:2.8941585802 552:1.9714581425 553:1.94869267 554:1.7917466101 555:1.5751232679 556:1.4426546807 557:1.2817422781 558:1.0826208769 559:0.9944976349 560:2 561:9 562:1 563:4 564:4 565:0.6428227826 566:0.8746134423 567:0.2966601627 568:0.3056750327 569:1.7728325942 570:0.4114603012 571:1.1470710813 572:0.1758028907 573:4.2862464602 574:7.2000561491 575:10.0648363697 576:2.2723479296 577:3.2005484087 578:3.8178243012 579:19.0872111909 580:13.6866731858 581:12.5624674743 582:10.8413389733 583:9.4709506592 584:7.5586250765 585:5.2807641052 586:4.2113668367 587:0.681686114 588:0.4277085371 589:0.273097119 590:0.1693959215 591:0.1101273332 592:0.0706413559 593:0.0447522382 594:0.029245603 595:16.8819678685 596:10.4552371714 597:8.5854942001 598:6.4024154302 599:4.869705044 600:3.3026473986 601:2.0114320855 602:1.4130382575 603:0.6029274239 604:0.3267261616 605:0.1866411783 606:0.1000377411 607:0.0566244773 608:0.0308658635 609:0.0170460346 610:0.0098127657 611:34.6463209851 612:56.1893663511 613:53.9803921569 614:54.66 615:37.8724532934 616:434.7378724551 617:0.6078301927 618:0.985778357 619:0.9470244238 620:0.9589473684 621:0.6644290051 622:7.6269802185 623:4.09198 624:113.958 625:343.125346777 626:496.2039394297 627:686.2506935541 628:24.5089533412 629:343.125346777 630:0.1303209581 631:0.0046543199 632:-5.7057138949 633:303.6021600513 634:10.8429342875 635:16.0040108488 636:639 638:3 639:12 640:7 641:8 642:7 643:7 644:7 645:2 646:3 647:2 648:2 650:7.3070209844 651:21.0498145682 652:3.2416014801 653:3.8603310737 654:3.2416014801 655:7.8499739002 656:14.8129494253 657:1.108133147 658:3.8587450245 659:9.7545170873 660:4.8430728079 662:2.3890998298 663:1.1118031148 664:0.4105902533 665:0.3927944417 666:0.4105902533 667:0.8590478553 668:1.9378269576 669:0.3824521815 670:1.1817663454 671:4.7333035417 672:2.3890998298 674:2.4639481765 1 2:-1.4034 3:1.96953156 4:53.4424 5:65.316997 6:15 7:16 8:57 9:28 10:29 11:23 12:5 13:4328.520884 14:5257.256864 15:7630.863017 16:7853.386072 17:7593.281622 18:7485.67811 19:7430.95408 20:6887.356942 21:6135.144818 22:11858.88819484 23:15622.0540418464 24:23861.3526236842 25:26567.8590572934 26:26082.261459435 27:26421.531654613 28:26783.9718438419 29:25149.0548634416 30:23554.1612149383 31:419.275504 32:463.631552 33:855.141396 34:1143.37944 35:1234.879208 36:1177.318312 37:1223.14768 38:1248.431 39:1115.068104 40:83.0884742406 41:109.73471298 42:173.0649314933 43:203.3700552106 44:202.5802201385 45:202.9314221303 46:211.0742767855 47:200.2123080021 48:187.9297679391 49:9335.0824128232 50:8984.8595029275 51:17683.9722497862 52:24700.3188310957 53:27389.857175555 54:25754.46006867 55:26548.9961389266 56:28075.9404729201 57:24735.2908545586 58:128.6944444444 59:154.5277777778 60:258.7777777778 61:320.8333333333 62:336.25 63:331.6388888889 64:346.0833333333 65:334.1944444444 66:300.8055555556 67:75.9389628772 68:86.1845387541 69:67.5297612124 70:50.9960134545 71:45.4687522275 72:47.377709557 73:44.7647836145 74:40.2769411813 75:40.0989857386 76:208.0506700849 77:256.0992465876 78:211.162412599 79:172.5185653071 80:156.1812063439 81:167.22488389 82:161.349227975 83:147.0704962774 84:153.9487661107 85:7.3557105965 86:7.6005172459 87:7.5676229735 88:7.4245418182 89:7.3944862754 90:7.4513817215 91:7.3683595181 92:7.3007660819 93:7.2880268235 94:1.4576925305 95:1.798929721 96:1.5315480663 97:1.3205847741 98:1.2130552104 99:1.2843760894 100:1.2715317879 101:1.1708322105 102:1.2282991369 103:163.7733756636 104:147.2927787365 105:156.4953296441 106:160.3916807214 107:164.0111208117 108:163.002911827 109:159.9337116803 110:164.1867863913 111:161.6685676769 112:2.257797271 113:2.5332422587 114:2.2900688299 115:2.0833333333 116:2.0134730539 117:2.0989803094 118:2.0848393574 119:1.9543534763 120:1.9660493827 121:0.4580105372 122:-0.1749414469 123:-0.1724621365 124:0.213107589 125:0.0370895601 126:-0.3041785483 127:0.2118434665 128:0.0173007991 129:-0.1040509949 130:1854.5687717193 131:182.2640269671 132:-15.7589723383 133:-192.4671909609 134:90.7195762259 135:-531.7391928009 136:-721.416141779 137:262.8745027018 138:-287.0724244746 139:2940.0118958545 140:13.6791626421 141:60.6655398306 142:-478.3789759964 143:131.4939697325 144:-120.5974524593 145:-1559.0986697577 146:456.9717505604 147:-190.7275671274 148:1.6043528421 149:0.0770730637 150:-0.2132257396 151:0.3002481108 152:-0.1812006427 153:-1.0708712465 154:0.3255228809 155:0.0031153684 156:-0.4244447867 157:12.9098316175 158:-0.7274036435 159:0.3204619636 160:-2.1564713296 161:0.3245139364 162:0.8380333016 163:-7.1242003008 164:1.9210841909 165:-0.2966670797 166:87.8230655603 167:-22.2989985268 168:-1.0194291253 169:-5.7875572976 170:-8.3689396426 171:19.7609761434 172:-33.6550875833 173:6.5707647171 174:4.9436309602 175:15.0058479532 176:1.9766851339 177:0.6896737458 178:1.8808864266 179:-1.4956909818 180:-9.2286857495 181:-1.4244382887 182:0.8961988304 183:-2.2525392428 184:0.0080352726 185:-0.0028678926 186:-0.0015262136 187:0.0013838155 188:0.0002220932 189:-0.0019251807 190:0.0012761655 191:0.0001011743 192:-0.0006800719 193:32.5362942407 194:2.9879348683 195:-0.1394599322 196:-1.2497869543 197:0.5432309954 198:-3.3654379291 199:-4.3458803722 200:1.5372777936 201:-1.876290356 202:51.5791560676 203:0.2242485679 204:0.5368631843 205:-3.106356987 206:0.7873890403 207:-0.7632750156 208:-9.3921606612 209:2.6723494185 210:-1.2465854061 211:0.0281465411 212:0.0012634928 213:-0.0018869534 214:0.0019496631 215:-0.0010850338 216:-0.0067776661 217:0.0019609812 218:1.8218528778104E-05 219:-0.0027741489 220:0.226488274 221:-0.0119246499 222:0.0028359466 223:-0.0140030606 224:0.0019431972 225:0.0053040082 226:-0.0429168693 227:0.0112344105 228:-0.0019390005 229:1.5407555361 230:-0.3655573529 231:-0.0090214967 232:-0.0375815409 233:-0.050113411 234:0.1250694693 235:-0.2027414915 236:0.0384255247 237:0.0323113135 238:0.2632604904 239:0.0324046743 240:0.0061033075 241:0.0122135482 242:-0.0089562334 243:-0.0584094035 244:-0.0085809535 245:0.0052409288 246:-0.0147224787 247:-0.3569129164 248:-0.1899392437 249:0.1722176191 250:0.027639781 251:-0.2395912106 252:0.1588204318 253:0.0125912672 254:-0.0846358173 255:0.0918339024 256:-0.0042862881 257:-0.0384120867 258:0.0166961545 259:-0.1034364241 260:-0.1335702321 261:0.0472480911 262:-0.0576676109 263:0.0043476587 264:0.0104085298 265:-0.0602250448 266:0.0152656441 267:-0.0147981292 268:-0.182092174 269:0.0518106464 270:-0.0241683948 271:0.0448898088 272:-0.0670403316 273:0.0692683002 274:-0.0385494539 275:-0.2407992547 276:0.0696704154 277:0.0006472742 278:-0.0985609182 279:-0.0526501866 280:0.0125213837 281:-0.0618268678 282:0.0085796814 283:0.0234184673 284:-0.1894882615 285:0.0496026142 286:-0.0085611519 287:-0.2372585036 288:-0.0058552421 289:-0.0243916313 290:-0.0325252189 291:0.0811741164 292:-0.1315857621 293:0.0249394039 294:0.0209710838 295:0.1230897742 296:0.02318353 297:0.0463933962 298:-0.0340204237 299:-0.2218692345 300:-0.0325949159 301:0.0199077683 302:-0.0559236165 303:1.539018349 304:1.646117391 305:0.9724803693 306:1.1817356553 307:1.5128979522 308:0.8258355152 309:0.9698105163 310:0.9166086051 311:0.8926616929 312:1.013202864 313:1.0322272988 314:0.9859160414 315:1.1118819552 316:1.1136974952 317:0.9377161944 318:1.0297890518 319:1.0463278345 320:0.9762972097 321:1.0397353549 322:0.9417497197 323:0.9700077916 324:1.1711951059 325:0.9211020771 326:1.0170359031 327:0.9827485379 328:1.3537774553 329:1.0507527631 330:1.2323101027 331:1.5209747431 332:0.9198781701 333:0.9949418311 334:0.9790786131 335:1.1472483342 336:0.9903160923 337:1.0461580481 338:0.9415023639 339:0.9288249654 340:1.1849252885 341:0.9172636517 342:1.0040938799 343:1.4768407839 344:1.1643849395 345:1.0721359668 346:1.0371310253 347:0.9686749396 348:1.1491821935 349:0.9284452993 350:0.9465147457 351:0.8131620306 352:0.9448954676 353:0.9302770495 354:1.0776016167 355:1.3604166461 356:1.0375696081 357:0.9708582316 358:1.0291799754 359:239.4183777781 360:120.0402124396 361:176.1142310615 362:240.030622676 363:8.5725222384 364:120.0402124396 365:0.7142857143 366:0.0339866772 367:0.0012138099 368:-9.4690027015 369:281.3912972165 370:10.0496891863 371:15.7912894089 372:239.4643128411 373:120.0623808528 374:176.149388637 375:240.0750278918 376:8.574108139 377:120.0623808528 378:0.7125008924 379:0.0340053294 380:0.0012144761 381:-9.4674664544 382:281.5242397577 383:10.0544371342 384:15.7926119488 385:299.8655427086 386:148.8612972248 387:221.8674878467 388:298.8391421178 389:10.6728265042 390:148.8612972248 391:-1.5966231412 392:0.0519511355 393:0.0018553977 394:-8.2808647733 395:255.4331499986 396:9.1226125 397:15.5202900484 398:239.7522142926 399:120.2013198154 400:176.3697380523 401:240.353341406 402:8.5840479074 403:120.2013198154 404:0.7013149656 405:0.0341219998 406:0.0012186429 407:-9.4578762501 408:282.3692177256 409:10.0846149188 410:15.8010033924 411:326.2367977105 412:161.3166747956 413:241.6560824547 414:324.5712132949 415:11.5918290462 416:161.3166747956 417:-2.5909090909 418:0.0571523367 419:0.0020411549 420:-8.0136980082 421:242.0766391293 422:8.6455942546 423:15.3699122266 424:228.834217074 425:114.928271451 426:168.0093410164 427:229.7995724402 428:8.2071275872 429:114.928271451 430:1.1262479273 431:0.0293949405 432:0.0010498193 433:-9.8754115896 434:259.9737451984 435:9.2847766142 436:15.5696258086 437:271.4687940977 438:135.591287003 439:200.7346844411 440:272.1368065201 441:9.7191716614 442:135.591287003 443:1.0391304348 444:0.049909195 445:0.0017824713 446:-8.3931400673 447:287.3186174862 448:10.2613791959 449:15.8496569432 450:2 451:11.9 452:14.0071321949 453:-0.3161784321 454:0.1369823372 455:6.6663395274 456:11.1829530606 457:32 458:61 459:61 460:45 461:16 462:16 463:38.963003 464:4.0630843512 465:3.8720867597 466:3.6126594475 467:3.4259593331 468:3.263430811 469:3.0864313168 470:2.8361819678 471:2.6996169332 472:2.0203806496 473:1.9700492704 474:1.8042893743 475:1.6796545005 476:1.602396554 477:1.573253796 478:1.3920839303 479:1.2223999566 480:3.9827191626 481:3.8394992666 482:3.5911819507 483:3.3757042465 484:3.2447501324 485:3.0336039919 486:2.7883496661 487:2.6787231647 488:2.0263679027 489:1.9809098078 490:1.8081592832 491:1.6703504734 492:1.5928224287 493:1.5604623444 494:1.400545551 495:1.1454022696 496:4.0896414198 497:3.9244682069 498:3.7277086131 499:3.5199840668 500:3.3771903845 501:3.2285571145 502:2.9608562455 503:2.8615284312 504:1.9082991291 505:1.8844193331 506:1.6385334195 507:1.4820801014 508:1.3734478944 509:1.3068817222 510:1.1473144643 511:0.9197289513 512:3.9698486538 513:3.8389023769 514:3.5990095488 515:3.3757542364 516:3.2519530344 517:3.0355924032 518:2.7939858424 519:2.6939172053 520:2.0214237275 521:1.9783542329 522:1.8048993329 523:1.6483266429 524:1.5742110177 525:1.5329258279 526:1.3843751241 527:1.1037976295 528:4.1315008454 529:3.9617846839 530:3.7891015852 531:3.5812861262 532:3.4372161121 533:3.3101100921 534:3.040109686 535:2.9486754189 536:1.8748429585 537:1.8576095546 538:1.5854855638 539:1.4251129833 540:1.3137655529 541:1.2188323147 542:1.0687269431 543:0.8649035001 544:4.1727945718 545:3.9994274503 546:3.6402318944 547:3.5313517033 548:3.3787881445 549:3.1891303714 550:3.0124679809 551:2.908723269 552:1.9547944909 553:1.8699137136 554:1.7148298978 555:1.5447999071 556:1.4341299769 557:1.4082957966 558:1.2157480846 559:1.0078259273 560:2 561:9 562:1 563:4 564:5 565:0.5589923432 566:0.7869302055 567:0.2367170244 568:0.2563002358 569:1.4841574596 570:0.4114603012 571:0.9234642835 572:0.1758028907 573:4.0058860011 574:7.0654217493 575:9.9602546126 576:2.077974494 577:3.0804499988 578:3.7577100087 579:18.9240744841 580:13.7928263357 581:12.2703482883 582:10.8446793239 583:9.6221968739 584:7.6508263399 585:5.2790985271 586:4.1485137866 587:0.675859803 588:0.431025823 589:0.2726744064 590:0.172137767 591:0.1105999641 592:0.0715030499 593:0.0455094701 594:0.0296322413 595:16.6418610542 596:10.5826748251 597:8.3570827785 598:6.4250456443 599:4.9255343485 600:3.3815191084 601:2.0151918473 602:1.3874592667 603:0.5943521805 604:0.3307085883 605:0.1857129506 606:0.1019848515 607:0.0566153373 608:0.0316029823 609:0.0173723435 610:0.0099104233 611:34.6463209851 612:56.1893663511 613:53.9803921569 614:54.66 615:37.8724532934 616:434.7378724551 617:0.6078301927 618:0.985778357 619:0.9470244238 620:0.9589473684 621:0.6644290051 622:7.6269802185 623:4.13988 624:113.9477 625:349.2041627494 626:503.1990120696 627:698.4083254989 628:24.9431544821 629:349.2041627494 630:0.0860136202 631:0.003071915 632:-6.8690989398 633:282.3412443603 634:10.08361587 635:15.8007259921 636:609 637:1 638:3 639:12 640:7 641:9 642:7 643:8 644:7 645:2 646:3 647:2 648:1 649:0.235143313 650:8.4212597146 651:21.0422885856 652:3.2374621869 653:4.209713117 654:3.2374621869 655:8.9702631091 656:14.8129494253 657:1.0741619536 658:3.8552699678 659:9.7606202689 660:2.4539729781 661:0.235143313 662:2.4539729781 663:1.110208727 664:0.4103334322 665:0.3847969533 666:0.4103334322 667:0.8566562736 668:1.9378269576 669:0.3808883955 670:1.1793098073 671:4.7346235601 672:2.4539729781 673:0.235143313 674:3.5033385601
        "...few lines..."

        He, he! s/few/many/. See also Does Humor Belong in Programming?. <= 10 lines or so should be enough. And don't forget to put your data in code tags.

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Hello neeraj_kr,

        Welcome to the Monastery. I am bit confused here. I am not sure through your code sample and description what you are trying to do.

        Can you copy a few lines of input as a sample, and create a sample (hard-coded) of the desired output. I mean for example you have input:

        0 2:-0.5795 3:0.33582025 4:55.8255 5:65.316997 6:15 7:16 8:57 9:28 10: +29 11:23 12:5 13:4328.520884 14:5279.218852 15:7434.371708 16:7829.12 +6536 17:7560.24877 18:7380.518025 19:7094.262906 20:6916.621367 21:61 +98.40255 22:11858.88819484 23:15547.317962699 24:23174.9885578928 25: +26259.9933684153 26:26163.2825969745 27:26115.0415561043 28:25798.425 +8540249 29:24623.253542266 30:23630.8474599248

        Desired output would be:

        2:-0.5795 3:0.33582025 21:6198.40255 23:15547.317962699

        Something like that, of course the sample above it is just an example but at least to give us a direction of what you are trying to do.

        Update: Based on your sample of code above your data is ' ' space separated but on your code you split the data with \t. Which one is the correct?

        Looking forward to your update, BR.

        Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: Data extraction with specific keywords
by johngg (Canon) on Jan 19, 2018 at 11:58 UTC

    When initialising your @arr1 array you could save a yourself a heap of fiddly and error-prone typing by using a map with the qw{ ... } quote words operator. E.g.

    johngg@shiraz ~ $ perl -Mstrict -Mwarnings -MData::Dumper -E ' my @arr1 = map qq{$_:}, qw{ 91 86 184 430 }; print Data::Dumper->Dumpxs( [ \ @arr1 ], [ qw{ *arr1 } ] );' @arr1 = ( '91:', '86:', '184:', '430:' );

    You might also make things easier for yourself (and any others maintaining your code) by using meaningful variable names rather than $c, $i or $j.

    It is not clear from your description and your code exactly what you are trying to achieve. The "few" lines of data you posted seem to be in two sections headed "0" and "1" each having many lines of data in the form:-

    0 2:-0.5795 3:0.33582025 4:55.8255 5:65.316997 ...

    Please correct me if that is wrong. Your @arr1 appears to be a selection of the initial parts of the data lines in an apparently random order. Is this order to be preserved in your output file and do you want a separate output file for each section? Depending on the answers to these questions you might be better off using hashes rather than arrays.

    Cheers,

    JohnGG

Re: Data extraction with specific keywords
by jahero (Pilgrim) on Jan 19, 2018 at 09:58 UTC

    What is suspicious at the first glance is the following code: if ($arr[$i] == $ar)

    I believe that the array in question stores strings, not numbers. Did you mean this? if ($arr[$i] eq $ar)

    Also, $c is a scalar (line 4), however later on you are presenting it as an array: $c[$j]=$i;, what is up with that?

    I would strongly suggest adding this at the top, and start from there, eliminating things interpretter will complain about, one by one.

    Good luck, and have patience!

    use strict; use warnings;

    I would contradict what other fellow monks suggested, and suggest you to start slowly from scratch, as the code is too "strange" for me to comprehend what is it you actually need it to do. Debugging the program in question in smaller chunks could probably help. Perhaps this might also help:

Re: Data extraction with specific keywords
by poj (Abbot) on Jan 19, 2018 at 08:30 UTC

    Without an example of your input file this is my best guess

    updated with sample data from Re^2: Data extraction with specific keywords
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @colname = ("2:", "6:", "4:", "3:"); my $datafile = 'datafile2.txt';#$ARGV[0]; open my $fh,'<',$datafile or die "Could not open $datafile : $!"; # column header line my @header = split /\s+/,<$fh>; chomp(@header); #print Dumper \@header; # convert name to colno my %colno = (); for my $i (0..$#header){ my ($name,$val) = split ':',$header[$i]; $colno{ $name.':' } = $i; } #print Dumper \%colno; # array of required columns my @colno = map { $colno{$_} } @colname; #print Dumper \@colno; # process datalines seek $fh,0,0; while (<$fh>){ chomp; my @f = split /\s+/,$_; print join "\t",@f[@colno]; print "\n"; } close $fh; __DATA__ 0 2:-0.5795 3:0.33582025 4:55.8255 5:65.316997 6:15 7:16 8:57 9:28 10: +29 11:23 12:5 1 2:-1.4034 3:1.96953156 4:53.4424 5:65.316997 6:15 7:16 8:57 9:28 10: +29 11:23 12:5
    poj
Re: Data extraction with specific keywords
by thanos1983 (Parson) on Jan 19, 2018 at 12:53 UTC

    Hello again neeraj_kr,

    Something like that is working for you?

    Input data that I used, based on what you provided us:

    Even the sample of code that I provided is working for you, I do not think so it is the most efficient solution but it is a good point of start (I think).

    Update 2: I was testing a minor modification that might help to make the script a bit more efficient. The truth is that 4% is not a big difference in a few lines of file input but if your files are big any minor additional resources could be helpful.

    Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1207502]
Approved by kcott
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-16 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found