Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Extract Multiple Tags

by stallion (Acolyte)
on Dec 21, 2011 at 14:36 UTC ( [id://944633]=perlquestion: print w/replies, xml ) Need Help??

stallion has asked for the wisdom of the Perl Monks concerning the following question:

Scenario:

The following text belongs to a .doc file

File: check1.asm Function: Monks Tag: No Tag: 001 Tag: Yes Tag: 002 File: check2.asm Function: Perl Monks Tag: Yes Tag: 003 Tag: No Tag: 004 File: check3.asm Function: Experts Tag: No Tag: 005 Tag: No Tag: 006 Function: Perl Experts Tag: No Tag: 007 Tag: Yes Tag: 008

I have to extract the tag which have been tagged as Yes and the corresponding function and file name to an excel sheet..

The output have to be like this:

Tags Function File 002 Monks check1.asm 003 Perl Monks check2.asm 008 Perl Experts check3.asm

I have written the following snippet for extracting the tag which is categorized as Yes :

use strict; use warnings; use Win32::OLE; use Win32::OLE qw(in with); use Win32::OLE::Variant; use Win32::OLE::Const 'Microsoft Excel'; use Win32::OLE::Const 'Microsoft Word'; use Cwd; use File::Find; use Win32::OLE; use Win32::OLE::Enum; $Win32::OLE::Warn = 3; # die on errors. +.. my $out_file = 'check.xls'; open my $out_fh, '>', $out_file or die "Could not open file $out_file: +$!"; my $print_next = 0; #Globals our $Word; our $reviewchklists; my @scriptfiles; @scriptfiles=glob('*.doc'); foreach my $file (@scriptfiles) { my $var; my $filename = "D\:\\"; $var = $filename."$file"; print $var ; my $document = Win32::OLE -> GetObject("$var"); print "Extracting Text ...\n"; my @array; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(my $paragraph = $enumerate->Next()) { my $text = $paragraph->{Range}->{Text}; $text =~ s/[\n\r\t]//g; $text =~ s/\x0B/\n/g; $text =~ s/\x07//g; chomp $text; my $Data .= $text; @array=split(/\.$/,$Data); foreach my $line( @array) { if ($print_next) { print $out_fh $line."\n" ; # we add a "\n" ; #No n +eed to chomp - we print the "\n" local $\ = "<br>\n"; local $/="\n\n"; } $print_next = ($line =~ /^Tag\sYes/); } } } #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The above snippet is printing the output as follows:

ID : 002 ID : 003 ID : 008

I dont want the ID to be printed and how to extract the corresponding function and file name?

Help out monks!!!

Replies are listed 'Best First'.
Re: Extract Multiple Tags
by Marshall (Canon) on Dec 22, 2011 at 01:49 UTC
    When I saw this post, I thought: oh gosh this looks familiar. Anon Monk has given us a link to another post that looks virtually identical.

    If you are working on homework, a better strategy would be to "come clean" about it (admit it). And then ask a general question about the technique that you attempting rather than asking: "what's wrong with my homework code"?

    In this case, you are wondering about how to parse the input and extract some info. But that isn't the way that the question is framed.

    I guess if you are trying to "cheat", you at least need to become smarter about how you do it!

    Asking a general question about a specific technique is not "cheating" and will often generate a lot of responses especially if you have some working code albeit awkward. Ask a "how do I approach problem X?" question rather than a "please fix my homework" question.

Re: Extract Multiple Tags
by Anonymous Monk on Dec 21, 2011 at 14:42 UTC
Re: Extract Multiple Tags
by Cristoforo (Curate) on Dec 22, 2011 at 01:30 UTC
    Here is an approach that worked on the data you provided. I think it would replace your foreach loop.
    Although, I'm not sure how you want your data printed with a '<br>' at the end of each line.

    #!/usr/bin/perl use strict; use warnings; use 5.014; my ($file, $function, $tag); printf "%-5s%-15s%s\n", qw/ Tags Function File /; my @line = <DATA>; for my $i (0 .. $#line) { if ($line[$i] =~ /^File: (.+)$/) { $file = $1; } elsif ($line[$i] =~ /^Function: (.+)$/) { $function = $1; } elsif ($line[$i] =~ /^Tag: Yes$/) { ($tag) = $line[$i+1] =~ /(\d+)$/; printf "%-5s%-15s%s\n", $tag, $function, $file; } } __DATA__ File: check1.asm Function: Monks Tag: No Tag: 001 Tag: Yes Tag: 002 File: check2.asm Function: Perl Monks Tag: Yes Tag: 003 Tag: No Tag: 004 File: check3.asm Function: Experts Tag: No Tag: 005 Tag: No Tag: 006 Function: Perl Experts Tag: No Tag: 007 Tag: Yes Tag: 008
    Outputs
    Tags Function File 002 Monks check1.asm 003 Perl Monks check2.asm 008 Perl Experts check3.asm

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://944633]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-20 00:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found