Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

How to Clean up the Junks!!!

by Anonymous Monk
on Jun 29, 2013 at 03:39 UTC ( #1041390=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a question regarding the clean up of some paragraphs. Is there any universal way to clean up these junks, so that I can can apply the universal method in all the cases. I have a lot of paragraphs like these but not all are the same, but they all contain some type of junks like these. The examples are as below,

$data = '%3CP style=%22MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px%22%3EAbbVi +e (NYSE:ABBV) is a global, research-based biopharmaceutical company formed in 2013 following separation from Abbott. AbbVie combines th +e focus and passion of a leading-edge biotech with the expertise and capabilities of a long-established pharmaceutical leader to develop and market advanced therapies that address some of the world’s most complex and serious diseases. In 2013, AbbVie will employ approximately 21,000 people worldwide and markets medicines in more than 170 countries.%3C/P%3E%0A%3CP style=%22MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px%22%3E%26nbsp;%3C/P%3E%0A%3CDIV%3EPRIMARY FUNCTIO +N / PRIMARY GOALS / OBJECTIVES:%3C/DIV%3E%0A%3CUL%3E%0A%3CLI%3ESupervis +es the determination of a client area application systems requirements for new or modified application programs, the preparation of system +s specifications and the development, testing and implementation of efficient, cost effective applications solutions.%26nbsp; %0A%3CLI%3EActs as expert technical resource to development staff.%26nbsp; %0A%3CLI%3ESupervise, manage and prioritize as neede +d the weekly and monthly data warehousing operations, managing a vend +or managed services team that performs the operational tasks.%26nbsp; %0A%3CLI%3EPartner with primary business clients to prioritize the operations schedule when normal operations and schedules are impacted.%3C/LI%3E%3C/UL%3E%0A%3CDIV%3E%3CBR%3ECORE JOB RESPONSIBILITIES:%3C/DIV%3E%0A%3CUL%3E%0A%3CLI%3EResponsible for compliance with applicable Corporate and Divisional Policies and procedures. %0A%3CLI%3EUnderstands organization%27s vision, goals a +nd strategies. Aligns teams priorities appropriately and determines te +ams critical success factors.%26nbsp; Identify connections between business processes and/or technical processes and ensure integratio +n and touchpoints align appropriately to fulfill the project goals an +d the overall strategies of AbbVie. %0A%3CLI%3EDefine and report on k +ey operational performance metrics. %0A%3CLI%3EEngage with multiple business and technical teams to maintain compliance to AbbVie Best Specialty processes and controls.%26nbsp; %0A%3CLI%3EManages change and encourages innovation.%26nbsp; Open to new ideas. %0A%3CLI%3EUnderstands clients business needs and requirements. Resolves issues in an appropriate and timely manner. %0A%3CLI%3EMay negotiate, secures, oversee %26amp; ensure that resources are available to meet the daily operational demands of the area. %0A%3CLI%3EResponsible for all aspects of people leadership; settin +g expectations, coaching, counseling, developing, evaluating, feedbac +k, hiring, discipline and separations.%26nbsp; Determine necessary mix + of skill set. %0A%3CLI%3EConfronts and deals with employees issues in +a constructive and timely manner. %0A%3CLI%3EEstablish and maintain high-quality relationships with all levels across the company and w +ith external partners.%3C/LI%3E%3C/UL%3E'

This one is over.

Next one is like below,

$data = '%3CP style=%22MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px%22%3E%26nbsp;%3C/P%3E%0A%3CUL%3E%0A%3CLI%3EBachelor%27s degree +or equivalent experience, preferably in software engineering, business +, information systems, or a discipline closely related to the client area served. %0A%3CLI%3E7-10 years overall experience %0A%3CLI%3EIT +IL or related disciplines preferred. %0A%3CLI%3EExperience in a commercial IT or commercial Sales Operations organization preferred.%3C/LI%3E%3C/UL%3E%0A%3CP style=%22MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px%22%3E%26nbsp;%3C/P%3E'

These are two examples, there are a lot of them, Kindly any Monks please provide me a universal solution for this.

Thanks in advance.

Replies are listed 'Best First'.
Re: How to Clean up the Junks!!!
by thomas895 (Chaplain) on Jun 29, 2013 at 04:05 UTC

    What do you mean with "junks"? You will need to be more specific, because unfortunately we are not all-knowing.
    Don't forget we are not a code-writing service. We will help you if you at least try to solve your problem. If you need code written for you, find and hire/contract a Perl programmer -- perhaps a freelancer of sorts.

    If you would like to be helped here, you may want to read the very informative nodes linked below every text field, especially How do I compose an effective node title? and How do I post a question effectively?.

    "Excuse me for butting in, but I'm interrupt-driven..."
Re: How to Clean up the Junks!!!
by roboticus (Chancellor) on Jun 29, 2013 at 12:02 UTC

    The only universal way of removing all the junk would be something like:

    $txt =~ s/.//g;

    It does, however, have the unfortunate side effect of removing other data as well.

    On a more serious note: What's junk for you might be important data for someone else. So it's impossible to have a universal way of removing junk unless there's a universal consensus on what constitutes junk. If you could specify what bits you consider junk, and which bits you consider useful, it would be a start.


    When your only tool is a hammer, all problems look like your thumb.

Re: How to Clean up the Junks!!!
by LanX (Bishop) on Jun 29, 2013 at 04:00 UTC
    Best ways to Clean up the Junks??? Water? If not call the Chinese embassy and ask for advice!!!

    Your data OTOH looks like URL-encoded HTML. Maybe running the according routine from (IIRC) helps?

    If you wanna get rid of the HTML-tags try a regex to delete everything between %3C and %3E (representing '<' and '>')

    Good luck showing efforts and enjoy sailing ...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: How to Clean up the Junks!!!
by ww (Archbishop) on Jun 29, 2013 at 15:48 UTC

    Perfunctory sampling reveals that your junk (looks like markup) may be that data which consistently (so far as I've checked) begins and ends with the sequence:

    percent sign, digit, uppercase ALPHA

    That might be the basis for using a regular expression to delete the markup.

    Update: forgot to mention that this is not "code-a-matic;" your request that the Monks "provide me a universal solution for this" ignores the purpose of the Monastery; to help you learn, but NOT to do your $work/homework/coding for you.

    If you didn't program your executable by toggling in binary, it wasn't really programming!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1041390]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2018-06-23 20:58 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.