Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Automatic CODE-tag creation (Prototype)

by Corion (Patriarch)
on Jun 21, 2000 at 20:28 UTC ( #19270=sourcecode: print w/replies, xml ) Need Help??
Category: HTML Utility; Text processing
Author/Contact Info Max Maischein aka Corion
Description: Out of a discussion about how we can prevent newbies from posting unreadable rubbish, here is a program that tries to apply some heuristics to make posts more readable. This version isn't the most elegant, so it's called a prototype.
#!/usr/bin/perl -w
# Really crude "post beautifyer"
# The rationale behind it is to split an unformatted post up into
# text and code, but leaning more to code than to text, as it is
# easier to read monospaced text than to read paragraph-formatted
# code.

# * Can't properly handle "print <<EOF"-style code ...

use strict; # this is becoming a habit ...
#use Safe;   # kudos to swiftone for the idea and nuance for the Safe 
use Text::Wrap;

my $filename = $ARGV[0] || $0;

my $Content = &readfile( $filename );
my %StartTag = (
  "Code" => '<PRE>',
  "Text" => '<P>',
my %EndTag = (
  "Code" => '</PRE>',
  "Text" => '</P>',

# A RE that constitutes what a variable name might be ...
my $varRE = q'[\\$\\@\\%\\*](\w[a-zA-Z_0-9]+|[\\$/\|\\@_])';

my $CodeLineBreak = "<FONT color=\"red\">+<BR>+ </FONT>";
#print $Text::Wrap::columns;
$Text::Wrap::columns = 50;

# Fix up the newlines
$Content =~ s/\n\r|\r/\n/g;

# Bail out if the user knows about CODE tags
if ($Content =~ /<CODE>/) {
  print $Content;

my (@Lines) = split /\n/, $Content;

my $LastLineType = "Text";
my $LineType = "Text";

#my ($Sandbox) = new Safe 'Sandbox';

# Hey, I'm trying to stay on the safe side. If something
# compiles fine but dosen't run, I'm all OK with that !

print $StartTag{$LineType};
foreach (@Lines) {
  # For a start, assume that the current line has the same style
  # as this line
  $LineType = $LastLineType;

  # blank lines remain blank lines
  if (/^\s*$/) {
    # keep old state
  # some really safe bets about code lines
  elsif (    /^\s*#/           # A comment line
          || /^\s*["'].*?["']\s*=>/ # A hash entry
          || /^\s*[\{\(]|[\}\)];?/     # a line containing (only) an o
+pening or closing bracket
          || /^\s*if\s*\(/     # an if statement
          || /elsif/           # elsif statements are a dead giveaway
          || /^\s*else\s*\{?\s*/ # a single else statement
          || /\s*[\$\@\%]\w+ [=!~+\-\*]?=/ # assignments
          || /^\s*use\s+\w+(::\w+)*/ # use clauses
          || /^\s*require\s+\w+(::\w+)*/ # require clauses
          || /^\s*sub\s+\w+/
          || /^\s*my\s+\(?\s*$varRE/o
          || /^\s*our\s+\(?\s*$varRE/o
          || /^\s*local\s+$varRE/o
          || /^\s*return\s+\(?\s*$varRE/o
          || /^\s*close\s+[A-Z]+/o
          || /^\s*print\W/
         $LineType = "Code";

         $_ = wrap("","",$_);
         $_ =~ s/\n/$CodeLineBreak/o;
  else {
    # Everything that hasn't been weeded out by now must be normal tex
+t ...
    $LineType = "Text"

  print( $EndTag{$LastLineType},$StartTag{$LineType} ) if ($LastLineTy
+pe ne $LineType);
  print $_, "\n";

  $LastLineType = $LineType;
print( $EndTag{$LastLineType} );

sub readfile {
  # Don't put this into production code !
  # It prints user-supplied stuff like the filename ...
  my( $filename ) = @_;
  local $/;
  local *FILE;
  open( FILE, "< $filename" ) or die "Can't read from $filename : $!\n
  my $Result = <FILE>;
  close FILE;

  return $Result;
Replies are listed 'Best First'.
RE: Automatic CODE-tag creation (Prototype)
by lhoward (Vicar) on Jun 21, 2000 at 21:08 UTC
    Maybe a good check to add would be:
    if(($reputatiuon < 50 )||( $isAnonymousMonk)){ #check to apply auto CODE tag formatting }
    Monks w/ more than 50 reputation points should know how to use code tags and deserve to be flamed if they don't. This check would prevent those of us who know better from having things we don't want code taged to be auto-code-tagged by mistake.

    The "auto-code-tag" process should also alert the user that their post was auto-code-tagged and advise them how to use code tags in the future.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://19270]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (1)
As of 2022-05-22 19:36 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (81 votes). Check out past polls.