Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Parsing Chess Algebra Notation

by cyocum (Curate)
on May 04, 2004 at 13:33 UTC ( [id://350332]=perlquestion: print w/replies, xml ) Need Help??

cyocum has asked for the wisdom of the Perl Monks concerning the following question:

I would like to thank you all in advance for helping me with this.

I am writing a Chess Engine in Perl. I know there are several already on CPAN but they seem to have been abandoned and I was hoping to keep my skills up while I am in grad school.

My question is how do I go about parsing Chess algebra notation? I will give you a few examples. For instance, e4 means "move pawn on e2 to e4." Well this is easy enough but it gets more complicated. Nf3 means "move Knight on g1 to f3". As you can see the piece letter, in this case N for Knight, is not always required. Also, sometimes you get e4xd5 which means "pawn on e4 takes piece on d5". In the case of Rooks, you can get: Rad4, which means "move the Rook which is on column a and move it to d4" (this is in case both rooks can move to d4 legally).

Currently I have this:

my $piece; my $col; my $row; my $capture; #parse Nf4 where N is not always present my @parsed = split //, $move; if(scalar @parsed == 2) { $col = pop @parsed; $row = pop @parsed; } elsif (scalar @parsed == 3) { $piece = pop @parsed; $col = pop @parsed; $row = pop @parsed; } elsif(scalar @parsed == 4) { $piece = pop @parsed; $capture = pop @parsed; #this is the capture symbol $col = pop @parsed; $row = pop @parsed; } else { die "not known algebra notation\n"; }

It is horrible and not complete and I know it. I was thinking about using split on this but because the first letter is not always the piece letter, it gets confusing.

Again, thanks and any help would be greatly appreciated!

Replies are listed 'Best First'.
Re: Parsing Chess Algebra Notation
by gmax (Abbot) on May 04, 2004 at 13:52 UTC

      I am currently using Chess::PGN::Parse and have had no issues with it at all. I definately suggest it over attempting to write your own.

      If you are just writing your own version for fun, practice, or to learn something new, I'd say, "Great! Go for it!" Just don't release it to the CPAN unless you are adding a lot of new functionality or there is a speed improvement. If you have questions on releasing it or not, the authors of the other chess modules are here, and I'm sure they'd be glad chat about chess and Perl.

      Bingo bingo bingo, thank you very much this is EXACTLY what I have spent the last few hours trying very hard NOT to start writing!
Re: Parsing Chess Algebra Notation
by EdwardG (Vicar) on May 04, 2004 at 13:51 UTC

    This is a nice problem, one which can give you many hours of enjoyment, depending of course on how you define enjoyment.

    I've been throught this exercise a number of time in different languages, so let me give you my thoughts on where the complexity lies.

    First, as you say, it is not enough to simply derive the algebraic form of the from and to squares. If it was then something as simple as this would suffice for each1

    sub sqToAlg { my $sq = shift; # Assumes squares are numbered 0 to 63, # from right to left, bottom to top substr('abcdefgh',7 - $sq % 8,1) . ($sq / 8); }

    But that is far from enough. You need to know the following as well:

    • Was the toSq a capture?
    • Did the move result in simple check
    • Did the move result in simple stalemate
    • ...or stalemate by the 50-move rule
    • ...or stalemate by repetition (must be claimed)
    • If the piece moved was a pawn, did it promote to something?
    • Of course, checking legality can be assumed, right?
    • Has the flag fallen (if the game is timed)
    • Was this an en passant capture eg e2xf3ep.

    Well, you get the idea. And some of these can combine to make wonderful moves like e7xf8=Q++

    Best of luck with it, and I for one would very much like to see your code when you finish!

    Update: I recommend that to start with you deal with the simple form a1a2 and worry about the more complex notation later. There is so much more to consider (much more interesting too) that this should be passed over quickly. For instance, think about move generation first, perhaps some bit twiddling for efficiency, or bitmaps to help you with simultaneous generation of (in particular) pawn moves. Minimax, or more specifically AlphaBeta pruning thereof still keeps me amused for hours whenever I revisit. Ahh...and I must finish the version in C#, one day, perhaps when the Delphi version rises above 1500...


    1 converted from C# without testing


      I think your starting simple and moving up to complex is a better way to do it (in fact, I should have remembered it as a general rule).

Re: Parsing Chess Algebra Notation
by dragonchild (Archbishop) on May 04, 2004 at 14:13 UTC
    Algebraic notation is relative to the current board position. So, you have to parse it backwards.

    The last character may or may not be a '+', indicating check. (I'm going to assume that there is no analysis on the game, so I will ignore !, !!, ?, ??, ?!, and !?.) So, we have to see if that exists. If it does, we may or may not want to verify that, after the move, a state of check exists.

    After we look for the '+', the last two characters are guaranteed to be the destination square in the form /[a-h][1-8]/.

    Any remaining characters are the specification for the piece and/or starting-square coordinates. (Remember, we're still parsing the string in reverse.)

    • If there is nothing there, look for a pawn that could make it to this square.
    • If there is an 'x', that means that there is a capture. (This notation, while legal, isn't common.)
    • If there is a K, Q, R, N, or B, look for the appropriate piece that could make it to this square. There should be only one of that piece type that can legally make this move.
    • If there is [a-h1-8]+, that means that there is either more than one officer of that piece type that can make this move or it is a pawn capture (ed5 is a legal notation). Remember, Qa2b3 means that there are three queens, potentially at a2, a4, and b4. All of them can move to b3.

    The key is parsing it backwards.

    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Remember, Qa2b3 means that there are three queens, potentially at a2, a4, and b4. All of them can move to b3.


      I would interpret Qa2b3 to only mean that a Queen at a2 is moving to b3 - I don't know how you could derive the existence and position of two other Queens from this.

      And from my playing experience I had thought that when two or more of the same piece could move to a given square, you added the minimal extra information to disambiguate, either the rank number or file letter. For example in the following position there are two rooks that can move Ra4, so to disambiguate I add the rank, thus R8a4.

      +---+---+---+---+---+---+---+---+ 8 | R | | | | | | | | +---+---+---+---+---+---+---+---+ 7 | | | | | | | | | +---+---+---+---+---+---+---+---+ 6 | | | | | | | | | +---+---+---+---+---+---+---+---+ 5 | | | | | | | | | +---+---+---+---+---+---+---+---+ 4 | | | | | | | | | +---+---+---+---+---+---+---+---+ 3 | | | | | | | | | +---+---+---+---+---+---+---+---+ 2 | | | | | | | | | +---+---+---+---+---+---+---+---+ 1 | R | | | | | | | | +---+---+---+---+---+---+---+---+ a b c d e f g h

      There is an excellent description of this notation at wikipedia


        I think he meant that with Qa2b3 that there could potentially be up to three other queens, precisely because you should only provide the minimal disambiguation.
        • If Qb3 is insufficient,
          there must be another queen that can move on b3.
        • If Qab3 is insufficient,
          there must be another queen in column a that can move on b3.
        • If Q2b3 is insufficient,
          there must be another queen in rank 2 that can move on b3.
        • Since both Qab3 and Q2b3 are insufficient, the board must be crawling with queens poised to attack b3.

        [ e d @ h a l l e y . c c ]

        This discussion seems to assume English Short Algebraic Notation. For a parser this seems unwise.

        People use and generate chess notation and they will enjoy being able to follow their usual habits, especially if they are the sort to fall into english descriptive or long algebraic under stress.

        Any good chess notation parser will need to maintain or reference a board image.

      I took your advice and tried it backwards. As you will notice, I did this in a step by step fashion. I am sure that there are much more effient ways to do this. Also, it does not yet support the edge cases that you are talking about (that includes promotion). I just wanted to layout a bit more code for general commments and improvments. Also, thanks for all the help. I really appreciate it!

      use strict; use warnings; my $move = "Qe4++"; #reverse as per dragonchild's suggestion my @parsed = reverse split //, $move; print join('', @parsed) . "\n"; my $check; my $row; my $col; my $piece; #convert the letters into appropriate numbers #there is probably a faster way to to this my %col_convert = (a => 0, b => 1, c => 2, d => 3, e => 4, f => 5, g => 6, h => 7); foreach my $elem (@parsed) { #check for, well, check if($elem eq '+') { #start the process to see if this is a checkmate my $nextElem = shift @parsed; print "nextElem $nextElem" . "\n"; #if it is then print if($nextElem eq '+') { print "Checkmate\n"; next; } else { #otherwise put it back on the stack for the next #iteration unshift @parsed, $nextElem; } #then print check in this instance print "Check\n"; }elsif($elem =~ m/[1-8]/) { $row = $elem; print "Row: $row\n"; }elsif($elem =~ /[a-h]/) { #convert to number $col = $col_convert{$elem}; print "Col: $col\n"; }elsif($elem =~ m/x/) { #this is a capture to that square print "Capture on $col $row\n"; }elsif($elem =~ /[KQRNB]/) { $piece = $elem; print "Piece to move is $piece\n"; } else { die "not a valid move\n"; } }

        Why are you making it so complicated for yourself? All you want to do is build a string that corresponds to the move. This means that you're attempting to build a state representation by parsing another state representation, one character at a time. The following may or may not be complete, but it's a good start. (I didn't do castling, leaving that as an exercise for the reader. I'd probably special-case it.)

        And the test code ...

Re: Parsing Chess Algebra Notation
by hsmyers (Canon) on May 04, 2004 at 15:35 UTC

    As someone who has 'been there, done that' here is a random collection of tips and advice:

    • Don't worry about re-inventing wheels---there are never enough wheels and not all of them are round!
    • Restrict your input language. In this case I'd suggest PGN (Portable Game Notation) developed by Steven J. Edwards. It is a standard in the chess world and will make you life a lot more fun.
    • Resist the notion that you don't need to 'play' the game you are parsing.
    • Do the easy stuff first.
    • Find someone to ask stupid questions of. PM is good for overall stuff. Idiots like me for chess parsing. Experts like Gmax for the rest.
    • Regress. Then do it again! I know of no better task than parsing to implement the pragmatic approach to testing. In a as yet unpublished English to algebraic translator, the testbase includes 3000 games with known results to test against. Every time I think I'm safe---things go south and I'm back to testing. You are never safe---test always!
    • Remember, Free Advice---and to paraphrase Curtis Mathis, darn well worth it!


    "Never try to teach a pig to wastes your time and it annoys the pig."
Re: Parsing Chess Algebra Notation
by flyingmoose (Priest) on May 04, 2004 at 22:05 UTC
    I'm a lazy guy, but I'd just implement algebraic notation and worry about the classical notation parsing later. The parsing isn't the hard part, it's the shorthand. Rf3 implies questions like "which rook?", and "QPxR" is sometimes shortenable to "PxR", but not always.

    I've started a chess engine a few times (never really in Perl), and I always stopped earlier due to the thought that, really, I'd rather play humans :) Anyhow, it's a fun effort, full of lots of interesting questions like how to write efficient move validators, alpha/beta pruning, etc, etc, etc. I'm almost more than tempted to pick this up again, perhaps writing something a little less overplayed than chess, such as perhaps Abalone -- yeah, one ships with GNOME and/or KDE, but it could use a higher quality replacement that allows sidereal moves.

    As a final thought, you may be able to extract some algorithms for PGN parsing from XBoard.

    Good show for keeping up the motivation! I need more of that myself.

Re: Parsing Chess Algebra Notation
by podian (Scribe) on May 04, 2004 at 19:56 UTC
    actully e4 or f4 always means Pe4 (Pawn to e4) or Pf4.

    so one idea is to fully expand it first and then parse it.

    Of course, you need to handle things like

    e2e4 and Rad4 and so on.

    I am also certain that if the first letter is not upper case then it is a pawn move(i.e. all the other moves must give the name of the piece that is moving)

    so you could change e2e4 to Pe2e4 or Pe2Pe4

Re: Parsing Chess Algebra Notation
by davidj (Priest) on May 14, 2004 at 17:26 UTC
    If all you want to do is parse the individual moves (without validating, etc), the following regex will do the trick:

    ($p, $f, $r, $a, $d, $c) = $move =~ /([BKNPQR]?)([a-h]?)([0-9]?)([x=]?)([BKNPQR]|[a-h][1-8])([+#]?)/;

    where the variables are as follows:
    $move = the move being parsed
    $p = the piece being moved
    $f = the file being moved from
    $r = the rank being moved from
    $a = action (x = capture, '=' = promote)
    $d = destination square (or promotion piece if $a = '=')
    $c = check or mate
    As an example, the move you gave, Nf4, will parse as follows:
    $p = N, $f = "", $r = "", $a = "", $d = b4, $c = ""

    hope this helps,
Re: Parsing Chess Algebra Notation
by oha (Friar) on May 06, 2004 at 22:43 UTC
    I wrote a Wiki using Parse::RecDescent and i think it would be nice to write something like for chess

    the logic is that you can prepare each possible statement, and for each u have to check if it is a valid move _and_ only one piece move there

    the possible grammar become to P::RD, than the move check an unicity is obtained from your logic. it would be easier to add a grammar "variant" this way


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://350332]
Approved by Roger
Front-paged by cchampion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-05-29 15:59 GMT
Find Nodes?
    Voting Booth?

    No recent polls found