Converting Lucas Chess personal opening guides to PGN

Most chess software tools generally allow for importing/exporting between any of their own formats for chess games or positions or whatever and PGN. Lucas Chess seems to be, for the most part, no exception; however, while it allows for importing from PGN to its Personal Opening Guide structure (.pgo files), it oddly doesn’t have an option to export in the other direction. I’m not a Lucas Chess user myself, but I stumbled upon that fact in a forum post from someone wanting to export their .pgo file to PGN for use in some other chess software, and asking if someone could program a converter for them.

That piqued my curiosity, so here I’ll lay out a quick way to get from a .pgo file to a standard .pgn file that can be imported to other programs, using nothing but an R script and David Barnes’ free pgn-extract utility.

It turns out that each .pgo file is just a SQLite database containing a single table called GUIDE. Some of the columns of the GUIDE table look like this:

XPV                 PV      POS
---                 --      ---
DT                  c2c4    1
DTn^                e7e5    2
DTn^;L              b1c3    3
DTn^;Lxg            g8f6    4
DTn^;Lxg@O          g1f3    5
DTn^;Lxg@Osd        b8c6    6
DTn^;Lxg@OsdFN      e2e3    7
DTn^;Lxg@OsdFNwS    f8b4    8

As can be seen, the XPV column contains strings that encode sequences of moves (in some encoding scheme that we don’t need to worry about), and the corresponding PV entry gives the final move of the sequence in long algebraic notation. The XPV for the eighth entry above encodes the sequence (in standard algebraic notation) 1.c4 e5 2.Nc3 Nf6 3.Nf3 Nc6 4.e3 Bb4. Ultimately, we need to get all of the move sequences contained in the GUIDE table into this standard algebraic notation that .pgn files use. We do this in two steps: first an R script that constructs the sequences in long algebraic notation, and then using pgn-extract to convert that to a proper .pgn file that can be imported and used freely.

The following R script converts a file called lucas_guide.pgo into a file called intermediate_output.txt that consists of all move sequences from the GUIDE table, presented in long algebraic notation.

## Path to your Lucas Chess .pgo file
yourPGOfile <- "lucas_guide.pgo"
## Path to the output file for this script, which will then feed to
## pgn-extract
outputFilePath <- "intermediate_output.txt"


## Use the RSQLite package to read the .pgo chess content into a data
## frame
con <- dbConnect(drv=RSQLite::SQLite(), dbname=yourPGOfile)
openingGuide <- dbGetQuery(conn=con,
                           statement="SELECT * FROM 'GUIDE'")

## We're only going to work with a couple of columns
openingGuide <- openingGuide[,c("XPV","PV")]

## Determine which moves in the guide are terminal (end of a line)
allXPV <- openingGuide$XPV

isEndOfLine <- function(x) {
               FUN=function(y){grepl(x,y,fixed=TRUE) & !(x==y)}))

openingGuide$isEndOfLine <- sapply(openingGuide$XPV,FUN=isEndOfLine)
endsOfLines <- openingGuide[openingGuide$isEndOfLine>0,]$XPV

## For each terminal move, reconstruct the line that leads to it, in
## long algebraic notation, and add a "*" character at the end to
## separate the lines as "games" in a file
allLines <- character()

for (j in 1:length(endsOfLines)) {
  aux <- openingGuide
  aux$includeInLine <- sapply(openingGuide$XPV, FUN=function(x) {
  aux <- aux[aux$includeInLine,]
  aux <- aux[with(aux,order(XPV)),]
  allLines[[j]] <- paste(paste(aux$PV,collapse=' '),'*',sep=' ')

## Write the lines/games to an output file, to feed into pgn-extract
writeLines(allLines, fileConn)

Barnes’ pgn-extract utility can then convert intermediate_output.txt into a standard .pgn file final_output.pgn as follows:

pgn-extract --output final_output.pgn intermediate_output.txt

That’s all there is to it, as long as you are after just the moves, and not, say, textual comments or NAGs that were included in the opening guide. Those could be had too, by enhancing the R script above to use appropriately the NAG and COMMENT columns that are part of the GUIDE table, but I leave that to any interested party.

Category-theoretic view of SQL

David Spivak has given a nice, category-theoretic way of looking at certain elements of data analysis, and here I’m just going to spell out a few of the basic ideas for myself, comparing a handful of concrete SQL examples to the abstract machinery Spivak lays out. I might delve into more bits and pieces at a later date.

As I was going through his paper, I found myself thinking that these formulations of schemas and tables as category-theoretic structures could actually be practically useful, say in making more robust and manageable/trackable entity relationship models than what might exist out there right now. As it turns out, Spivak (along with others like Ryan Wisnesky) does indeed appear to be working on turning these ideas into usable tools, as evidenced by their Algebraic Query Language and a related commercial endeavor. It really seems like an interesting theoretical direction math-wise, and a promising possibility for use in actual data-analytic work.

The schema for a table is a specification of its column names, along with data type assignments for each column:

-- schema definition for a table 'my_table'
create table my_table (
   col1  integer
  ,col2  char(1)

Describing the schema mathematically, then, it consists of an ordered set C := \langle col1, col2\rangle of column names, along with a function \sigma : C \rightarrow \mathbf{DT} that assigns each column in the list to a SQL data type. (So we’re using \mathbf{DT} to denote the set of all data types available in SQL, like integer, char(1), varchar(27), etc.) To fix ideas, for the rest of this post I’ll assume that \mathbf{DT} actually consists of just the two types from our schema above: that we have only an integer type, and a single-character string type.

Now let’s start to get a little more abstract with our schema. We’ll let U be the infinite set of all objects that are of either type in \mathbf{DT}; that is, U is the union of the set \mathbb{Z} of integers, and the set of all single-character strings from our alphabet. Basically, U is the universe of all values that a cell of a data table in our world (with its limited \mathbf{DT} set) might be populated by.

Let’s let \pi : U \rightarrow \mathbf{DT} be the function that maps each object to its type, i.e. that maps each integer to the integer type and each single-character string to the char(1) type. Between our schema function \sigma and this type assignment \pi, we have the following picture:

\begin{CD} @. U \\ @. @V{\pi}VV \\ C @>{\sigma}>> \mathbf{DT} \end{CD}

To complete this picture a little bit, we want to add another idealization; we have the universe U of all values that could possibly go into a cell in some table, and we’d now like to craft the universe U_\sigma of all cell value assignments that could appear in a table that has schema \sigma. The full Cartesian product C\times U is overly inclusive, as we would have elements like \langle col2, 35\rangle, where an integer value has been erroneously assigned to col2. The right specification of the \sigma universe is

U_\sigma := C \times_\mathbf{DT} U = \{\langle c,u\rangle \in C\times U : \sigma(c)=\pi(u)\}

That is, it’s the most inclusive subset of C\times U that still respects the type assignments of the schema in questions, i.e. such that the following diagram commutes (where the unlabeled maps are the respective projection maps from U_\sigma to each of C and U):

\begin{CD} U_\sigma @>>> U \\ @VVV @V{\pi}VV \\ C @>{\sigma}>> \mathbf{DT} \end{CD}

Category-theoretically speaking, this U_\sigma is the pullback of \pi along \sigma.

Fixing the universal type assignment \pi, Spivak notes that with an appropriate notion of morphism between schemas \sigma : C \rightarrow \mathbf{DT} and \sigma' : C' \rightarrow \mathbf{DT}, we have a category of schemas of type \pi. Next up will be to build on this to form a category of data tables …