R based game “2048” with a simple API for ML benchmarking

For a long time I have been on the lookout for a ready-to-use R-based API for testing ML algorithms. My ideal “tool” would be an R-based API for the game “go” (“weiqi” in Chinese and “baduk” in  Korean), but I found none so far.

Since writing a rules engine is quite time-consuming, I would not venture developing one until recently when I stumbled upon the game “2048.” Many people use this simple game to benchmark their machine learning algorithms; and YouTube  has so many machine learning demos using “2048” that I won’t even list them here. To my surprise, all the videos I watched seem to use a java script based front-end for testing. Using a java script connection seemed like an over-complication for me including an additional bottleneck.

So after spending an hour building my own R-based version of “2048,” I decided to see what other options were available; and I found this nice implementation in C: https://github.com/mevdschee/2048.c by Maurits van der Schee. Porting from C saved me a lot of time. Many thanks to the original author.

I added a simple API for attaching the game code to a ML algorithm. I will just make a note that each game may be run in its own R environment, which allows for easy set up of parallel computing using the standard “foreach.” Other than that, the whole code is a little over 400 lines, so everything about its usage should be self-explanatory.

The program could be run in an interactive text mode as well (using `main_interactive()`). However, I found no way to port text coloring to the console of RStudio. Still, as this code is meant to be played by machines rather than human users, I suppose, I achieved my goal.

You can download the R version of the code from my github repository: github.com/cloudcell/2048_4ML. Comments / suggestions are always welcome.


Usage

# interactive mode
p.env <- new.env()
main_interactive(p.env)

# using for benchmarking
p.env <- new.env()
main_ML_init(p.env)
main_ML_run(p.env, m =”L” )
# use p.env$board to retrieve board state
p.env$board
main_ML_run(p.env, m =”D”, show_board = FALSE)
p.env$board
# …

 

Introducing Package ‘fuzztest’

This is a tool for code fault analysis. I built it to automate the most boring part of my debugging process.

The package automates test setup and logging and visualizes function exit states in a way that simplifies identification of root causes of software defects. Fuzzing is implemented by random generation of input parameters as shown in a demo below. Finally, even though there was no goal to make this tool as another unit testing package, one can use it as such, as this tool should potentially assure 100% code coverage with minimal effort.

The package tests all possible combinations of input parameters and produces statistics and visuals. If you have some specific requirements, you can simply build a wrapper function that catches output you are interested in and generates an error if a required condition is not met. Then submit your wrapper function for testing. You can even compare current output values against values recorded in a log during a ‘reference’ test run, effectively making a comprehensive unit test.

The package can be installed from here: https://github.com/cloudcell/fuzztest/.

Below is a presentation of a built-in demo. You can run it using `demo(fuzzdemo)`.

I will be grateful for your comments and suggestions.

 


 

> demo(fuzzdemo)

    r <- list()
    r$x <- c(0)
    r$y <- c(0)
    r$option <- c("a", "b", "c")
    r$suboption <- c("a", "b", "c","d")
    
    generate.argset(arg_register = r, display_progress=TRUE)
    apply.argset(FUN="fuzzdemofunc")
    test_summary()
    plot_tests()
    plot_tests(fail = F)    
    plot_tests(pass = F)
===LOG OMITTED===

Fuzztest: Argument-Option Combination Results
===================================================
   ARG~OPT     Arg Name     PASS    FAIL    FAIL%
---------------------------------------------------
   1 ~    1    x               5       7     58.3
   2 ~    1    y               5       7     58.3
   3 ~    1    option          4       0      0.0
   3 ~    2    option          1       3     75.0
   3 ~    3    option          0       4    100.0
   4 ~    1    suboption       2       1     33.3
   4 ~    2    suboption       1       2     66.7
   4 ~    3    suboption       1       2     66.7
   4 ~    4    suboption       1       2     66.7
===================================================

Fuzztest: Summary
========================================================================
  Arg Name            Failure Rate Contribution, % (Max - Min)          
------------------------------------------------------------------------
          x    0.0  '                                                  '
          y    0.0  '                                                  '
     option  100.0  '**************************************************'
  suboption   33.3  '*****************                                 '
========================================================================

The summary shows that the argument 'option' explains the most
variability in the outcome. So let's concentrate on the arg. 'option.'

The detailed statistics table shows that most failures occur when  
value #3 is selected within the argument 'option'. At the same time, 
a test log (omitted here) shows that the types of errors are mixed. 
For now, however, let's assume that fixing the bugs related to  
control flow is more important. 

The following three graphs will demonstrate how the data above
can be represented visually. Notice that some lines are grouped
when they intersect vertical axes. The groups correspond to specific
options and are ordered from the bottom of the chart to the top: 
i.e. the fist grouping of lines at axis 'suboption' (at the bottom)
corresponds to value 'a', the next one up is suboption 'b', and so on.
In case an argument has only one value in the test, the whole
group of lines will be evenly spread from the bottom to the top of 
the chart, as is the case for arguments 'x' and 'y'.

* All test cases:

download

One can also selectively display only passing or failing tests
as will be shown next.

* Only 'passing' test cases:
download (1)

* Only 'failing' test cases:
download (2)

Let's assume all the control flow related bugs discussed above 
are fixed now. To make this assumption to "work" during testing
we will simply choose a combination of options that will not 
cause the demo function to produce 'fail' states shown above. 
Such a combination could be {x=0, y=0, option='a', suboption='a'}.

Now we will concentrate on the numeric part of the test.
There are two main testing approaches:
 1. Create an evenly spaced sequence of values for each parameter 
    (x and y) from lowest to highest and let the argument set generator
    combine and test these values. This approach has an advantage 
    for more intuitive visualization as sequences of values
    for testing will be aligned with the vertical axis. For example, 
    if we create a test sequence [-10;+10] for argument 'x', 
    visualized test results will list those from 'Min' to 'Max'. 
    So finding simple linear dependencies that cause errors will 
    be easier as it will be easier than when using a random set
    of values (below).
 2. Generate random parameters for selected arguments and let the test
    framework test all possible parameter combinations.
  
  
The First Approach: Ordered Test Sequences  
    r <- list()
    r$x <- c(seq(from=-5, to=5, length.out = 11))
    r$y <- c(seq(from=-5, to=5, length.out = 11))
    r$option <- c("a")
    r$suboption <- c("a")
    
    generate.argset(arg_register = r, display_progress=TRUE)
    apply.argset(FUN="fuzzdemofunc")
    test_summary()
    plot_tests()
===LOG OMITTED===

Fuzztest: Argument-Option Combination Results
===================================================
   ARG~OPT     Arg Name     PASS    FAIL    FAIL%
---------------------------------------------------
   1 ~    1    x               9       2     18.2
   1 ~    2    x              10       1     9.09
   1 ~    3    x               9       2     18.2
   1 ~    4    x              10       1     9.09
   1 ~    5    x               9       2     18.2
   1 ~    6    x              10       1     9.09
   1 ~    7    x               9       2     18.2
   1 ~    8    x              10       1     9.09
   1 ~    9    x              10       1     9.09
   1 ~   10    x              10       1     9.09
   1 ~   11    x              10       1     9.09
   2 ~    1    y              11       0      0.0
   2 ~    2    y              10       1     9.09
   2 ~    3    y              10       1     9.09
   2 ~    4    y              10       1     9.09
   2 ~    5    y              10       1     9.09
   2 ~    6    y               9       2     18.2
   2 ~    7    y               9       2     18.2
   2 ~    8    y               9       2     18.2
   2 ~    9    y               9       2     18.2
   2 ~   10    y              10       1     9.09
   2 ~   11    y               9       2     18.2
   3 ~    1    option        106      15     12.4
   4 ~    1    suboption     106      15     12.4
===================================================

Fuzztest: Summary
========================================================================
  Arg Name            Failure Rate Contribution, % (Max - Min)          
------------------------------------------------------------------------
          x   9.09  '*****                                             '
          y   18.2  '*********                                         '
     option    0.0  '                                                  '
  suboption    0.0  '                                                  '
========================================================================

The test summary shows that argument 'y' contributes to 
failure the most.

What about the chart?

Rplot02

Now one can clearly see two linear relationships between
'x' and 'y'. These correspond to 'numeric bugs' #1NC and #4NC
(Please, see details in the file 'include_fuzzdemofunc.R')

Let's assume the previously discovered bugs have been fixed.
So we will again choose a different combination of input parameters
for arguments 'option' and 'suboption' for the next test.
  
  
The Second Approach: Random Test Sequences  
  
It makes no sense testing options randomly as all those 
combinations of values will be tested anyway. So the test 
will be conducted for numeric arguments only.

This test has 900 cases and might take a couple of minutes,
so you have time to pour yourself a cup of coffee: (_)]...

    set.seed(0)
    r <- list()
    r$x <- runif(15, min=-10, max=10)
    r$y <- runif(15, min=-10, max=10)
    r$option <- c("b")
    r$suboption <- c("a","b","c","d")
    
    generate.argset(arg_register = r, display_progress=TRUE)
    apply.argset(FUN="fuzzdemofunc")
    test_summary()
    plot_tests()
===LOG OMITTED===

Fuzztest: Argument-Option Combination Results
===================================================
   ARG~OPT     Arg Name     PASS    FAIL    FAIL%
---------------------------------------------------
   1 ~    1    x              35      25     41.7
   1 ~    2    x              35      25     41.7
   1 ~    3    x              35      25     41.7
   1 ~    4    x              35      25     41.7
   1 ~    5    x              35      25     41.7
   1 ~    6    x              35      25     41.7
   1 ~    7    x              35      25     41.7
   1 ~    8    x              35      25     41.7
   1 ~    9    x              35      25     41.7
   1 ~   10    x              35      25     41.7
   1 ~   11    x              35      25     41.7
   1 ~   12    x              35      25     41.7
   1 ~   13    x              35      25     41.7
   1 ~   14    x              35      25     41.7
   1 ~   15    x              35      25     41.7
   2 ~    1    y              45      15     25.0
   2 ~    2    y              30      30     50.0
   2 ~    3    y              30      30     50.0
   2 ~    4    y              45      15     25.0
   2 ~    5    y              30      30     50.0
   2 ~    6    y              45      15     25.0
   2 ~    7    y              45      15     25.0
   2 ~    8    y              30      30     50.0
   2 ~    9    y              30      30     50.0
   2 ~   10    y              30      30     50.0
   2 ~   11    y              30      30     50.0
   2 ~   12    y              30      30     50.0
   2 ~   13    y              30      30     50.0
   2 ~   14    y              30      30     50.0
   2 ~   15    y              45      15     25.0
   3 ~    1    option        525     375     41.7
   4 ~    1    suboption     225       0      0.0
   4 ~    2    suboption     195      30     13.3
   4 ~    3    suboption       0     225    100.0
   4 ~    4    suboption     105     120     53.3
===================================================

Fuzztest: Summary
========================================================================
  Arg Name            Failure Rate Contribution, % (Max - Min)          
------------------------------------------------------------------------
          x    0.0  '                                                  '
          y   25.0  '************                                      '
     option    0.0  '                                                  '
  suboption  100.0  '**************************************************'
========================================================================

The test table shows that suboption #3 ('c') is always failing.

Let's see if the visual approach provides a better perspective.

Rplot05

This graph has a confusing order of axes at this point.
An axis that has only one option should either be hidden or placed
at an edge of the chart so relations with other parameters could
be visible. To reorder axes, for simplicity, we will quickly create 
a smaller test with a different sequence of arguments, which will 
change the sequence of axes.

    set.seed(0)
    r <- list()
    r$x <- runif(5, min=-10, max=10)
    r$y <- runif(5, min=-10, max=10)
    r$suboption <- c("a","b","c","d")
    r$option <- c("b")
    
    generate.argset(arg_register = r, display_progress=TRUE)
    apply.argset(FUN="fuzzdemofunc")
    test_summary()
    plot_tests()
===LOG OMITTED===

Fuzztest: Argument-Option Combination Results
===================================================
   ARG~OPT     Arg Name     PASS    FAIL    FAIL%
---------------------------------------------------
   1 ~    1    x              12       8     40.0
   1 ~    2    x              12       8     40.0
   1 ~    3    x              12       8     40.0
   1 ~    4    x              12       8     40.0
   1 ~    5    x              12       8     40.0
   2 ~    1    y              10      10     50.0
   2 ~    2    y              15       5     25.0
   2 ~    3    y              15       5     25.0
   2 ~    4    y              10      10     50.0
   2 ~    5    y              10      10     50.0
   3 ~    1    suboption      25       0      0.0
   3 ~    2    suboption      15      10     40.0
   3 ~    3    suboption       0      25    100.0
   3 ~    4    suboption      20       5     20.0
   4 ~    1    option         60      40     40.0
===================================================

Fuzztest: Summary
========================================================================
  Arg Name            Failure Rate Contribution, % (Max - Min)          
------------------------------------------------------------------------
          x    0.0  '                                                  '
          y   25.0  '************                                      '
  suboption  100.0  '**************************************************'
     option    0.0  '                                                  '
========================================================================

Rplot06--

The textual test summary shows the same pattern as in the previous 
test. Also, a reduced set of test cases produced a more transparent
representation of test results without losing important details.

There are many ways to proceed from here:
* if some 'error' states are valid, one can exclude them from tests 
  using the 'subset' argument of apply.argset().
* if bugs are trivial, one can eliminate them one by one.
* if faults are intractable, one can start with narrowing down the
  range of input parameters and further analyze function behavior.

-------------
 End of Demo 
-------------


The following is the test function used in the demo

#' Generates errors for several combinations of input parameters to test the
#' existing and emerging functionality of the package
#'
#' Whenever options lead the control flow within a function to a 'demo bug', 
#' the function stops and the test framework records a 'FAIL' result.
#' Upon a successful completion, the function returns a numeric value into the 
#' environment from which the function was called.
#'
#' @param x: any numeric scalar value (non-vector)
#' @param y: any numeric scalar value (non-vector)
#' @param option any character value from "a", "b", "c"
#' @param suboption any character value from "a", "b", "c", "d"
#' 
#' @author cloudcell
#' 
#' @export
fuzzdemofunc <- function(x, y, option, suboption)
{
    tmp1 <- 0
    switch(option,
           "a"={
               switch(suboption,
                      "a"={                                    },
                      "b"={                                    },
                      "c"={ if(x + y <0) stop("demo bug #1CF (control flow)") },
                      "d"={                                    },
                      { stop("Wrong suboption (valid 'FAIL')") }
               )
               if(abs(x-y+1)<0.01) stop("demo bug #1NC (numeric calc.)")
           },
           "b"={
               x <- 1
               switch(suboption,
                      "a"={ x <- x*1.5                         },
                      "b"={ x <- y                             },
                      "c"={ y <- 1 }, "d"={ if(x>y) stop("demo bug #2CF (control flow)") },
                      { stop("Wrong suboption (valid 'FAIL')") }
               )
               if(abs(x %% 5 - y)<0.01) stop("demo bug #2NC (numeric calc.)")
           },
           "c"={
               switch(suboption,
                      "a"={ stop("demo bug #3CF (control flow)") },
                      "b"={                                    },
                      "c"={                                    },
                      "d"={  rm(tmp1)                          },
                      { stop("Wrong suboption (valid 'FAIL')") }
               )
               if(abs(x %/% 5 - y)<0.01) stop("demo bug #3NC  (numeric calc.)")
           },
           { stop("Wrong option (valid 'FAIL')") }
    )
    
    if(!exists("tmp1")) stop("demo bug #4CF (control flow)")
    
    result <- x - y*2 + 5
    
    if(abs(result)<0.01) stop("demo bug #4NC  (numeric calc.)")
    
    result
}

			

Testing R Code

Among various ways to test R code on GitHub / Travis / Codecov, there exist four main approaches:

  1. use RUnit package
  2. use testthat package
  3. use one’s own custom function (what I’ve been doing so far)
  4. save test output as reference & compare modified code output against it

After reading this post [http://yihui.name/en/2013/09/testing-r-packages/], I realized that saving reference values to compare against them the output after code is modified does not allow TDD, or test driven development. So the tests will always “drag behind” the development process.

I am currently using option #3. However, there are obvious shortcomings of this approach in large projects. Since I am using R as well as other languages, naturally, my choice falls on RUnit (xUnit framework) as multiple languages use this format and this fact will make life easier in the long run.

Key points about the testing workflow:

  1. install the package
  2. test the package
  3. testing in development mode is a separate matter and won’t be my primary concern

File locations (using Dirk Eddelbuettel’s github repo as an example):

“package_root/tests” folder contains only the file “doRUnit.R” that launches tests {launcher example}

“package_root/inst/unitTests” contains a file with the primary testing suite builder code {suit_builder example} and test code files {test files examples}. “unitTests” folder will be moved into the package root folder after installation and will become accessible to the ‘laucher’ R code sitting in the “package_root/tests” folder.

References for RUnit:

PS: A good review of testing packages (pros, cons, usage):