Random Number Generation

Introduction

Random number generation is an important tool which finds application in computer science algorithms, statistics, simulation, numerical methods and so on. It is therefore very important to understand different approaches to random number generation and their limitations. In this article we will look at the basic pseudo random number generator, cryptographically secure pseudo random number generator, quasi random number generator and true random number generators. A related topic of parallel/distributed random number generation will also be addressed in this article.

Pseudo Random Number Generator (PRNG)

These are completely deterministic random number generators. they are very fast, have very long periodicity and pass most statistical tests for randomness. They are suitable for most applications with the exception of cryptographics usecases. Once the seed is set, same sequence can be re-generated, which is useful for reproducible research. The built-in random number generation uses Mersenne Twister PRNG by default.

RNGkind()

## [1] "Mersenne-Twister" "Inversion"

set.seed(123)
sample(10)

##  [1]  3  8  4  7  6  1 10  9  2  5

Cryptographically Secure Pseudo Random Number Generator (CS-PRNG)

They are used in cryptographic applications where non-predictability is important. Which means observing the random numbers to deduce the state information (and hence the next random number) should be difficult. In R, packages like randaes and openssl support generation of cryptographically secure random numbers.

First using randaes package:

RNGkind("user")
RNGkind()

## [1] "user-supplied" "Inversion"

set.seed(42)
runif(10)

##  [1] 0.6568692 0.3456236 0.2441115 0.2374188 0.6248742 0.7974705 0.3529777
##  [8] 0.3157614 0.4291010 0.5479084

Next using openssl package:

# to produce integers between 0-255
rand_bytes(10) %>% as.numeric()

##  [1] 250 123   1 205 247  67 106  92  51 240

Randomized Quasi Random Number Generator

Low Discrepency Sequences or Quasi random numbers have low discrepancy property, which means they fill the Euclidean space uniformly with equidistribution. But they are not random in nature. Applying suitable methods of shuffling can result in equidistribution and randomness. Effectively this results in variance reduction.

First using randtoolbox package:

sobol(10, dim = 1, init=TRUE)

##  [1] 0.5000 0.7500 0.2500 0.3750 0.8750 0.6250 0.1250 0.1875 0.6875 0.9375

Next, using qrng package:

sobol(10, d = 1, randomize = TRUE)

##  [1] 0.129276011 0.629276011 0.879276011 0.379276011 0.254276011
##  [6] 0.754276011 0.504276011 0.004276011 0.066776011 0.566776011

True Random Number Generators (TRNG)

TRNGs produce true random numbers. They are usually based on external entropy or noise sources. The website random.org uses atmospheris noise to generate random numbers. Another source is the new CPU instruction rdrand in Intel microprocessors which uses an on-chip hardware random number generator. First, using random.org website:

randomNumbers(3, min=1, max=5, col=1) %>% as.vector()

## [1] 1 4 2

Next, using rdrand instruction:

library(Rrdrand) # loading the library masks the default RNG
hasRDRAND()

## [1] TRUE

RNGkind()

## [1] "user-supplied" "Inversion"

runif(10)

##  [1] 0.81441656 0.57147493 0.25714639 0.18172554 0.31314914 0.02383602
##  [7] 0.15334052 0.63977092 0.57000914 0.62771103

Parallel / Distributed Random Number Generation

For massive simulation and other similar applications, distributed PRNGs are required. The challenge is to minimize inter-stream and intra-stream correlation between different PRNGs running in parallel. This needs special care in the choice of PRNG and seeding. In R packages like rstream, doRNg, sitmo, harvestr can be used. External tools like JAGS can also be used, though it is not intended for this purpose. A few examples -

First, using doRNG package:

set.seed(123)
foreach(i=1:5) %dorng% {runif(3)} %>% unlist %>% matrix(nrow=5, byrow=TRUE)

##             [,1]      [,2]      [,3]
## [1,] 0.173157076 0.3413007 0.1556812
## [2,] 0.075833824 0.3474639 0.2967710
## [3,] 0.545505104 0.7832514 0.2321153
## [4,] 0.008997922 0.4626108 0.1891243
## [5,] 0.604493934 0.3919720 0.4686865

Next, using harvestr package:

seeds <- gather(2, seed=123)
farm(seeds, {runif(7)}) %>% unlist %>% matrix(nrow=7, byrow=TRUE)

##           [,1]      [,2]
## [1,] 0.3411064 0.9712727
## [2,] 0.8135272 0.1827527
## [3,] 0.1934177 0.6406949
## [4,] 0.9284401 0.3123993
## [5,] 0.9870978 0.6675080
## [6,] 0.7083334 0.4872546
## [7,] 0.2387469 0.5583171

Finally using JAGS and its multicore capabilities (set n.chains = number of cores):

m = "model {
  x ~ dunif(0,1)
}"

jgs <- jags.model(file = textConnection(m), n.adapt = 1000, n.chains = 4)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 0
##    Unobserved stochastic nodes: 1
##    Total graph size: 3
## 
## Initializing model

update(jgs)

out <- jags.samples(jgs, c('x'), 6)
out$x %>% matrix(nrow=6, byrow=TRUE)

##            [,1]      [,2]      [,3]       [,4]
## [1,] 0.07955224 0.5099825 0.2331965 0.32239254
## [2,] 0.98784894 0.7938703 0.2324782 0.09904735
## [3,] 0.71834714 0.2938549 0.8848708 0.45076995
## [4,] 0.89387818 0.9556509 0.8283376 0.66922644
## [5,] 0.05149423 0.1390441 0.9891570 0.35324055
## [6,] 0.04159588 0.2327182 0.6425525 0.66206344