Random number generation is an important tool which finds application in computer science algorithms, statistics, simulation, numerical methods and so on. It is therefore very important to understand different approaches to random number generation and their limitations. In this article we will look at the basic pseudo random number generator, cryptographically secure pseudo random number generator, quasi random number generator and true random number generators. A related topic of parallel/distributed random number generation will also be addressed in this article.
These are completely deterministic random number generators. they are very fast, have very long periodicity and pass most statistical tests for randomness. They are suitable for most applications with the exception of cryptographics usecases. Once the seed is set, same sequence can be re-generated, which is useful for reproducible research. The built-in random number generation uses Mersenne Twister PRNG by default.
RNGkind()
## [1] "Mersenne-Twister" "Inversion"
set.seed(123)
sample(10)
## [1] 3 8 4 7 6 1 10 9 2 5
They are used in cryptographic applications where non-predictability is important. Which means observing the random numbers to deduce the state information (and hence the next random number) should be difficult. In R, packages like randaes
and openssl
support generation of cryptographically secure random numbers.
First using randaes
package:
RNGkind("user")
RNGkind()
## [1] "user-supplied" "Inversion"
set.seed(42)
runif(10)
## [1] 0.6568692 0.3456236 0.2441115 0.2374188 0.6248742 0.7974705 0.3529777
## [8] 0.3157614 0.4291010 0.5479084
Next using openssl
package:
# to produce integers between 0-255
rand_bytes(10) %>% as.numeric()
## [1] 250 123 1 205 247 67 106 92 51 240
Low Discrepency Sequences or Quasi random numbers have low discrepancy property, which means they fill the Euclidean space uniformly with equidistribution. But they are not random in nature. Applying suitable methods of shuffling can result in equidistribution and randomness. Effectively this results in variance reduction.
First using randtoolbox
package:
sobol(10, dim = 1, init=TRUE)
## [1] 0.5000 0.7500 0.2500 0.3750 0.8750 0.6250 0.1250 0.1875 0.6875 0.9375
Next, using qrng
package:
sobol(10, d = 1, randomize = TRUE)
## [1] 0.129276011 0.629276011 0.879276011 0.379276011 0.254276011
## [6] 0.754276011 0.504276011 0.004276011 0.066776011 0.566776011
TRNGs produce true random numbers. They are usually based on external entropy or noise sources. The website random.org uses atmospheris noise to generate random numbers. Another source is the new CPU instruction rdrand
in Intel microprocessors which uses an on-chip hardware random number generator. First, using random.org website:
randomNumbers(3, min=1, max=5, col=1) %>% as.vector()
## [1] 1 4 2
Next, using rdrand instruction:
library(Rrdrand) # loading the library masks the default RNG
hasRDRAND()
## [1] TRUE
RNGkind()
## [1] "user-supplied" "Inversion"
runif(10)
## [1] 0.81441656 0.57147493 0.25714639 0.18172554 0.31314914 0.02383602
## [7] 0.15334052 0.63977092 0.57000914 0.62771103
For massive simulation and other similar applications, distributed PRNGs are required. The challenge is to minimize inter-stream and intra-stream correlation between different PRNGs running in parallel. This needs special care in the choice of PRNG and seeding. In R packages like rstream
, doRNg
, sitmo
, harvestr
can be used. External tools like JAGS can also be used, though it is not intended for this purpose. A few examples -
First, using doRNG
package:
set.seed(123)
foreach(i=1:5) %dorng% {runif(3)} %>% unlist %>% matrix(nrow=5, byrow=TRUE)
## [,1] [,2] [,3]
## [1,] 0.173157076 0.3413007 0.1556812
## [2,] 0.075833824 0.3474639 0.2967710
## [3,] 0.545505104 0.7832514 0.2321153
## [4,] 0.008997922 0.4626108 0.1891243
## [5,] 0.604493934 0.3919720 0.4686865
Next, using harvestr
package:
seeds <- gather(2, seed=123)
farm(seeds, {runif(7)}) %>% unlist %>% matrix(nrow=7, byrow=TRUE)
## [,1] [,2]
## [1,] 0.3411064 0.9712727
## [2,] 0.8135272 0.1827527
## [3,] 0.1934177 0.6406949
## [4,] 0.9284401 0.3123993
## [5,] 0.9870978 0.6675080
## [6,] 0.7083334 0.4872546
## [7,] 0.2387469 0.5583171
Finally using JAGS and its multicore capabilities (set n.chains = number of cores
):
m = "model {
x ~ dunif(0,1)
}"
jgs <- jags.model(file = textConnection(m), n.adapt = 1000, n.chains = 4)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 0
## Unobserved stochastic nodes: 1
## Total graph size: 3
##
## Initializing model
update(jgs)
out <- jags.samples(jgs, c('x'), 6)
out$x %>% matrix(nrow=6, byrow=TRUE)
## [,1] [,2] [,3] [,4]
## [1,] 0.07955224 0.5099825 0.2331965 0.32239254
## [2,] 0.98784894 0.7938703 0.2324782 0.09904735
## [3,] 0.71834714 0.2938549 0.8848708 0.45076995
## [4,] 0.89387818 0.9556509 0.8283376 0.66922644
## [5,] 0.05149423 0.1390441 0.9891570 0.35324055
## [6,] 0.04159588 0.2327182 0.6425525 0.66206344