[PDF] Simulation



Previous PDF Next PDF







From Algorithms to Z-Scores: Probabilistic and Statistical

was formerly a professor of statistics at that university He is a former database software developer in Silicon Valley, and has been a statistical consultant for rms such as the Kaiser Permanente Health Plan Dr Matlo was born in Los Angeles, and grew up in East Los Angeles and the San Gabriel Valley



Simulation

Computer Intensive Statistics STAT:7400, Spring 2019 Tierney Uniform Random Numbers The most basic distribution is the uniform distribution on [0;1] Ideally we would like to be able to obtain a sequence of independent draws from the uniform distribution on [0;1] Since we can only use finitely many digits, we can also work with



MODELING AND SIMULATIONS IN STATISTICS EDUCATION

prepared size N simulations obtained with computers As the teaching of statistical tools was reinforced and simulations were introduced, the Inter-IREM commission felt the need to clarify the status of probability as a part of teaching statistics and the role of probability in learning modeling, where probability is considered as a



What is Probability and Statistics and Why Should You Care?

Applications of Probability and Statistics Computer Science: I Machine Learning I Data Mining I Artificial Intelligence I Simulation I Image Processing I Computer Graphics I Visualization I Software Testing I Algorithms Electrical Engineering: I Signal Processing I Telecommunications I Information Theory I Control Theory I Instrumentation



Statistical foundations of machine learning

Probability is the language of stochastic modeling and statistical machine learning However, a variety of philosophical interpretations of the probability concept can exist Frequentist: statistical analysis must be based on the use of sample data evaluated through a frequency concept of probability Information comes



Multivariate quantile mapping bias correction: an N

either perfect prognosis or model output statistics (MOS) Abstract Most bias correction algorithms used in clima-tology, for example quantile mapping, are applied to uni-variate time series They neglect the dependence between different variables Those that are multivariate often correct only limited measures of joint dependence, such as Pearson



PETER W GLYNN - Stanford University

• Co-organizer of 2005 INFORMS Applied Probability Conference, Ottawa, Canada • Chair, INFORMS Applied Probability Society, 2004-2006 • Council Member, INFORMS Applied Probability Society, 2002-2010 • Member, International Scientific Advisory Board, Mathematics of Information Technology and Complex Systems, 2006-2012



Statistics - STAT

STAT 3010 STATISTICS FOR ENGINEERS AND SCIENTISTS (3) LEC 3 Pr MATH 1610 or MATH 1613 or MATH 1617 or MATH 1710 Introduction to statistical methods and analysis used in engineering and science STAT 3600/3603 PROBABILITY AND STATISTICS I (3) LEC 3 Pr MATH 1620 or MATH 1623 or MATH 1627 or MATH 1720

[PDF] Probabilités, simulation et algorithmique (pour TI)

[PDF] Algorithmes et programmation en Pascal TD corrigés - Limuniv-mrsfr

[PDF] Notes de cours / Algo et Python

[PDF] Algorithmique et Programmation Projet : algorithme de - DI ENS

[PDF] Score ASIA

[PDF] Un algorithme de simulation pour résoudre un problème de probabilité

[PDF] Algorithmique en classe de première avec AlgoBox - Xm1 Math

[PDF] Algorithme U prend la valeur [expression de la suite - Maths en ligne

[PDF] Rappels sur les suites - Algorithme - Lycée d 'Adultes

[PDF] Algorithme U prend la valeur [expression de la suite - Maths en ligne

[PDF] Algorithmique et Suites numériques Utiliser un algorithme avec les

[PDF] Les tableaux - Luc Brun

[PDF] Les tableaux 1 Exercice 1 - Lipn

[PDF] Les tableaux 1 Exercice 1 - Lipn

[PDF] Terminale S Exercices sur les suites Exercice 1 On consid`ere la

Simulation

Computer Simulation

Computer simulations are experiments performed on the computer using computer-generated random numbers.

Simulation is used to

-study the behavior of complex systems such as biological systems ecosystems engineering systems computer networks -compute values of otherwise intractable quantities such as integrals -maximize or minimize the value of a complicated function -study the behavior of statistical procedures -implement novel methods of statistical inference

Simulations need

-uniform random numbers -non-uniform random numbers -random vectors, stochastic processes, etc. -techniques to design good simulations -methods to analyze simulation results 1 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Uniform Random Numbers

The most basic distribution is the uniform distribution on[0;1] Ideally we would like to be able to obtain a sequence of independent draws from the uniform distribution on[0;1]. Since we can only use finitely many digits, we can also work with -A sequence of independent discrete uniform random numbers on f0;1;:::;M1gorf1;2;:::;Mgfor some largeM. -A sequence of independent random bits with equal probability for 0 and 1. Some methods are based on physical processes such as -nuclear decay http://www.fourmilab.ch/hotbits/ -atmospheric noise http://www.random.org/

The R package random provides an interface.

-air turbulence over disk drives or thermal noise in a semiconductor (Toshiba Random Master PCI device) -event timings in a computer (Linux/dev/random) 2 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Using/dev/randomfrom R

devRand <- file("/dev/random", open="rb")

U <- function()

(as.double(readBin(devRand, "integer"))+2ˆ31) / 2ˆ32 x <-numeric(1000) for (i in seq(along=x)) x[i] <- U() hist(x) y <- numeric(1000) for (i in seq(along=x)) y[i] <- U() plot(x,y) close(devRand)Histogram of x x

Frequency

0.00.20.40.60.81.0

020406080100120

0.00.20.40.60.81.0

0.00.20.40.60.81.0

x yIssues with Physical Generators can be very slow not reproducible except by storing all values distributionisusuallynotexactlyuniform; canbeoffbyenoughtomatter departures from independence may be large enough to matter mechanisms, defects, are hard to study can be improved by combining with other methods 3 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Pseudo-Random Numbers

Pseudo-random number generators produce a sequence of numbers that is not random easily reproducible "unpredictable;" "looks random" behaves in many respects like a sequence of independent draws from a (discretized) uniform[0;1]distribution fast to produce Pseudo-random generators come in various qualities

Simple generators

-easy to implement -run very fast -easy to study theoretically -usually have known, well understood flaws

More complex

-often based on combining simpler ones -somewhat slower but still very fast -sometimes possible to study theoretically, often not -guaranteed to have flaws; flaws may not be well understood (yet)

Cryptographic strength

https://www.schneier.com/fortuna.html -often much slower, more complex -thought to be of higher quality -may have legal complications -weak generators can enable exploits, a recent issue in iOS 7 We use mostly generators in the first two categories. 4 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

General Properties

Most pseudo-random number generators produce a sequence of integers x

1;x2;:::in the rangef0;1;:::;M1gfor someMusing a recursion of

the form x n=f(xn1;xn2;:::;xnk)

Valuesu1;u2;:::are then produced by

u i=g(xdi;xdi1;:::;xdid+1)

Common choices ofMare

-M=231orM=232 -M=2311, aMersenne prime -M=2 for bit generators

The valuekis theorderof the generator

The set of the most recentkvalues is thestateof the generator.

The initial statex1;:::;xkis called theseed.

Since there are only finitely many possible states, eventually these gen- erators will repeat. The length of a cycle is called theperiodof a generator.

The maximal possible period is on the order ofMk

Needs change:

-As computers get faster, larger, more complex simulations are run. -A generator with period 232used to be good enough. -A current computer can run through 232pseudo-random numbers in under one minute. -Most generators in current use have periods 264or more. -Parallel computation also raises new issues. 5 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Linear Congruential Generators

A linear congruential generator is of the form

x i= (axi1+c)modM with 0xiA multiplicative generator is of the form x i=axi1modM with 0Examples Lewis, Goodman, and Miller ("minimal standard" of Park and Miller): x i=16807xi1mod(2311) =75xi1mod(2311)

Reasonable properties, period 2

3122:15109is very short for mod-

ern computers.

RANDU:

x i=65538xi1mod 231

Period is only 2

29but that is the least of its problems:

u i+26ui+1+9ui=an integer so(ui;ui+1;ui+2)fall on 15 parallel planes. Using therandudata set and therglpackage: library(rgl) points3d(randu) par3d(FOV=1) ## removes perspective distortion

With a larger number of points:

seed <- as.double(1)

RANDU <- function() {

seed <<- ((2ˆ16 + 3) *seed) %% (2ˆ31) seed/(2ˆ31)

U <- matrix(replicate(10000

*3, RANDU()), ncol = 3, byrow = TRUE) clear3d() points3d(U) par3d(FOV=1)

PDP11 machines.

Some examples are available in

http://www.stat.uiowa.edu/

˜luke/classes/STAT7400/

examples/sim.Rmd 7 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Lattice Structure

All linear congruential sequences have alattice structure Methods are available for computing characteristics, such as maximal distance between adjacent parallel planes Values ofMandacan be chosen to achieve good lattice structure for c=0 orc=1; other values ofcare not particularly useful. 8 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Shift-Register Generators

Shift-register generators take the form

x i=a1xi1+a2xi2++apxipmod 2 for binary constantsa1;:::;ap. values in[0;1]are often constructed as u i=Lå s=12sxti+s=0:xit+1xit+2:::xit+L for sometandLt.tis thedecimation. The maximal possible period is 2p1 since all zeros must be excluded. The maximal period is achieved if and only if the polynomial z p+a1zp1++ap1z+ap is irreducible over the finite field of size 2. Theoretical analysis is based onk-distribution: A sequence ofMbit in- tegers with period 2 p1 isk-distributedif everyk-tuple of integers ap- pears 2 pkMtimes, except for the zero tuple, which appears one time fewer. Generators are available that have high periods and goodk-distribution properties. 9 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Lagged Fibonacci Generators

Lagged Fibonacci generators are of the form

x i= (xikxij)modM for some binary operator.

Knuth recommends

x i= (xi100xi37)mod 230 -There are some regularities if the full sequence is used; one recom- mendation is to generate in batches of 1009 and use only the first

100 in each batch.

-Initialization requires some care.

Combined Generators

Combining several generators may produce a new generator with better properties.

Combining generators can also fail miserably.

Theoretical properties are often hard to develop.

Wichmann-Hill generator:

x i=171xi1mod 30269 y i=172yi1mod 30307 z i=170zi1mod 30323 and u i=xi30269 +yi30307 +zi30232 mod 1

The period is around 10

12. This turns out to be equivalent to a multiplicative generator with modulus

M=27817185604309

10 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney Marsaglia"s Super-Duper used in S-PLUS and others combines a linear congruential and a feedback-shift generator.

Other Generators

Mersenne twister

Marsaglia multicarry

Parallel generators

-SPRNGhttp://sprng.cs.fsu.edu. -L"Ecuyer, Simard, Chen, and Kelton http: //www.iro.umontreal.ca/

˜lecuyer/myftp/streams00/

11 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Pseudo-Random Number Generators in R

R provides a number of different basic generators:

Wichmann-Hill:Period around 1012

Marsaglia-Multicarry:Period at least 1018

Super-Duper:Period around 1018for most seeds; similar to S-PLUS Mersenne-Twister:Period2199371106000; equidistributedin623dimen- sions; current default in R. Knuth-TAOCP:Version from second edition ofThe Art of Computer Pro- gramming, Vol. 2; period around 1038. Knuth-TAOCP-2002:From third edition; differs in initialization. L"Ecuyer-CMRG:A combined multiple-recursive generator from L"Ecuyer (1999). The period is around 2

191. This provides the basis for the multi-

ple streams used in packageparallel. user-supplied:Provides a mechanism for installing your own generator; used for parallel generation by rsprngpackage interface to SPRNG rlecuyerpackage interface to L"Ecuyer, Simard, Chen, and Kel- ton system rstreamspackage, another interface to L"Ecuyer et al. 12 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Testing Generators

All generators have flaws; some are known, some are not (yet). Tests need to look for flaws that are likely to be important in realistic statistical applications.

Theoretical tests look for

-bad lattice structure -lack ofk-distribution -other tractable properties Statistical tests look for simple simulations where pseudo-random num- ber streams produce results unreasonably far from known answers.

Some batteries of tests:

-DIEHARDhttp://stat.fsu.edu/pub/diehard/ dieharder.php rng/ 13 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Issues and Recommendations

Good choices of generators change with time and technology. -Faster computers need longer periods. -Parallel computers need different methods.

All generators are flawed

-Bad simulation results due to poor random number generators are very rare; coding errors in simulations are not. -Testing a generator on a "similar" problem with known answers is a good idea (and may be useful to make results more accurate). -Using multiple generators is a good idea; R makes this easy to do. -Be aware that some generators can produce uniforms equal to 0 or 1 (I believe R"s will not). -Avoid methods that are sensitive to low order bits 14 Computer Intensive Statistics STAT:7400, Spring 2019 Tierney0.00.20.40.6

0.00.20.40.6

frac(2^30 * x) frac(2^30 * y)

Mersenne-Twister ´230

0.00.20.40.60.81.0

0.00.20.40.60.81.0

frac(2^50 * x) frac(2^50 * y)

Wichmann-Hill U ´250mod 1

0.00.20.40.60.8

0.00.20.40.60.8

frac(2^50 * (1 - x)) frac(2^50 * (1 - y))

Wichmann-Hill (1 - U) ´250mod 115

Computer Intensive Statistics STAT:7400, Spring 2019 Tierney

Non-Uniform Random Variate Generation

Starting point: Assume we can generate a sequence of independent uni- form[0;1]random variables. Develop methods that generate general random variables from uniform ones.

Considerations:

-Simplicity, correctness -Accuracy, numerical issues -Speed Setup

Generation

General approaches:

-Univariate transformations -Multivariate transformations -Mixtures -Accept/Reject methodsquotesdbs_dbs5.pdfusesText_9