使用Kullback-Leibler散度计算泊松（n）样本与极限分布P∞之间差异的度量

MXB261 – Modelling and Simulation Science Assignment 1 – Problem Solving Task – 15% of final grade Due: 23:59 on Monday of Week 6 (26 August 2019)

Part 1 – A Biased Random Walk – 8 marks

Suppose we have a 2D simulation domain, 99 × 99 units. One at a time, particles will start at a nominated position along the top row of the domain, and will follow a biased walk “under gravity” towards the bottom row. If the particle is to move West or East but the location is occupied, generate a new random sample until the particle can move. If the particle is to move South but the location is occupied (or the particle has reached the bottom row in the domain), the particle stops – this is the end of the particle’s walk. Then a new particle is initiated. You should assume the side boundaries are reflecting boundaries, so if a particle hits the side boundary it moves back in the direction it came from. The biased walk will allow a choice of direction at each step of either South, West or East, with probabilities s, w, e, respectively, that are described by four different cases:

(i) s = w = e = 1 3

(ii) s=1,w=1,e=1 263

(iii) s=2,w=1,e=1 366

(iv) s = 3 , w = 3 , e = 1 5 10 10

You will simulate the biased walks of N = 120 particles. When all particles have completed their walks, calculate the height of all the resulting growths building up, in each column, from the bottom row of the domain, and produce a histogram of the distributions of these heights. For each probability ‘case’ (i) – (iv) above, you are to compare the results for different numbers of start positions (P = 1, 2, 3, and 4); for 1 start position, all the particles will start in column 50 in the top row; for 2 start positions, half the particles will start at column 25 and the others at column 75; for 3 start positions, one third the particles will start at column 25, one third at column 50, and the remainder at column 75; for 4 start positions, one quarter of the particles will start at each of columns 20, 40, 60 and 80.

• Your code should be in MATLAB, and should accept input parameters N (the number of particles), P (the number of start positions), and s, w, e (the probabili- ties).

• Plot figures in MATLAB showing the distribution of heights for 1, 2, 3, and 4 start positions, for each of the probability cases (i) – (iv). There will be 4 figures in total, with the cases (i) – (iv) being displayed in a 2 × 2 grid of subplots within the figure, for each P.

1

• Complete the table showing the maximum height you get in each category:

• Write a paragraph to discuss your answers with reference to the number of start positions and the probabilities that were used.

Part 2 – Sampling from Experimental Data – 7 marks

• Read the file sample2019.txt into an array Data0; the file contains 500 samples of data from some experiments. You can use the MATLAB tool ImportData.

– Plot (on the left hand side of a 1 x 2 subplot) the probability distribution of Data0, using histogram in MATLAB.

– In MATLAB, construct the cumulative distribution function F (x) for Data0 and generate a new set DataNew of 500 random variables from this distribu- tion.

– Using histogram in MATLAB, plot the probability distribution of DataNew alongside the histogram of Data0.

– Use the Kullback-Leibler measure to compare the distributions of Data0 and DataNew.

– Discuss your results in a concise paragraph.

• Sample 5000 random numbers from Pi,n, i = 1, · · · , 5000, from the Poisson distri-

bution Poisson(n), for n = 10, 20, 50, 100, 200, 400.

– Normalise your Pi,n samples according to

Pi,n − n Zi,n= √n .

– For each n, plot a histogram of Zi,n using 14 bins.

– Why use 14 bins? Try some other numbers of bins for the histograms, and

discuss your results, using figures to substantiate your comments.

– What is the limiting distribution P∞ of the sequences of Poisson samples, as n increases?

2

Number of start positions

1

2

3

4

Max Height, case (i) Max Height, case (ii) Max Height, case (iii) Max Height, case (iv)

– Use Kullback-Leibler divergence to calculate a measure for the difference be- tween the Poisson(n) samples and the limiting distribution P∞. Complete the table, to show how the KL measure varies according to the parameter n. Discuss your results in a concise paragraph.

Submission: This assignment is to be submitted via the Blackboard site.

The submitted zip archive (YOURNAME assign1.zip) should contain

• The MATLAB source codes for implementing this assignment.

• Your report (in pdf format) containing the Figures, the Tables, and the Discussions.

Guide to the Marking Schedule

• Part 1: (total: 8 marks)

– 1 mark: Code well-structured and well-documented.

– 3 marks: Code functionality

∗ Input parameters are specified.

∗ Number of start positions P is coded correctly.

∗ Probabilies s, e, w (for 4 different cases) are coded correctly. ∗ Reflecting boundary is coded correctly.

– 4 marks: The paragraph discussion shows insight, accurate results, and correct interpretation of the results, with reference to the figures (that should include title, axes labels, and a legend) and table.

• Part 2: (total: 7 marks)

– 1 mark: Code well-structured and well-documented.

– 1 mark: Algorithm for constructing the CDF F(x) from the input data sample2019.txt is coded correctly; the histograms for both the initial data and the 500 newly-generated samples are well-presented.

– 2 marks: The Poisson samples are correctly generated and normalised. The limiting distribution P∞ is correctly identified.

– 2 marks: The Kullback-Leibler calculations are correct, and the discussions comparing distributions show insight.

– 1 mark: The discussion into the number of histogram bins shows insight and is confirmed by the figures.

3

n

10

20

50

100

200

400

KL measure from P∞