We will focus on a case in the Markov chain Monte Carlo (MCMC) domain. MCMC contains algorithms for sampling from a probability distribution. The sampling happens by a Markov chain, i.e. it is a random process that changes from one state to another. In this article, we will use the Metropolis-Hastings algorithm to generate a random walk with a proposal density. We will then either accept or reject the proposed moves according to a certain condition.

We will perform some experiments to study the behavior of the Metropolis-Hastings on a test suite of target distributions. The test suite consists in this case of a Gaussian and a uniform distribution in one dimension. The distributions are characterized by ther mean and variance and the target distribution is the mixture

The weights and measure the relative contribution of both Gaussian distributions. Without loss of generality, we can always assume that the first mean

As for a target distribution, we will us

For additional notes, I refer to the paper *An Introduction to MCMC for Machine Learnin*g by C. Andrieu, N. de Freitas, A. Doucet and M. Jordan.

We will use symmetric proposal distributions in the Metropolis-Hastings algorithm. In that case the acceptance probability reduces to

where and are respectively the current state and the proposed new state accepted with probability

The proposal distributions are the Gaussian

and the uniform distribution over the interval

Now, we can begin with the experiments. We will collect 5,000 samples after the chain has converged to the invariant distribution, i.e. after the burn-in period. We will then construct a histogram based on the collected samples and compared with the target distribution. For these experiments, we will use Java.

In order to be able to write code for sampling and constructing a histogram, we will use two Java libraries: The Commons Math library and the JFreeChart library.

First, we will create two arrays of size 5000. One array is for the generated numbers according to the Gaussian distribution and the other array will hold the generated numbers according to the uniform distribution.

double[] arrayG; double[] arrayU; arrayG = new double[5000]; arrayU = new double[5000];

After creating these arrays for the numbers of both distributions, we will now use the distribution classes in order to be able to do the sampling. In this example, the sigma value for the normal distribution will be 0.5, whereas the mean is 0.2. In the case of the uniform distribution, the sigma value is 0.0 and the mean is 100.

NormalDistribution nor = new NormalDistribution(0.5, 0.2); UniformRealDistribution uni = new UniformRealDistribution(0.0, 100);

For the sampling, we will use a for-lus for each distribution. We then create an array for each probability.

for(int i = 0; i < 5000; i++) { arrayG[i] = nor.sample(); } for(int i = 0; i < 5000; i++) { arrayU[i] = uni.sample(); } double[] probG; prob = new double[5000]; double[] probU; prob2 = new double[5000];

After the arrays for the probabilities are initialized, we will now calculate the probabilities by using the density function. For the accepted values using the Metropolis-Hastings algorithm, we will use two arraylists. This means that the accepted values will be added to these lists according to their distributions.

for(int j = 0; j < 5000; j++) { probG[j] = nor.density(arrayG[j]); probU[j] = uni.density(arrayU[j]); } List arraylistG = new ArrayList(); List arraylistU = new ArrayList(); arraylistG.add(arrayG[0]); for(int i = 0; i < 4999; i++) { double check = probG[i+1]/prob[i]; if(check >= 1) { arraylistG.add(arrayG[i+1]); } } for(int i = 0; i < 4999 ; i++) { double check = probU[i+1]/prob2[i]; if(check >= 1) { arraylistU.add(arrayU[i+1]); } }

So far, we sampled 5000 times from two distributions and added the accepted values from the Metropolis-Hastings algorithm to an arraylist. The only thing we need to do now is plotting our experiments in order to have a visual understanding of the behaviour of the samples within a certain range. As mentioned earlier, we used the JFreeChart to construct a histogram with our numbers. Since we worked with two distributions, we will therefore create two histograms. The code of these histograms is almost similar.

I wrote also some code to print the result in the console. We will use this code to construct our histograms, i.e. the `dataset`

class and its function `addSeries`

will use the print code to draw the results.

double[] printG; printG = new double[arraylistG.size()]; for(int i = 0; i < arraylistG.size(); i++) { printG[i] = (double)arraylistG.get(i); } double[] printU; printU = new double[arraylistU.size()]; for(int i = 0; i < arraylistU.size(); i++) { printU[i] = (double)arraylistU.get(i); }

Now, we are close to the end. We will use the code listed above to construct our histograms, both for the Gaussian distribution and the uniform distribution.

int number = 50; HistogramDataset dataset = new HistogramDataset(); dataset.setType(HistogramType.RELATIVE_FREQUENCY); dataset.addSeries("Histogram",printG,number); String plotTitle = "M-H algorithm for Gaussian distribution"; String xaxis = "number"; String yaxis = "value"; PlotOrientation orientation = PlotOrientation.VERTICAL; boolean show = false; boolean toolTips = false; boolean urls = false; JFreeChart chart = ChartFactory.createHistogram( plotTitle, xaxis, yaxis, dataset, orientation, show, toolTips, urls); int width = 500; int height = 300; try { ChartUtilities.saveChartAsPNG(new File("histogram.PNG"), chart, width, height); } catch (IOException e) {} HistogramDataset dataset2 = new HistogramDataset(); dataset2.setType(HistogramType.RELATIVE_FREQUENCY); dataset2.addSeries("Histogram",printU,number); String plotTitle2 = "M-H algorithm for uniform distribution"; String xaxis2 = "number"; String yaxis2 = "value"; PlotOrientation orientation2 = PlotOrientation.VERTICAL; boolean show2 = false; boolean toolTips2 = false; boolean urls2 = false; JFreeChart chart2 = ChartFactory.createHistogram( plotTitle2, xaxis2, yaxis2, dataset2, orientation2, show2, toolTips2, urls2); int width2 = 500; int height2 = 500; try { ChartUtilities.saveChartAsPNG(new File("histogram2.PNG"), chart2, width2, height2); } catch (IOException e) {}

Running the code will result in the creation of two histograms, one for each distribution we used in our experiments.