Information about Large Deviations: An Introduction

This is a talk on Large Deviations I gave as part of the PSS in the University of Bath.

Ladies and gentlemen… The whole purpose of these slides is to let you know what Large Deviations are. So, without further ado…

Go to the PSS talk, they said… It’s going to be fun, they said… It’s not as tough as it seems…

Let’s first do an example to set the mood… And the simplest examples always have to do with tossing coins! The fancy way of saying ‘tossing a coin’ used by probabilists is ‘let X be a Bernoulli random variable’. The probability mass function of a Bernoulli random variable is given by the function Or alternatively:

One coin? Seriously, one coin?... That’s right, you guessed correctly... Probabilists (as other common human beings) prefer loads of coins, rather than just one… So that we don’t get lost, we assign numbers to our coins from 1 to n, and call the result of the k-th coin X_k, mathematically: And assume coins are independent from each other.

We are interested in studying the average “random behaviour” (which probabilists call distribution) of throwing all coins. If we throw 3 coins, we can get 1 of 8 possible results: Which we encode as vectors of 0s and 1s, for example: 001 110 This simple strategy has saved probabilists from using actual coins for centuries! But we want to look at the average (which we call mean) 1/3 2/3 Number of heads Number of coins Mathematically: Average of n tosses Number of coins Sum of the tosses

It so happens that we can find an explicit formula for the number of heads thrown in n tosses (this is called a Binomial distribution with parameters n and p). Therefore: Stirling’s formula! And this is precisely the main idea: The mythical rate function

Let’s revisit the definition of a large deviation principle: There 3 things to consider: The upper bound The lower bound

Basically we need to know what a rate function is. We say it is a good rate function if these sets are not only closed, but also compact. p=0.5 p=0.25 p=0.75

Now see the upper and lower bounds: Imagine that B is both, a set open and close, then the inequalities look like: But the liminf is always less than the limsup, therefore

Comparing with what we got from the Bernoulli example: It’s basically the idea. The definition formally talks about the probability of a set rather than just a single element and then takes the infimum over the set of the rate function. As Frank Den Hollander puts it: Any large deviation is done in the least unlikely of all the unlikely ways!

Fortunately, we don’t need to find the linear component in n of the exponential form of the distribution function. Theory has developed some other techniques.

We changed a manipulation of the actual distribution of the random variable that depends on the whole sequence to an optimisation problem of a function that only depends on the distribution of one random variable. For our example: 1. Calculate the moment generating function (aka Laplace transform of the density distribution). 2. Find the Legendre transform of the logarithm of the Laplace transform (aka find the rate function)

And there’s more!

In our example, we found a LDP for the unbiased estimator of the mean. But what if we want the variance estimator? We can see that after some algebraic manipulations: So we can define our function like Since this is a linear function, the inverse is a function so we need no infimum. We find the rate function immediately:

And we can plot both rate functions… have you noticed the zeros? For the mean estimator For the unbiased variance estimator If the good rate function has a unique minimiser, then the sequence converges in probability to that minimiser.

Ok, cool… but…

It’s so happens that there actually exist problems that require to know the probability of rare events. Here are just a few… Risk management deals with the identification, assessment, and prioritisation of risks followed by an application of resources to control the impact of unfortunate events. Consider N obligors to which a portfolio is exposed. Let Y_k be the default indicator for the k-th obligor and c_k the lost resulting from the default of the k-th obligor. Then the total loss from defaults is And risk managers are interested in finding the tail probabilities. That is We can see how these sort of problems may arise in insurance and financial problems.

Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information. The setting is as follows: consider a finite alphabet {x_1,…,x_n} and let X=(X_1,…,X_N) be a word of N of those characters, each randomly chosen independently and identically distributed from a probability measure . In the word X, define the empirical frequency of character x: Then the vector of all these n frequencies is called the empirical measure, because it can be associated with a measure on the alphabet.

One of the key concepts in Information Theory is the relative entropy between two probability measures: The magic comes here. Sanov’s theorem says that: So this relative entropy, which quantifies the amount of information needed to describe given , is the rate function of a LDP for the empirical measure! Moreover, from Sanov’s theorem we can prove Cramér’s theorem (but that I leave as an exercise to the reader…)

Consider a collection of n particles that interact with one another through some forces. Assume that this particle system could be in any of a finite set of configurations in the system. As we increase the number of particles (n) and manipulate the scales appropriately, we can find the macroscopic behaviour given the microscopic behaviour via the thermodynamic limit. At this point a large deviation question should be almost natural to ask… It so happens that my research is precisely done on large deviations of hydrodynamic limits of a particular interacting particle system… But that is another story and shall be told another time. Hydrodynamic limit

Large Deviations Finally! I’ve been doing some serious probability and finally grasp the concept of LDP! Turns out one just have to find the exponential decay of the probabilities of unlikely events! Sounds tough, though. What if I have an LDP for a sequence but I need a transformation? Well, there are other techniques, like Cramér’s Theorem or the Contraction Principle. Interesting! How do you do that? Fascinating! So it not only works for coin tossing, right? Right! There are other applications too! Important questions arise from risk management and information theory. And why do you need to know this stuff? Me? Well… Actually I just thought it would be awesome to be the cool slide of a lame probability talk. Nothing fancy, really.

Go use these awesome fonts at www.cooltext.com Read more Dinosaur comics at http://www.qwantz.com And, of course, go read some more on this fascinating topic:

Abstract: The theory of large deviations deals with the probabilities of rare events (or fluctuations) that are exponentially small as a ...

Read more

2 S. R. S. VARADHAN 1. Introduction The theory of large deviations deals with rates at which probabilities of certain events decay as a natural parameter ...

Read more

An Introduction to Large Deviations for Teletraﬃc Engineers John T. Lewis 1Raymond Russell November 1, 1997 1 What Large Deviation Theory is About

Read more

Introduction to large deviations Peter M¨orters October 19, 2010 Abstract Large deviation theory deals with the decay of the probability of in-creasingly ...

Read more

In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions.

Read more

Chapter 9 RANDOM WALKS, LARGE DEVIATIONS, AND MARTINGALES 9.1 Introduction Deﬁnition 9.1.1. Let {X i; i 1} be a sequence of IID random variables (rv’s ...

Read more

The Theory of Large Deviations and Applications to Statistical Mechanics ... Lectures on the Theory of Large Deviations 3 Contents 1 Introduction 4

Read more

Introduction to large deviations principles 3 1 Introduction to large deviations principles 1.1 Motivation Large deviation theory is concerned with the ...

Read more

14 3. Large Deviations: An Introduction The following statement conﬁrms that the LDP is preserved under a continuous mapping, which is called the ...

Read more

## Add a comment