advertisement

ABSLec5

67 %
33 %
advertisement
Information about ABSLec5
News-Reports

Published on September 27, 2007

Author: Columbia

Source: authorstream.com

advertisement

Bayesian analysis of a one-parameter model:  Bayesian analysis of a one-parameter model I. The binomial distribution—uniform prior Integration tricks II. Posterior Interpretation III. Binomial distribution—beta prior Conjugate priors and sufficient statistics Review of the Bayesian Setup:  Review of the Bayesian Setup From the Bayesian perspective, there are known and unknown quantities. - The known quantity is the data, denoted D. - The unknown quantities are the parameters (e.g. mean, variance, missing data), denoted . To make inferences about the unknown quantities, we stipulate a joint probability function that describes how we believe these quantities behave in conjunction, p(,D). Using Bayes’ Rule, this joint probability function can be rearranged to make inference about : p(  | D ) = p(  ) p( D|  ) / p( D ) Review of the Bayesian Set-Up cont.:  Review of the Bayesian Set-Up cont. L(  | D ) is the likelihood function for  p()p(D| )d is the normalizing constant or the prior predictive distribution. It is the normalizing constant because it ensures that the posterior distribution of  integrates to one. It is the prior predictive distribution because it is not conditional on a previous observation of the data-generating process (prior) and because it is the distribution of an observable quantity (predictive). Review of the Bayesian Set-Up cont.:  Review of the Bayesian Set-Up cont. Why are we allowed to do this? Why might it not be as useful? Example: The Binomial Distribution:  Example: The Binomial Distribution Suppose X1, X2, …, Xn are independent random draws from the same Bernoulli distribution with parameter . Thus, Xi ~ Bernoulli(  ) for i  {1, ... , n} or equivalently, Y =  Xi ~ Binomial( , n) The joint distribution of Y and  is the product of the conditional distribution of Y and the prior distribution . What distribution might be a reasonable choice for the prior distribution of ? Why? Binomial Distribution cont.:  Binomial Distribution cont. If Y ~ Bin(, n), a reasonable prior distribution for p must be bounded between zero and one. One option is the uniform dist.  ~ Unif( 0, 1 ). As it happens, this is a proper posterior density function. How can you tell? Binomial Distribution cont.:  Binomial Distribution cont. Let Y ~ Bin(, n) and  ~ Unif( 0, 1 ). This is the normalization constant to transform y(1-)n-y into a beta distribution. You cannot just call the posterior a binomial distribution because you are conditioning on Y and  is a random variable, not the other way around. Application-The Cultural Consensus Model:  Application-The Cultural Consensus Model A researcher examined the level of consensus denoted  among n = 24 Guatemalan women about whether or not polio (as well as other diseases) was thought to be contagious. In this case, 17 women said polio was contagious. Let Xi = 1 if respondent i thought polio was contagious and Xi = 0 otherwise. Let i Xi = Y ~ Bin(,24) and let  ~ Unif(0,1) Based on the previous slide: p(|Y,n) ~ Beta(Y+1, n-Y+1). Substitute n = 24 and Y= 17 into the posterior distribution. Thus, p(|Y,n) = Beta(18,8) The Posterior Distribution:  The Posterior Distribution The posterior distribution summarizes all that we know after analyzing the data How do we interpret the posterior distribution: p(|Y,n) = Beta(18,8) One option is graphically… Posterior Summaries:  Posterior Summaries The full posterior contains too much information, especially in multi-parameter models. So, we use summary statistics (e.g. mean, var, HDR). 2 Methods for generating summary stats: 1) Analytical Solutions: use the well-known analytic solutions for the mean, variance, etc. of the various posterior distribution. 2) Numerical Solutions: use a random number generator to draw a large number of values from the posterior distribution, then compute summary stats from those random draws. Analytic Summaries of the Posterior:  Analytic Summaries of the Posterior Analytic summaries are based on standard results from probability theory (see the handout from Gill’s Text). Continuing our example, p(|Y,n) ~ Beta(18,8) Numerical Summaries of the Posterior:  Numerical Summaries of the Posterior To create numerical summaries from the posterior, you need a random number generator. To summarize p(|Y,n) ~ Beta(18,8) Draw a large number of random samples from a Beta(18,8) distribution Calculate the sample statistics from that set of random samples. Numerical Summaries of the Posterior:  Numerical Summaries of the Posterior S-Plus code (should work in R) for Beta(18,8) summary # “true” posterior plot (see before) x<-0:1000/1000 post<-dbeta(x,18,8) plot(x,post) # take 1000 draws from the posterior rands <- rbeta(1000,18,8) # create summaries of those draws) mean(rands) median(rands) var(rands) hist(rands,20) Mean()=.70 Median()=.70 Var()=.01 Highest [Posterior] Density Regions (also known as Bayesian confidence or credible intervals):  Highest [Posterior] Density Regions (also known as Bayesian confidence or credible intervals) Highest Density Regions (HDR’s) are intervals containing a specified posterior probability. The figure below plots the 95% highest posterior density region. Beta(18,8) 95% HDR [.51,.84] Identification of the HDR:  Identification of the HDR It is easiest to find the Highest Density Region numerically. In S-Plus, to find the 95% HDR # take 1000 draws from the posterior rands <- rbeta(1000,18,8) # sort the random from highest to lowest, then identify the thresholds for the 95% credible interval. Quantile(rands,c(.025,.975)) An alternative HDR:  An alternative HDR With asymmetric posterior distributions, it makes more sense to identify regions of equal heights, rather than of equal mass (I haven’t figured out a cute way to do this numerically). Beta(18,8) HDR Confidence Intervals vs. Bayesian Credible Intervals:  Confidence Intervals vs. Bayesian Credible Intervals Differing interpretations… The Bayesian credible interval is the probability given the data that a true value of  lies in the interval. Technically, P(Interval)|X)=Intervalp(  | X )d The frequentist -percent confidence interval is the region of the sampling distribution for  such that given the observed data one would expect (100-) percent of the future estimates of  to be outside that interval. Technically,  = 1-a to b g( u |  )du U is a dummy variable of integration for the estimated value of  These limits are functions of the data Confidence Intervals vs. Bayesian Credible Intervals:  Confidence Intervals vs. Bayesian Credible Intervals But often the results appear similar… If Bayesians use “non-informative priors” and there is a large number of observations, often several dozen will do, HDRs and frequentist confidence intervals will coincide numerically. We will talk more about this when we cover the great p-value debate, but this is only a coincidence. The interpretation of the two quantities is entirely different. Returning to the Binomial Distribution:  Returning to the Binomial Distribution If Y ~ Bin(n,), the uniform prior is just one of an infinite number of possible prior distributions. What other distributions could we use? A reasonable alternative to the unif(0,1) distribution is the beta distribution. Can you show that Beta(1,1) is a uniform(0,1) distribution? Prior Consequences Plots of 4 Different Beta Distributions:  Prior Consequences Plots of 4 Different Beta Distributions Beta(5,5) Beta(3,10) Beta(10,3) Beta(100,30) The Binomial Distribution with Beta Prior:  The Binomial Distribution with Beta Prior If Y ~ Bin(n,) and  ~ Beta(,), then: This is a very nasty looking integral. Rather than computing it directly, we shall use a standard trick in the Bayesian toolbox. 1) Find some multiplicative constant c such that f(y)*c = 1.  i.e. try to transform f(y) into a well-known pdf. 2) Multiply by c and c-1 3) Since c*f(y)=1, the original numerator multiplied by c-1 is the posterior distribution. The posterior predictive distribution:  The posterior predictive distribution This is the kernel of the beta distribution This is called a beta-binomial distribution The posterior of the binomial model with beta priors:  The posterior of the binomial model with beta priors This is a Beta(Y+, n-Y+) distribution. Beautifully, it worked out that the posterior distribution is a form of the prior distribution updated by the new data. In general, when this occurs we say the prior is conjugate. Slide24:  Continuing the earlier example, if 17 of 24 women say polio is contagious (so Y=17 and n = 24, where Y is a binomial) and you use a Beta(5,5) prior, the posterior distribution is Beta(17+5,24-17+5) = Beta(22,12) Prior Posterior Posterior Mean = .65 Posterior Variance = .01 What is the mle for this likelihood?:  What is the mle for this likelihood? Have the students derive the maximum likelihood estimate to serve as a basis of comparison. Prior Consequences Plots of 4 Different Beta Distributions:  Prior Consequences Plots of 4 Different Beta Distributions Beta(5,5) Beta(3,10) Beta(10,3) Beta(100,30) Comparison of four different posterior distributions (in red) for the four different priors (black):  Comparison of four different posterior distributions (in red) for the four different priors (black) Prior: Beta(5,5) Post: Beta(22,12) Prior: Beta(10,3) Post: Beta(27,10) Prior: Beta(3,10) Post: Beta(20,17) Prior: Beta(100,30)Post: Beta(117,37) Slide28:  Summary Statistics of the Findings for different priors

Add a comment

Related presentations

Related pages

ABSLec5 - scribd.com

ABSLec5 - Download as Powerpoint Presentation (.ppt), PDF File (.pdf), Text File (.txt) or view presentation slides online.
Read more

ABSLec6 - Ace Recommendation Platform - 1

Bayesian Analysis of the Normal Distribution, Part IBayesian estimation of the mean of a normally distributed variable with known varianceIntroduction to ...
Read more

MB0024 - scribd.com

ABSLec5. Uniform Distribution. Weibull Distribution Example. june 13 s1. Princip Iose Stadi Stica. Os2 Tablet. Lecture 7. STPM (MATH M) 799ed864-31ef-4c0b ...
Read more

LeachJEDC - Ace Recommendation Platform - 1

ABSLec5 parameter model I. The binomial distributionuniform prior ...
Read more

544Outline-12 - Ace Recommendation Platform - 3

ABSLec5 a one- parameter model I. The binomial distributionuniform prior Integration tricks II. Posterior Interpretation III. Binomial distributionbeta ...
Read more

Lab1 - Ace Recommendation Platform - 1

Related Contents; ABSLec5 I. The binomial distributionuniform prior Integration tricks II. Posterior Interpretation III. Binomial distributionbeta prior ...
Read more

fhw_2011 - Ace Recommendation Platform - 1

Related Contents; ABSLec5 iform prior Integration tricks II. Posterior Interpretation III. Binomial distributionbeta prior Conjugate priors and sufficient ...
Read more

non parametric methods.docx - pt.scribd.com

Methods. Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which,
Read more

Lab1 - Ace Recommendation Platform - 1

Introduction to ProbabilityWARNING: Do not share this knowledge with any friends traveling to Las Vegas.The first lab in this course deals with probability.
Read more

wham_derivation - Ace Recommendation Platform - 1

The Weighted Histogram Analysis Method (WHAM)Michael AndrecJanuary 21, 20101 IntroductionThere are situations in which it is necessary to combine samples ...
Read more