exp meth2

50 %
50 %
Information about exp meth2
Entertainment

Published on November 5, 2007

Author: Julie

Source: authorstream.com

Experimental Methodology*:  Experimental Methodology* COSC 4550/5550 Prof. D. Spears * Part of this lecture is from Modern Elementary Statistics by John Freund. The lecture omits many of the mathematical derivations. Students who are interested in seeing these motivating proofs should consult Mathematical Statistics by John Freund. By now we’ve learned about a lot of agents…:  By now we’ve learned about a lot of agents… Behavior based Physics based RL agents Planning agents Game Player Neural network Bayesian What’s a proper way to compare their performance ??? Comparing Agents/Algorithms:  Comparing Agents/Algorithms My agent is better than your agent! Dr. Spears (in a Halloween costume) student I don’t believe it. Your agent is stochastic and we both only ran for one trial! Spears: OK. Let’s run them for 5 trials and compare. Student: Done. When we take the two averages, your agent wins again. But I’m still not convinced, because 5 is so few that it could be a fluke of the random numbers that were generated. Spears: Fine. Let’s run them for 20 trials and compare averages. Student: Done. When we take the two averages, your agent wins again. But I am still not convinced. Even though your average is better, on some trials my agent is better. So it could still be a statistical fluke that your agent’s average is better. I propose that we call in an impartial judge to settle the dispute. Dr. Spears The Umpire:  The Umpire Dr. Spears, your student has the right idea, but even he is not strict enough. To prove that your agent is better, you not only need to take an average over lots of trials, but you also need to run a statistical significance test to show that the difference between the averages of the two agents’ performances is meaningful. Statistical significance is greater if the averages differ more and the variances are lower. Normally, a statistical significance test will tell you that with a percent confidence, e.g., 95% confidence, one agent/algorithm is “better” than another. Experimental methodology and statistics:  Experimental methodology and statistics To understand experimental methodology, one needs to understand some basic statistics. The foundation for statistics is probability. So you’re already well-prepared by now! Reminder: Requirements for the Term Project:  Reminder: Requirements for the Term Project Run your agent in the environment for 10 or more trials if there is any aspect of agent or environment that is non-deterministic. Do likewise for a baseline agent. Report the following: Mean (average) performance over the trials Number of trials over which this has been averaged The standard deviation or variance If you are doing learning, then you can perform the comparison(s) during learning or at the end of learning. Statistics:  Statistics What is statistics? Unlike probability, which is a purely mathematical technique, statistics has more of an empirical nature. It deals with populations and samples from the populations. Descriptive statistics originated with government censuses. It involves summarizing, describing, and presenting data in the form of tables and charts. Statistical inference is more popular these days, and it is what’s used in experimental methodology. Statistical inference is a method for drawing generalizations based on samples. These generalizations go beyond the data. They are a form of conjecture for predicting trends. They can be used to test hypotheses about the data. Statistical inference is also at the heart of much modern machine learning – because much of machine learning is inferring general conclusions from specific data. These conclusions are predictive. Review: Performance measure of an agent:  Review: Performance measure of an agent Performance measure: An external measure of how well the agent has done or is doing at the task. Performance Comparisons:  Performance Comparisons How do we compare the performance of two or more agents? Deterministic agents in a deterministic environment: Execute the agents in the environment once Compare their performance Agents and/or environments with non-determinism: Execute the agents in the environment multiple times Compare their performance. How? Statistical inference about the means of the performance. Expected Value:  Let X be a discrete-valued random variable, for which each x in the range of X has a value v(x). If x is a number, then v(x) is typically just x. Then the expected value of X, i.e, E[X] is defined by: If X is numerically-valued, then E[X] is also called the mean, and it is typically denoted by . When X has a small, finite set of numeric values, we can calculate the mean by simply taking the average of all numbers. Question: Why is an average equivalent to an expected value (i.e., with probabilities) in this case? Expected Value R E V I E W Estimation of the mean:  Estimation of the mean For statistical inferences about means, we first estimate the true mean of the population from sample data. The average of the sample is the estimate of the population mean, and this average is called the sample mean x. entire population sample Estimating mean performance of agents:  Estimating mean performance of agents For agents, a sample is obtained over time – by executing multiple trials/episodes. To get a reasonable sample, we execute the agent in the environment for at least 10 trials; less than 30 trials is considered a “small sample.” We then calculate the average performance over all trials. Recall that the Law of Large Numbers states that with many, many trials the sample average will approach the population mean. The Basics of Sampling and Sampling Errors:  The Basics of Sampling and Sampling Errors Errors From Using Samples as Estimates:  Errors From Using Samples as Estimates With probability, we have exact mathematical formulas to work with. With statistics (which is the application of probability in many real-world situations), we make inferences and conclusions using samples only. How can we determine the probability of (alternatively called the “confidence in”) our sampling error, i.e., the error that we will get when we use a sample instead of the full population? Sampling Distribution:  Sampling Distribution A sampling distribution is a distribution consisting of the probabilities of the means/averages of all samples of a given size. Example: Finite population consisting of the numbers 3, 5, 7, 9, and 11. μ = (3 + 5 + 7 + 9 + 11) / 5 = 7 s.d.=σ = ((3 – 7)2 + (5 – 7)2 + (7 – 7)2 + (9 – 7)2 + (11 – 7)2))/5 = 8 If we take random samples of size n=2 from this population, then there are = 10 distinct samples. The size 2 samples are: 3 and 5, 3 and 7, 3 and 9, 3 and 11, 5 and 7, 5 and 9, 5 and 11, 7 and 9, 7 and 11, and 9 and 11. The sample means are x1 = 4, x2 = 5, x3 = 6, x4 = 7, x5 = 6, x6 = 7, x7 = 8, x8 = 8, x9 = 9, x10 = 10. Sampling Distribution: Histogram:  Sampling Distribution: Histogram P O P U L A T I O N M E A N 4 5 6 7 8 9 10 x (sample mean) probability 1/10 1/10 2/10 2/10 2/10 1/10 1/10 sampling distribution P = < 0.1, 0.1, 0.2, 0.2, 0.2, 0.1, 0.1 > Question: What kind of distribution is this? Using the Sampling Distribution to Calculate the Sampling Error:  Using the Sampling Distribution to Calculate the Sampling Error 4 5 6 7 8 9 10 x probability 1/10 1/10 2/10 2/10 2/10 1/10 1/10 The probability is 0.2+0.2+0.2 = 0.6 that your sample mean will not differ from the population mean μ=7 by more than 1, i.e., a sampling error of 1. The probability is 0.8 that your sample mean will not differ from the population mean μ=7 by more than 2, i.e., a sampling error of 2. Suppose you don’t know the mean of the population, and you want to estimate it with the mean of one random sample of size 2. If I know this histogram and the population mean, then I can determine the probability of your sampling error. P O P U L A T I O N M E A N Sampling Distributions:  Sampling Distributions A problem with using sampling distributions to estimate our probability of error size is that these sampling distributions are often difficult or impossible to obtain. Instead, let’s assume we know the population mean and standard deviation, and we want to use them to estimate the probability of error size. This can actually be done with a simple table lookup! To do this, we need to revisit the Central Limit Theorem… Slide19:  R E V I E W Central Limit Theorem Restated:  Central Limit Theorem Restated Central Limit Theorem: For large, independently-taken samples (size 30 or greater), the sampling distribution of the mean can be approximated closely with a normal distribution. The Central Limit Theorem justifies the use of normal-curve methods for a wide range or problems when we have a large sample size. The Central Limit Theorem also implies that for large samples, we can use a normal distribution to approximate the true sampling distribution. If the population actually does have a normal distribution, then the sampling distribution is a normal distribution. We also want to standardize our distribution – so that we can assume that all samples are taken from the same sampling distribution. Therefore, we use a standard normal distribution to approximate the probability of sampling error. Statistics books have tables for the standard normal distribution. The Random Variable Z:  The Random Variable Z The standard normal distribution is a normal distribution with mean μ = 0 and standard deviation σ = 1. Z: A random variable having values approximating the standard normal distribution. We convert the statistics (mean, variance) for a sample to the standard normal distribution using: where n is the sample size, x is the mean of a random sample of size n from an infinite population with mean μ and standard deviation σ, z is a value of Z, and n is assumed to be large. Now, using Z, we can estimate our sampling errors. z = (x - μ ) / (σ / n ) Assumption Behind Using a Normal Distribution for Calculating Sampling Errors:  Assumption Behind Using a Normal Distribution for Calculating Sampling Errors Note that using a normal (Gaussian) curve, we assume a higher probability of getting smaller errors and a lower probability of getting larger errors. probability How To Use the Standard Normal Distribution:  How To Use the Standard Normal Distribution Often, we are interested in the area under a standard normal curve from 0 to some value z of Z, or the area under a normal curve for values greater than z, for some z. The former corresponds to the probability of the random variable Z having a value between 0 and z, and the latter corresponds to the probability of the random variable Z have a value greater than z. 0 z The total area under the curve is the sum of the probabilities of all possible outcomes of Z, which is 1! probability Standard Normal Curve Using Z to Calculate the Sampling Error:  Using Z to Calculate the Sampling Error Example: Based on the Central Limit Theorem, what is the probability that your sampling error (i.e., the difference between your sample mean and the population mean) will be less than 5, when you use the mean of a random sample of size n = 64 (large sample) to estimate the mean of an infinite population with μ = 0 and σ = 20? Solution: We want to find the area under the curve between: and μ μ + 5 (z = 2) μ - 5 (z = -2) We use a standard table for Z to get this probability… (z=0) Note: Previously we used the sampling distribution and population mean to estimate the sampling error. Now we use the sample size and mean, and the population mean and standard deviation. z = (x - μ ) / (σ / n ) A Portion of a Z Table:  A Portion of a Z Table Since the table entry corresponding to z = 2.000 is 0.4772, due to the symmetry of the Gaussian curve, the probability asked for is 0.4772 + 0.4772 = 0.9544. Therefore, we can state that with probability 0.9544 we are assured that the mean from a random sample of size n > 30 from a given population will differ from the true mean of the population by less than 2. This is the area under the curve between z=0 and z=2.0 We want to find the area under the curve between z = -2 and z = 2. More on Z:  More on Z Let zα denote the value of Z for which the standard normal curve area to its right is equal to α. The area α is equal to the probability that the value of Z is greater than or equal to zα. zα Area = α Example of Finding zα:  Example of Finding zα zα is the value of Z for which the standard normal-curve area to its right is equal to α. Example: Let’s find z0.05. Now z0.05 corresponds to an area α = 0.5000 – 0.0500 = 0.4500 in the table, because the table entry represents the area between 0 and z0.05. area = 0.5 in the entire right half z0.05=? A Portion of a Z Table:  A Portion of a Z Table Since the α table entry corresponding to z = 1.64 is 0.4495, and the α table entry corresponding to z = 1.65 is 0.4505, and the α value we are interested in is 0.4500, we need to interpolate. We estimate that z0.05 = 1.645. We conclude that the probability is 0.05 that the value z of Z is greater than or equal to 1.645. This is the area under the curve between z=0 and z=1.64 We want to find the value z of Z whose area (between 0 and z) is 0.4500. This is the area under the curve between z=0 and z=1.65 More Practice Calculating Area Under the Curve:  More Practice Calculating Area Under the Curve Let zα/2 denote the value of Z for which the standard normal curve area to its right is equal to α/2. The area α/2 is equal to the probability that Z is greater than or equal to zα/2. zα/2 Area = α/2 Area = 1- α -zα/2 We can conclude that with probability (1-α) the value of random variable Z will lie between -zα/2 and zα/2. If we want to find zα/2 then what will we look up in the table? Statistical Significance Tests:  Statistical Significance Tests The Contest: American Idol:  Umpire The Contest: American Idol Dr. Spears, your student wants to prove that Britney Spears is a better singer than you are. To prove this, he plans to design a suite of 500 songs that you will both have to sing in front of an audience. The audience will score your singing ability and Britney’s on each song. The performance of both singers will be averaged over all audience members and all 500 songs. I told him that if he wants to publish the results of this competition in Time Magazine, then in order to preserve his credibility he had better run a statistical significance test -- to show that Britney’s win is indeed not due to random chance. Dr. Spears (still wearing a Halloween costume) Student Testing Hypotheses: Statistical Significance:  Testing Hypotheses: Statistical Significance Suppose you want to test the hypothesis HA that the mean performance of agent A is better than that of agent B. You can compare their means (averages). If the mean of A’s performance is better than the mean of B’s performance, how do I know this isn’t just due to random chance? Use a statistical significance test. (statistically significant is not the same as significant) Here, we consider the large-sample z-test and the small-sample t-test. A Fuzzy Intuition Behind Statistical Significance Tests:  A Fuzzy Intuition Behind Statistical Significance Tests If we assume a large sample, then the Central Limit Theorem says we can use a normal distribution to estimate our sampling errors. With a normal distribution (and the standard random variable Z), large errors have a low probability and small errors have a high probability. If we hypothesize that agent A is performing better than agent B, we can test this hypothesis by estimating the mean performance of A and B using a sample size n of trials for each. We may conclude that A is indeed performing better than B if its sample mean is sufficiently greater than the sample mean of B’s performance. “Sufficiently greater” is measured based on a predetermined standard which accounts for the probability of sampling errors using a normal distribution. The values of n and the variance are also taken into account. If the performance of A is not sufficiently greater than that of B, based on the samples, then we conclude that A outperforming B on these samples is likely due to random chance. Otherwise, we conclude that the comparison is statistically significant. Note that the larger the number n of samples, and the lower the variance, the greater our confidence that it’s not due to chance. Large-Sample Z-Test:  Large-Sample Z-Test Form the null hypothesis H0, which is the opposite of the hypothesis you want to test (it’s easier to test the null hypothesis instead). If HA = A > B, then H0 = A <= B. Choose α based on the confidence with which you want to make conclusions. If you want to be 95% confident of your conclusion (if you reject H0 and accept HA), then set α = 0.05 = (100-0.95). Find zα based on table lookup in any statistics book. Calculate z = (xA – xB) / σA2 / nA + σB2 / nB, where nA and nB are the sizes of the samples (i.e., the number of trials over which you tested agent A and agent B, respectively). They should both be at least 30 to make this a large-sample test. This converts to standard form. Reject H0 and accept HA if z > zα ; else accept the null hypothesis or reserve judgment. Rejecting the null hypothesis means that the differences are statistically significant with a confidence of (1 - α). xA is the average of sample A, and σA is the standard deviation of sample A. Convert to the standard normal distribution Rejecting the Null Hypothesis:  Rejecting the Null Hypothesis Recall zα denotes the value of Z for which the standard normal curve area to its right is equal to α. zα Reject H0 The area under the curve to the right of zα is the probability that the hypothesis HA is true for statistically sound reasons, rather than due to chance. Therefore, you want to accept HA if z > zα (i.e., the difference between the means is sufficiently large and the variances are low enough) – because the probability is not in the chance region. Note that the greater the confidence we want in our conclusion (i.e., smaller α), the stricter this test becomes. For example, if α is 0.05, then we can conclude HA with a (1 – α) = 95% confidence (which is the same as 0.95 probability). Example of Applying the Large-Sample Z-Test:  Example of Applying the Large-Sample Z-Test Problem: The mean performance of agent A is higher than B at time t = 3. At t=3, nA = 120, xA = 62.7, σA = 2.50, nB = 150, xB =61.8, σB = 2.62. Is the claim that A is better than B at time 3 statistically significant? Solution: H0 = A <= B. HA = A > B. Choose α = 0.05. Find z0.05 equals 1.645, based on our previous calculations (see 7 slides back). Calculate z = (62.7 – 61.8) / (2.50)2 / 120 + (2.62)2 / 150 = 2.88 Reject H0 and accept HA because 2.88 > 1.645. We conclude that the fact that agent A performs better than agent B at time 3 is statistically significant with a 95% confidence level. Small-Sample T-Test:  Small-Sample T-Test For small samples (< 30 trials), we need to use the t-test. This uses the Student’s t-distribution (first published by a statistician under the pen name “Student”). This distribution is similar to the normal distribution. It uses a parameter called the number of degrees of freedom (d.o.f), which is proportional to the sample size. The more degrees of freedom, the more the t-distribution becomes like the normal distribution, and the more the small-sample test becomes like the large-sample test. Small-Sample T-Test:  Small-Sample T-Test Form the null hypothesis H0, which is the opposite of the hypothesis you want to test (it’s easier to test the null hypothesis instead). If HA = A > B, then H0 = A <= B. Choose α based on the confidence with which you want to make conclusions. If you want to be 95% confident of your conclusion (if you reject the null hypothesis), then set α = 0.05. Set d.of. = nA + nB – 2. The overall variance σ2 = ((nA – 1) σA2 + (nB – 1) σB2) / d.o.f. Find tα based on table lookup of α and d.o.f. in any statistics book. Calculate t = (xA – xB) / σ2 * (/1/nA + 1/nB), where nA and nB are then sizes of the samples (i.e., the number of trials over which you tested agent A and agent B, respectively). Reject the null hypothesis and accept HA if t > tα ; else accept the null hypothesis or reserve judgment. Qualifications !!!:  Qualifications !!! The Z-test and t-test only work well if the population distributions are normal or approximately normal, unless the sample size is very large. Should really use a non-parametric test if this doesn’t hold. Both tests assume equal variances for the performance of A and B. If variances are unequal, need to adjust the d.o.f. for the t-test based on the F-statistic. Both tests assume i.i.d. samples, e.g., each trial is independent, and the development of agent B was independent of the development of agent A. If repeated comparisons are being done on the same problem, some adjustments need to be made. Etc. etc. etc. Always check that you have the correct test for your assumptions! For an excellent guide to experimental evaluation methodology, see http://eksl-www.cs.umass.edu/eis or Empirical Method for Artificial Intelligence by Paul Cohen, 1995 (MIT Press) The Contest: American Idol:  Umpire The Contest: American Idol Dr. Spears, your student wants to prove that Britney Spears is a better singer than you are. To prove this, he plans to design a suite of 500 songs that you will both have to sing in front of an audience. The audience will score your singing ability and Britney’s on each song. The performance of both singers will be averaged over all audience members and all 500 songs. I told him that if he wants to publish the results of this competition in Time Magazine, then in order to preserve his credibility he had better run a statistical significance test -- to show that Britney’s win is indeed not due to random chance. Dr. Spears (still wearing a Halloween costume) Student Question: Is it reasonable to run a t-test? A Z-test? Nonparametric Tests:  Nonparametric Tests T-tests assume independent trials drawn from a normal (or nearly normal) distribution. What if these assumptions don’t hold? Nonparametric tests can be used when the distribution is unknown. Example: Wilcoxon rank-sum test Wilcoxon Rank Sum Test:  Wilcoxon Rank Sum Test Combine the two data sets (samples) into one, sorted set. However, keep a record (label) of which data set each data item originally came from, e.g., 0.16 0.24 0.30 0.92 1 2 2 2 Give each number a rank based on its order. (Ties from different samples are assigned the mean of the rank that they jointly occupy. Ranking tied numbers from the same sample is arbitrary.) Let HA be A = B and H0 is A= B, where A and B are the means of the two samples. (We could alternatively pick HA and H0 to be like in our Z-test or t-test, but this method is simpler.) Motivation for ranking: If there is a significant difference between the sample means, most of the lower ranks will belong to one of the two samples. Wilcoxon Rank Sum Test (Cont’d):  Wilcoxon Rank Sum Test (Cont’d) Let W1 be the sum of the ranks of the first sample and W2 be the sum of the ranks of the second sample. Suppose the sample sizes are n1 and n2. We could base our test on W1 and W2, but we instead base it on U (which is why this is sometimes called the U test) because U is easier to use for table construction: The distributions for U1 and U2 look Gaussian; the distribution for U looks like the left half of a Gaussian. Wilcoxon Rank Sum Test (Cont’d):  Wilcoxon Rank Sum Test (Cont’d) Choose α based on the confidence with which you want to make conclusions. If you want to be 95% confident of your conclusion (if you reject the null hypothesis), then set α = 0.05. Find Uα based on table lookup in a statistics book. Reject the null hypothesis and accept HA if U <= Uα ; else accept the null hypothesis or reserve judgment. Motivation: If there is a significant difference between the sample means, then most of the lower ranks will belong to one of the two samples, and that sum will be small. Practical Application of Statistical Significance Tests:  Practical Application of Statistical Significance Tests There is a lot of software available, such as S-Plus, that will automatically run statistical significance tests. The challenge is choosing the right test with the right parameters for your situation!

Add a comment

Related presentations

Related pages

Folie 1

Title: Folie 1 Author: Rainer Roth Last modified by: Rainer Roth Created Date: 9/12/2005 1:44:43 PM Document presentation format: Bildschirmpräsentation
Read more

Let's Promotion by LasagneXXL Metin2.DE ♛ Schweizer LP ...

... +50% mehr EXP +30% Schaden +20% ... crafty, x3Eulex3, Calypso-2, DeadBreakZz, Server, Root, x3Eulex3, Server, Meth2, The Tuner ...
Read more

Let's Promotion by LasagneXXL Metin2.DE ♛ M2P3rfect ...

... +50% mehr EXP +30% Schaden +20% ... crafty, x3Eulex3, Calypso-2, DeadBreakZz, Server, Root, x3Eulex3, Server, Meth2, The Tuner ...
Read more

www2.math.uu.se

... (4*x^5*exp(-x^3),0,2)=1.3293 ... ##### Method 2 - important sampling ##### ##### meth2-function(N){yy -rexp ...
Read more

The main technical function of research design is to control

(z.B. Exp.‐Gruppe: 1, KG: 0) methodenlehre ll – ALM und Mehrfaktorielle ANOVA ... Microsoft PowerPoint - meth2 Author: tschae Created Date:
Read more

TECHSAP : EXCEPTION HANDLING - blogspot.com

methods : meth2 raising cx_demo_constructor. endclass. ... raise exception type local_exp. CATCH local_exp. message 'caught' type 'I'. ENDTRY.
Read more

merge - R merging 2 tables based on column - Stack Overflow

R merging 2 tables based on column [duplicate] ... "Meg"), "Meth"=c(62, 62), "Exp"=c(1, 0.9), "Meth2"=c(54, 05), "Exp2"=c(-3.2, 3.9)) r merge data.frame.
Read more

ABAP Keyword Documentation

... ->meth2(...)->...->attr ... The instance attribute attr is used as an operand. The same applies for the methods as for the chained method ... bit_exp ...
Read more