Pobieranie prezentacji. Proszę czekać

Pobieranie prezentacji. Proszę czekać

Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2.

Podobne prezentacje


Prezentacja na temat: "Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2."— Zapis prezentacji:

1 Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2

2 Chi-square test To perform the test one should : 1. Calculate the mean value and the standard deviationa according equation : 2. Calculate value of chi-square statistics: Test of variation of the general population

3 Chi-square test 3. For level of confidence a and degree of freedom k = n - 1 we must find in tables of chi-square distribution the critical value that satisfies the equation: Inequality defines right-hand critical area. When during comparison of calculated value with the critical value we obtained that the null hypothesis should be rejected. Otherwise, there is no reason to reject the null hypothesis.

4 Chi-square test 11 independent measurements were made of cast pipe diameter and following results were obtained: 50.2, 50.4, 50.6, 50.5, 49.9, 50.0, 50.3, 50.1, 50.0, 49.6, 50.6 mm At confidence level α = 95% we should test the hypothesis that the variance of the obtained diameter of the pipes is equal to 0.04 mm Example

5 Chi-square test Mean value: Standard deviation: The hypothetical value of the variance Example

6 Chi-square test Chi-square statistics is equal: Critical value for degree of freedom equal to k=n-1=11-1=10: so the critical value read from the tables is less than calculated one, therefore, the null hypothesis should be rejected. Example

7 Chi-square test In order to verify the accuracy of the measuring instrument 6 measurements were made of the same quantity and the following values were obtained: 1.017, 1,021, 1.015, 1.019, 1.022, 1.019 The measurements are normally ditributed. At the confidence level α = 99% verify the hypothesis that the variance of the measurements is equal to 0.001 Exercise

8 Nonparametric test Nonparametric tests are decoupled from the distribution so the tested attribute may also be used in the case of arbitrary distribution, not necessarily close to normal. Nonparametric tests can be divided into two groups: tests of goodness of fit, allowing to test the hypothesis that the population has a certain type of distribution, tests of the hypothesis that two samples come from one population (ie, that the two populations have the same distribution).

9 Chi-square test of goodness of fit This is one of the oldest of statistical tests for confirming the hypothesis that the population has a certain distribution type (described in the form of a cumulative distribution function), and it may be either continuous or discrete distribution. The only limitation is that the sample must be large, containing at least tens of samples, because we have to share the results of some class values. These classes should not be too small a number to each of them should fall at least 8 results.

10 Chi-square test of goodness of fit The algorithm is as follows : 1. The results are divided into r disjoint classes of size n i, the size of the sample is equal to: Thus, we have the empirical distribution. 2. We formulate the null hypothesis that the tested population has a distribution with distribution function belonging to some set of distributions with a specific type of distribution function;

11 Chi-square test of goodness of fit 3. From the hypothetical distribution calculate for each of the r classess of the investigated quantity values of probability p i, that the random variable will take the value belonging to a class number i (i = 1,2,...,r); 4. We calculate the theoretical sizes np i, for class i, if the population has assumed distribution ;

12 Chi-square test of goodness of fit 5. From all the empirical ni and theoretical npi sizes we determine the value of chi-square statistics: which, assuming that the null hypothesis is true, has the chi-square distribution with r - 1 degrees of freedom or r - k - 1 degrees of freedom, where k is numeber of parameters estimated from the sample

13 Chi-square test of goodness of fit 6. From the tables of the chi-square distribution, for the selected level of confidence we must read the critical value to undergo a relation P( ) = 1 -. 7. We compare two values, and if the inequality null hypothesis should be rejected. In opposite case, when there is no reason to reject the null hypothesis, however, does not mean that we can accept it.

14 Chi-square test of goodness of fit In a physical experiment time of scintillation is measured. Number of mesurements n = 1000 and grouped set of results is as in Table. At the 99% confidence level to test the hypothesis that the time of occurrence of light effect which during these experiments was tested is normally distributed. From the content of the task does not arise hypothetical distribution parameters. Our null hypothesis will be: F(x) where is all normal distribution function class Example Two parameters of the distribution, the average value m and the standard deviation, we estimate from the sample using the estimators m = 0.67 i s = 0.30. Further results are in a Table, where F(u i ) is the value of the normal distribution function N(0,1) at point u i = (x i -m) / s which is the standardized value of the right end of the range of the class. Degree of freedom k = 7 - 2 - 1 = 4, since based on the random sample were calculated two parameters: the mean and standard deviation. From the tables from of the chi-square, with the level of significance 0.01, we find the critical value χ 2 = 13,277. Critical value is less than the calculated statistics equal to 73,52, Thus the hypothesis of normality should be rejected.

15 Chi-square test of goodness of fit During experiment n = 200 mesurements was conducted and grouped set of results is as in Table. At the 95% confidence level to test the hypothesis that the results of measurements are under uniform distribution. Example Mean of the classnini 45.2523 45.7519 46.2525 46,7518 47.2517 47.7524 48.2516 48.7522 49.2520 49.7516

16 Kolmogorov test of goodness of fit In Kolmogorov λ test of goodness of fit, for veryfication of hypothesis that population has specified distribution. Not be processed, as in the chi-square test, the size of empirical series and compares with the size of hypothetical series, but during thata test the empirical distribution function is compared to the hypothetical one. In fact, when the population distribution is consistent with the hypothesis the value of empirical and hypothetical distribution should be similar in all examined points. The test starts with the analysis of the differences between the two distribution functions, the largest of which will be used then for the construction of lambda statistics whose distribution does not depend on the form of a hypothetical distribution. This distribution determines the critical value for this test. If the maximum difference at some point in the area of the characteristic variability is too high, the hypothesis that the distribution of the population has the cumulative distribution as we suspect, it should be rejected. The us this test is limited, however, because the distribution hypothetical must be continuous, in principle, we should also know the parameters of this distribution, but in the case of large samples can be estimated from the sample.

17 Kolmogorov test of goodness of fit Stosowanie tego testu jest jednak ograniczone, dystrybuanta hipotetyczna musi bowiem być ciągła, w zasadzie powinniśmy też znać parametry tego rozkładu, jednak w przypadku dużych prób możemy je szacować na podstawie próby. Sposób postępowania w teście Kołmogorowa jest następujący: 1. porządkujemy wyniki w kolejności rosnącej lub grupujemy je w stosunkowo wąskie przedziały, o prawych końcach x i i odpowiadających im liczebnościach n i ; 2. wyznaczamy dla każdego x i wartość empirycznej dystrybuanty F n (x) korzystając ze wzoru: 3. z rozkładu hipotetycznego wyznaczamy dla każdego x i wartość teoretycznej dystrybuanty F(x); 4. dla każdego x i obliczamy wartość bezwzględną różnicy F n (x)-F(x); 5. obliczamy wartość statystyki D = sup|F n (x)-F(x)| oraz wartość statystyki: która, przy prawdziwości hipotezy zerowej, powinna mieć rozkład Kołmogorowa. 6. dla ustalonego poziomu ufności odczytujemy z granicznego rozkładu Kołmogorowa wartość krytyczną spełniającą warunek P{ kr } = 1 -. Gdy kr hipotezę zerową należy odrzucić, w przeciwnym wypadku nie ma podstaw do odrzucenia hipotezy zerowej.

18 Kolmogorov test of goodness of fit Przebadano próbkę o liczebności n = 1000, a wyniki, pogrupowane w 10 wąskich klasach, zawarto w tabeli. Naszym zadaniem jest wysunąć sensowną hipotezę zerową dotyczącą rozkładu i zweryfikować ją na poziomie ufności 95%. Example Rozkład liczebności jest zbliżony do symetrycznego, maksimum ma w jednej ze środkowych klas, co nasuwa hipotezę, że rozkład badanej cechy jest rozkładem normalnym N(m, ). Jeśliby w wysuniętej hipotezie przyjąć m = 65, to w przedziale, a więc o długości 4, mieściłoby się 1000-(25+19) = 956 wyników, co stanowi 95.6%. Z własności rozkładu normalnego wiemy, że prawdopodobieństwo przyjęcia wartości z przedziału o końcach u-1.96 i u+1.96 wynosi 95%, więc dla próby o liczebności 1000 w przedziale tym powinno się znaleźć 950 wyników, a więc niewiele mniej niż 956. Długość przedziału wynosi 3.92, co odpowiada w zadaniu wartości 4, zatem sensowną hipotezą wydaje się być = 1, czyli nasza hipoteza zerowa H 0 : N(65,1). W trzeciej kolumnie umieszczone są wartości dystrybuanty obliczone wg wzoru: W czwartej kolumnie umieszczamy standaryzowane prawe końce klas (x - m)/, w piątej kolumnie odczytane z tablicy wartości dystrybuanty F(xi) rozkładu N(0, 1), a w ostatniej wartości bezwzględne różnic między dystrybuantami, z których największa jest d 4 = 0,0280. Następnie obliczamy sqrt(n) d n = sqrt(1000) 0,0280 = 0,886. Dla poziomu ufności 0,95 odczytujemy z tablic rozkładu Kołmogorowa wartość krytyczną kr = 1,354. Jest ona większa od wartości obliczonej, zatem wyniki próby nie przeczą hipotezie zerowej, że rozkład populacji generalnej jest rozkładem normalnym N(65, 1).

19 To be continued … !


Pobierz ppt "Fundamentals of Data Analysis Lecture 5 Testing of statistical hypotheses pt.2."

Podobne prezentacje


Reklamy Google