- Excess kurtosis (forth moment, very big tails, due to extreme values away from the mean)
- Excess skewness (third moment, lopsided)
- Others: lognormal (a RV whose logarithm is normally-distributed), uniform, weibull, exponential…
- Histogram (largely depends on the bin size)
- Stem and leaf plots
- Box plot: symmetrical; Q1 and Q3 the same distance from the median; “whiskers” the same length
- PP plot(compare cumulative probability of empirical data to the “test” disribution): straight line
- QQ plot (compare quantiles of empirical data to the ideals): straight line
- Hypothesis testing:Shapiro–Wilk test, Anderson–Darling test
- How to deal with multivariate data?
- What is the distribution of the data?
- How to normalize the data: square root, quarter root, log
- Discrete Distributions
- Continuous Distributions
- Bivariate Distributions
- Distributions of Functions of Random Variables
- F distribution
Correlation, Partial correlation
- Samples and population
- Central limit thereom: With a sample size of at least 30, the distribution of sample means will be approximately normally distributed regardless of the shape of the population with a mean of and a standard deviation of . In practice we often do not know or . In those situations we estimate and with $latex \mu_x$ and respectively.
- Sampling distributions
- Large Number Law
weak law of large numbers: converge in probability
strong law of large numbers: converge almost sure
ESTIMATION/INFERENCE (what is the value of the parameter):
- Point Estimation (MLE and Method of Moments)
- Interval Estimation (how good the point estimation is)
“We can be 95% confident that the population mean falls between L and U.”
- Confidence interval for one mean 1.1(known ) z-interval
Probability statements are about random variables. The population mean μ is a constant, not a random variable. It makes no sense to make a probability statement about a constant that doesn’t change.
The length of a confidence interval
As the confidence level decreases, the length of the interval decreases. So, for this factor, we have a bit of a tradeoff! We want a high confidence level, but not so high as to produce such a wide interval as to be useless. That’s why 95% is the most common confidence level used.
1.2 Confidence interval for one mean (unknown ) t-interval
2. Confidence interval for two means
2.1 Two sample pooled t-interval (two means have same )
2.2 Welch’s t-interval when variances are not equal
2.3 Paired t-interval
The difference of the two means remove the twin effect???
4. Confidence interval for variances
4.1 One variance
4.2 Two variances
5. Confidence interval for proportions
5.1 One proportion
5.2 Two proportions
6. Sample sizes
6.1 Estimating a mean
6.2 Estimating a proportion for a large population
6.3 Estimating a proportion for a small, finite population
- Confidence interval relation with two-tailed proportion test
- Distribution-Free CIs for percentiles
- A confidence band
HYPOTHESIS TESTING: is the value of the parameter θ such and such?
(1) We’ll make an initial assumption about the population parameter (null hypothesis).
(2) We’ll collect evidence or else use somebody else’s evidence (in either case, our evidence will come in the form of data).
(3) Based on the available evidence (data), we’ll decide whether to “reject” or “not reject” our initial assumption.
- Tests about proportions (z-score) , one-tailed test
- state H0 and H1
- calculate the test-statistics, which is distributed as a standard normal distribution.
- if using critical value method: determine the critical/rejection region ( “size” of the critical region is 0.05 if the significance level of the test is 0.05)
- make a decision: if the test statistic lies in the rejection region, we reject null hypothesis, because under the null hypothesis, our observation/sample is too extreme to be observed
- if using p value approach: It is the smallest α−level that would lead to rejection. The p-value is the probability of an observed (or more extreme) result under H0.
- make a decision: if p-value < , then reject H0. The smaller the p-value, the larger the significance because it tells the investigator that the hypothesis under consideration may not adequately explain the observation.
- errors: Type 1 (false positive) = = = significance level of the test
- two-tailed test: if using p value approach, times the p value by 2; if using rejection region approach, divided by 2.
- Comparing two proportions
- Tests about one mean
- Z-test: When population variance is known, population mean is unknown (unrealistic)
- one sample T-test: When population variance is unknown, population mean is unknowndegree of freedom: n-1; a 95% confidence interval for the mean μ is: , we can be 95% confident that the mean is in this interval.
two sample t-test (independent)
we can “remove” the dependence between X and Y by subtracting the two measurements Xi and Yi for each pair of twins i, that is, by considering the independent measurements d = X-Y.
- Test of the equality of two means (independent, unpaired)
- When population variances are equal, pooled two-sample t-test
The test statistic follows a tn+m−2 distribution:
Sp2, the pooled sample variance, is an unbiased estimator of the common variance σ2.
2. When population variances are not equal, Welch’s t-test
The test statistic follows a tr distribution. If r doesn’t equal an integer, as it usually doesn’t, then we take the integer portion of r. That is, we use [r] if necessary.
- Tests for variances
- One variance
The test statistic follows a chi-square distribution with n−1 degrees of freedom.
Test statistic follows an F distribution with n−1 numerator degrees of freedom and m−1 denominator degrees of freedom.
Tests Concerning Regression and Correlation
- Test for slope
- Tests for correlation
If (Xi, Yi) follows a bivariate normal distribution, then testing for the independence of X and Y is equivalent to testing whether the correlation coefficientequals 0. Testing assuming (Xi, Yi) follows a bivariate normal distribution.
A Type I error occurs if we reject the null hypothesis H0 (in favor of the alternative hypothesis HA) when the null hypothesis H0 is true. We denote α = P(Type I Error).
A Type II error occurs if we fail to reject the null hypothesis H0 when the alternative hypothesis HA is true. We denote β= P(Type II Error).
- Power = 1 – true positive (Type 2 error)
- Calculate sample size
- Likelihood ratio test (null and alternative hypotheses are composite)
every hypothesis test that we derived in the hypothesis testing section is a likelihood ratio test
- Best Critical Regions
- A/B test
- AIC/BIC score
ANOVA(Analysis of Variance)
group/treatment/factor: which feature, different groups stand for the different value of a feature
- One factor ANOVA: to use the analysis of variance method to compare the equality of the (unknown) means μ1, μ2, …, μm of m normal distributions with an unknown but common variance σ^2. If the assumption is broken?
- Normality. (1) transform your data using various algorithms so that the shape of your distributions become normally distributed or (2) choose the nonparametric Kruskal-Wallis H Test which does not require the assumption of normality.
- homogeneity of variances. (1) Welch or (2) Brown and Forsythe test.
- test with means more than one mean, variance between groups and variance within groups are close: H0, then means are same. if variance between groups is larger than that within groups, than means are not the same. There is only one factor/treatment affecting the data.
Post hoc tests (a posteriori tests): if we want to know which two means are different.
Why not use multiple t-tests: we need to do many t-tests instead of one ANOVA, and it is possible that we increase the error when we do multiple t-tests.
When testing these hypotheses, the important thing to remember is that we have to evaluate the significance of the interaction as our first step in looking at the output. If the interaction is significant, we can’t do much about interpreting the main effects.
- MANOVA (Multivariate Analysis of Variance)
data on p variables
- Chi-square Test (how “good” do the data “fit” the probability model, sample representative for the population)Q1 is distributed as Chi-square with one degree of freedom; the expected number of successes must be at least 5 (that is, np1 ≥ 5) and the expected number of failures must be at least 5 (that is, n(1−p1) ≥ 5), because we use central limit theorem. Extension to K categories:
- Contingency table
- Homegeneity (whether two or more multinomial distributions are equal)
If there are more than two samples, that is, if h > 2, the chi-square statistic follows an approximate chi-square distribution with h(k−1) − (k−1) = (h−1)(k − 1) degrees of freedom. #parameters???
2. Independence: testing the independence of two categorical variables
The sampling schemes???
(kh−1)−(h+k−2) = (h−1)(k − 1) degrees of freedom. #parameters???
The Wilcoxon Tests for a Median (no distribution assumption)
- Run test and test for randomness: distribution functions F(x) and G(y) of two continuous random variables X and Y, are equal.
Kolmogorov-Smirnov Goodness-of-Fit Test: how well a hypothesized distribution function F(x) fits an empirical distribution function Fn(x).
How to put sparsity on model:
- L1 penalty
- L2 penalty
- Laplace prior
- Factorized Laplace
- Cauchy prior
- Student-t prior
- Spike and slab prior
Conjugate Prior: what is conjugate prior, conjugate prior table