[Math Review] Statistics Basic: Estimation

 

Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:
  • Point Estimate: the value of sample statistics

[Math Review] Statistics Basic: Estimation

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

[Math Review] Statistics Basic: Estimation

95% confidence interval of average height with multiple samples (Source: Zhihu)

 

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

 [Math Review] Statistics Basic: Estimation

According to Central Limit Theorem, the the sampling distribution of the mean M is 

[Math Review] Statistics Basic: Estimation

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

[Math Review] Statistics Basic: Estimation

That is,

[Math Review] Statistics Basic: Estimation

Since we don't know the mean of population, we could use the sample mean [Math Review] Statistics Basic: Estimation instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

[Math Review] Statistics Basic: Estimation

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula 

[Math Review] Statistics Basic: Estimation

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom. 

Student's t-Distribution 

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

[Math Review] Statistics Basic: Estimation

is sample mean and

[Math Review] Statistics Basic: Estimation

is sample deviation.

[Math Review] Statistics Basic: Estimation 

is a random variable of normal distribution.

[Math Review] Statistics Basic: Estimation

is a random variable of student's t distribution.

The probability density function of T is

[Math Review] Statistics Basic: Estimation

where [Math Review] Statistics Basic: Estimation is the degree of freedom, [Math Review] Statistics Basic: Estimation is a gamma function.

 

上一篇:常见模块(五) random模块


下一篇:商业智能BI软件的价值有哪些