vinaitheerthan


Tutorial on Introduction to biostatistics

Table of contents

Inferential data analysis

As the researcher draws scientific conclusions from his study using only a sample instead of the whole population, he can justify his conclusion with help of statistical inference tools. The principal concepts involved in statistical inference are theory of estimation and hypothesis testing.

Theory of estimation

Point Estimation:

A single value is used to provide the best estimate of the parameter of interest.

Interval Estimation:

Interval estimates shows the estimate of the parameter and also give an idea of the confidence that the researcher has in that estimate. This leads us to consideration of confidence intervals.

Confidence interval (CI)

A confidence interval estimate of a parameter consists of an interval, along with a probability that the interval contains the unknown parameter.  The level of confidence in a confidence interval is a probability that represents the percentage of intervals that will contain the parameter if a large number of repeated samples are obtained.  The level of confidence is denoted (1 - a)*100%.

The narrower the width of the confidence interval, the lower is the error of the point estimate it contains. The sample size, sample variance and the level of confidence all affect the width of the confidence interval.

Confidence intervals can be computed for estimating single mean and proportions and also for comparing the difference between two means or proportions. Confidence interval is widely used to represent the main clinical outcomes instead of p values as it has many advantages over it (such as giving information about effect size, variability and possible range). The most commonly used confidence interval is the 95% CI. Increasingly, medical journals and publications require authors to calculate and report the 95% CI wherever appropriate since it gives a measure of the range of effect sizes possible – information that is of great relevance to clinicians. The term 95% CI means that it is the interval within which we can be 95% sure the true population value lies. Note that the remaining 5% of the time, the value may fall outside this interval. The estimate, which is the effect size observed in the particular study is the point at which the true value is most likely to fall, though it can theoretically occur at any point within the confidence interval (or even outside it, as just alluded to).

Example: 

A study is conducted to estimate the average glucose levels in patients admitted with diabetic ketoacidosis. Sample of 100 patients were selected and the mean was found to be 500 mg/dL with a 95% confidence interval of 320-780.  This means that there is a 95% chance that the true mean of all patients will lie between 320 and 780.

Hypothesis testing vs. Estimation

Similarity:  Both use sample data to infer something about a population

                   

Difference: Designed to answer different questions

Does a new drug lower cholesterol levels?

Measure cholesterol of 25 patients before drug & after - change in cholesterol is 15 mg/dL (225 before; 210 after)

Hypothesis test: Did the drug alter cholesterol levels?

Yes/no decision. Reject or fail to reject H0

Estimation: By how much did the drug alter cholesterol levels?

Hypothesis testing

Setting up the Hypotheses:

The basic concept used in hypothesis testing is that it is far easier to show that something is false than to prove that it is true.

a) Two mutually exclusive & competing hypotheses:

Let us consider a situation where we want to test if a new drug is having superior efficacy to one of the standard drugs prevailing in the market for the treatment of tuberculosis. We will have to construct a null hypothesis and alternative hypothesis for this experiment as below:           

         

1.  The “null” hypothesis (H0)

The null hypothesis indicates a neutral position (or the status quo in an interventional trial) in the given study or experiment. Typically the investigator hopes to prove this hypothesis wrong so that the alternate hypotheses (which encompasses the concept of interest to the investigator) can be accepted.

Example:

In the situation given above, though we actually want to prove the new drug to be effective, we should proceed with a neutral attitude while doing the experiment so our null hypothesis will be stated as follows:

Ho: There is no difference between the effect of new drug and standard drug in treating tuberculosis

2.  The “alternative” hypothesis (H1)

This is the hypothesis we believe or hope is true. 

Example: In the above situation if we want to prove the new drug is superior then our alternative hypothesis will be:

H1:  New drug’s effect is superior to that of the standard drug.

Based on the alternative hypothesis the test will become one-tailed test or two-tailed test. Two-tailed tests are when the researcher wants to test in both the direction for the population parameter specified in the null hypothesis (i.e. either greater or lesser). If he wants to test the parameter of the null hypothesis in only one direction greater or lesser it becomes a one-tailed test.

In the above example the researcher test framed the alternative hypothesis in only one direction (new drug is superior to the standard drug) so the test becomes a one tailed test.

b) Selecting a “significance level”: a

Significance level is the probability of rejecting the null hypothesis when it is actually true (Type I error). It is usually set at 5% i.e. a = .05 (5%)

c)      Calculate the test statistics and p value

 
Test statistics

Calculating the test statistics will depend on our null hypothesis. It may be testing a single mean or proportion or it may be comparing two means or proportions.

p-value

A p-value  gives the likelihood of the study effect, given that the null hypothesis is true. For example, a p-value of .03 means that, assuming that the treatment has no effect, and given the sample size, an effect as large as the observed effect would be seen in only 3% of studies.

In other words it gives the chance of observing a difference (effect) from the sample when the null hypothesis is true. For example, if get a p value of 0.02 then only a 2% chance is there for observing a difference (effect) from the sample if we assume the null hypothesis is true.

The p-value obtained in the study is evaluated against the significance level alpha. If alpha is set at .05, then a p-value of .05 or less is required to reject the null hypothesis and establish statistical significance.

d) Decision rule:        

We can reject H0 if the p-value <α.

Most statistical packages calculate the p-value for a 2-tailed test. If we are conducting a 1-tailed test we must divide the p-value by 2 before deciding if it is acceptable. (In SPSS output, the p-value is labeled “Sig (2-tailed).”)

 Table 1: Step by step guide to applying hypothesis testing in research

  1.  Formulate a research question

  2.  Formulate a research/alternative hypothesis

  3.  Formulate the null hypothesis

  4.  Collect data

  5.  Reference a sampling distribution of the particular statistic assuming that H0

is true (in the cases so far, a sampling distribution of the mean)

6.      Decide on a significance level (a), typically .05

7.      Compute the appropriate test statistic

8.      Calculate p value

9.      Reject H0 if the p value is less than the set level of significance otherwise accept H0

Hypothesis Testing for different Situations

Testing for Single mean – Large Samples: Z-test

Z-test for single mean is useful when we want to test a sample mean against the population mean when the sample is size is large (i.e. more than 30).

Example:

A researcher wants to test the statement that the mean level of dopamine is greater than 36 in individuals with schizophrenia. He collects a sample of 54 patients with schizophrenia.

The researcher can test the hypothesis using Z-test for testing single mean.

Testing for Two means – Large Samples: Z-test for comparing two means.

Z-test for comparing two means is useful when we want to compare two sample means when the sample is size is large (i.e. more than 30).

Example:

Past studies shows that Indian men have higher cholesterol levels than Indian women. A sample of 100 males and females were taken and their cholesterol level measured – males were found to have a mean cholesterol level of 188 mg/dL and females a mean level of 164 mg/dL. Is there sufficient evidence to conclude that the males are indeed having a higher cholesterol level?

Here we can test the hypothesis using Z-test for comparing two sample means.

Testing for Single mean – t-test.

The t-test for single mean is useful when we want to test a sample mean against the population mean when the sample is size is small (i.e. less than 30).

Example:

A researcher wants to test the statement that the mean age of diabetic patients in his district is greater than 60 years. He draws a sample of 25 persons.

Here we can test the hypothesis using t-test for single mean.

Independent Sample t-test for two means.

The t-test for comparing two means is appropriate when we want to compare two independent sample means when the sample is size is small (i.e. less than 30).

Example:

A study was conducted to compare males and females in terms of average years of education with a sample of 9 females and 13 males. It was inferred that males had an average of 17 years of formal education while females had 14. Can it be concluded that males are having a higher degree of education than females within this population?

Here we can test the hypothesis using t-test for comparing two sample means.

Paired t-test for two means.

Paired t-test is useful when we want to compare the two sample means when the two sample measurements are taken from the same subject under the study like pre and post measurements.

Example:

A study was conducted to compare the effect of a drug in treating hypertension by administering it to 20 patients. BP was recorded immediately before and one hour after the drug is given.  The question of interest - is the drug effective is reducing blood pressure?

A paired t-test can be used for hypothesis testing and comparing two paired sample means.

Testing for Single proportion: Binomial test for proportion

If we want to test a sample proportion against the population proportion we can use the

binomial test for single proportion.

Example:

A random sample of patients is recruited for a clinical study.  The researcher wants to establish that the proportion of female patients is not equal to 0.5.

The binomial test for proportion is the appropriate statistic method here.

Testing for Two proportion: Z-test for two proportions

If we want to compare two sample proportions we can use the Z-test for two proportions when the sample size is large (i.e. more than 30)

Example:

Two types of hypodermic needles, the old type and a new type, are used for giving injections. It is hoped that the new design will lead to less painful injections. The patients are allocated at random to two groups, one to receive the injections using a needle of the old type, the other to receive injections with needles of the new type.

Does the information support the belief that the proportion of patients having severe pain with injections using needles of the old type is greater than the proportion of patients with severe pain in the group getting injections using the new type?

Here we can test the hypothesis using Z-test for comparing two sample proportions.

Chi-square test (χ2)

 It is a statistical procedure used to analyze categorical data.

We will explore two different types of χ2 tests:

            1.  One categorical variable: Goodness-of-fit test

            2.  Two categorical variables: Contingency table analysis

One categorical variable: Goodness-of-fit test

A test for comparing observed frequencies with theoretically predicted frequencies.

Two categorical variables: Contingency table analysis

Defined: a statistical procedure to determine if the distribution of one categorical variable is contingent on a second categorical variable

Note:

If the expected frequencies in the cells are “too small,” the χ2 test may not be valid

A conservative rule is that you should have expected frequencies of at least 5 in all cells

Example

We want to test the association between cancer and smoking habit in 250 patients. The  chi-square would be an appropriate test.

Analysis of Variance (ANOVA)

When we want to compare more than two means we will have to use an analysis of variance test.                      

Example:

A researcher has assembled three groups of psychology students. He teaches the same topic to each group using three different educational methodologies. The researcher wishes to determine if the three modalities are giving equivalent results. He tests all the students and records the marks obtained.

An ANOVA analysis can be used to test the hypothesis.

Repeated Measures ANOVA

Repeated measures ANOVA is useful when we want to compare more than two sample means when the sample measurements are taken from the same subject enrolled in the study.

Example:

A trial was conducted to compare the effect of a drug in treating hypertension by administering it to 20 patients. BP was recorded immediately before and one, two and four hours after the drug is administered

Is the drug is effective is reducing blood pressure?

Repeated measures ANOVA would be the right way to get an answer.

Parametric Tests

The statistical hypothesis test such as z-test,t-test and ANOVA assumes the  distributions of the variables being assessed comes from a parametrized  probability distribution. The parameters usually used are the mean and standard deviation. For example, t-test assumes the variable comes from the normal population and analysis of variance assumes that the underlying distributions are normally distributed and that the variances are similar.

Parametric techniques are poweful  to detect differences or similarities than the non parametric tests 

Nonparametric/Distribution-free tests

Nonparametric tests: statistical tests that do not involve population parameters and  do not make assumptions about the shape of the population(s) from which sample(s) originate.

It is used in the following circumstances

1.  Useful when statistical assumptions have been violated

2.  Ideal for nominal (categorical) and ordinal (ranked) data

3.  Useful when sample sizes are small (as this is often when assumptions are violated)

What are the disadvantages of Nonparametric/Distribution-free tests?

1.  Tend to be less powerful than their parametric counterparts

2.  H0 & H1 not as precisely defined

There is a nonparametric/distribution-free counterpart to many parametric tests. 

· The Mann-Whitney U Test: The nonparametric counterpart of the independent samples t-test

· The Wilcoxon Signed Rank Test: The nonparametric counterpart of the related samples t-test

· The Kruskal-Wallis Test: The nonparametric counterpart of one-way ANOVA

· Kolmogorov-Smirnov Test : It is a non-parametric test and is used to test whether the distribution of the two data sets are same or not

· Run Test: Run is a series of similar values followed by a different value. Run test is used to test the runs randomly occurred in a data set or not


Table 2: Statistical tests at a glance

Type of variable in the study

Parameters to be tested

Number of variables

Sample size

Test

Ratio variables

Mean

One

>30

Z-test

Mean

Two

>30

Z-test

Mean

One

<30

t-test

Mean

Two

<30

Independent sample t-test

Mean (same subject)

Two

<30

Paired sample t-test

Proportion

One

Binomial

Proportion

Two

>30

z-test

Mean

More than two

>30

ANOVA

Mean(same subject

More than two

>30

Repeated measures ANOVA

Nominal/

Categorical variables

Association

Two or more

-

Chi-square

Ratio variables

Mean

Two

When normality assumption violated

Mann-Whitney test

Ratio variables

Mean (same subject)

Two

When normality assumption violated

Wilcoxon signed rank test

Ratio variables

Mean

Moe than Two

When normality assumption violated

Kruskal Wallis test


Tutorial on Introduction to biostatistics

Table of contents