vinaitheerthan


Tutorial on Introduction to biostatistics

Table of contents

Statistical Notes

 

  1. Handling missing values
    1. group missing and non missing and compare dependent variable to find out any significance difference between missing value group and non missing value group
    2. Mean Substitution method - Missing values can be substituted with the mean value of the variable
    3. Missing values can be estimated using regression method instead of just substituting with the mean value
    4. Hot deck imputation – Missing values can be substituted with similar value
    5. Missing values can be estimated based on full information maximum likelihood
    6. Multiple imputation
    7. List wise deletion – cases with missing values can be deleted if the sample is more
    8. Variable deletion – variable with more missing values can be deleted
  2. Checking for normality
    1. Diagrams

                                                               i.      Box plot

                                                             ii.      Histogram

                                                           iii.      Q-Q plot

    1. Tests

                                                               i.      Kolmogrov simirnov tes

                                                             ii.      Shapiro wilk test

                       iii.      Anderson Darling Test 

  1. Transformations to over come the issues of non normality
    1. Right Skewed (longer tail in the right)

                                                               i.      Log

                                                             ii.      Square root

                                                           iii.      Cube root

                                                           iv.      Inverse

    1. Left skewed (longer tail in the left)

                                                               i.      Squaring the variable

                                                             ii.      Cubing the variable

Apart from the above deleting outliers will also help us to bring data into normality
 

  1. Structural Equation modeling
    1. Structural equation models contain unobserved variables called latent variables or factors and observed variables or indicators
    2. Latent variables Influence observed variables
    3. Structural model represents theoretical relationship among set of latent variables
    4. Measurement model represents latent variables as a linear combination of observed variables
    5. Need examine covariance among observed variables to get less number of latent variables
    6. Multivariate normality assumed – skewness of the data to be checked 
  2. Principal component analysis
    1. Need to check for component with Eigen values greater than 1
    2. Need to check for component in Scree plot with value greater than 1
    3. Need to check the loadings of component on each variable and include variable with high factor loadings for that component
    4. KMO & bartlett’s  spherecity test for checking whether correlation and covariance matrix is identity matrix i.e correlation and covariance is 0
    5. Check for determinant of covariance matrix if it is low issue of multicolinearity exists (ex. <0.0001  or 0)

 

Tutorial on Introduction to biostatistics

Table of contents