Generalized Linear Models (GLMs) and Categorical Data Analysis (CDA)

Course Topics

Generally speaking, there are two types of outcomes (i.e. response) in statistical analysis: continuous and categorical responses. Linear Models (LM) are one of the most commonly used statistical methods to analyze continuous outcomes. However, many studies in Engineering, Medical Study, Education, etc. involve categorical outcomes. In these cases, Generalized Linear Models (GLM) are a more appropriate choice for analysis.

This short course will introduce the concept, theory, and application of GLM. Moreover, we will discuss some techniques commonly used in categorical data analysis, such as contingency table analysis, measures of association, tests of independence, tests of symmetry. Class demonstrations will be conducted using three real-world data sets listed below. All analysis will be carried out in R (a free statistical software ) via the RStudio interface ( products/rstudio/download).

Example 1:
Researcher A is interested in how variables, including GRE, GPA and prestige of the undergraduate institution, affect admission status into graduate school. (Binary response)
Data set link: 

Example 2:
Researcher B wants to predict the number of awards that a newly admitted student will earn by looking at the type of program in which the student was enrolled (vocational, general or academic) and the score of their final math exam. (Count response)
Data set link: 

Example 3:
A Physicians’ Health Study Research Group at Harvard Medical School wants to study the relationship between aspirin use (Placebo/Aspirin) and heart attacks (Fatal Attack/Nonfatal Attack/No Attack).

Data are summarized in the table below:

 Mycardial Infarction
TreatmentFatal AttackNonfatal AttackNo Attack
Placebo1817110,845
Aspirin59910,933

 

[video:https://vimeo.com/133680176]

 from  on .