Homework # 4 – due on Thursday October 23 by 5pm
Problems students are
to do -
- students registered for the UG version (336 level) – do
problems 1, 3, and 4(a)
- students registered for the G version (436 level) – do
problems 1, 2, 3, 4 (b), and 5
As always, list necessary assumptions, and include respective p-values in parentheses next to your conclusion – for example, one might conclude, "the data suggested a difference in the treatments (p = 0.0013)."
1.
The EAGLE dataset from problem 7.3 on pp. 236-7 of the
Venables & Ripley 1999 text (originally from Knight and Skagen 1988) is
entered and analyzed with this SAS
program/output. The data relate to the foraging behavior of wintering
bald eagles in
(a) Identify the model being fit here, clearly defining all terms (e.g., what is a "success" here?) and parameters.
(b) Using the "Proc Logistic", report on factors that explain the success of the pirating attempt, and give the prediction formula for the probability of success.
(c) Again, using the "Proc Logistic", interpret the respective Odds Ratios. If you were a feeding Bald Eagle in Washington State, using these results, comment on the conditions which would be favorable to your keeping your chum salmon from a pirating Bald Eagle.
2.
An experiment (originally from
Svenson 1981, and reported in Venables and Ripley 1999 p.239 ex 7.7) was performed in
3. Samuels & Witmer (Statistics for the Life Sciences,1999:429) present an example in which 50 patients were randomized to receive either of pain medication A or B (in a balance manner, meaning 25 pts. in each group), and the response variable that was measured was Y = the response to the pain medication. For the levels of Y, the researchers simply recorded as 1 (for "None"), 2 (for "Some"), 3 (for "Substantial"), or 4 (for "Complete). The counts are in this table below. The data are analyzed using both Proc Freq and Proc Logistic in this SAS program/output. Our goal here is to decide if the drugs do indeed differ in terms of pain relief, and, if so, how.
|
PAIN RELIEF |
|||
|
None (1) |
Some (2) |
Substantial (3) |
Complete (4) |
Drug A |
3 |
7 |
10 |
5 |
Drug B |
7 |
11 |
5 |
2 |
(a) Examine
the results from the "Freq" procedure and discuss your findings,
bearing in mind all necessary assumptions. For
example, the MH test shows significance (p = 0.0282), but the Chi-square and FET tests show no significance - comment
on the relevance of these tests and
result here.
(b) Examine the results from the "Logistic" procedure and discuss your findings. Based on this output, do you feel the drugs differ in terms of pain relief? Do all necessary assumptions seem to be met here including the proportional odds assumption? Write down the predictive formulas. Clearly interpret the odds ratios for this Logistic fit.
(Extra Credit: use the predictive formulas to give the predicted
values in each one of the cells and compare the
4. On p. 113, Collett (Modelling Binary Data, Chapman & Hall, 2nd ed, 2003) describes an insecticide toxicity study in which flour beetles, Tribolium castaneum, were sprayed with one of three different insecticides in solution in Shell oil P31. The three insecticides used were dichloro-diphenyltrichloroethane (DDT) at 2.0% w/v, g-benzene hexachloride (g-BHC) used at 1.5% w/v, and a mixture of the two. In the experiment, batches of about fifty insects were exposed to varying deposits of spray, measured in units of mg/10 cm2. The resulting data on the proportion of insects killed after a period of six days are given in the Table below. In modelling these data, the (natural) logarithm of the amount of deposit of insecticide is used as the explanatory variable in a linear logistic model, and the deposit levels were 2.00, 2.64, 3.48, 4.59, 6.06, and 8.00 (mg/10 cm2).
Insecticide |
2.00 |
2.64 |
3.48 |
4.59 |
6.06 |
8.00 |
DDT |
3/50 |
5/49 |
19/47 |
19/38 |
24/49 |
35/50 |
g-BHC |
2/50 |
14/49 |
20/50 |
27/50 |
41/50 |
40/50 |
DDT + g-BHC |
28/50 |
37/50 |
46/50 |
48/50 |
48/50 |
50/50 |
(a) Our goal here is to examine and compare only the "DDT" and the "DDT + g-BHC" treatments. Using this SAS program/output, write down the predictive (logistic) formula for the DDT group and the formula for the "DDT + g-BHC'' group, and give your point estimate of the respective LD50's (on the original scale!). Based on the subsequent analysis in the SAS program, test whether the two treatments share a common "slope" (but with different "intercepts".). Do not use the "Wald Chi-square" test statistic for this test. Next, assuming common slopes, test whether the intercepts differ. Finally, check the residuals and comment on your findings.
(b) Our goal here is to examine and compare all three treatments. Using this SAS program/output, write down the predictive (logistic) formula for the DDT group, the formula for the g-BHC group, and the formula for the "DDT + g-BHC'' group, and give your point estimate of the respective LD50's (on the original scale!). Based on the subsequent analysis in the SAS program, test whether the three treatments share a common "slope" (but with different "intercepts".) Next, test whether (in addition to the common slopes) the intercepts are the same. Finally, using the model with common slopes but different intercepts, check the residuals and comment on your findings.
5. Stokes, Davis & Koch (Categorical Data Analysis using the SAS System, 2nd Ed.) discuss a study on coronary artery disease, in which the response is "CA" = 1 if CA disease is present and = 0 otherwise, and possible explanatory variables are "AGE" (treated as a continuous variable), SEX (which takes the value of 0 for females and 1 for males), and "ECG" (an ordinal variable, with values of 0, 1 and 2, where ECG = 0 is scored if the corresponding ST segment depression is less than 0.1, ECG = 1 is scored if the corresponding ST segment depression lies between 0.1 and 0.2, and ECG = 2 is scored if the corresponding ST segment depression is greater than 0.2). A logistic model was fit to these data (predicting whether the disease is present) including all interaction terms, and since all interaction terms were non-significant, they were dropped. The main-effects logistic regression is fit in this SAS program / output. Use the results to comment on the fit (e.g., lack of fit and residuals), and discuss all findings. Point out which effects are significant and interpret the parameter estimates and the odds ratios. E.g., are males or females more likely to develop CAD? Why? Is the difference significant? Interpret the 3.882 OR for the gender factor. Answer similar Qs for the other factors. Finally, relate the table (from Proc Freq) to the Logistic output and discuss its relevance.