Homework # 4 – due on Thursday October 23 by 5pm

 

 Problems students are to do -
    - students registered for the UG version (336 level) – do problems 1, 3, and 4(a)
    - students registered for the G version (436 level) – do problems 1, 2, 3, 4 (b), and 5

 

As always, list necessary assumptions, and include respective p-values in parentheses next to your conclusion – for example, one might conclude, "the data suggested a difference in the treatments (p = 0.0013)."

 

1.      The EAGLE dataset from problem 7.3 on pp. 236-7 of the Venables & Ripley 1999 text (originally from Knight and Skagen 1988) is entered and analyzed with this SAS program/output.  The data relate to the foraging behavior of wintering bald eagles in Washington State, and concern 160 attempts by one (pirating) Bald Eagle to steal a chum salmon from another (feeding) Bald Eagle.  As the program indicates, the variable "pirsize" represents the size of the pirating eagle, the variable "pirage" represents the age of the pirating eagle, and the variable "fesize" represents the size of the feeding eagle.

(a)    Identify the model being fit here, clearly defining all terms (e.g., what is a "success" here?) and parameters.

(b)   Using the "Proc Logistic", report on factors that explain the success of the pirating attempt, and give the prediction formula for the probability of success.

(c)    Again, using the "Proc Logistic", interpret the respective Odds Ratios.  If you were a feeding Bald Eagle in Washington State, using these results, comment on the conditions which would be favorable to your keeping your chum salmon from a pirating Bald Eagle.

 

2.      An experiment (originally from Svenson 1981, and reported in Venables and Ripley 1999 p.239 ex 7.7) was performed in Sweden in 1961-2 to assess the effect of speed limits on the motorway accident rate. The experiment was conducted on 92 days in each year, matched so that day "j" in 1962 was comparable to day "j" in 1961.  On some days the speed limit was in effect and enforced, whereas on other days there was no speed limit and cars tended to be driven faster.  The speed limit days tended to be in contiguous blocks.  The data is given and analyzed in this SAS program, with factors "year", "day" and "limit" and the response variable is the daily traffic accident count, "y". Fit Poisson log-linear models and summarize what you discover.  You might assume "day" occurs as a main effect only (i.e., no interactions with day), but assess if an interaction between "limit" and "year" is needed.  Summarize your findings using the given output, and be thorough in your summary. E.g., don't just state, for example, that the "limit" effect is significant, rather, go one step further and discuss for which level of "limit" ("no" or "yes") there were more traffic accidents, and what kind of policy impact this may entail.

 

3.      Samuels & Witmer (Statistics for the Life Sciences,1999:429) present an example in which 50 patients were randomized to receive either of pain medication A or B (in a balance manner, meaning 25 pts. in each group), and the response variable that was measured was Y = the response to the pain medication. For the levels of Y, the researchers simply recorded as 1 (for "None"), 2 (for "Some"), 3 (for "Substantial"), or 4 (for "Complete).  The counts are in this table below. The data are analyzed using both Proc Freq and Proc Logistic in this SAS program/output. Our goal here is to decide if the drugs do indeed differ in terms of pain relief, and, if so, how.

 

 

PAIN RELIEF

 

None (1)

Some (2)

Substantial (3)

Complete (4)

Drug A

3

7

10

5

Drug B

7

11

5

2

 

(a)   Examine the results from the "Freq" procedure and discuss your findings, bearing in mind all necessary assumptions.  For example, the MH test shows significance (p = 0.0282), but the Chi-square and FET tests show no significance - comment on the relevance of these tests and result here.

(b)   Examine the results from the "Logistic" procedure and discuss your findings.  Based on this output, do you feel the drugs differ in terms of pain relief?  Do all necessary assumptions seem to be met here including the proportional odds assumption? Write down the predictive formulas. Clearly interpret the odds ratios for this Logistic fit.


(Extra Credit: use the predictive formulas to give the predicted values in each one of the cells and compare the PO predicted values with the actual values and the expected values using the Freq chi-square method.)

 

4.      On p. 113, Collett (Modelling Binary Data, Chapman & Hall, 2nd ed, 2003) describes an insecticide toxicity study in which flour beetles, Tribolium castaneum, were sprayed with one of three different insecticides in solution in Shell oil P31. The three insecticides used were dichloro-diphenyltrichloroethane (DDT) at 2.0% w/v, g-benzene hexachloride (g-BHC) used at 1.5% w/v, and a mixture of the two. In the experiment, batches of about fifty insects were exposed to varying deposits of spray, measured in units of mg/10 cm2.  The resulting data on the proportion of insects killed after a period of six days are given in the Table below.  In modelling these data, the (natural) logarithm of the amount of deposit of insecticide is used as the explanatory variable in a linear logistic model, and the deposit levels were 2.00, 2.64, 3.48, 4.59, 6.06, and 8.00 (mg/10 cm2).

 

  Insecticide

2.00

2.64

3.48

4.59

6.06

8.00

DDT

3/50

5/49

19/47

19/38

24/49

35/50

g-BHC

2/50

14/49

20/50

27/50

41/50

40/50

DDT + g-BHC

28/50

37/50

46/50

48/50

48/50

50/50

 

 

(a)    Our goal here is to examine and compare only the "DDT" and the "DDT + g-BHC" treatments. Using this SAS program/output, write down the predictive (logistic) formula for the DDT group and the formula for the "DDT + g-BHC'' group, and give your point estimate of the respective LD50's (on the original scale!).  Based on the subsequent analysis in the SAS program, test whether the two treatments share a common "slope" (but with different "intercepts".). Do not use the "Wald Chi-square" test statistic for this test.  Next, assuming common slopes, test whether the intercepts differ.  Finally, check the residuals and comment on your findings.

(b)   Our goal here is to examine and compare all three treatments. Using this SAS program/output, write down the predictive (logistic) formula for the DDT group, the formula for the g-BHC group, and the formula for the "DDT + g-BHC'' group, and give your point estimate of the respective LD50's (on the original scale!).  Based on the subsequent analysis in the SAS program, test whether the three treatments share a common "slope" (but with different "intercepts".)  Next, test whether (in addition to the common slopes) the intercepts are the same.  Finally, using the model with common slopes but different intercepts, check the residuals and comment on your findings.

 

 

5.      Stokes, Davis & Koch (Categorical Data Analysis using the SAS System, 2nd Ed.) discuss a study on coronary artery disease, in which the response is "CA" = 1 if CA disease is present and  = 0 otherwise, and possible explanatory variables are "AGE" (treated as a continuous variable), SEX (which takes the value of 0 for females and 1 for males), and "ECG" (an ordinal variable, with values of 0, 1 and 2, where ECG = 0 is scored if the corresponding ST segment depression is less than 0.1, ECG = 1 is scored if the corresponding ST segment depression lies between 0.1 and 0.2, and ECG = 2 is scored if the corresponding ST segment depression is greater than 0.2).  A logistic model was fit to these data (predicting whether the disease is present) including all interaction terms, and since all interaction terms were non-significant, they were dropped.  The main-effects logistic regression is fit in this SAS program / output. Use the results to comment on the fit (e.g., lack of fit and residuals), and discuss all findings. Point out which effects are significant and interpret the parameter estimates and the odds ratios.  E.g., are males or females more likely to develop CAD?  Why?  Is the difference significant? Interpret the 3.882 OR for the gender factor.  Answer similar Qs for the other factors.  Finally, relate the table (from Proc Freq) to the Logistic output and discuss its relevance.