hwk04

Homework # 4 - due Wednesday, 22^nd March 2006 at the start of class

Problems students are to do -
- students registered for the UG version (300 level), please do problems 1, 3, and 4(a) below
- students registered for the G version (400 level), please do problems 1, 2, 3, 4 (b), and 5 below

As always, list necessary assumptions, and include respective p-values in parentheses next to
your conclusion - e.g., one might conclude, "the data suggested a difference in the treatments
(p = 0.0013)."

1. The EAGLE dataset from problem 7.3 on pp. 236-7 of the Venables & Ripley 1999 text (originally
from Knight and Skagen 1988) is entered and analysed with this SAS program/output. The data relate
to the foraging behavior of wintering bald eagles in Washington State, and concern 160 attempts by one
(pirating) Bald Eagle to steal a chum salmon from another (feeding) Bald Eagle. As the program
indicates, the variable "pirsize" represents the size of the pirating eagle, the variable "pirage" represents
the age of the pirating eagle, and the variable "fesize" represents the size of the feeding eagle.
    (a) Identify the model being fit here, clearly defining all terms (e.g., what is a "success" here?)
          and parameters.
    (b) Using the "Proc Logistic", report on factors that explain the success of the pirating attempt, and
          give the prediction formula for the probability of success.
    (c) Again, using the "Proc Logistic", interpret the respective Odds Ratios. If you were a feeding
         Bald Eagle in Washington State, using these results, comment on the conditions which would be
         favorable to your keeping your chum salmon from a pirating Bald Eagle.

2. An experiment (originally from Svenson 1981, and reported in Venables and Ripley 1999 p.239
ex 7.7) was performed in Sweden in 1961-2 to assess the effect of speed limits on the motorway
accident rate. The experiment was conducted on 92 days in each year, matched so that day "j" in
1962 was comparable to day "j" in 1961. On some days the speed limit was in effect and enforced,
whereas on other days there was no speed limit and cars tended to be driven faster. The speed limit
days tended to be in contiguous blocks. The data is given and analyzed in this SAS program, with
factors "year", "day" and "limit" and the response variable is the daily traffic accident count, "y".
Fit Poisson log-linear models and summarize what you discover. You might assume "day" occurs
as a main effect only (i.e., no interactions with day), but assess if an interaction between "limit" and
"year" is needed. Summarize your findings using the given output, and be thorough in your summary.
E.g., don't just state, for example, that the "limit" effect is significant, rather, go one step further and
discuss for which level of "limit" ("no" or "yes") there were more traffic accidents, and what kind of
policy impact this may entail.

3. Samuels & Witmer (Statistics for the Life Sciences,1999:429) present an example in which
50 patients were randomised to receive either of pain medication A or B (in a balance manner,
meaning 25 pts. in each group), and the response variable that was measured was Y = the
response to the pain medication. For the levels of Y, the researchers simply recorded as
1 (for "None"), 2 (for "Some"), 3 (for "Substantial"), or 4 (for "Complete). The counts are in
this table below.

Pain Relief	Drug A	Drug B
None (1)	3	7
Some (2)	7	11
Substantial (3)	10	5
Complete (4)	5	2

The data are analysed using bothProc Freq and Proc Logistic in this SAS program/output. Our
goal here is to decide if the drugs do indeed differ in terms of pain relief, and, if so, how.

(a) Examine the results from the "Freq" procedure and discuss your findings, bearing in mind all
necessary assumptions. For example, the MH test shows significance (p = 0.0282), but the
Chi-square and FET tests show no significance - comment on the relevance of these tests
and result here.

(b) Examine the results from the "Logistic" procedure and discuss your findings. Based on this
output, do you feel the drugs differ in terms of pain relief? Do all necessary assumptions seem to
be met here including the proportional odds assumption? Write down the predictive formulas.
Clearly interpret the odds ratios for this Logistic fit.

(Extra Credit: use the predictive formulas to give the predicted values in each one of the cells and
compare the PO predicted values with the actual values and the expected values using the
Freq chi-square method.)

4. On p. 113, Collette (Modelling Binary Data, Chapman & Hall, 2nd ed, 2003) describes an
insecticide toxicity study in which flour beetles, Tribolium castaneum, were sprayed with one of
three different insecticides in solution in Shell oil P31. The three insecticides used were dichloro-
diphenyltrichloroethane (DDT) at 2.0% w/v, g-benzene hexachloride (g-BHC) used at 1.5% w/v,
and a mixture of the two. In the experiment, batches of about fifty insects were exposed to varying
deposits of spray, measured in units of mg/10 cm². The resulting data on the proportion of insects
killed after a period of six days are given in the Table below. In modelling these data, the (natural)
logarithm of the amount of deposit of insecticide is used as the explanatory variable in a linear
logistic model, and the deposit levels were 2.00, 2.64, 3.48, 4.59, 6.06, and 8.00 (mg/10 cm²).

(a) Our goal here is to examine and compare only the "DDT" and the "DDT + g-BHC"
treatments. Using this SAS program/output, write down the predictive (logistic) formula for the
DDT group and the formula for the "DDT + g-BHC'' group, and give your point estimate of the
respective LD₅₀'s (on the original scale!). Based on the subsequent analysis in the SAS program,
test whether the two treatments share a common "slope" (but with different "intercepts".). Do not
use the "Wald Chi-square" test statistic for this test. Next, assuming common slopes, test whether
the intercepts differ. Finally, check the residuals and comment on your findings.

(b) Our goal here is to examine and compare all three treatments. Using this SAS
program/output, write down the predictive (logistic) formula for the DDT group, the formula for
the g-BHC group, and the formula for the "DDT + g-BHC'' group, and give your point estimate
of the respective LD₅₀'s (on the original scale!). Based on the subsequent analysis in the SAS

program, test whether the three treatments share a common "slope" (but with different "intercepts".)

Next, test whether (in addition to the common slopes) the intercepts are the same. Finally,

using the model with common slopes but different intercepts,check the residuals and

comment on your findings.

Insecticide	2.00	2.64	3.48	4.59	6.06	8.00
DDT	3/50	5/49	19/47	19/38	24/49	35/50
g-BHC	2/50	14/49	20/50	27/50	41/50	40/50
DDT + g-BHC	28/50	37/50	46/50	48/50	48/50	50/50

5. Stokes, Davis & Koch (Categorical Data Analysis using the SAS System, 2nd Ed.) discuss
a study on coronary artery disease, in which the response is "CA" = 1 if CA disease is present
and = 0 otherwise, and possible explanatory variables are "AGE" (treated as a continuous
variable), SEX (which takes the value of 0 for females and 1 for males), and "ECG" (an ordinal
variable, with values of 0, 1 and 2, where ECG = 0 is scored if the corresponding ST segment
depression is less than 0.1, ECG = 1 is scored if the corresponding ST segment depression lies
between 0.1 and 0.2, and ECG = 2 is scored if the corresponding ST segment depression is
greater than 0.2). A logistic model was fit to these data (predicting whether the disease is present)
including all interaction terms, and since all interaction terms were non-significant, they were
dropped. The main-effects logistic regression is fit in this SAS program / output. Use the results
to comment on the fit (e.g., lack of fit and residuals), and discuss all findings. Point out which
effects are significant and interpret the parameter estimates and the odds ratios. E.g., are males
or females more likely to develop CAD? Why? Is the difference significant? Interpret
the 3.882 OR for the gender factor. Answer similar Qs for the other factors. Finally,
relate the table (from Proc Freq) to the Logistic output and discuss its relevance.