Homework # 4
- due
Problems students are to do -
- students registered for the UG version (300 level), please
do problems 1, 3, and 4(a) below
- students registered for the G version (400 level), please
do problems 1, 2, 3, 4 (b), and 5 below
As always, list necessary assumptions, and include respective p-values in
parentheses next to
your conclusion - e.g., one might conclude, "the data suggested a
difference in the treatments
(p = 0.0013)."
1. The EAGLE dataset from problem 7.3 on pp. 236-7 of the Venables
& Ripley 1999 text (originally
from Knight and Skagen 1988) is entered and analysed with this
SAS program/output. The data relate
to the foraging behavior of wintering bald eagles in
(pirating) Bald Eagle to steal a chum salmon from another (feeding) Bald
Eagle. As the program
indicates, the variable "pirsize" represents the size of the pirating
eagle, the variable "pirage" represents
the age of the pirating eagle, and the variable "fesize" represents
the size of the feeding eagle.
(a) Identify the model being fit here, clearly defining all
terms (e.g., what is a "success" here?)
and parameters.
(b) Using the "Proc
Logistic", report on factors that explain the success of the pirating
attempt, and
give the prediction formula for the probability of success.
(c) Again, using the "Proc
Logistic", interpret the respective Odds Ratios. If you were
a feeding
Bald Eagle in Washington
State, using these results, comment on the conditions which would be
favorable to your keeping
your chum salmon from a pirating Bald Eagle.
2. An experiment (originally from Svenson 1981,
and reported in Venables and Ripley 1999 p.239
ex 7.7) was performed in
accident rate. The experiment was conducted on 92
days in each year, matched so that day "j" in
1962 was comparable to day "j" in
1961. On some days the speed limit was in effect and enforced,
whereas on other days there was no speed limit and
cars tended to be driven faster. The speed limit
days tended to be in contiguous blocks. The
data is given and analyzed in this SAS program, with
factors "year", "day" and
"limit" and the response variable is the daily traffic accident
count, "y".
Fit Poisson log-linear models and summarize what you
discover. You might assume "day" occurs
as a main effect only (i.e., no interactions with
day), but assess if an interaction between "limit" and
"year" is needed. Summarize
your findings using the given output, and be thorough in your summary.
E.g., don't just state, for example, that the "limit" effect is
significant, rather, go one step further and
discuss for which level of "limit" ("no" or
"yes") there were more traffic accidents, and what kind of
policy impact this may entail.
3. Samuels & Witmer (Statistics for the Life Sciences,1999:429)
present an example in which
50 patients were randomised to receive either of pain medication A or B (in a
balance manner,
meaning 25 pts. in each group), and the response variable that was measured was
Y = the
response to the pain medication. For the levels of Y, the researchers simply
recorded as
1 (for "None"), 2 (for "Some"), 3 (for
"Substantial"), or 4 (for "Complete). The counts are in
this table below.
Pain Relief |
Drug A |
Drug B |
None (1) |
3 |
7 |
Some (2) |
7 |
11 |
Substantial (3) |
10 |
5 |
Complete (4) |
5 |
2 |
The data are analysed using bothProc Freq and Proc Logistic in this SAS program/output. Our
goal here is to decide if the drugs do indeed differ in terms of pain relief,
and, if so, how.
(a) Examine the results from the "Freq" procedure and discuss
your findings, bearing in mind all
necessary assumptions. For example, the
MH test shows significance (p = 0.0282), but the
Chi-square and FET tests show no significance
- comment on the relevance of these tests
and result here.
(b) Examine the results from the "Logistic" procedure and
discuss your findings. Based on this
output, do you feel the drugs differ in terms of pain relief? Do all
necessary assumptions seem to
be met here including the proportional odds assumption? Write down the
predictive formulas.
Clearly interpret the odds ratios for this Logistic fit.
(Extra Credit: use the predictive formulas to give the predicted
values in each one of the cells and
compare the
Freq chi-square method.)
4. On p. 113, Collette (Modelling Binary Data, Chapman & Hall, 2nd
ed, 2003) describes an
insecticide toxicity study in which flour beetles, Tribolium castaneum,
were sprayed with one of
three different insecticides in solution in Shell oil P31. The three
insecticides used were dichloro-
diphenyltrichloroethane (DDT) at 2.0% w/v, g-benzene
hexachloride (g-BHC) used at 1.5% w/v,
and a mixture of the two. In the experiment, batches of about fifty insects
were exposed to varying
deposits of spray, measured in units of mg/10 cm2. The
resulting data on the proportion of insects
killed after a period of six days are given in the Table below. In
modelling these data, the (natural)
logarithm of the amount of deposit of insecticide is used as the explanatory
variable in a linear
logistic model, and the deposit levels were 2.00, 2.64, 3.48, 4.59, 6.06, and
8.00 (mg/10 cm2).
(a) Our goal here is to examine and compare
only the "DDT" and the "DDT + g-BHC"
treatments. Using this
SAS program/output, write down the predictive (logistic) formula for the
DDT group and the formula for the "DDT + g-BHC''
group, and give your point estimate of the
respective LD50's (on the original scale!). Based on the
subsequent analysis in the SAS program,
test whether the two treatments share a common "slope" (but with
different "intercepts".). Do not
use the "Wald Chi-square" test statistic for this test. Next,
assuming common slopes, test whether
the intercepts differ. Finally, check the residuals and comment on your
findings.
(b) Our
goal here is to examine and compare all three treatments. Using this SAS
program/output, write down the predictive (logistic) formula for the DDT group,
the formula for
the g-BHC group, and the formula for
the "DDT + g-BHC'' group,
and give your point estimate
of the respective LD50's (on the original scale!). Based on
the subsequent analysis in the SAS
program, test whether the three treatments share a common "slope" (but with different "intercepts".)
Next, test whether (in addition to the common slopes) the intercepts are the same. Finally,
using the model with common slopes but different intercepts,check the residuals and
comment on your findings.
Insecticide |
2.00 |
2.64 |
3.48 |
4.59 |
6.06 |
8.00 |
DDT |
3/50 |
5/49 |
19/47 |
19/38 |
24/49 |
35/50 |
g-BHC |
2/50 |
14/49 |
20/50 |
27/50 |
41/50 |
40/50 |
DDT + g-BHC |
28/50 |
37/50 |
46/50 |
48/50 |
48/50 |
50/50 |
5. Stokes, Davis & Koch (Categorical Data Analysis using the SAS
System, 2nd Ed.) discuss
a study on coronary artery disease, in which the response is "CA" = 1
if CA disease is present
and = 0 otherwise, and possible explanatory variables are "AGE"
(treated as a continuous
variable), SEX (which takes the value of 0 for females and 1 for males), and
"ECG" (an ordinal
variable, with values of 0, 1 and 2, where ECG = 0 is scored if the
corresponding ST segment
depression is less than 0.1, ECG = 1 is scored if the corresponding ST segment
depression lies
between 0.1 and 0.2, and ECG = 2 is scored if the corresponding ST segment
depression is
greater than 0.2). A logistic model was fit to these data (predicting
whether the disease is present)
including all interaction terms, and since all interaction terms were
non-significant, they were
dropped. The main-effects logistic regression is fit in this SAS program / output. Use the results
to comment on the fit (e.g., lack of fit and residuals), and discuss all
findings. Point out which
effects are significant and interpret the parameter estimates and the odds
ratios. E.g., are males
or females more likely to develop CAD?
Why? Is the difference significant? Interpret
the 3.882 OR for the gender factor.
Answer similar Qs for the other factors. Finally,
relate the table (from Proc Freq) to the Logistic output and discuss its
relevance.