Homework # 7 - due
Problems students are to do
-
- UG students: please do problem A and B below
- G students: please do problems A - D below
As always, list necessary assumptions,
and include respective p-values in parentheses
next to your conclusion - e.g., one might conclude, "the data suggested
a difference in
the treatments (p = 0.0013)." Again, please note that some of the
following SAS program
outputs are *very* lengthy, so you are warned to print them out at your own
risk - rather,
just copy down what you need and print out the essential sections! Do not include the
computer output (from the computer programs) in your homework submissions
except
to indicate relevant test statistics, p-values,
outliers, etc.
A. Diggle, Liang & Zeger (1994) give an example of repeated
measurements on the "size" (by
convention this is log height plus twice log diameter) of 79
grown in ozone-enriched chambers and 25 of which were controls. The trees
were measured
on eight days in 1989. (exercise 6.13 on p. 209 of Venables and Ripley, Modern
Applied
Statistics with S-Plus, 1999). The data are analyzed in SAS here. This analysis uses the
classical linear mixed models analysis (cf.exercise C below). Use the
output to thoroughly
analyze these repeated-measures data, including
(1) for each part (MANOVA, SP, each of the Mixed's),
listing the necessary assumptions,
(2) in the MANOVA approach, comment on the
significance of the fact that the "time_6"
contrast for
"treat" is significant (p = 0.0031),
(3) comment on the usefulness of the results of the
Split Plot (SP) approach,
(4) identify (and justify!!) which of the three
covariance structures used in Proc Mixed is
most appropriate for
these data. "Justify" here means report the relevant test stat
and
results. How many
variance components need to be estimated for each of the three runs
of Proc Mixed?
(5) using the appropriate analysis, comment on whether
you feel the profiles for the two
treatment curves are the
same, reporting the relevant test statistic and p-value.
Give your final
conclusion.
B. On page 18 of Davidian & Giltinan, Nonlinear Models for
Repeated Measurement Data,
the authors present the data of Kwan et al (1976), and fit a mixed nonlinear
model of the form
EY = b1*exp(- b2X) + b3*exp(- b4X),
in which Y = plasma concentration
(of a drug called cefamandole) and X = time (in minutes
post dose). For this study, a dose of 15 mg/kg body weight of the drug was
administered by
ten-minute intravenous infusion to six healthy volunteers. The data are input
and analyzed in
SAS here. The program first graphs the data then
runs four Proc NLMixed's in SAS.
(1) Using the values of b1
= 2.7733, b2 = 2.8139, b3 = 0.7870, b4 = 0.4195, graph
the above curve, and comment
on the role of each of the four parameters (this part has
been done for
students and discussed in class – so no need to do it!).
(2) Listing all necessary assumptions, explain what is being
done in each of the NLMixed runs,
commenting on which model is being
fit and identifying the underlying assumptions. Be
specific. Also, identify which
models are special cases of others (i.e., which are nested,
and identify which they are nested
in.)
(3) Choose the NLMixed analysis and model which best
describes these data, and give
specific reasons for why you chose
the model you did, and why you rejected the
others. Give test statistics,
degrees of freedom and p-values to justify your claims.
(4) The parameters b2
and b4 are important since
they address the rate of decrease of the
expected concentration function. Contrasting the estimated standard errors of these
two
parameters for the first NLMixed with those for the second through fourth
NLMixed's, why are these SEs lower for the latter three NLMixed's than for the
first
one?
C. Reanalyze the Sitka89 data from exercise A using the SAS
program/output here. This
program runs one NLIN and three NLMixeds. The NLIN and the first NLMixed
are run
"by treatment," so each produces two outputs. The respective
outputs are identified by a
corresponding title.
(1) Identify the models which are fit in each of these runs
and the underlying assumptions.
(2) Focusing on the NLIN in output # 1, make a 2x4 table of
parameter estimates and
comment on which parameters
seem close (guesses can be made using approx. CIs)
for the two treatments.
(3) It turns out that outputs # 2 and # 3 are virtually
identical, noting that the -2LL's for
output # 2 sum to (-206.1 + -
492.8) almost that of output # 3 (-697.2). In model
# 3, what is the role of the
"add" terms (th1add, th2add, etc.)?
(4) Of the models #1, #3, and # 4, which are special cases
of others (nested)?
(5) What hypothesis can be tested by comparing outputs # 3
and # 4? Clearly list the
hypotheses, test statistic, p-value
and your conclusion. Which parameters are random
in these models? Do these models
assume that these random parameters vary by
different or the same amounts?
(6) Using the model you feel best describes these data,
summarize the data with your model.
In comparing the NLIN output and the
last NLMIXED, comment on the differences in
the estimated variability associated
with the LD50 parameter for each of these models.
D. A patient swallows a
tablet of Zantac, which enters the patient's gut, and begins entering
the patient's bloodstream at time t = 0. Blood samples are then taken
ever half-hour until
hour 16, and the concentration of Zantac in the patient's serum are recorded
and analyzed in
the SAS program and output here.
(1) Describe the model(s) that are being fit in the NLIN and
the NLMixed, and the implicit
assumptions. Are they fitting
the same model? What are the model parameters, and the
roles of these for each model.
(2) The residual plot after the NLIN highlights a problem
with one of the implicit assumptions.
Which one (assumption) is it, and
what is wrong? What are the usual ramifications of the
violation of this (or these)
assumption(s)?
(3) Explain what is being done in the IML procedure, focusing
on which function is being
minimized in the "neg2lla"
and the "neg2llb" functions. The latter function introduces an
additional parameter - which one is
it and what is it's role? Is it significant? (Listing your
null and alternative
hypotheses, report the relevant test statistic, distribution, degrees of
freedom and p-value.) Give
reliable 90%, 95% and 99% CIs for this parameter.