Homework # 7     -     due Monday, 1st May 2006 at the start of class

Problems students are to do -
    - UG students:  please do problem A and B below
    - G students: please do problems A - D below

As always, list necessary assumptions, and include respective p-values in parentheses
next to your conclusion - e.g., one might conclude, "the data suggested a difference in
the treatments (p = 0.0013)."  Again, please note that some of the following SAS program
outputs are *very* lengthy, so you are warned to print them out at your own risk - rather,
just copy down what you need and print out the essential sections!  Do not include the
computer output
(from the computer programs) in your homework submissions except
to indicate relevant test statistics, p-values, outliers, etc.

A.  Diggle, Liang & Zeger (1994) give an example of repeated measurements on the "size" (by
convention this is log height plus twice log diameter) of 79 Sitka spruce trees, 54 of which were
grown in ozone-enriched chambers and 25 of which were controls.  The trees were measured
on eight days in 1989. (exercise 6.13 on p. 209 of Venables and Ripley, Modern Applied
Statistics with S-Plus, 1999). The data are analyzed in SAS here. This analysis uses the
classical linear mixed models analysis (cf.exercise C below).  Use the output to thoroughly
analyze these repeated-measures data, including
    (1)  for each part (MANOVA, SP, each of the Mixed's), listing the necessary assumptions,
    (2)  in the MANOVA approach, comment on the significance of the fact that the "time_6"
          contrast for "treat" is significant (p = 0.0031),
    (3)  comment on the usefulness of the results of the Split Plot (SP) approach,
    (4)  identify (and justify!!) which of the three covariance structures used in Proc Mixed is
          most appropriate for these data.  "Justify" here means report the relevant test stat and
          results.  How many variance components need to be estimated for each of the three runs
          of Proc Mixed?
    (5)  using the appropriate analysis, comment on whether you feel the profiles for the two
          treatment curves are the same, reporting the relevant test statistic and p-value.
          Give your final conclusion.

B.  On page 18 of Davidian & Giltinan, Nonlinear Models for Repeated Measurement Data,
the authors present the data of Kwan et al (1976), and fit a mixed nonlinear model of the form

                    EY = b1*exp(- b2X) + b3*exp(- b4X),

in which Y = plasma concentration (of a drug called cefamandole) and X = time (in minutes
post dose). For this study, a dose of 15 mg/kg body weight of the drug was administered by
ten-minute intravenous infusion to six healthy volunteers. The data are input and analyzed in
SAS here.  The program first graphs the data then runs four Proc NLMixed's in SAS.
    (1) Using the values of  b1 = 2.7733, b2 = 2.8139, b3 = 0.7870, b4 = 0.4195, graph
         the above curve, and comment on the role of each of the four parameters (this part has

         been done for students and discussed in class – so no need to do it!).
    (2) Listing all necessary assumptions, explain what is being done in each of the NLMixed runs,
        commenting on which model is being fit and identifying the underlying assumptions.  Be
        specific. Also, identify which models are special cases of others (i.e., which are nested,
        and identify which they are nested in.)
    (3) Choose the NLMixed analysis and model which best describes these data, and give
        specific reasons for why you chose the model you did, and why you rejected the
        others.  Give test statistics, degrees of freedom and p-values to justify your claims.
    (4) The parameters b2 and b4 are important since they address the rate of decrease of the
        expected concentration function. Contrasting the estimated standard errors of these
        two parameters for the first NLMixed with those for the second through fourth
        NLMixed's, why are these SEs lower for the latter three NLMixed's than for the
        first one?

C.  Reanalyze the Sitka89 data from exercise A using the SAS program/output here.  This
program runs one NLIN and three NLMixeds.  The NLIN and the first NLMixed are run
"by treatment," so each produces two outputs.  The respective outputs are identified by a
corresponding title.
    (1) Identify the models which are fit in each of these runs and the underlying assumptions.
    (2) Focusing on the NLIN in output # 1, make a 2x4 table of parameter estimates and
         comment on which parameters seem close (guesses can be made using approx. CIs)
         for the two treatments.
    (3) It turns out that outputs # 2 and # 3 are virtually identical, noting that the -2LL's for
        output # 2 sum to  (-206.1 + - 492.8) almost that of output # 3 (-697.2).  In model
        # 3, what is the role of the "add" terms (th1add, th2add, etc.)?
    (4) Of the models #1, #3, and # 4, which are special cases of others (nested)?
    (5) What hypothesis can be tested by comparing outputs # 3 and # 4?  Clearly list the
        hypotheses, test statistic, p-value and your conclusion.  Which parameters are random
        in these models? Do these models assume that these random parameters vary by
        different or the same amounts?
    (6) Using the model you feel best describes these data, summarize the data with your model.
        In comparing the NLIN output and the last NLMIXED, comment on the differences in
        the estimated variability associated with the LD50 parameter for each of these models.

D.  A patient swallows a tablet of Zantac, which enters the patient's gut, and begins entering
the patient's bloodstream at time t = 0.  Blood samples are then taken ever half-hour until
hour 16, and the concentration of Zantac in the patient's serum are recorded and analyzed in
the SAS program and output here.
    (1) Describe the model(s) that are being fit in the NLIN and the NLMixed, and the implicit
         assumptions. Are they fitting the same model?  What are the model parameters, and the
         roles of these for each model.
    (2) The residual plot after the NLIN highlights a problem with one of the implicit assumptions.
        Which one (assumption) is it, and what is wrong?  What are the usual ramifications of the
         violation of this (or these) assumption(s)?
    (3) Explain what is being done in the IML procedure, focusing on which function is being
        minimized in the "neg2lla" and the "neg2llb" functions.  The latter function introduces an
        additional parameter - which one is it and what is it's role?  Is it significant?  (Listing your

        null and alternative hypotheses, report the relevant test statistic, distribution, degrees of
        freedom and p-value.)  Give reliable 90%, 95% and 99% CIs for this parameter.