hwk07

Homework # 7 – due Thursday December 4 by 5pm

Problems students are to do -

· Undergraduate students: please do problem A and C below

· G students: please do problems A - D below

As always, list necessary assumptions, and include respective p-values in parentheses next to your conclusion - e.g., one might conclude, "the data suggested a difference in the treatments (p = 0.0013)." Again, please note that some of the following SAS program outputs are *very* lengthy, so you are warned to print them out at your own risk - rather, just copy down what you need and print out the essential sections! Do not include the computer output (from the computer programs) in your homework submissions except to indicate relevant test statistics, p-values, outliers, etc.

A. Diggle, Liang & Zeger (1994) give an example of repeated measurements on the "size" (by convention this is log height plus twice log diameter) of 79 Sitka spruce trees, 54 of which were grown in ozone-enriched chambers and 25 of which were controls. The trees were measured on eight days in 1989. (This is exercise 6.13 on p. 209 of Venables and Ripley, Modern Applied Statistics with S-Plus, 1999). The data are analyzed in SAS here. This analysis uses the classical linear mixed models analysis (cf. exercise C below). Use the output to thoroughly analyze these repeated-measures data, including:

(1) For each part (MANOVA, SP, each of the Mixed's), listing the necessary assumptions,

(2) In the MANOVA approach, comment on the significance of the fact that the "time_6" contrast for "treat" is significant (p = 0.0031),

(3) Comment on the usefulness or otherwise of the results of the Split Plot (SP) approach,
(4) Identify (and justify!!) which of the three covariance structures used in Proc Mixed is most appropriate for these data. "Justify" here means report the relevant test statistic(s) and results. How many variance components need to be estimated for each of the three runs of Proc Mixed?

(5) Using the appropriate analysis, comment on whether you feel the profiles for the two treatment curves are the same, reporting the relevant test statistic and p-value. Give your final conclusion.

B. On page 18 of Davidian & Giltinan, Nonlinear Models for Repeated Measurement Data, the authors present the data of Kwan et al (1976), and fit a mixed nonlinear model of the form

EY = b₁*exp(- b₂X) + b₃*exp(- b₄X),

in which Y = plasma concentration (of a drug called cefamandole) and X = time (in minutes post dose). For this study, a dose of 15 mg/kg body weight of the drug was administered by ten-minute intravenous infusion to six healthy volunteers. The data are input and analyzed in SAS here. The program first graphs the data then runs four Proc NLMixed's in SAS.

(1) Using the values of b₁ = 2.7733, b₂ = 2.8139, b₃ = 0.7870, b₄ = 0.4195, graph the above curve, and comment on the role of each of the four parameters (this part has been done for students and discussed in class – so no need to do it!).

(2) Listing all necessary assumptions, explain what is being done in each of the NLMixed runs, commenting on which model is being fit and identifying the underlying assumptions. Be specific. Also, identify which models are special cases of others (i.e., which are nested, and identify which they are nested in.)

(3) Choose the NLMixed analysis and model which best describes these data, and give specific reasons for why you chose the model you did, and why you rejected the others. Give test statistics, degrees of freedom and p-values to justify your claims.

(4) The parameters b₂ and b₄ are important since they address the rate of decrease of the expected concentration function. Contrasting the estimated standard errors of these two parameters for the first NLMixed with those for the second through fourth NLMixed's, why are these SEs lower for the latter three NLMixed's than for the first one?

C. Reanalyze the Sitka89 data from exercise A using the SAS program/output here. This program runs one NLIN and four NLMixeds. The NLIN and the first NLMixed are run "by treatment," so each produces two sub-outputs. The respective outputs are identified by a corresponding title.
(1) Identify the models that are fit in each of these runs and the underlying assumptions. (2) Focusing on the NLIN in output # 1, make a 2x4 table of parameter estimates and comment on which parameters seem close (guesses can be made using approx. CIs) for the two treatments.
(3) It turns out that outputs # 2 and # 3 are virtually identical, noting that the -2LL's for output # 2 sum to (-206.1 + - 492.8) almost that of output # 3 (-697.2). In model # 3, what is the role of the "add" terms (i.e., as in “th1add”, “th2add”, etc.)?
(4) Of the models #1, #3, # 4, and # 5 – which are special cases of others (nested)?
(5) What hypothesis can be tested by comparing outputs # 3 and # 4? Clearly list the hypotheses, test statistic, p-value and your conclusion. Which parameters are random in these models? Do these models assume that these random parameters vary by different or the same amounts?
(6) Answer the questions in (5) but comparing outputs #4 and #5.

(7) Using the model you feel best describes these data, summarize the data with your model.
In comparing the NLIN output and the last NLMIXED, comment on the differences in the estimated variability associated with the LD50 parameter for each of these models.

D. A patient swallows a tablet of Zantac, which enters the patient's gut, and begins entering the patient's bloodstream at time t = 0. Blood samples are then taken every half-hour until hour 16, and the concentration of Zantac in the patient's serum are recorded and analyzed in the SAS program and output here.
(1) Describe the model(s) that are being fit in the NLIN and the NLMixed, and the implicit assumptions. Are they fitting the same model? What are the model parameters, and the roles of these for each model.
(2) The residual plot after the NLIN highlights a problem with one of the implicit assumptions. Which one (assumption) is it, and what is wrong? What are the usual ramifications of the violation of this (or these) assumption(s)?
(3) Explain what is being done in the IML procedure, focusing on which function is being minimized in the "neg2lla" and the "neg2llb" functions. The latter function introduces an additional parameter - which one is it and what is it's role? Is it significant? (Listing your null and alternative hypotheses, report the relevant test statistic, distribution, degrees of freedom and p-value.) Give reliable 90%, 95% and 99% CIs for this parameter.