New In
TVCs with interval -censored Cox model
Stata 17 introduced the stintcox command to fit genuine semiparametric Cox models to interval-censored event-time data. Stata 18 adds support for time-varying covariates (TVCs), commonly used in practice to record characteristics that change during follow-up. So, whether you have a biomarker variable or a policy variable that changes with time, you can now include them to model events of interest known only to lie within some time interval. For instance, the event may be a recurrence of cancer or a change
Highlights
-
New in putdocx
-
Include bookmarks in paragraphs and tables
-
Include alternative text to be read by voice software for images
-
Include Scalable Vector Graphics (.svg) images
-
-
New in putexcel
-
Freeze a worksheet at a specific row or column
-
Insert a page break at a specific row or column
-
Insert a header and footer into a worksheet
-
Include hyperlinks in cells
-
Create a named cell range
-
in employment status recorded during one of the follow-ups. Both are examples of interval-censored event-time data, in which the event time is not observed exactly.
Use stintcox to create TVCs automatically by interacting existing covariates with specified deterministic functions of time or specify your own TVCs in the new multiple-record-per-subject data format. Use TVCs to test the proportional-hazards assumption with the Wald test or likelihood-ratio test. And incorporate TVCs in your predictions and plots of survivor and other functions.
Let’s see it work
Testing the proportional-hazards assumption using TVCs
Here we demonstrate how to use TVCs to test the proportional-hazards assumption of a Cox model. Following Zeng et al. (2016), we use an interval-censored Cox regression to model time to HIV infection of injecting drug users. All 1,124 subjects initially tested negative for HIV. They were followed and assessed for HIV-1 seropositivity through blood tests approximately every four months. Because the subjects were tested periodically, the exact time of HIV infection was not observed, but the times were known to fall in intervals between blood tests. The corresponding lower and upper time endpoints are recorded in variables ltime and rtime.
The baseline factors we use to model the HIV-1 seropositivity are centered age at recruitment (age_mean), sex (male), history of needle sharing (needle), history of drug injection (inject), and whether a subject has been in jail at the time of recruitment (jail). Here is a subset of the dataset for subjects 271 through 274 with one record per subject:
We fit a Cox proportional hazards model in which time to HIV infection depends on the above baseline factors. In a single-record-per-subject format, we specify the lower and upper event-time intervals in stintcox‘s interval() option.
We find that age and being in jail at enrollment appear to be associated with, respectively, lower and higher risks of HIV infection.
One way of testing the proportional-hazards assumption for a covariate is to test whether the coefficient associated with that covariate is time invariant. This can be accomplished by including an interaction between this covariate and a function of time in the model and testing whether the corresponding coefficient equals zero.
To test the proportional-hazards assumption in our example, we include all covariates in option tvc() to additionally include their interactions with the analysis time _t, the default, in the model. In this analysis, we suppress the default reporting of hazard ratios with option nohr:
The first equation, main, reports the coefficients for the covariates that do not vary over time; the second equation, tvc, reports the results for covariates interacted with time. And the proprtional-hazards assumption appears to be violated for covariates male and jail according to their p-values reported in the tvc equation. The Wald test at the bottom of the table also indicates that the proportional-hazards assumption doesn’t hold globally.
Fitting an interval-censored Cox model using TVCs in multiple-record data
In the previous example, we used stintcox‘s tvc() option to create TVCs to test the proportional-hazards assumption. In some applications, TVCs already exist in the dataset. Continuing with the data on injecting drug users, being in jail is a TVC. TVCs can be recorded only in multiple-record-per-subject data format. In this format, each subject may contain multiple records with multiple examination times, the event status at each examination time, and potential time-varying covariates at each examination time.
Let’s use an extended version, idu2.dta, of the previous idu.dta dataset, which contains all the baseline covariates as well as a time-varying imprisonment indicator variable (jail_vary). jail_vary indicates whether the subject has been imprisioned since the last clinic visit. The dataset also records a subject identifier (id), the examination time of the blood test (time), and whether the blood test is positive at the examination time (is_seropos). Here is a subset of this dataset for subjects 271 through 274 with multiple records per subject:
We refit the previous Cox model, but we now use the time-varying variable jail_vary instead of the baseline variable jail. In a multiple-record-per-subject format, we specify the subject identifier in option id(), the examination time in option time(), and the event-indicator status in option status().
As we scroll down to the North Central region, we see an average of 47,436 marriages per state; Illinois had the most, 109,823.
We might also store summary statistics for each region in one worksheet and hyperlink to the table from another sheet with the new support for hyperlinks. We could customize our worksheet further by including a header, footer, or page break. See [RPT] putexcel for more information.
Compared with the previous example, after we account for time-varying imprisonment, the hazard ratio for inject increases from 1.25 to 1.37, but the effect of imprisonment decreases from 1.57 for baseline jail to 1.44 for time-varying jail_vary.
Graphing survivor functions
After fitting the model, we can use stcurve to plot the estimated survivor, failure, hazard, or cumulative hazard function. By default, stcurve evaluates the functions at the overall means of covariates.
. stcurve, survival note: function evaluated at overall means of covariates.
With multiple-record data with TVCs, we may want to incorporate the time-varying nature of those covariates when plotting the functions. In this case, we can use option attmeans to evaluate the function at time-specific means.
. stcurve, survival attmeans note: function evaluated at time-specific means of covariates.
We can also use option atframe() to specify our own TVC values to be used to evaluate the survivor function. Suppose we want to plot the survivor curve for an individual with the same covariate pattern as subject 1 in our dataset. We create a new frame called id1 and use frame put to copy the relevant information for subject 1 to this new frame. We list the data we just saved in frame id1.
Then we can graph the survivor curve for this particular profile by typing
. stcurve, survival atframe(id1) note: function evaluated at specified values of selected covariates and overall means of other covariates (if any). note: covariate values from frame id1 used to evaluate function.
Reference
Zeng, D., L. Mao, and D. Lin. 2016. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103: 253—271.