Chapter 4 Gamma Frailty with Applications

4.1 From survival to hazards

We have \[ \bar S(x) = {1 \over \left(1 + \sigma^2 H_0(x)\right)^{1/ \sigma^2}} \]

Let’s compute \(\bar\mu(x)\).

\[ \bar\mu(x) = {\mu_0 \over 1 + \sigma^2 H_0(x)} \]

4.1.1 What happens to frailty of survivors?

Recall that pop hazards = baseline \(\times \bar{z}(x)\).

So, \[ \bar\mu(x) = \mu_0(x) { 1 \over 1 + \sigma^2 H_0(x)} \] Sketch \(\bar{z}(x)\). Hint: what form does \(H_0(x)\) have? (see next Example)

4.1.1.1 Example: Gamma-Gompertz

  • If \(H_0(x)\) be Gompertz, we have closed-form expression. What is it?

  • Does \(\bar{z}\) have the form

    \[{1 \over 1 + v*e^{w x}}\]

  • This is a backwards S, going down.

sigma.sq = .2
x = 0:100
a = 5 * 10^-4
b = 1/8
H0.x = (a/b) * (exp(b*x) - 1)
bar.z = 1 / (1 + sigma.sq * H0.x)
plot(x, bar.z)
Gamma-Gompertz

Figure 4.1: Gamma-Gompertz

Look at the apparent exponential decline in tail.

Homework: what is proportional rate of change in \(\bar{z}\) as \(x\) gets big? Is it close to Gompertz \(b\)?

4.1.2 Average frailty in terms of survival

\[ \bar{z}(x) = [\bar{S}(x)]^{\sigma^2}\] Let’s derive?

In real life, we observe \(\bar{S}(x)\). So this allows us to say something about implied \(\bar{z}\) from hazards.

Reversing the logic: if we see a characteristic changing with age, then we can estimate “\({\sigma^2}\)” (I put in quotes because its the variance of the proportional effect of the observed characteristic.)

4.2 CenSoc: Selection and observed frailty

We have a large matched sample from the 1940 census to Social Security death data observed from 1975 to 2004. This means that we can compute the survival curves of extinct cohorts and see how mortality selection changes the composition of the cohort as it ages.

In this example, we use observed wage income in 1940 for the cohort born 1895 to 1900. We look at how wages of survivors increase with age as a result of selective mortality and we see if the gamma-frailty model can produce similar results.

4.2.1 Data

Read in the data and transform the variables to what we want. We produce a variable \(y\) (in this case a standardized version of log wage income) to be transformed into a frailty score.

## read in dat
library(data.table)
dt <- fread("/data/josh/CenSoc/archive/censoc_bfdw.csv")
## Clean wage data
dt[, incwage := INCWAGE]
dt[incwage == 999998, incwage := NA]
dt[incwage == 0, incwage := NA]
hist_incwage <- dt[, hist(incwage, main ="", xlab = "Income-Wage")]
Histogram of Income-Wage

Figure 4.2: Histogram of Income-Wage

hist_ln_incwage <- dt[, hist(log(incwage),main ="", xlab = "Log(Income-Wage)")]
Histogram of LOG(Income-Wage)

Figure 4.3: Histogram of LOG(Income-Wage)

## Do age at death for 1895-1900 cohorts
dt[, age.at.death := dyear + dmonth/12 - (byear + bmonth/12)]
my.dt <- dt[byear %in% 1895:1900 & dyear %in% 1975:2004]
## now limits to deaths younger than 105
my.dt[, max(age.at.death), by = byear]
##    byear       V1
## 1:  1900 104.9167
## 2:  1898 106.8333
## 3:  1897 107.0833
## 4:  1895 109.4167
## 5:  1899 105.6667
## 6:  1896 107.9167
nrow(my.dt[age.at.death >= 105])
## [1] 177
nrow(my.dt[floor(age.at.death) == 104])
## [1] 253
## now we have same age range for every cohort
my.dt <- my.dt[age.at.death < 105]
my.dt <- my.dt[!is.na(incwage)] ## keep only non-missing 

Log-wages look reasonable, unimodal, kind of symmetric. We now center our variable to 0 before estimating the effect on mortality. Our model exponentiates this 0 to become 1, which is where we want our frailty measure to be centered.

## standardized log income
## log_inc_stan = log(y_orig) - mean(log(y_orig))
## note: control for byear, since different ages in 1940
my.dt[, y_orig := incwage]
my.dt[, log_inc := log(incwage)]
my.dt[, log_inc_mean := mean(log_inc), by = byear]
my.dt[, y := log_inc - log_inc_mean]
hist_y <- my.dt[, hist(y)]
Histogram of standardized Log(Income-Wage)

Figure 4.4: Histogram of standardized Log(Income-Wage)

my.dt[, summary(y, main="", xlab = "Standardized Log(Income-Wage)")]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7.0542 -0.4159  0.1661  0.0000  0.5660  4.2460

Now we’re centered at 0.

Show how our (life-long fixed) characteristic of interest changes by age because of mortality selection.

par(mfrow = c(1,2))
my.dt[, plot(x, y_orig.bar)]
## NULL
title("Wage income by surviving age", cex.main = .7)
my.dt[, plot(x, y.bar)]
## NULL
title("Standardized log of Wage income by surviving age", cex.main = .7)
Wage Income by age

Figure 4.5: Wage Income by age

So we see annual wage income in 1940 increases by about $100 or so, or about 5% from age 75 to age 95. And more after that.

  • Is this what we would expect from our Gamma frailty model?

4.2.2 Estimation

Estimate an observed frailty for each person, call this \(z_{obs}\) To do this we first use Cox regression to estimate the proportional effect of \(y\) on hazards. The Cox model has the form

\[ \mu_i(x) = \mu_0(x) e^{\beta y} \]

We can then transform \(y\) into a frailty score \(z_{obs}\), letting \[ z_{obs} = e^{\hat\beta y} \]

## now get z's
library(survival)
my.dt[, event := 1]
m <- coxph(Surv(age.at.death, event) ~ y, data = my.dt)
(summary(m))
## Call:
## coxph(formula = Surv(age.at.death, event) ~ y, data = my.dt)
## 
##   n= 402797, number of events= 402797 
## 
##       coef exp(coef) se(coef)      z Pr(>|z|)    
## y -0.03073   0.96974  0.00177 -17.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   exp(coef) exp(-coef) lower .95 upper .95
## y    0.9697      1.031    0.9664    0.9731
## 
## Concordance= 0.512  (se = 0.001 )
## Likelihood ratio test= 298  on 1 df,   p=<2e-16
## Wald test            = 301.5  on 1 df,   p=<2e-16
## Score (logrank) test = 301.5  on 1 df,   p=<2e-16
##                      coef exp(coef) se(coef)      z Pr(>|z|)
## y -0.03073   0.96974  0.00177 -17.36   <2e-16 ***
## so a 10% increase in income reduces mortality by .3% (quite a tiny effect!)
beta <- coef(m)
  • The effect is very small. Any ideas why?

Now estimate our z’s.

my.dt[, z := exp(beta * y)]
hist_z <- my.dt[, hist(z, xlim = c(0, 3))]
Histogram of Fraility

Figure 4.6: Histogram of Fraility

  • Does this look gamma-like?

Calculating the variance of \(z_{obs}\) to be used for estimating \(\bar{z}_{obs}\). (Also plotting the histogram to see if it looks gamma-like)

sigma.sq <- var(my.dt$z)
print(sigma.sq)
## [1] 0.000745355
print(sqrt(sigma.sq))
## [1] 0.02730119

Check SD against histogram. Does it look right?

Extinct cohort method to estimate survivorship \(\bar{S}(x)\)

Dx <- my.dt[, table(floor(age.at.death))]
par(mfrow = c(1,2))
plot(Dx)
lx <- rev(cumsum(rev(Dx)))
lxpn <- c(lx[-1],0)
Lx <- (lx + lxpn)/2
mx <- Dx/Lx
x <- as.numeric(names(Dx))
plot(x, log(mx), type = "p")
## gomp fit from ages 80 to 95
m <- lm (log(mx) ~ x, subset = x %in% 80:100)
lines(80:100, predict(m), lwd = 2)
axis(2)
Deaths and Mortality

Figure 4.7: Deaths and Mortality

## no slowdown in mortality!!
  • How does Dx look? Plausible?

  • How about hazards? They are Gompertzian for a while, but how do we explain tails?

plot(x, lx)
Survirship

Figure 4.8: Survirship

Estimation of \(\hat{\bar{z}}(x)\) using the gamma-frailty result:

\[ \bar{z}(x) = \bar{S}(x)^{\sigma^2} \]

sx.bar <- lx/lx[1]
z.bar.hat <- sx.bar^sigma.sq

Comparing this to our observed \(\bar{z}\)

x <- 74:104
z.bar <- NULL
for (i in 1:length(x))
{
    z.bar[i] <- my.dt[age.at.death > x[i], mean(z)]
}

Plotting comparison

plot(x, z.bar.hat, ylim = c(.995, 1.001), type = "l", lty = 2)
lines(x, z.bar)
Frailty comparison

Figure 4.9: Frailty comparison

  • How did we do?

Showing plots for observed mortality selection and the gamma-frailty based estimate of mortality selection. Do this for several measures including \(y\), and raw (unstandardized) income.

y.bar.hat <- log(z.bar.hat)/beta
plot(x, y.bar, ylab = "log(inc) - mean(log(inc))")
lines(x, y.bar.hat)
title("Mean standardized log-income")
legend("topleft", legend = c("observed", "fitted"),
       pch = c(1, -1), lty = c(-1, 1))
Frailty comparison

Figure 4.10: Frailty comparison

##
bar.log.y = mean(log(my.dt$y_orig))
y_orig.bar.hat.wrong <- exp(y.bar.hat) * exp( bar.log.y) ## this is geometric mean
y_orig.bar.hat.right <- exp(y.bar.hat)*y_orig.bar[1]
plot(x, y_orig.bar, ylab = "$ per year")
title("Mean income")
legend("topleft", legend = c("observed", "fitted"),
       pch = c(1, -1), lty = c(-1, 1))
lines(x, y_orig.bar.hat.wrong, lty = 2)
lines(x, y_orig.bar.hat.right, lty = 1)
Frailty comparison

Figure 4.11: Frailty comparison

4.2.3 Discuss our conclusions and possible future directions to follow.

  • How did we do? Does our gamma frailty model give basically the right prediction?

  • How come it appears that wage income matters so little?

  • How could we improve the measurement of wage income?

  • What other variables could we look at?

  • How would we expect the gamma model do with another variable, e.g. educational attainment?

  • What is the relationship between “observed” and “unobserved” frailty?

  • IMPORTANT: Is our work here a validation of the model’s applicability to real life? If so what are we validating? That our transformed covariate is roughly gamma distributed? Are we assuming multiplicative fixed frailty – or are we validating it’s applicability?



4.3 Questions

  1. Under gamma frailty, we obtained an explicit expression for average frailty by age for any baseline hazard schedule. \[ \bar{z}=\frac{1}{1+\sigma^2 H_0(x)}\] Assume baseline mortality is Gompertz (say with a = \(10^{-4}\) and b = 1/12). Try a couple of different values of \(\sigma^2\) (but make sure one of these values is 1/7 for comparability with the next problem). Describe what happens to average frailty at older ages. Does it decrease exponentially? If so, is there an age at which the rate of decrease equals (or at least comes very close to) the exponential rate of increase in baseline hazards b? Does this age depend on \(\sigma^2\)?
  2. Obtain from the Human Mortality Database a schedule of single-year-of-age, cohort mortality rates for females born in 1880 in Italy. Use the “inversion formula” for the gamma distribution to obtain the baseline hazards implied by \(\sigma^2= 1/7\). Plot the observed and implied baseline schedule. Plot the average frailty by age. Do your results resemble or differ from the Gompertz case above ?
  3. Derive V&M ’s result (5E):\[\overline{R}(x) \equiv \frac{\bar{\mu}_2(x)}{\bar{\mu}_1(x)} = \frac{R + R \sigma_1^2 H_1(x)}{1 + R \sigma_2^2 H_1(x)} \]
  4. Use mathematics to say what the determinants of the age of crossover are in terms of the respective frailty variances, R, and a baseline Gompertz schedule.
  5. Simulate this cross over with two proportional Gompertz schedules, with different frailty variances. Can you get a cross-over? If so, does it occur when cumulative hazard satisfy the condition (in small font) at the end of 5E?
  6. Use simulation to say what the determinants of the age of crossover are in terms of the respective frailty variances, R, and the baseline Gompertz schedule.
  7. Get two Italian cohorts 20 years apart and calculate the rate of mortality improvement by age \(\rho(x)\) that you observe and that which you would have observed had there been no frailty. For frailty, assume gamma-distributed with \(\sigma^2 = 1/5\).
  8. Extend the CenSoc demonstration of changing characteristics with age in at least one of the following ways
    1. Use years of education instead of wage income.
    2. Use both years of education and wage income.
    3. Analyze Blacks and Whites separately using wage income? Is the variance of “observed heterogeneity” (\(\hat{z}_{obs}\)) larger for one group. Discuss briefly.

4.4 Solutions

  1. Under gamma frailty, we obtained an explicit expression for average frailty by age for any baseline hazard schedule. \[ \bar{z}=\frac{1}{1+\sigma^2 H_0(x)}\] Assume baseline mortality is Gompertz (say with a = \(10^{-4}\) and b = 1/12). Try a couple of different values of \(\sigma^2\) (but make sure one of these values is 1/7 for comparability with the next problem). Describe what happens to average frailty at older ages. Does it decrease exponentially? If so, is there an age at which the rate of decrease equals (or at least comes very close to) the exponential rate of increase in baseline hazards \(b\)? Does this age depend on \(\sigma^2\)?
    Let \(H_0\) be a gompertz curve with parameters a = \(10^{-4}\) and b = 1/12. The average frailty over age depends on the level of \(\sigma^2\) as seen by the left handside graph. As \(\sigma^2\) increases, average fraily decreases at an exponential rate at earlier ages. That is, when \(\sigma^2\) is very large (ie, 50) the exponential decrease begins almost instantly. However, with a very small \(\sigma^2\) of 0.01 the average frailty is almost constant except at older ages. Therefore \(\sigma^2\) determines when average frailty starts to decrease.
    The graph on the right shows the derivative over ages of each of the average frailty curves as well as the \(b\) parameter of the baseline Gompertz mortality (in blue). Regardless of the the value of \(\sigma^2\), none of the derivatives are close enough to equal the \(b\) parameter.Analytically, the derivative of average frailty is always going to be negative and very small. \[\frac{d}{dx}\bar{z}= -\sigma^2ae^{bx}\bar{z}(x)^2\].

    sigma.sq1 <- c(0.01, 1/7, 0.5, 0.75,4,50)
    
    a <- 10^-4
    b <- 1/12
    x <- 0:100
    H0x <- (a/b)*(exp(b*x) -1)
    
    z.bar.fun <- function(variance) {
      z.bar <- 1 / (1 +  H0x*variance)
      return(z.bar)
    }
    
    #Average gamma frailty
    z1 <- z.bar.fun(0.01)
    z2 <- z.bar.fun(1/7)
    z3 <- z.bar.fun(0.5)
    z4 <- z.bar.fun(0.75)
    z5 <- z.bar.fun(4)
    z6 <- z.bar.fun(50)
    
    #Derivatives
    z1_d <- center.diff(z1)
    z2_d <- center.diff(z2)
    z3_d <- center.diff(z3)
    z4_d <- center.diff(z4)
    z5_d <- center.diff(z5)
    z6_d <- center.diff(z6)
    
    #Graphs
    graph_colors <- c("black","blue","darkred","darkgreen" ,"darkgoldenrod","gray","red")
    par(mfrow=c(1,2), oma=c(0.1,0.1,0.1,0.1), mar=c(3,4,3,1))
    plot(x, z1 ,type = "l", ylim =c(0,1.5), xlab="Age", ylab="Average frailty (z)")
    lines(x, z2, col=graph_colors[2])
    lines(x, z3, col=graph_colors[3])
    lines(x, z4, col=graph_colors[4])
    lines(x, z5, col=graph_colors[5])
    lines(x, z6, col=graph_colors[6])
    legend ("topright", legend = round(sigma.sq1, 3),col=graph_colors, lty=rep(1,6), ncol = 2)
    
    plot(x, z1_d ,type = "l", lty=1, ylim =c(-1/12, 2/12), xlab="Age", ylab="D(Average frailty (z))/dage")
    lines(x, z2_d, col=graph_colors[2])
    lines(x, z3_d, col=graph_colors[3])
    lines(x, z4_d, col=graph_colors[4])
    lines(x, z5_d, col=graph_colors[5])
    lines(x, z6_d, col=graph_colors[6])
    abline(h= b, col= graph_colors[7])
    legend ("topright", legend = c(round(sigma.sq1, 3),"b"),col=graph_colors, lty=rep(1,7) ,ncol = 2)
    Average frailty by age

    Figure 4.12: Average frailty by age

  2. Obtain from the Human Mortality Database a schedule of single-year-of-age, cohort mortality rates for females born in 1880 in Italy. Use the “inversion formula” for the gamma distribution to obtain the baseline hazards implied by \(\sigma^2= 1/7\). Plot the observed and implied baseline schedule. Plot the average frailty by age. Do your results resemble or differ from the Gompertz case above ? In order to get the baseline hazards implied by \(\sigma^2\) = 1/7, we can use the inversion formula \[\mu_0 (x) = \bar{\mu}(x)e^{\sigma^2\bar{H}(x)}\]
    Taking logs, this gives us \[log(\mu_0 (x)) = log(\bar{\mu}(x))+{\sigma^2log(\bar{H}(x))}\]
    \(H(x)\) is equal to the summation of \(\mu(x)\) in continuous time, so we can take the cumulative sum of these mortality rates to get the cumulative hazards. We can then use this to calculate the baseline hazards schedule.

    italy2 <- read_table2("data/italycMx_1x1.txt", skip=2) %>%
       filter(Year == 1880) %>%
       select(Year, Age, Female)
    
    italy2$Age[italy2$Age == "110+"] <- 110
    italy2$mux <- as.numeric(italy2$Female)
    sigma.sq.it <- 1/7
    
    italy2$H0 <- cumsum(italy2$mux)
    italy2$Hbar <- (1/sigma.sq.it)*log(1+sigma.sq.it*italy2$H0)
    italy2$mu0 <- log(italy2$mux)+sigma.sq.it*italy2$Hbar
    
    plot(italy2$Age, log(italy2$mux), type = "l", xlab = "Age", ylab = "Hazards")
    lines(italy2$Age, italy2$mu0, col = "red", lty = 2)
    legend("topleft",legend = c("Observed", "Implied"), lty = c(1,2))
    Observed and implied hazards

    Figure 4.13: Observed and implied hazards

    Now let’s plot average frailty by age. While the shape of the mean frailty graph is the same in both cases, average frailty seems to decline more rapidly here than in the Gompertz case (this is driven by early ages.)

italy2$zbar <- (1/(1+sigma.sq.it*italy2$H0))
plot(italy2$Age, italy2$zbar, type = "l", xlab = "Age", ylab = "Average frailty")
Observed and implied hazards

Figure 4.14: Observed and implied hazards

  1. Derive V&M ’s result (5E) Since \(\mu_2(x) = R\mu_1(x)\) and frailty is distributed gamma with variances \(\sigma_1^2\) and \(\sigma_2^2\), respectively, we can rewrite \[\bar{R}(x) = \frac{\bar{\mu_2}(x)}{\bar{\mu_1(x)}}\] as
    \[\begin{aligned} \bar{R}(x) & = {\mu_2(x) \over 1+\sigma^2_2H_2(x)} \times{1+H_1(x)\sigma^2_1 \over \mu_1(x)} \\ & = { \bar{\mu}_2(x) \over \bar{\mu}_1(x) }\times { 1+H_1(x)\sigma^2_1 \over 1+H_2(x)\sigma^2_2 } \end{aligned}\] Since \(H_2 = R*H_1\), \[\begin{aligned} & = {R} \times { 1+\sigma^2_1 H_1(x) \over 1+R\sigma^2_2 H_1(x) }\\ & = { R+R\sigma^2_1 H_1(x) \over 1+R\sigma^2_2 H_1(x) } \end{aligned}\]

  2. Use mathematics to say what the determinants of the age of crossover are in terms of the respective frailty variances, R, and a baseline Gompertz schedule.
    The age crossover occurs at \(\bar{u_1} = \bar{u_2}\), which occurs at \(\bar{R} = 1\). Rearranging 5E after equating it to 1 gives us \[1+R\sigma_2^2(H_1(x_c)) = R+R\sigma_1^2(H_1(x_c))\] \[H_1(x_c)(R\sigma_1^2 - R\sigma_2^2) = 1-R\] \[H_1(x_c) = { R -1 \over R(\sigma_2^2 - \sigma_1^2)}\] Assuming a baseline hazard schedule \(H_1(x)\) that is Gompertzian, we can solve to get the age of crossover \(x_c\). \[ \begin{aligned} {a \over b}(e^{bx_c} -1) & = { R -1 \over R(\sigma_2^2 - \sigma_1^2)} \\ x_c & = {1 \over b} \log \bigg({ {(b/a)(R-1)}\over R(\sigma^2_2-\sigma^2_1)} +1\bigg) \end{aligned}\]

  3. Simulate this cross over with two proportional Gompertz schedules, with different frailty variances. Can you get a cross-over? If so, does it occur when cumulative hazard satisfy the condition (in small font) at the end of 5E?
    We borrow the frailty simulation function from problem set 2 and use it to create two schedules with Gamma frailty distributions (with different variances) and where the scales of the gompertz curves are proportional.

    source("functions/gomp_funs.R")
    source("functions/frailty_sim_function.R")
    
    #Let's choose different variances for the two Gompertz schedules.
    sigmasq.1 <- 0.02
    sigmasq.2 <- 0.25
    N <- 1000000
    
    #Now let's generate the zs for this using the rgamma function.
    z1 <- rgamma(N, shape = 1/sigmasq.1, scale = sigmasq.1)
    z2 <- rgamma(N, shape = 1/sigmasq.2, scale = sigmasq.2)
    
    #Since these are proportional Gompertzian schedules, they will have the same b but different alphas, scaled by R
    beta1 <- 1/9
    alpha1 <- 10^-4
    R <- 1.5
    
    schedule1 <- frailty_sim(N, z1, base.a = alpha1, base.b =beta1)
    schedule2 <- frailty_sim(N, z2, base.a = R*alpha1, base.b =beta1)

    Now we can graph this to observe the crossover. In Problem 4, we calculate an age where this crossover would occur based on 5E, and here, graphing that line in grey, we see that the crossover occurs at exactly that point.

x.crossover <- (1/beta1)*log( (((beta1/alpha1)*(R-1)) / (R*(sigmasq.2-sigmasq.1))) - 1 )
  
plot(schedule1$frailty$x, log(schedule1$frailty$mx), main = "Crossover in Log Hazards", type = "l", lty = 1, lwd = 2, col = "black", xlab = "Age", ylab = "log hx")
lines(schedule2$frailty$x, log(schedule2$frailty$mx), type = "l", col = "red", lty = 1, lwd = 2)
#abline(v = x.crossover, col ="gray", lty = 2)
legend(x = 5, y = -1, title = "Variance", legend = c("0.25", "0.02"), col = c("black", "red"),
       lwd = 2,lty = 1)
Mortality crossover

Figure 4.15: Mortality crossover

  1. Use simulation to say what the determinants of the age of crossover are in terms of the respective frailty variances, R, and the baseline Gompertz schedule.
    If we alter any of the parameters here, it would change the age of crossover in accordance with that observed in Problem 4. We can simulate this by writing the previous code as a function and running it with different parameters.

    get.crossover.plot <- function( N, sigmasq.1.fun, sigmasq.2.fun, beta.fun, alpha.fun, R.fun) {
    
    #Now let's generate the zs for this using the rgamma function.
    z1.fun <- rgamma(N, shape = 1/sigmasq.1, scale = sigmasq.1)
    z2.fun <- rgamma(N, shape = 1/sigmasq.2, scale = sigmasq.2)
    
    #Since these are proportional Gompertzian schedules, they will have the same b but different alphas, scaled by R
    #We can use the frailty simulation function from now onwards
    
    schedule1 <- frailty_sim(N, z1.fun, base.a = alpha.fun, base.b =beta.fun)
    schedule2 <- frailty_sim(N, z2.fun, base.a = R*alpha.fun, base.b =beta.fun)
    
    #Crossover plots
    plot(schedule1$frailty$x, log(schedule1$frailty$mx), type = "l", lty = 1, lwd = 2, col = "black", xlab = "Age", ylab = "log hx")
    lines(schedule2$frailty$x, log(schedule2$frailty$mx), type = "l", col = "red", lty = 1, lwd = 2)
    legend("topleft", title = "Variance", legend = c(sigmasq.1.fun, sigmasq.2.fun ), col = c("black", "red"),
       lwd = 2,lty = 1)
    mtext(paste0("R= ", R.fun," Base a = ", alpha.fun, " Base b = ", round(beta.fun,2) ), side=3)
    }

    Now let’s run this for different values of alpha, beta, R, and the two variances. In the first set of graphs, changing the two variances to compare when they are very different and when they are very similar. Age of crossover does not seem to change very much.

    N <- 1000000
    par(mfrow = c(1,2))
    get.crossover.plot(N, sigmasq.1.fun = 0.001, sigmasq.2.fun = 0.7, alpha.fun = 10^-4, beta.fun = 1/9 , R.fun = 1.6 )
    get.crossover.plot(N, sigmasq.1.fun = 0.3, sigmasq.2.fun = 0.25, alpha.fun = 10^-4, beta.fun = 1/9 , R.fun = 1.6 )
    Crossover: changing variances

    Figure 4.16: Crossover: changing variances

    Then, when changing alpha so that we can compare a very small alpha with a large one, a crossover occurs earlier with a larger value.

    par(mfrow = c(1,2))
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-6, beta.fun = 1/9 , R.fun = 1.6 )
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-3, beta.fun = 1/9 , R.fun = 1.6 )
    Crossover: changing Gompertz $a$ parameter

    Figure 4.17: Crossover: changing Gompertz \(a\) parameter

    By changing beta to compare a very small beta and a large one, we get a crossover very early with a large beta.

    par(mfrow = c(1,2))
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-4, beta.fun = 1/20 , R.fun = 1.6 )
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-4, beta.fun = 1/4 , R.fun = 1.6 )
    Crossover: changing Gompertz $b$ parameter

    Figure 4.18: Crossover: changing Gompertz \(b\) parameter

    Finally, if we compare a large and small r, there does not seem to be a difference in the crossover ages.

    par(mfrow = c(1,2))
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-4, beta.fun = 1/9 , R.fun = 0.7)
    get.crossover.plot(N, sigmasq.1.fun = 0.03, sigmasq.2.fun = 0.25, alpha.fun = 10^-4, beta.fun = 1/9 , R.fun = 2.0)
    Crossover: changing $R$

    Figure 4.19: Crossover: changing \(R\)

  2. Get two Italian cohorts 20 years apart and calculate the rate of mortality improvement by age \(\rho(x)\) that you observe and that which you would have observed had there been no frailty. For frailty, assume gamma-distributed with \(\sigma^2 = 1/5\).
    We obtain the Italian cohort female lifetable (1x1) from the Human Mortality Database (HMD).

    italy <- read_table2("data/italy_fltcoh_1x1.txt", skip=1) %>% #HMD Italy cohort data, female lifetable (1x1)
    filter(Year == 1880|Year == 1900)
    italy$Age[italy$Age == "110+"] <- 110
    italy$Age <- as.numeric(italy$Age)
    italy$mx <- as.numeric(italy$mx)
    italy$lx <- italy$lx/100000
    
    italy <- italy %>%
    select(Year, Age, mx, lx)
    
    italy1880 <- italy %>% filter(Year == 1880) #1880 cohort
    italy1900 <- italy %>% filter(Year == 1900) #1900 cohort

    The observed rate of mortality improvement can be calculated using \[ \bar{\rho}(x,t) = - {1 \over t} \log {m_{t2}(x) \over m_{t1}(x) }\] and the version with frailty can be calculated using: \[ \rho(x,t) = \bar{\rho}(x,t) + \sigma^2\ {d \over dt}\bar{S}_c (x,t) \] Now we can calculate the rates of improvement in mortality and compare them. When we assume frailty, we get a higher rate of improvement at the older ages than in the observed case.

    sigma.sq.ct <- 1/5
    ages <- 0:110
    rho_bar <- (-1/20)*log(italy1900$mx/italy1880$mx) #hazard component
    sc_bar <- (1/20)*log(italy1900$lx/italy1880$lx)  #survivorship component
    rho <- rho_bar+sigma.sq.ct*sc_bar
    
    plot(ages, rho_bar, type = "l", lty = 1, xlab = "Age", ylab = "Mortality Improvement Rate")
    lines(ages, rho, lty = 2, col = "red")
    legend("topright", legend = c("Observed", "With Frailty"), lty = c(1,2), col = c("black", "red"))
    title("Mortality Improvement for the Cohorts of 1880 and 1900")
    Mortality improvement

    Figure 4.20: Mortality improvement

  3. Extend the CenSoc demonstration of changing characteristics with age in at least one of the following ways

    1. Use years of education instead of wage income.
    2. Use both years of education and wage income.
    3. Analyze Blacks and Whites separately using wage income? Is the variance of “observed heterogeneity” (\(\hat{z}_{obs}\)) larger for one group. Discuss briefly.
    dt <- fread("/data/josh/CenSoc/working_files/censoc_dmf_demo.csv")
    # censoc <- fread("/data/josh/CenSoc/working_files/censoc_dmf_demo.csv")
    # write.csv(censoc, file="data/censoc_dmf_demo")
    
    #Death and age variables
    dt[, age.at.death := dyear + dmonth/12 - (byear + bmonth/12)]
    dt <- dt[byear %in% 1895:1900 & dyear %in% 1975:2004,]
    dt <- dt[age.at.death < 105]
    
    #Education variables
    dt <- dt[!is.na(educyrs)]
    dt[, y_orig_educ := educyrs]
    dt[, educyrs_mean := mean(educyrs), by = byear]
    dt[, y_educ := educyrs - educyrs_mean]
    
    x <- 74:104
    y.bar.educ <- NULL
    y_orig.bar.educ <- NULL
    for (i in 1:length(x)){
    y.bar.educ[i] <- dt[age.at.death > x[i], mean(y_educ)]
    y_orig.bar.educ[i] <- dt[age.at.death > x[i], mean(y_orig_educ)]
    }
    
    #Income variables
    dt[incwage == 999998, incwage := NA]
    dt[incwage == 0, incwage := NA]
    dt <- dt[!is.na(incwage)] 
    dt[, y_orig_inc := incwage]
    dt[, log_inc := log(incwage)]
    dt[, log_inc_mean := mean(log_inc), by = byear]
    dt[, y_inc := log_inc - log_inc_mean]
    1. Let’s see how this changes for education.
    2. The easiest way to compare both income and education is to compare income (on the y-axis) while letting education vary. For simplicity we took 4 education bins: 0-5 years, 5-10 years, 10-15 years and more than 15 years: as well as their standardized analoges. The intensity of colors increases with the number of years of education.
    x <- 74:104
    y_std.bar.both1 <- NULL
    y_std.bar.both2 <- NULL
    y_std.bar.both3 <- NULL
    y_std.bar.both4 <- NULL
    
    y_orig.bar.both1 <- NULL
    y_orig.bar.both2 <- NULL
    y_orig.bar.both3 <- NULL
    y_orig.bar.both4 <- NULL
    
    for (i in 1:length(x))
    {
    #Original
    y_orig.bar.both1[i] <- dt[age.at.death > x[i] & educyrs>=0 & educyrs <5 , mean(y_orig_inc)]
    y_orig.bar.both2[i] <- dt[age.at.death > x[i] & educyrs>=5 & educyrs <10, mean(y_orig_inc)]
    y_orig.bar.both3[i] <- dt[age.at.death > x[i] & educyrs>=10 & educyrs <15, mean(y_orig_inc)]
    y_orig.bar.both4[i] <- dt[age.at.death > x[i] & educyrs >=15, mean(y_orig_inc)]
    
    #Standardized
    y_std.bar.both1[i] <- dt[age.at.death > x[i] & y_educ >= -10 & y_educ < -5, mean(y_inc)]
    y_std.bar.both2[i] <- dt[age.at.death > x[i] & y_educ >= -5 & y_educ <0, mean(y_inc)]
    y_std.bar.both3[i] <- dt[age.at.death > x[i] & y_educ >= 0 & y_educ <5, mean(y_inc)]
    y_std.bar.both4[i] <- dt[age.at.death > x[i] & y_educ >= 5, mean(y_inc)]
    
    }
    
    par(mfrow = c(1,2))
    colors <- gray((4:0)/5)
    plot(x, y_orig.bar.both1, col=colors[1], ylim=c(min(y_orig.bar.both1), max(y_orig.bar.both4)),
     type="l", ylab = "log wage", xlab = "Age at death")
    lines(x, y_orig.bar.both2, col=colors[2])
    lines(x, y_orig.bar.both3, col=colors[3])
    lines(x, y_orig.bar.both4, col=colors[4])
    title("Education and wage by surviving age", cex.main = .7)
    
    plot(x, y_std.bar.both1, col=colors[1], ylim=c(-2,2), 
     type="l", ylab = "standardized log wage", xlab = "Age at death")
    lines(x, y_std.bar.both2, col=colors[2])
    lines(x, y_std.bar.both3, col=colors[3])
    lines(x, y_std.bar.both4, col=colors[4])
    title("Education and wage by surviving age", cex.main = .7)

    1. We’ll run the wage income comparison for Blacks and Whites separately.
dt.new <-dt 

x <- 74:104
y.bar.inc.white <- NULL
y_orig.bar.inc.white <- NULL
dt.white <-dt.new[race == "White"]
for (i in 1:length(x))
{
    y.bar.inc.white[i] <- dt.white[age.at.death > x[i], mean(y_inc)]
    y_orig.bar.inc.white[i] <- dt.white[age.at.death > x[i], mean(y_orig_inc)]
}

x <- 74:104
y.bar.inc.black <- NULL
y_orig.bar.inc.black <- NULL
dt.black <-dt.new[race== "Black/African American"]
for (i in 1:length(x))
{
    y.bar.inc.black[i] <- dt.black[age.at.death > x[i], mean(y_inc)]
    y_orig.bar.inc.black[i] <- dt.black[age.at.death > x[i], mean(y_orig_inc)]
}
Now let's graph these two. We can see a decline in log wages with age for Blacks that we do not observe for Whites, for whom this tends to increase by age. This suggests that we may observe greater variance in heterogeneity for blacks than for Whites.   
par(mfrow = c(1,2))
plot(x, y_orig.bar.inc.white,
     type="l", ylab = "log wage", xlab = "Age at death")
title("Wage income by surviving age: Whites", cex.main = .7)
plot(x, y.bar.inc.white, 
     type="l", ylab = "standardized log wage", xlab = "Age at death")
title("Standardized log wages by surviving age: Whites", cex.main = .7)

par(mfrow = c(1,2))
plot(x, y_orig.bar.inc.black,
     type="l", ylab = "log wage", xlab = "Age at death")
title("Wage income by surviving age: Blacks", cex.main = .7)
plot(x, y.bar.inc.black, 
     type="l", ylab = "standardized log wage", xlab = "Age at death")
title("Standardized log wages by surviving age: Blacks", cex.main = .7)