Using MortaltiyTables.jl with DataFrames

MortalityTables.jl standard representation

MortalityTables.jl stores the rates in a very efficient manner as a collection of vectors indexed by attained age.

First, we include the package, and then we'll pick a table, where all of the mort.soa.org tables are mirrored into your MortalityTables.jl installation.

begin
    using MortalityTables

    vbt = MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB") #or any other table
end

MortalityTable (Insured Lives Mortality):
   Name:
       2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB
   Fields: 
       (:select, :ultimate, :metadata)
   Provider:
       Society of Actuaries
   mort.SOA.org ID:
       1118
   mort.SOA.org link:
       https://mort.soa.org/ViewTable.aspx?&TableIdentity=1118
   Description:
       2001 Valuation Basic Table (VBT) Residual Standard Select and Ultimate Table -  Male Nonsmoker. Basis: Age Nearest Birthday. Minimum Select Age: 0. Maximum Select Age: 99. Minimum Ultimate Age: 25. Maximum Ultimate Age: 120

To see how the data is represented, we can look at the the select data for a 55 year old and see the attained age and mortality rates:

vbt.select[55]

66-element OffsetArray(::Vector{Float64}, 55:120) with eltype Float64 with indices 55:120:
 0.00139
 0.00218
 0.00288
 0.00344
 0.00403
 0.00485
 0.00599
 ⋮
 0.75603
 0.79988
 0.84627
 0.89536
 0.94729
 1.0

This is very efficient and convienent for modeling, but a lot of times you want the data matched up with policy data in a DataFrame.

Getting data into a dataframe

Generate sample data

using DataFrames

sample_size = 10_000

sample_data = let
    # generate fake data
    df = DataFrame(
        "sex" => rand(["Male","Female"],sample_size),
        "smoke" => rand(["Smoker","Nonsmoker"],sample_size),
        "issue_age" => rand(25:65,sample_size),
        )
    
    # a random offset of issue age is the current attained age
    df.attained_age = df.issue_age .+ rand(1:10,sample_size)
    df
end

	sex	smoke	issue_age	attained_age
1	"Female"	"Nonsmoker"	25	32
2	"Male"	"Smoker"	58	66
3	"Male"	"Nonsmoker"	59	61
4	"Female"	"Smoker"	64	65
5	"Male"	"Smoker"	49	54
6	"Female"	"Nonsmoker"	32	37
7	"Male"	"Smoker"	41	49
8	"Male"	"Smoker"	38	47
9	"Male"	"Smoker"	30	40
10	"Male"	"Nonsmoker"	45	49
...
10000	"Male"	"Smoker"	52	60

Define the table set you want to use

There are a lot of different possible combinations of parameters that you might want to use, such as rates that vary by sex, risk class, table set (VBT/CSO/etc), smoking status, relative risk, ALB/ANB, etc.

It's easy to define the parameters applicable to your assumption set. Here, we'll use a dictionary to define the relationship:

rate_map = Dict(
    "Male" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Male Nonsmoker, ANB"),
        ),
    
    "Female" => Dict(
        "Smoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Smoker, ANB"),
        "Nonsmoker" => MortalityTables.table("2001 VBT Residual Standard Select and Ultimate - Female Nonsmoker, ANB"),
        )
    );

and then we'll define a function to look up the relevant rate. Note how the function matches the levels we defined for the assumption set dictionary above.

function rate_lookup(assumption_map,sex,smoke,issue_age,attained_age)
    # pick the relevant table
    table = assumption_map[sex][smoke]
    
    # check if the select rate exists, otherwise look to the ulitmate table
    if issue_age in eachindex(table.select)
        table.select[issue_age][attained_age]
    else
        table.ultimate[attained_age]
    end
end

rate_lookup (generic function with 1 method)

Lining up with dataframe

By mapping each row's data to the lookup function, we get a vector of rates for our data:

rates = map(eachrow(sample_data)) do row
    rate_lookup(rate_map,row.sex,row.smoke,row.issue_age,row.attained_age)
end

10000-element Vector{Float64}:
 0.00053
 0.02309
 0.00384
 0.00805
 0.00626
 0.00053
 0.00518
 ⋮
 0.00108
 0.00227
 0.00385
 0.0011
 0.00814
 0.01273

And finally, we can just add this to the dataframe:

sample_data.expectation = rates;

sample_data

	sex	smoke	issue_age	attained_age	expectation
1	"Female"	"Nonsmoker"	25	32	0.00053
2	"Male"	"Smoker"	58	66	0.02309
3	"Male"	"Nonsmoker"	59	61	0.00384
4	"Female"	"Smoker"	64	65	0.00805
5	"Male"	"Smoker"	49	54	0.00626
6	"Female"	"Nonsmoker"	32	37	0.00053
7	"Male"	"Smoker"	41	49	0.00518
8	"Male"	"Smoker"	38	47	0.00446
9	"Male"	"Smoker"	30	40	0.00245
10	"Male"	"Nonsmoker"	45	49	0.00177
...
10000	"Male"	"Smoker"	52	60	0.01273

begin
    # add a table of contents to the page
    using PlutoUI
    TableOfContents()
end

Built with Julia 1.8.0 and

DataFrames 1.2.2
MortalityTables 2.1.4
PlutoUI 0.7.9

Run this Pluto Notebook

To run this page locally, download this file and open it with Pluto.jl.

The packages in JuliaActuary are open-source and liberally licensed (MIT License) to allow wide private and commercial usage of the packages, like the base Julia language and many other packages in the ecosystem.