Calibration of the Sr/Ca paleothermometer: a frequentist approach#
Preamble#
A common exercise in paleoclimatology is the calibration of the measured proxy (in this case, coral Sr/Ca) to the environmental parameter that controls it (sea surface temperature). This notebook walks through the calibration of Sr/Ca measurements made on various corals in Dry Tortugas compared to instrumental data, using a frequentist approach (Ordinary Least Squares). Some figures are reproduced from the original study by DeLong et al. 2011.
Let’s import the necessary packages. For the regression work, we will use the statsmodel package.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
#To make tables
from great_tables import GT, md, html
Loading the data#
For this calibration exercise, we will be using the Sr/Ca data from DeLong et al. (2011). The previous chapter was dedicated to data exploration, so we will directly delve into the calibration work.
Lets’s import it into our workspace using pandas.
url = 'https://www.ncei.noaa.gov/pub/data/paleo/coral/atlantic/tortugas2011.xls'
df = pd.read_excel(url, sheet_name=1,header=[2])
df.head()
Year | Mean Sr/Ca (mmol/mol) | Number of samples | St. error of mean | Mean Sr/Ca (mmol/mol).1 | Number of samples.1 | St. error of mean.1 | Mean Sr/Ca (mmol/mol).2 | Number of samples.2 | St. error of mean.2 | DRTO SST (ºC) | St. error of mean.3 | Sand Key SST (ºC) | St. error of mean.4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2008.625000 | 8.737095 | 1 | 0.012000 | NaN | NaN | NaN | 9.025385 | 3 | 0.003027 | 30.193011 | 0.010330 | NaN | NaN |
1 | 2008.541667 | 8.770223 | 2 | 0.003228 | NaN | NaN | NaN | 9.051444 | 3 | 0.010060 | 29.551197 | 0.010570 | NaN | NaN |
2 | 2008.458333 | 8.841024 | 2 | 0.003956 | NaN | NaN | NaN | 9.111843 | 3 | 0.000554 | 28.539499 | 0.022829 | NaN | NaN |
3 | 2008.375000 | 8.940933 | 2 | 0.001614 | 8.972382 | 1.0 | 0.01 | 9.190204 | 3 | 0.021459 | 26.642473 | 0.026493 | NaN | NaN |
4 | 2008.291667 | 9.022715 | 2 | 0.003125 | 9.062088 | 1.0 | 0.01 | 9.232626 | 3 | 0.010278 | 24.905000 | 0.019899 | NaN | NaN |
In the original study, Delong et al. chose a weighted linear regression over an ordinary least square regression to take into account the uncertainty in the data. Here, let’s try both and compare the results.
Ordinary least square regression#
Siderastrea sidera#
Let’s start with the calibration for Siderastrea sidera, using one core at a time as per the original study.
Core A1#
Let’s start with the calibration to the DRTO SST data. But first, let’s remove NaNs:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol)','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,1]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.004
Slope: -0.040
Intercept Standard Error: 0.024
Slope Standard Error: 0.001
R_squared: 0.95
Let’s keep these results in a series of list that we will update as we go:
coral_id = ['A1 (S.Sidera)']
SST_p = ['DRTO']
m = [slope]
p = [intercept]
m_se = [slope_se]
p_se = [intercept_se]
r2 = [r_squared]
Let’s repeat the calculation with the Sand Key SST data:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol)','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,1]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.115
Slope: -0.043
Intercept Standard Error: 0.025
Slope Standard Error: 0.001
R_squared: 0.93
Let’s update our results list:
coral_id.append('A1 (S.Sidera)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Core F1#
Let’s proceed with the DRTO SST Data:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).1','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,4]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.083
Slope: -0.042
Intercept Standard Error: 0.025
Slope Standard Error: 0.001
R_squared: 0.95
And let’s save our results:
coral_id.append('F1 (S.Sidera)')
SST_p.append('DRTO')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Let’s do the regression agaisnt Sand Key SST data:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).1','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,4]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.221
Slope: -0.046
Intercept Standard Error: 0.027
Slope Standard Error: 0.001
R_squared: 0.93
And let’s save the results:
coral_id.append('F1 (S.Sidera)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Montastraea faveolata#
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).2','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,7]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 9.955
Slope: -0.029
Intercept Standard Error: 0.028
Slope Standard Error: 0.001
R_squared: 0.87
And let’s save the results:
coral_id.append('B3 (M. faveolata)')
SST_p.append('DRTO')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Let’s do the same with the Sand Key SST:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).2','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,7]
X = sm.add_constant(X)
ols = sm.OLS(Y,X)
results = ols.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.031
Slope: -0.031
Intercept Standard Error: 0.027
Slope Standard Error: 0.001
R_squared: 0.85
And let’s save the results:
coral_id.append('B3 (M. faveolata)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Summary#
Let’s summarize our results into a table. Although Pandas DataFrames can represent data in a tabular format and are extremely useful for computation, they are not always easy to read. Here, we will be using Great Tables to present the results in a publication-ready table.
ols_df = pd.DataFrame({'Coral ID':coral_id,
'SST':SST_p,
'Slope':m,
'Slope SE$^a$':m_se,
'Intercept':p,
'Intercept SE$^a$':p_se,
'R$^2$':r2})
#from great_tables.data import islands
gt_tbl = (GT(ols_df)
.tab_header(title='Calibration of coral Sr/Ca to SST, Ordinary Least Square')
.fmt_number(['Slope','Slope SE$^a$','Intercept','Intercept SE$^a$'], decimals=3)
.fmt_number('R$^2$',decimals=2)
.tab_source_note(source_note = "$^a$ SE: standard error")
.tab_options(column_labels_background_color="darkgrey",source_notes_background_color="OldLace")
)
gt_tbl
Calibration of coral Sr/Ca to SST, Ordinary Least Square | ||||||
---|---|---|---|---|---|---|
Coral ID | SST | Slope | Slope SE$^a$ | Intercept | Intercept SE$^a$ | R$^2$ |
A1 (S.Sidera) | DRTO | −0.040 | 0.001 | 10.004 | 0.024 | 0.95 |
A1 (S.Sidera) | SANF1 | −0.043 | 0.001 | 10.115 | 0.025 | 0.93 |
F1 (S.Sidera) | DRTO | −0.042 | 0.001 | 10.083 | 0.025 | 0.95 |
F1 (S.Sidera) | SANF1 | −0.046 | 0.001 | 10.221 | 0.027 | 0.93 |
B3 (M. faveolata) | DRTO | −0.029 | 0.001 | 9.955 | 0.028 | 0.87 |
B3 (M. faveolata) | SANF1 | −0.031 | 0.001 | 10.031 | 0.027 | 0.85 |
$^a$ SE: standard error |
The calibration equations for S. sidera are vert similar between the two cores and SST products, which is not surprising given the good agreement among these datasets. The calibration slope of M.faveolata is lower, suggesting separate calibrations would be needed for these two species if used as paleothermometers.
Weighted least square regression#
Following the original approach in Delong et al., let’s use a weighted least square regression. The method of ordinary least squares, which we used previously, assumes that there is constant variance in the errors. The method of weighted least squares can be used when the ordinary least squares assumption of constant variance in the errors is violated, which is the case here. The weights applied to the regression are inversely proportional to the error variance and reflects the information we have about each observation. Here, we will take into consideration the error in both the Sr/Ca measurements (estimated as repeated measurements on the same sample) and SST:
where \(\sigma\) represents the standard error on the Sr/Ca measurements and SST
Siderastrea sidera#
Let’s start with the calibration for Siderastrea sidera, using one core at a time as per the original study.
Core A1#
Let’s start with the calibration to the DRTO SST data. But first, let’s remove NaNs:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol)','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,1]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,3]**2+cleaned_df.iloc[:,11]**2)
wls = sm.WLS(Y,X, weights=weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.043
Slope: -0.041
Intercept Standard Error: 0.023
Slope Standard Error: 0.001
R_squared: 0.95
Let’s save the results:
coral_id = ['A1 (S.Sidera)']
SST_p = ['DRTO']
m = [slope]
p = [intercept]
m_se = [slope_se]
p_se = [intercept_se]
r2 = [r_squared]
And let’s regress against Sand Key SST:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol)','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,1]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,3]**2+cleaned_df.iloc[:,13]**2)
wls = sm.WLS(Y,X, weights=weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.124
Slope: -0.044
Intercept Standard Error: 0.028
Slope Standard Error: 0.001
R_squared: 0.92
And let’s save the results:
coral_id.append('A1 (S.Sidera)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Core F1#
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).1','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,4]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,6]**2+cleaned_df.iloc[:,11]**2)
wls = sm.WLS(Y,X,weights=weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.091
Slope: -0.042
Intercept Standard Error: 0.026
Slope Standard Error: 0.001
R_squared: 0.95
And let’s append our results!
coral_id.append('F1 (S.Sidera)')
SST_p.append('DRTO')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Let’s move on to Sand Key SST!
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).1','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,4]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,6]**2+cleaned_df.iloc[:,13]**2)
wls = sm.WLS(Y,X, weights=weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.219
Slope: -0.046
Intercept Standard Error: 0.029
Slope Standard Error: 0.001
R_squared: 0.92
Let’s append to the results!
coral_id.append('F1 (S.Sidera)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Montastraea faveolata#
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).2','DRTO SST (ºC)'])
X = cleaned_df.iloc[:,10]
Y = cleaned_df.iloc[:,7]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,9]**2+cleaned_df.iloc[:,11]**2)
wls = sm.WLS(Y,X, weights=weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 9.956
Slope: -0.029
Intercept Standard Error: 0.030
Slope Standard Error: 0.001
R_squared: 0.86
Let’s append the results to the list:
coral_id.append('B3 (M. faveolata)')
SST_p.append('DRTO')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Finally, let’s do the regression against Sand Key SST data:
cleaned_df = df.dropna(subset=['Mean Sr/Ca (mmol/mol).2','Sand Key SST (ºC)'])
X = cleaned_df.iloc[:,12]
Y = cleaned_df.iloc[:,7]
X = sm.add_constant(X)
weights = 1/(cleaned_df.iloc[:,9]**2+cleaned_df.iloc[:,13]**2)
wls = sm.WLS(Y,X, weights = weights)
results = wls.fit()
#get the parameters
intercept, slope = results.params
#get the standard error
intercept_se, slope_se = results.bse
#R_squared
r_squared = results.rsquared
print("Intercept:", f"{intercept:.3f}")
print("Slope:", f"{slope:.3f}")
print("Intercept Standard Error:", f"{intercept_se:.3f}")
print("Slope Standard Error:", f"{slope_se:.3f}")
print("R_squared:", f"{r_squared :.2f}")
Intercept: 10.047
Slope: -0.032
Intercept Standard Error: 0.030
Slope Standard Error: 0.001
R_squared: 0.84
And append the results!
coral_id.append('B3 (M. faveolata)')
SST_p.append('SANF1')
m.append(slope)
p.append(intercept)
m_se.append(slope_se)
p_se.append(intercept_se)
r2.append(r_squared)
Summary#
Let’s summarize our results into a table as we have done for the ordinary least square:
wls_df = pd.DataFrame({'Coral ID':coral_id,
'SST':SST_p,
'Slope':m,
'Slope SE$^a$':m_se,
'Intercept':p,
'Intercept SE$^a$':p_se,
'R$^2$':r2})
gt_tbl_wls = (GT(wls_df)
.tab_header(title='Calibration of coral Sr/Ca to SST, Weighted Least Square')
.fmt_number(['Slope','Slope SE$^a$','Intercept','Intercept SE$^a$'], decimals=3)
.fmt_number('R$^2$',decimals=2)
.tab_source_note(source_note = "$^a$ SE: standard error")
.tab_options(column_labels_background_color="darkgrey",source_notes_background_color="OldLace")
)
gt_tbl_wls
Calibration of coral Sr/Ca to SST, Weighted Least Square | ||||||
---|---|---|---|---|---|---|
Coral ID | SST | Slope | Slope SE$^a$ | Intercept | Intercept SE$^a$ | R$^2$ |
A1 (S.Sidera) | DRTO | −0.041 | 0.001 | 10.043 | 0.023 | 0.95 |
A1 (S.Sidera) | SANF1 | −0.044 | 0.001 | 10.124 | 0.028 | 0.92 |
F1 (S.Sidera) | DRTO | −0.042 | 0.001 | 10.091 | 0.026 | 0.95 |
F1 (S.Sidera) | SANF1 | −0.046 | 0.001 | 10.219 | 0.029 | 0.92 |
B3 (M. faveolata) | DRTO | −0.029 | 0.001 | 9.956 | 0.030 | 0.86 |
B3 (M. faveolata) | SANF1 | −0.032 | 0.001 | 10.047 | 0.030 | 0.84 |
$^a$ SE: standard error |
For easy comparsion, let’s bring back the ordinary least square results:
gt_tbl
Calibration of coral Sr/Ca to SST, Ordinary Least Square | ||||||
---|---|---|---|---|---|---|
Coral ID | SST | Slope | Slope SE$^a$ | Intercept | Intercept SE$^a$ | R$^2$ |
A1 (S.Sidera) | DRTO | −0.040 | 0.001 | 10.004 | 0.024 | 0.95 |
A1 (S.Sidera) | SANF1 | −0.043 | 0.001 | 10.115 | 0.025 | 0.93 |
F1 (S.Sidera) | DRTO | −0.042 | 0.001 | 10.083 | 0.025 | 0.95 |
F1 (S.Sidera) | SANF1 | −0.046 | 0.001 | 10.221 | 0.027 | 0.93 |
B3 (M. faveolata) | DRTO | −0.029 | 0.001 | 9.955 | 0.028 | 0.87 |
B3 (M. faveolata) | SANF1 | −0.031 | 0.001 | 10.031 | 0.027 | 0.85 |
$^a$ SE: standard error |
The coefficients are very similar to each other, which is not surprising. WLS resulted in slightly lower standard error on the intercept.
References#
DeLong, K.L., J.A. Flannery, C.R. Maupin, R.Z. Poore, and T.M. Quinn. 2011. A coral Sr/Ca calibration and replication study of two massive corals from the Gulf of Mexico. Palaeogeography, Palaeoclimatology, Palaeoecology, 307(1-4), 117-128. doi: 10.1016/j.palaeo.2011.05.005