Reconstructing the Esper Reconstruction

As discussed in previous article, Esper et al (2024) link, the newest hockey stick diagram, asserted that 2023 was the “warmest summer” in millennia by an updated version of “Mike’s Nature trick” – by comparing 2023 instrumental temperature to purported confidence intervals of temperature estimates from “ancient tree rings” for the past two millennia. In today’s article, I will report on detective work on Esper’s calculations, showing that the article is not merely a trick, but a joke.

Background

Esper et al 2024 provided only a sketchy and incomplete description of methodology and negligible supporting data. Like Mann et al 1998.

Indeed, the only supporting data thus far released by Esper is a single table of his final reconstruction (Recon.), target instrumental temperature (Obs.) and the purported lower and upper confidence intervals (link)

Esper’s description of methodology was cursory to say the least, consisting of the following paragraph. Footnote 23 linked to Buentgen et al (Nature Communications 2021 lin k), a prior article by two of the Esper et al 2024 coauthors (Esper, Buentgen). This article, unlike Esper et al 2024, had an associated data archive (link), which, while far from complete, provided a foothold for analysing Esper’s 2024 calculations.

Buentgen et al (2021) reported on what they called a “double blind” experiment in which they sent out measurement data from 9 prominent tree ring sites to 15 different climate science groups, asking each of them to respond with a “reconstruction” of Northern Hemisphere (extratropic) temperature for the past 2000 years. (Many of the nine tree ring sites are familiar to Climate Audit readers between 2005 and 2012: they include both bristlecone and Briffa 2008 sites, as I’ll discuss later.) The 15 reconstructions varied dramatically (as will also be discussed below). Buentgen’s takeaway conclusion was that the ensemble “demonstrated the influence of subjectivity in the reconstruction process”:

Differing in their mean, variance, amplitude, sensitivity, and persistence, the ensemble members demonstrate the influence of subjectivity in the reconstruction process.

This was, to say the least, an understatement. What the experiment actually demonstrated was that different climate groups could get dramatically different reconstructions from identical data. Thus, over and above the many well known defects and problems in trying to use tree ring data to reconstruct past temperatures, there was yet one more source of uncertainty that had not been adequately canvassed: the inconsistency between climate groups presented with the same data.

The Buentgen (2021) Rmean reconstruction

Buentgen et al’s NOAA archive contained a sheet with all 15 reconstructions plus their mean (Rmean) and median (Rmedian). Comparison of the Buentgen Rmean reconstruction to the Esper et al 2024 reconstruction (“Recon.”) was an obvious first step. The Rmean reconstruction had an exact correlation (r=1) to the Esper reconstruction, but was both dilated (higher standard deviation) and displaced upwards, as shown in diagram below. The Buentgen 2021 reconstructions used a 1961-1990 reference period (matching the reference period of common instrumental temperature datasets); the Esper 2024 reconstruction used a 1851-1900 reference period. But how (and why) was the standard deviation change?

In further detail, here are the steps required to go from the Rmean reconstruction to the Esper version:

re-centering the Rmean reconstruction to a 1901-2010 reference period and re-scaling its standard deviation in 1901-2010 period to match the corresponding 1901-2010 standard deviation of (“scaled against”) the Berkeley JJA 30-90N instrumental series
re-centering the resulting reconstruction to a 1851-1900 reference period (i.e. subtracting the 1851-1900 mean of the step 1 reconstruction series 1 to center at zero over 1851-1900.

If the Esper et al (2024) target instrumental series is re-centered on 1961-1990 reference period, it is an almost exact match to the Tmean instrumental series of Buentgen et al 2021 (link). I presume that the change to Berkeley JJA 30-90N is to extend the record to 2023. I don’t have any objection or issue with this, other than that I was unable to locate the Berkeley JJA 30-90N in its native form. In the diagram below, I re-centered the archived Esper et al 2024 “Obs.” series to 1961-1990 reference period, yielding the reconciliation shown below.

To get from the underlying Berkeley JJA 30-90N instrumental series to the version archived in Esper et al (2024), Esper et al did the following:

re-center the Berkeley JJA 30-90N to 1901-2010 (as part of their re-scaling of the Buentgen Rmean reconstruction)
re-center the step 1 instrumental series to 1851-1900 by the 1851-1900 mean of the step 1 reconstruction. The instrumental series is NOT centered on 1851-1900.

The effect of these manipulations can be seen by plotting (left) the Buentgen et al 2021 Rmean and Tmean data (both reference 1961-1990) to Esper et al 2024 Extended Figure 3 (right). Using the original Buentgen version of the data, the instrumental data (red) increases almost twice as quickly as the proxy reconstruction, while, in the Esper version, the two series rise at similar rates. Had Esper re-centered the instrumental temperature to 1851-1900 (to correspond with the re-centering of the reconstruction), this would have reduced the visual coherence in the recent period of interest.

There is no statistical requirement for any of the above Esper et al 2024 re-scaling and re-centering operations. The only purpose appears to have been to force a reduction in the divergence between the Rmean reconstruction and instrumental temperature.

The “Confidence” Intervals

Esper et al 2024 stated that their confidence intervals were estimated by scaling the 15 Buentgen et al (2021) “ensemble members” against the instrumental temperature target in 1901-2010 period and the “variance among ensemble members was used to approximate 95% confidence intervals”:

The most obvious interpretation of this cryptic description is to calculate year-by=year variance and compare to the reported confidence interval. However, the Esper upper confidence interval is highly correlated to (0.96) to the maximum of the Buentgen ensemble and the Esper lower confidence interval is highly correlated (0.97) to the minimum of the Buentgen ensemble (in each case, the values closely match after deducting the offset to 1851-1900 reference period.) The emulation is shown below. It appears that there is some additional re-scaling that I haven’t figured out yet.

Also note the asymmetry between the upper and lower “confidence intervals”.

I remind readers that there “confidence intervals” are nothing more than the range of answers obtained by 15 different climate groups from the same measurement datasets. There is no statistical basis for assuming that this range of inconsistent answers corresponds to an actual confidence interval.

The Buentgen “Ensemble”: Regression vs Averaging

In most walks of science, one expects that groups from one scientific institution will be arrive at more or less the same results from the same data. But look at the enormous inconsistency among five Buentgen (2021) reconstructions in the period since 1980. Reconstructions R8 and R10 increase by 1.2 and 1.6 deg C respectively, while reconstructions R13, R12 and R2 are unchanged or decline. How is such inconsistency possible?

The next figure shows the R8 and R10 reconstructions against target instrumental temperature (reference 1961-1990). Both R8 and R10 show an astounding – almost perfect – reconstruction of the target instrumental temperature. The reconstructions are too perfect. Indeed, the astounding accuracy of these two reconstructions raises an obvious question: why didn’t Buentgen et al (2021) – and Esper et al (2024) – rely on these near-perfect reconstructions, rather than blending them into a mean with reconstructions (R2, R13, R14) that didn’t replicate modern instrumental temperature?

Additional details on the individual reconstructions is available at the Buentgen et al (2021) Supplementary Information (link). It turns out that R10 “include[d] instrumental temperature measurements in the reconstruction”. Esper et al conceded that “since R10 integrates instrumental temperature measurements during the calibration period, [R10] is not entirely independent of the target.” This seriously under-states the problem: R10 was so seriously dependent on the target as to disqualify its use in calculation of confidence intervals.

….

As soon as I became aware of the near-perfection of the R8 reconstruction, my initial surmise was that it involved some sort of inverse regression of temperature onto the nine tree ring chronologies. This was confirmed in the Supplementary Information. R8 carried out two inverse regressions: a “high-frequency” and a “low-frequency” regression, followed by combining the two. This is clearly a recipe for overfitting – the construction of a model that fits “too well” in the calibration period, but of negligible merit outside the calibration period.

In contrast to the inverse regression of R8, R13 stated that it used a sort of average of the available tree ring chronologies:

Similarly, the R12 reconstruction was based on averaging chronologies, rather than inverse regression or splicing.

Conclusion

At first reading, Esper et al (2024) carried out multiple re-scaling and re-centering operations on Buentgen et al (2021) series that were already reconstructions centered on reference period 1961-1990. The only purpose for these operations appears to have been to “improve” the coherence of the Buentgen Rmean reconstruction with temperature. A sort of air brushing of their hockey stick diagram.

And, at the end of the day, Esper et al (2024) is best described as climate pornography. In the premier modern journal for climate pornography: Nature. And while climate partisans (and scientists) pretend to read the articles and the fine print, in reality, they, like Penthouse readers in the 1980s, are only interested in the centerfold. In the present case, an air brushed hockey stick diagram. A diagram that raises the same question that Penthouse readers asked back in the day: real or fake?

Appendix

Some notes and some figures not used in this note.

Buentgen et al (2021), the reference in Esper (2024) footnote 23, has an associated data archive at NOAA (link) as follows:

the results of the 15 reconstructions (link) plus the overall mean (Rmean) and overall median. The reconstructions were “anomalized”, but the reference period is not stated in the archive and does not appear to be consistent across the reconstructions. Five of the reconstructions can be determined to be centered on 1961-1990; I haven’t figured out the reference period for the others.
an archive (link) for target instrumental data: year, 15 columns for target instrumental data for each group, overall mean (Tmean) and overall median. These anomalies are all centered on a 1961-1990 reference period.
an archive of measurement data for eight of the nine tree ring measurement data sets – inexplicably leaving out one data set (Yamal). Most of the datasets are familiar to Climate Audit readers from 2005-2012: two are from Graybill bristlecone sites (inclusive of updates by Salzer et al); three are based on (or identical to) measurement data from Briffa et al 2008 (which was under discussion when Climategate emails released). I presume that they used the Yamal dataset from Briffa (2013) and neglected to include it in the archive.

Below is a comparison of corresponding diagrams for Buentgen (2021) and Esper (2024). Buentgen (2021) appears to have a reference period of 1961-1990 and Esper (2024) a reference period of 1851-1900.

Jan and Ulf’s Nature Trick: The Hottest Summer in 2000 Years

A couple of weeks ago, the New York Times and other institutional media proclaimed that “tree rings” (sometimes “ancient tree rings”) had “shown” that 2023 was the warmest summer in 2000 years (link link) Almost 25 years to the day since they had similarly proclaimed that 1998 was the warmest year in 1000 years. The recent proclamation, as in 1998, was based on an article in Nature (Esper et al, 2024); the proclamation in 1998 was similarly based on an article in Nature (Mann et al, 1998), together with its companion article in Geophysical Research Letter (link).

A “Confidence Interval” Trick

In addition to the similarity of the conclusions to the two article, the structure of the money diagram in both cases is almost identical, as shown in the comparison below.

Figure 1. Left – On the left is the money diagram from Mann et al (1999) showing confidence intervals from the Mann reconstruction in light grey, the reconstruction in black. The reconstruction itself only went to 1980. The instrumental temperature (as an “anomaly”) for 1998 was shown as a point. A horizontal line was then drawn across the diagram to show that the 1998 instrumental temperature exceeded the confidence interval for all prior dates of the reconstruction – hence the “warmest year in 1000 years”. Right – the Esper et al 2024 reconstruction only went to 2010. The point estimate for 2023 exceeds the confidence intervals for all prior years.

Each diagram shows purported confidence intervals around the reconstruction in light/medium gray. Each diagram also denotes a the recent instrumental estimate featured in the headline as a highlighted point. In each case, the highlighted instrumental point is more than 10 years after the end of the reconstruction. In each case, the conclusion is obtained by observing that the instrumental point is higher than any of the upper confidence limits of the corresponding reconstruction. In other words, both Esper et al (2024) and Mann et al (1998-99) used the same technique.

In Climategate controversy, there was voluminous discussion of the term “Mike’s Nature trick”, but, to my recollection, there was little to discussion of the term in relation to the use of confidence intervals to arrive at a “warmest year” conclusion.

In his notorious Climategate email, Phil Jones described “Mike’s Nature trick” as the splicing of proxy and instrumental temperatures that he had carried out for the World Meteorological Organization. Mann had indeed spliced proxy and instrumental temperatures for the calculation of the smoothed reconstruction illustrated in the article, but had cut the smoothed version back to 1980 (the end of the proxy data.) Mann vehemently denied that the splicing of proxy and instrumental data was “Mike’s Nature trick” and instead claimed that Mike’s Nature trick was nothing more than showing an estimate (reconstruction) and actual (observed temperature) on the same figure, clearly marked. But this is such a benign and commonplace statistical practice that it cannot be reasonably described – even by the statistically and mathematically challenged Jones – as a “trick” in the mathematical sense. A mathematical “trick” implies ingenuity or novelty, but showing estimate vs actual is trivial and commonplace.

However, the technique shown above – comparing a point estimate in a year without proxy data to the confidence envelope of the proxy reconstruction – was a novelty introduced by Mann and, in that sense, an actual candidate for the term “trick” (sensu mathematics) as opposed to the commonplace and trivial comparison of estimate and observed on the same figure.

The validity of this comparison (for Mann et al 1998-99 and Esper et al 2024) depends not just on the validity of the reconstruction, but the validity of the confidence intervals.

Some Comments on Mann et al Confidence Intervals

Mann et al went to considerable lengths to obfuscate analysis of their confidence intervals and, indeed, key information related to that calculation was discovered within the last year – see link. Mann’s reconstruction was done in 11 steps, with each step having a different reconstruction. The purported confidence interval for each step was twice the standard error in calibration period for that step. However, to this day, Mann never archived the results of each step, and, to its shame, Nature refused a request that Mann be required to archive the results for each step. Mann had also refused to provide us with the residuals that had been used to calculate the standard errors. (However, he did provide residuals to Tim Osborn in a Climategate email and this data became available as a result of Climategate.) Further complicating analysis, while Wahl and Ammann and ourselves could replicate one another’s results, neither of us could replicate Mann’s results. In the last couple of years, more than 20 years after the original study, Swedish engineer Hampus Soderqvist, by reverse engineering, finally accomplished an exact replication of Mann’s reconstruction – it turned out that Mann’s list of proxies used in early steps was inaccurate: several listed proxies were not actually used, and several unlisted proxies were used.

One of the long standing (and, unfortunately, under-appreciated) issues with the Mann et al 1998-99 reconstruction is overfitting in the calibration. Due to a sort of inverse regression of instrumental temperature on large networks of proxies (a variation of Partial Least Squares). If there is overfitting in the calibration period, the standard error in the calibration period will be artificially small. To get a more realistic estimate of confidence intervals, one needs to use the standard error in the verification period. There is a direct relationship between the calibration period standard error and calibration period r2: a high r2 statistic necessarily means a small calibration standard error and vice versa.

If one is concerned about calibration period overfitting, the verification period r2 statistic is a simple and effective test: if there is a high calibration r2 and dismal verification r2, then there has almost certainly been statistical overfitting and a flawed model. In addition to withholding the results of the individual steps, Mann et al 1998-99 concealed extremely low verification r2 statistics.

In the recent Mann-Steyn libel trial, the DC judge blocked both McKitrick and myself from presenting any technical evidence on Mann’s failed verification statistics or Mike’s Nature trick, even refusing to allow the defense to show a table of failed verification statistics that had been published in Geophysical Research Letters.

These controversies have been known for some time, but they connect directly to the confidence interval trick. Had the “money diagram” been calculated with standard errors from verification period residuals, the confidence interval envelope would almost certainly have been at least double the size of the confidence envelope shown in MBH98 and MBH99. (Late last year, Soderqvist made exact stepwise MBH results available for the first time. Re-doing the diagram with verification period standard errors would be a useful exercise.)

Some Comments on Esper et al Confidence Intervals

Needless to say, there are many issues with both the Esper et al 2024 and with its associated “confidence” intervals. But before getting into the details, I recommend that interested readers re-examine a truly excellent 2012 article by Esper, Buentgen and coauthors also in Nature (link), an article on which I favorably commented many years ago (link). Esper et al (2012) observed that tree ring chronologies (black below) dismally failed to record huge millennial-scale change in the high-latitude Northern Hemisphere summer insolation (and temperature), showing the following diagram in their SI.

Figure S1. Temperature trends recorded over the past 4000-7000 years in high latitude proxy and CGCM data. Multi-millennial TRW records from Sweden 1 , Finland2 , and Russia 3 (all in grey) together with reconstructions of the glacier equilibrium line in Norway4,5 (blue), northern treeline in Russia 3,6 (green), and JJA temperatures in the 60-70°N European/Siberian sector from orbitally forced ECHO-G 7,8 (red) and ECHAM5/MPIOM9 (orange) CGCM runs10. All records, except for the treeline data (in km) were normalized relative to the AD 1500-2000 period. Resolution of model and TRW data were reduced (to ~ 30 years) to match the glacier data.

Esper et al (2012) observed that summer insolation at 50N decreased by more than 35 wm-2 (!!!) since the early Holocene and 6 wm-2 since Roman times – vastly more than the forcing of ~1.5 wm-2 associated with increased CO2 since pre-industrial times. They identified changes in the northern treeline at Yamal (green) and changes in the equilibrium altitude of a small ice cap in Norway (blue) as two proxies that were responsive to large-scale millennial-scale changes in insolation and high-latitude summer temperatures.

Conclusion

Whether or not the comparison of an observed temperature point to the confidence envelope of a reconstruction to draw conclusions about “warmest year in 1000 years” was precisely what either Mann or Jones defined as “Mike’s Nature trick”, it can be fairly described as a trick (sensu mathematics), whereas plotting an estimate and observed on same figure is so commonplace and trivial that it cannot reasonably be described as a trick (sensu mathematics.)

In that spirit, I think that it is fair to describe “Mike’s Nature trick” (and the similar trick employed by Esper et al 2024) as a confidence trick. In the mathematical sense, of course.

As a caveat, readers should note that the question of whether tree rings (ancient or otherwise) show that 2023 (1998) was the warmest summer (year) in 1000 or 2000 years is a different question than whether 2023 was the warmest summer in 1000 years. My elevator take is

that 20th and 21st century warming are both very real, but that the 19th century was probably the coldest century since the Last Glacial Maximum and that the warming since the 19th century has been highly beneficial for our societies – a view that was postulated in the 1930s by Guy Callendar, one of the canonical climate heroes;
per Esper et al 2012, given the failure of tree ring chronologies to reflect major millennial-scale changes in summer insolation and temperature, what possible reliance can be attached to pseudo-confidence intervals attached to 2000-year tree ring chronologies in Esper et al 2024 (or any other tree ring chronologies)
in addition, we know that there is global-scale “greening” of the planet over the past 30-40 years that has been convincingly attributed to enhanced growth due to fertilization by higher CO2 levels. So, in addition to all other issues related to tree ring chronologies, it is necessary to disaggregate the contribution of CO2 fertilization from the contribution of increased warming – an effort not made by Esper et al 2024 (or its references.)

In a follow-up article, I will examine details of the Esper et al 2024 reconstruction, which, among other interesting features, connect back to Graybill bristlecone sites and the Briffa sites under discussion in the period leading up to the Climategate emails.

Twisted Tree Heartrot Hill Revisited

Recently, while re-examining PAGES2K, the current paleoclimate darling, I noticed that PAGES2K(2019) reverted to a variation of the Twisted Tree Heartrot Hill (Yukon) [TTHH] tree ring chronology that we had already criticized in 2003 as being obsolete when used by Mann et al 1998. PAGES2K was supposed to be an improvement on Mann et al 1998 data, but, in many ways, it’s even worse. So it’s It was very strange to observe the 2019 re-cycling of a TTHH version, previously criticized in 2003 as being already obsolete in 1998.

MM2003

In McIntyre and McKitrick (2003), we had observed that MBH98 had used an obsolete version of the Twisted Tree Heartrot Hill (Yukon) [TTHH] tree ring chronology, for which measurement data ended in 1975, as compared the chronology ending in 1992 available at the NOAA archive, as shown in excerpt below. (I checked with NOAA and verified that the updated chronology was available at NOAA prior to submission of MBH98.)

The TTHH chronology declined precipitously in the late 1970s and 1980s, reaching its lowest value in the entire record in 1991. However, the MBH version ended in 1980 (extrapolating the 1975 value for the final 5 years.) Below is comparison from our 2003 article.

Mann et al 2003 Response

In our 2003 replication, we used the NOAA version of the TTHH chronology rather than the obsolete MBH version. In their contemporary (November 2003) response to our first article, Mann et al objected vehemently claiming that we had wrongly substituted a “shorter version” of the TTHH chronology for the “longer” version used in MBH98. (The so-called “shorter” version used a larger dataset but began when 5 cores were available.)

Because the MBH98 proxy reconstruction ended in 1980, the difference between the two versions wasn’t an important issue in the main narrative of MBH hockey stick controversies, but it does become relevant for reconstructions ending in 2000 (such as PAGES2K).

PAGES2K (2019) Version

PAGES2K, including its 2019 version, reverted to the TTHH data version already obsolete in Mann et al 1998 – the data ending in 1975, not 1992. The figure below compares the TTHH version in PAGES2K (2019) – on -right – to the TTHH versions discussed above. The PAGES2K version uses the same measurement data (ending in 1975) as the MBH98 version. The PAGES2K chronology is very similar to the MBH98 version in the period of overlap (1550-1975) but is not exactly the same. Notice that the PAGES2K version (like MBH98) avoids the post-1975 data with the severe “decline”.

The precise provenance of PAGES2K chronology versions is not reported and figuring them out by reverse engineering is a herculean effort (e.g. Soderqvist’s work on PAGES2K Asian chronologies.) Amusingly, the PAGES2K version begins a little later (1550) than the version that Mann had criticized for being “shorter”.

Measurement Data

Although Jacoby and D’Arrigo’s contemporary NOAA archive included the TTHH chronology up to 1992 (with its decline), they never archived the measurement data corresponding to the 1992 chronology. Many years later (2014), as Jacoby was on his death bed, they filed a large archive of measurement data with NOAA, including data for a Yukon regional chronology (cana326), a subset of which was the 1975 TTHH measurement data. This archive included the TTHH update, but did not include a concordance identifying which identifiers belonged to the TTHH update and which identifiers belonged to other Yukon locations.

By coincidence, Tom Melvin, Briffa’s associate at the University of East Anglia, had used a TTHH measurement data version (74 cores) as a benchmark for testing “signal free” methodology in 2010 and this measurement data proved to be available in an archive identified by Hampus Soderqvist in his investigations of signal-free methodology. It contained the 1975 measurement data (34 cores), 25 cores from 1987-1992 and 15 cores from 1999 sampling.

As an exercised, I calculated a chronology using conventional methodology from the subset of cores collected in 1992 or earlier – shown below. It closely matches the chronology archived in NOAA in the mid-1990s.

TTHH 1999 Update

The TTHH measurement data was updated a second time in 1999. Its results were published in a 2004 article by D’Arrigo et al entitled “Thresholds for warming-induced growth decline at elevational tree line in the Yukon Territory, Canada” (link) in which they broached the problem of the “decline” in high latitude tree ring widths in late 20th century, despite observed warming in the Arctic. Climate Audit readers will recall D’Arrigo’s “explanation” of the “divergence problem” to the NAS panel in 2006, when she explained that you “have to pick cherries if you want to make cherry pie”.

The type case in D’Arrigo et al 2004 was TTHH as shown below.

D’Arrigo et al never archived the measurement data or chronology for their 2004 article on the divergence problem. As another exercise, I calculated a chronology for the Melvin data including cores from the 1999 update using a conventional methodology (dplR ModNegExp): it closely replicated the D’Arrigo diagram from 1575 on, but not in the earliest portion (when there are fewer than 5 cores anyway.)

Melvin Signal-Free Version

As a final exercise, I looked at Melvin’s “signal-free” methodology on the resulting tree ring chronology. (On previous occasions, we’ve discussed the perverse results of this methodology on multiple PAGES2K Asian tree ring chronologies, as articulated by Soderqvist.) In this case, the signal-free artifact at the end of the series increases closing values by about 20% – much less dramatic than the corresponding artifact for paki033 but an artifact nonetheless.

Conclusion

In any event, PAGES2K did not use the Melvin signal-free version – which went to 1999 and incorporated a decline after 1975. As noted above, PAGES2K reverted to the measurement data version already obsolete in Mann et al 1998, but in a novel version. At present time, the provenance and exact methodology of the PAGES2K calculation is unknown. As readers are aware, it took a heroic effort by Soderqvist to deduce that the methodology and provenance used in multiple PAGES2K Asian tree ring chronologies (a particular LDEO variation of “signal-free” methodology). My guess is that the TTHH chronology used in PAGES2K was also calculated by LDEO (Cook et al) using some variation of Melvin iteration: the closing uptrend in the difference is characteristic.

D’arrigo et al 2006: NWNA Alaska

Today’s article is about one of the D’Arrigo et al 2006 datasets.

D’Arrigo et al 2006, then under submission, had been cited in drafts of the IPCC Fourth Assessment Report. I had been accepted as an IPCC reviewer and, as an IPCC reviewer, I asked IPCC to make the data available to me or to ask the lead author to make the data available. That prompted a vehement refusal that I documented in March 2007 (link). Readers unfamiliar with the severity of data obfuscation by climate science community should read that exchange. (Some further light on the campaign emerged later in the Climategate emails.

D’Arrigo et al 2006 calculated more than a dozen new regional chronologies, but refused to archive or provide the digital chronologies until April 2012, more than six years later (by which time the paleo field purported to have “moved on”. Also, in April 2012, more than six years later, D’Arrigo et al provided information (somewhat sketchy) on which sites had been used in the various reconstructions, but measurement data for many of the sites was unavailable, including (and especially) the sites that had been sampled by D’Arrigo, Jacoby and their associates. Much of this data was archived in April 2014, a few months before Jacoby’s death. But even this archive was incomplete.

By then, D’Arrigo et al 2006 was well in the rear view mirror of the paleo community and there has been little, if any, commentary on the relationship of the belated and long delayed 2014 data archive to the 2006 article.

In several recent posts, I’ve discussed components of D’Arrigo’s Northwest Alaska (NWNA) regional chronology, which, prior to 2012, had only been available in the muddy form shown below.

The NWNA series goes from AD1297 to AD2000 and closes on a high note – as shown more clearly in the top panel below, which re-plots the post-1800 period of the NWNA chronology (RCS version; STD version is very similar.) Also shown in this figure (bottom panel) is the post-1800 period of the chronology (ModNegExp ) for the Dalton Highway (ak104) site, the only component of the NWNA composite with values in the 1992-2000 period (shown to right of red dashed line.)

Look at the difference right of the dashed line at AD1990. In the underlying Dalton Highway data, the series ends at almost exactly the long-term average, whereas the same data incorporated into D’Arrigo’s NWNA regional composite closes at record or near-record highs for the post-1800 period.

If the 1992-2000 Dalton Highway data doesn’t show record highs for the site chronology, then it is implausible to claim that it shows record highs for the regional chronology. So what’s going on here?

My guess is that the regional chronology has mixed sites with different average widths and that their rudimentary statistical technique didn’t accommodate those differences. If so, this would be the same sort of error that we saw previously with Marcott et al 2013, in which there was a huge 20th jump without any increase in component series (simply by a low value series ending earlier.) Needless to say, these errors always go in a hockey stick direction.

Sheenjek, Alaska: A Jacoby-MBH Series

MBH98 used three Jacoby tree ring chronologies from Alaska: Four Twelve (ak031) – discussed here, Arrigetch (ak032) and Sheenjek (ak033). Sheenjek will be discussed in this article.

In our compilation of MBH98 in 2003, we observed that the Sheenjek chronology archived at NOAA Paleo was not the same as the “grey” version used in MBH98. While we used the MBH98 version to benchmark our emulation of the MBH98 algorithm, we used the version archived at NOAA in our sensitivity analysis, both in our 2003 article and in our early 2004 submission to Nature. In his reply to our submission, Mann vehemently protested that the “introduc[tion of] an extended version of another Northern Treeline series not available prior to AD 1500 at the time of MBH98” “introduce[d] problems into the important Northern Treeline dataset used by MBH98”:

Finally, MM04 introduce problems into the important Northern Treeline dataset used by MBH98. Aside from incorrectly substituting shorter versions of the “Kuujuag” and TTHH Northern Treeline series for those used by MBH98, and introducing an extended version of another Northern Treeline series not available prior to AD 1500 at the time of MBH98, they censored from the analysis the only Northern Treeline series in the MBH98 network available over the AD 1400-1500 interval, on the technicality that it begins only in AD 1404 (MBH98 accommodated this detail by setting the values for AD 1400-1404 equal)

The other “Northern Treeline series” referred to here was Sheenjek chronology ak033.crn. I checked Mann’s assertion alleging that the data was “not available prior to AD1500 at the time of MBH98”. This was contradicted by NOAA, who confirmed that the chronology that we had used had been available since the early 1990s.

In the figure below, I’ve compared three Sheenjek chronology versions:

the MBH98 version from 1580-1979 (plus 1980 infill);
the ModNegExp chronology (dplR) calculated from measurement data (ak033.rwl), which, in this case, has been available since the 1990s. It covers period 1296-1979.
the archived chronology at NOAA (ak033.crn). Also covering the period 1296-1979.

The issues relating to Sheenjek are different than observed at Four Twelve.

The MBH98 and the chronology (rwl) freshly calculated from measurement data using ModNegExp option (emulating contemporary Jacoby technique) are very, very similar for their period of overlap (1580-1979). Neither show elevated 20th century values or a closing uptick. If anything, a modest decline in late 20th century.
however, the MBH98 version excludes all values prior to AD1580. There is no good reason for this exclusion. There are 28 cores in the ak033.rwl in 1579, far above usual minimums. In the 15th century, there are more cores for Sheenjek than for the Gaspe series which was used by MBH98 in its AD1400 network, even when it only had one core. (And even no cores for the first five years.)
the Sheenjek chronology archived at NOAA (ak033.crn) was clearly derived from the ak033.rwl dataset, as the series in middle and bottom panels are highly correlated. However, from its appearance, it looks like the archived Sheenjek chronology was calculated with very flexible splines (rather than “stiff” ModNegExp) and that this has attenuated the “low frequency” variability observed in the middle panel using ModNegExp option.
We used the ak031.crn version in our sensitivity study. If the same exercise was repeated using the middle panel version, it would yield relatively high early 15th century results.

It is not presently known who chopped off Sheenjek values prior to AD1580 in the MBH98 version. Or why.

All cores in the Sheenjek dataset were included in D’Arrigo et al 2006 NWNA Composite.