Absolute Change in Temperature

You can follow along in a Google Colab notebook

1 Motivation

I was always a bit annoyed by the arbitrary choice of a baseline to compute the average temperature anomaly. Of course, I know that the plot itself doesn’t change, and a different baseline only shifts the vertical axis, since the zero depends on the time span used as baseline.

Still, not everyone can see through it easily. I always thought it was a disservice to people trying to learn more about global warming since it burdens the reader.

Even worse, the NOAA’s website makes it very confusing even for seasoned professionals when it mentions, literally, four different spans to refer to the same data!

In the page corresponding to the data we’re using here, the GHCN Gridded Products, we see:

Moreover, given the very nature of the dataset, NOAA recommends in the Note section:

“These gridded data sets were developed to produce the most accurate time series possible. However, this required that months and grid boxes be treated independently through time. The use of these data sets is most appropriate for analyzing the change in temperature within a particular grid box, or set of grid boxes, over a span of years.”

Tip

It makes sense, since different locations exhibit quite different weather and temperature patterns, and worse yet, they are being impacted by global warming differently over time.

The only reasonable way to compare them, and aggregate them, is to try to find their own baselines.

Tip

The baseline represents the typical temperature pattern in a given location before it was modified/disturbed by the effects of global warming.

Regardless of the time span used to compute the temperature anomalies, it should be possible to find out which (relative) temperature was typical of a given location for a certain period of time.

Once this temperature level was established, we can look for change points corresponding to a regime change in the temperature.

The (relative) temperature in the changed regime, when compared to the baseline (the first observed regime) gives us an indication of the absolute change in that region.

That’s our goal here!

Warning

DISCLAIMER: I am NOT a meteorologist or climate scientist! I am using the tools I have at my disposal, as a data scientist and developer, to try and understand better the dynamics of global warming, and, at the same time, address a pet peeve that has been bugging me for years!

For this reason, I am making all the code freely available, and I invite you to try it yourself using different parameters and assumptions if you wish.

Spoiler alert: I tried many different parameters for change point detection and interpolation algorithms, and the results almost didn’t budge.

2 Data

We’re using data from NOAA’s National Centers for Environmental Information (NCEI).

The data, Land and Ocean Temperature Anomalies grid file using GHCN Version 5.0.0, shows temperature anomalies with respect to the 1971-2000 average, in degrees Celsius, for each 5° by 5° grid box, and it can be directly accessed here.

Code
import requests
import os

def download(url=None, cached_etag=None):
    try:
        if url is None:
            base = b'https://www.ncei.noaa.gov/data/noaa-global-surface-temperature/v5/access/gridded/'
            resp = requests.get(base)
            start = resp.content.find(b'href="NOAAGlobalTemp')
            end = resp.content.find(b'>', start)
            fname = resp.content[start+6:end-1]
            url = base+fname
        if not os.path.exists(fname):
            resp = requests.head(url, allow_redirects=True)
            if resp.status_code == 200:
                headers = resp.headers
                etag = headers['ETag']
                if (cached_etag is None) or etag != cached_etag:
                    local_filename = url.split(b'/')[-1]
                    r = requests.get(url, stream=True, allow_redirects=True)
                    if r.status_code == 200:
                        with open(local_filename, 'wb') as f:
                            for chunk in r.iter_content(chunk_size=8192):
                                f.write(chunk)
        return fname.decode('utf-8')
    except Exception as e:
        pass
Code
import numpy as np
import netCDF4 as nc

def noaa_temperatures(fname, yearly=True, start=1971, 
                      span=30, stat='mean', grid_size=5):
    # Reads NetCDF file
    ds = nc.Dataset(fname)
    # Anomaly data (z)
    anom = ds['anom'][:].data
    # Value used to replace missing values
    nanval = ds['anom'].missing_value
    # Original shape is (n_years, 1, 36, 72)
    gridded = anom.squeeze()
    gridded[gridded == nanval] = np.nan
    stat = 'mean'
    funcs = {'mean': np.mean, 'max': np.max, 'min': np.min}
    # Gets the number of full years in the file
    n_years = gridded.shape[0]//12

    end = start + span - 1
    if (start != 1971) or (end != 2000):
        polygons = surface_polygons(grid_size)
        surface_perc = surface_area(polygons, perc=True)
        # Computes weighted average using surface area of grid boxes
        adjustment = np.mean(surface_stat(gridded[(start-1880)*12:(end-1880+1)*12], 
                                          surface_perc))
        # Adjusts baseline
        gridded = gridded - adjustment

    # Computes the average over each year
    griddedy = np.array([funcs[stat](gridded[i*12:(i+1)*12], axis=0) 
                         for i in range(n_years)])
    # NetCDF has latitudes from -87.5 to 87.5 (it uses the centers of the 
    # grid boxes) but we need to flip it upside down to use in Plotly
    griddedy = griddedy[:, ::-1]
    # NetCDF has longitudes from 0 to 360, but we need to rearrange it from 
    # -180 to 180 to use in Plotly, so we concatenate the last 36 columns 
    # (180 to 360 = -180 to 0) to the first 36 columns (0 to 180)
    griddedy = np.concatenate([griddedy[:, :, 36:], griddedy[:, :, :36]], axis=2)
    return griddedy if yearly else gridded

The download() function retrieves the data from NOAA, and the noaa_temperatures() returns a 3D Numpy array containing the yearly averages for each grid box (36 x 72 boxes). It is possible to adjust the baseline for the tempeature anomalies from the default 1971-2000 to any other span of years.

Important

The noaa_temperatures() function uses np.mean (instead of np.nanmean) on purpose! If data from any month in a given year is missing, the whole year is deemed missing, as the average would be biased otherwise.

fname = download()
griddedy = noaa_temperatures(fname, yearly=True)
griddedy.shape
(142, 36, 72)

The dataset covers a span of 142 years, from 1880 to 2021.

If you took a peek at the noaa_temperatures() function you may have noticed that the baseline adjustment uses the surface area of the grid boxes to compute a weighted average of the temperatures.

This is an important step, and the rationale behind the choice of projection for visualization, as we’ll see in the next section.

3 Grid and Projections

The default projection is the Mollweide projection because it is an equal-area projection, that is, it preserves the relative sizes of the grid boxes. A 5° by 5° grid box at the Equator line is approximately 23 times larger than a 5° by 5° grid box at one of the poles, and this should be taken into account while averaging anomalies across the globe.

Figure 1: Mollweide Projection

The functions below compute the percentage of the globe’s surface contained in each and every 5-by-5 degrees grid box:

Code
#!pip install area
from area import area
from functools import partial

def make_polygon(coords, size):
    left_long, bottom_lat = coords
    return np.array([[[left_long, bottom_lat],
                      [left_long, bottom_lat + size],
                      [left_long + size, bottom_lat + size],
                      [left_long + size, bottom_lat],
                      [left_long, bottom_lat]]], dtype=np.float32)

def surface_polygons(grid_size=5):
    long, lat = np.meshgrid(np.arange(-180, 180, grid_size), 
                            np.arange(90-grid_size, -90-grid_size, -grid_size))
    coords = np.dstack([long, lat])
    polygons = np.apply_along_axis(func1d=partial(make_polygon, size=grid_size), 
                                   axis=2, arr=coords)
    return polygons

def surface_area(polygons, perc=True):
    meridian_surface_area = np.array(list(map(area, 
                            map(lambda c: {'type': 'Polygon', 'coordinates': c.tolist()}, 
                            polygons[:, 0]))))/1e6
    surface_area = np.tile(meridian_surface_area.reshape(-1, 1), polygons.shape[1])
    surface_perc = (surface_area / surface_area.sum())
    return surface_perc if perc else surface_area
polygons = surface_polygons(grid_size=5)
surface_perc = surface_area(polygons, perc=True)

The same result can be achieve using a simple cosine function over the latitudes at the center of each grid box, referred by NOAA as cosine weighting:

lat = np.linspace(-87.5, 87.5, 36)
cosines = np.cos(lat/180*np.pi)
cosines = cosines/(72*cosines.sum())
Text(0.5, 1.0, 'Surface area of a 5-by-5 degrees grid box')

4 Temperature Anomalies

Now we can compute the average global anomaly weighted by the surface area of the grid boxes, not counting missing data.

Code
def surface_stat(data, surface_perc, stat='mean'):
    # Average anomaly weighted by surface area, not counting missing data
    data_mean = np.array([np.nansum(ev * surface_perc) / 
                          (~np.isnan(ev) * surface_perc).sum() 
                          for ev in data])
    if stat == 'mean':
        return data_mean
    elif stat == 'std':
        data_var = [np.nansum((ev - ev_mean) ** 2 * surface_perc) / 
                              (~np.isnan(ev) * surface_perc).sum() 
                    for ev, ev_mean in zip(data, data_mean)]
        return np.sqrt(data_var)

These are the average global anomalies with respect to the 1971-2000 average:

Code
avg_temp = surface_stat(griddedy, surface_perc)
avg_temp
array([-0.49950845, -0.437659  , -0.46099328, -0.55762834, -0.66325144,
       -0.55927735, -0.5296677 , -0.61390813, -0.52221157, -0.41688996,
       -0.64482477, -0.61349774, -0.65583242, -0.69150886, -0.65121656,
       -0.62195114, -0.44614686, -0.42208863, -0.61661892, -0.52107259,
       -0.39073078, -0.46883892, -0.62728571, -0.72594027, -0.76846173,
       -0.64734485, -0.55134399, -0.72619441, -0.76130973, -0.72785092,
       -0.69972305, -0.75875122, -0.66646819, -0.64330766, -0.47295105,
       -0.40953497, -0.64000969, -0.71576234, -0.65454381, -0.57661185,
       -0.55576413, -0.46010078, -0.54557568, -0.56007732, -0.5460153 ,
       -0.49508586, -0.37550539, -0.48066778, -0.49368916, -0.63079376,
       -0.40662835, -0.3620246 , -0.43417767, -0.55512125, -0.40984992,
       -0.45844197, -0.42133779, -0.31418626, -0.31750286, -0.2998834 ,
       -0.17384875, -0.0441839 , -0.19072324, -0.20550301, -0.03651162,
       -0.14753947, -0.29445738, -0.3460578 , -0.35940049, -0.3616485 ,
       -0.45255908, -0.30141405, -0.2614704 , -0.16359548, -0.4036462 ,
       -0.43811772, -0.50331773, -0.23410685, -0.1732868 , -0.21081136,
       -0.25137586, -0.20037309, -0.19060011, -0.1735941 , -0.44836399,
       -0.37708389, -0.31270695, -0.31256246, -0.34134812, -0.20479953,
       -0.2498834 , -0.38640911, -0.27439695, -0.11041791, -0.37162896,
       -0.3035485 , -0.39112852, -0.10629708, -0.19548852, -0.08478207,
       -0.04385898,  0.01332938, -0.11793198,  0.04759796, -0.14719811,
       -0.15664772, -0.07000922,  0.07627796,  0.08557043, -0.01269871,
        0.14363659,  0.081649  , -0.07584615, -0.02908075,  0.04030949,
        0.16373958,  0.01630613,  0.20947407,  0.35191738,  0.14264881,
        0.12752382,  0.26871631,  0.31656927,  0.33398008,  0.27793616,
        0.3629874 ,  0.32961243,  0.31112634,  0.23295334,  0.33479465,
        0.41345646,  0.26483886,  0.32429365,  0.36625862,  0.43520278,
        0.62679989,  0.69281455,  0.59971238,  0.51962469,  0.64176325,
        0.67709032,  0.53795811])
Note

Note: if you compute the average from 1971 to 2000 (avg_temp[91:120].mean()) it won’t result in exactly zero as one would expect, but -0.0518 degrees instead. Unfortunately, I couldn’t figure out why this is the case in NOAA’s data.

avg_temp[91:120].mean()
-0.05189353308685528

Apart from this minor difference, the choice of window for taking the average against which the global anomaly is computed is arbitrary. As mentioned in the motivation, this is my pet peeve because it makes particularly difficult to make a point if the baseline is constantly changing.

As you can see in the plots below, changing from 1971-2000 to 1981-2010, for example, only changes the y axis: the whole plot shifts down about 0.18 degrees, but any difference between two points in the plot remain exactly the same.

Important

So, it doesn’t matter which baseline is used, the difference in global temperature between 2021 and 1880 is the same: 1.02 degrees.

Tip

Nonetheless, it would be very helpful to have an absolute measure of the changes in temperature over time. And that’s what we’re trying to accomplish here.

However, the averages do not tell the whole story. It is much more interesting to look at the temperature anomalies recorded on each and every grid box.

Code
year_ini = 2021
year_end = 2021
fig = plot_projection(griddedy[year_ini-1880:year_end-1880+1], 
                      seq_offset=year_ini, 
                      add_line=True, animate=True, 
                      colorscale='Jet', colorbar_title='oC', 
                      zmin=-4, zmax=4)

The plot above isn’t interactive, but if we look at individual grid boxes, we’ll see that many of them have many missing data points. That’s totally expected, as data wasn’t collected in remote places in the late 19th or early 20th centuries.

5 Time Series

Let’s take at look at a grid box close to Antarctica. Its measurements started in the late 1950s, and there are some years missing.

Code
gy, gx = 33, 69
fig = plot_evolution(griddedy, gy, gx)

6 Missing Data Imputation

The first step we need to take is to impute some of the missing data. We’ll use some simple linear interpolation, but only in those cases where the gap in the data isn’t too long.

This is the Numpy array corresponding to the plot above:

Code
signal = griddedy[:, gy, gx]
signal
array([        nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan, -0.325     , -1.5       , -1.1808333 ,
       -1.1816666 ,  0.0975    , -1.9533334 , -0.9291666 , -0.53083336,
       -0.49250007,  0.5108333 ,  0.39000002, -1.1016666 ,         nan,
        1.0050001 ,  0.6675    ,  1.6266665 ,  0.31416667, -0.06500002,
               nan, -1.4924998 , -1.03      ,         nan, -1.1783333 ,
        0.5308334 , -0.175     , -1.6116667 ,  0.11999998,  1.2833333 ,
       -0.84583336, -0.355     ,  0.8141667 ,         nan,         nan,
       -0.44000006,  0.65250003, -0.07750002, -0.6775    , -0.26916668,
       -0.5958332 ,  0.9916668 , -0.1091666 , -0.14083336,  0.36166668,
        0.23416667, -0.40666673, -0.25416663,  0.8541667 , -1.5525001 ,
       -0.04999999, -0.23916666,  1.6108333 ,  0.8275001 ,  0.41333333,
        0.08916668,  2.3008335 ,  1.5166665 ,  0.705     ,  1.3366666 ,
       -1.1916666 , -1.1216667 ,  0.39083338,  0.4141667 ,  1.5300001 ,
        0.40250006,  0.22500002], dtype=float32)

Apart from the first 77 years, where there were no measurements whatsoever, there are only five missing measurements: three of those are single years, and the other two are consecutive years.

We obviously cannot do anything about the first 77 years, but we can use linear interpolation to fill in the other five missing points. The function below, bounds(), returns the first and last indices of a Numpy array that contain the longest sequence of data points that can be used as base for the imputation.

Code
def bounds(data, max_contiguous_na=5):
    # Returns the start and end indices of the longest
    # valid sequence, that is, containing up to a given
    # number of contiguous missing points
    
    # Gets the indices of the non-null points
    idxs = np.arange(len(data))[~np.isnan(data)]
    max_size = 0
    max_ini = 0
    size = 1
    ini = 0
    # Calculates the size of the gaps of missing data
    gaps = np.diff(idxs) - 1
    for i, v in enumerate(gaps):
        # If there's no gap, the size of valid data is increased by 1
        if v == 0:
            size += 1
        # If that's the long sequence of values containing tolerable 
        # gaps then updates max info
        if size > max_size:
            max_size = size
            max_ini = ini
        # If the gaps is larger than tolerable, resets size and init
        if v > max_contiguous_na:
            ini = i + 1
            size = 1
        # If the gap is tolerable, adds one to the size
        # (that means the next idx)
        elif v > 0:
            size += 1
    # Computes the end of the longest sequence
    max_end = max_ini + max_size
    # Returns the start and end indices of the longest sequence
    ini, end = idxs[max_ini], idxs[max_end-1] + 1
    return ini, end

Let’s create three dummy arrays to illustrate how the function above works.

dummy1 = np.ones(15)
dummy1[2:5] = np.nan
dummy1, bounds(dummy1)
(array([ 1.,  1., nan, nan, nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.]),
 (0, 15))

That was an easy case: there’s only a gap of three consecutive data points (thus below the default threshold of five), so we can use the whole sequence (start=0, end=15) and impute those three values.

dummy2 = np.ones(15)
dummy2[2:8] = np.nan
dummy2, bounds(dummy2)
(array([ 1.,  1., nan, nan, nan, nan, nan, nan,  1.,  1.,  1.,  1.,  1.,
         1.,  1.]),
 (8, 15))

The second sequence has six consecutive missing points, so they won’t be imputed, and we’ll only consider the sequence from its 9th data point (index=8).

dummy3 = np.ones(15)
dummy3[2:5] = np.nan
dummy3[8:14] = np.nan
dummy3, bounds(dummy3)
(array([ 1.,  1., nan, nan, nan,  1.,  1.,  1., nan, nan, nan, nan, nan,
        nan,  1.]),
 (0, 8))

The third example is a bit more peculiar, since it has six consecutive missing points towards its end. Therefore, the longest sequence we can use actually ends at the 9th data point (index=8). This is a rather rare occurrence, since most grid boxes have missing data points in earlier years only.

In our real example, we can use the sequence from 1957 up to 2021:

ini, end = bounds(signal, max_contiguous_na=5)
ini, end, signal[ini:end]
(77,
 142,
 array([-0.325     , -1.5       , -1.1808333 , -1.1816666 ,  0.0975    ,
        -1.9533334 , -0.9291666 , -0.53083336, -0.49250007,  0.5108333 ,
         0.39000002, -1.1016666 ,         nan,  1.0050001 ,  0.6675    ,
         1.6266665 ,  0.31416667, -0.06500002,         nan, -1.4924998 ,
        -1.03      ,         nan, -1.1783333 ,  0.5308334 , -0.175     ,
        -1.6116667 ,  0.11999998,  1.2833333 , -0.84583336, -0.355     ,
         0.8141667 ,         nan,         nan, -0.44000006,  0.65250003,
        -0.07750002, -0.6775    , -0.26916668, -0.5958332 ,  0.9916668 ,
        -0.1091666 , -0.14083336,  0.36166668,  0.23416667, -0.40666673,
        -0.25416663,  0.8541667 , -1.5525001 , -0.04999999, -0.23916666,
         1.6108333 ,  0.8275001 ,  0.41333333,  0.08916668,  2.3008335 ,
         1.5166665 ,  0.705     ,  1.3366666 , -1.1916666 , -1.1216667 ,
         0.39083338,  0.4141667 ,  1.5300001 ,  0.40250006,  0.22500002],
       dtype=float32))

The function below, fill_values(), uses Scikit-Learn’s interp1d() to impute the missing values in the longest sequence identified by the bounds() function:

Code
from scipy import interpolate

def fill_values(data, 
                max_contiguous_na=5, 
                periods_for_extrapolation=5, 
                method='slinear'):
    time = np.arange(len(data))
    signal = data.copy()
    # If there are no valid data points, there's nothing to fill
    if np.all(np.isnan(signal)):
        return signal
    elif np.any(np.isnan(signal)):
        # Gets minimum and maximum values
        minv, maxv = np.nanmin(signal), np.nanmax(signal)
        # Uses only the values in the longest sequence containing no more than
        # a given number of contiguous missing points (5)
        ini, end = bounds(data, max_contiguous_na)
        signal = signal[ini:end]
        
        # Creates a filler function to interpolate the missing values over time
        try:
            filler_func = interpolate.interp1d(time[ini:end][~np.isnan(signal)], 
                                               signal[~np.isnan(signal)], 
                                               kind=method)
        except ValueError:
            return signal
        
        # Passes the time as argument to the filler function
        filled = filler_func(time[ini:end])
        # Caps interpolated values at actually observed min and max values
        filled = np.minimum(filled, maxv)
        filled = np.maximum(filled, minv)
        if ini > 0:
            # Rebuilds the full sized array, if the longest sequence 
            # doesn't start at zero
            filled = np.concatenate([[np.nan] * ini, filled])
        if end < len(data):
            # If the longest sequence ends before the last time period, 
            # extrapolates the last missing values using the average of
            # a given number of periods
            avg = filled[-periods_for_extrapolation:].mean()
            filled = np.concatenate([filled, 
                                     np.nan_to_num(data[end:].copy(), nan=avg)])
        return filled
    else:
        return signal

We can apply the fill_values() function to our real example:

fill_values(signal[ini:end])
array([-0.32499999, -1.5       , -1.18083334, -1.18166661,  0.0975    ,
       -1.95333338, -0.92916662, -0.53083336, -0.49250007,  0.51083332,
        0.39000002, -1.10166657, -0.04833323,  1.00500011,  0.66750002,
        1.62666655,  0.31416667, -0.06500002, -0.77874992, -1.49249983,
       -1.02999997, -1.10416663, -1.17833328,  0.53083342, -0.175     ,
       -1.61166668,  0.11999998,  1.2833333 , -0.84583336, -0.35499999,
        0.81416672,  0.39611113, -0.02194446, -0.44000006,  0.65250003,
       -0.07750002, -0.67750001, -0.26916668, -0.59583318,  0.99166679,
       -0.1091666 , -0.14083336,  0.36166668,  0.23416667, -0.40666673,
       -0.25416663,  0.85416669, -1.55250013, -0.04999999, -0.23916666,
        1.61083329,  0.8275001 ,  0.41333333,  0.08916668,  2.30083346,
        1.51666653,  0.70499998,  1.33666658, -1.1916666 , -1.12166667,
        0.39083338,  0.41416669,  1.53000009,  0.40250006,  0.22500002])

Moreover, we can create yet another function, fill_series() to apply the fill_values() function to the time series of each and every grid box.

Code
def fill_series(gridded, 
                max_contiguous_na=3, 
                periods_for_extrapolation=5,
                method='slinear'):
    # Applies the fill_values function over every grid box
    filled = np.apply_along_axis(func1d=lambda s: fill_values(s, 
                                                              max_contiguous_na, 
                                                              periods_for_extrapolation, 
                                                              method),
                                 arr=gridded.reshape(gridded.shape[0], -1).T,
                                 axis=1)
    filled = filled.T.reshape(gridded.shape)
    return filled
filled = fill_series(griddedy)
filled.shape
(142, 36, 72)

The variable filled has the same shape as the gridded data, and it contains imputed values for some of the missing points.

Let’s visualize our real example, the grid box close to Antarctica:

Code
gy, gx = 33, 69
fig = plot_evolution(griddedy, gy, gx, filled=filled)

Excellent, there are no missing points in this time series anymore.

Now, we can use it to perform change point detection (CPD), our next step.

7 Change Point Detection (CPD)

We’re using the ruptures package to detect change points in the series of temperature anomalies corresponding to each grid box.

The change_points() function detects change points using linearly penalized segmentation (Pelt), having l1 as the model, and a penalty of three.

Code
#!pip install ruptures
import ruptures as rpt
from ruptures.exceptions import BadSegmentationParameters

def change_points(data, model='l1', pen=3):
    signal = data.copy()
    # If there are no valid data points, returns one regime
    # that starts at 0, ends at the full length, and is nan
    if np.all(np.isnan(signal)):
        return np.array([0, len(signal)]), np.array([np.nan])

    # Gets the index of the first non-missing data
    ini = np.argmax(~np.isnan(signal))

    # Uses the Pelt method of the ruptures package
    # to find regime changes/breaks
    algo = rpt.Pelt(model=model).fit(signal[ini:])
    try:
        breaks = algo.predict(pen=pen)
    except BadSegmentationParameters:
        # If no changes/breaks were found, assumes
        # the regime goes up to the end of the series
        breaks = [len(signal)-ini]

    # Gets the indices of the breaks and 
    breaks = np.array([0, *breaks]) + ini
    # Computes the average of each regime
    averages = np.array([signal[breaks[i-1]:breaks[i]].mean() 
                         for i in range(1, len(breaks))])

    # First regime must be at least 10 years long
    i = np.argmax(np.diff(breaks) >= 10)
    return breaks[i:], averages[i:]

For our real example, we feed the filled time series for the change_points() function, and that’s the result:

change_points(filled[:, gy, gx])
(array([ 77, 127, 142]), array([-0.23727499,  0.6299445 ], dtype=float32))

It identified one change point,happening in 2007 (index 127). From the start of the series, in 1957 (index 77), up until 2007, the average temperature anomaly was -0.236 degrees. From 2008 on, the new regime has an average temperature anomaly of 0.616 degrees.

We can use this information to construct a time series representing these regimes, and that’s what the function below, change_evolution(), does:

Code
def change_evolution(changes, offset=True):
    breaks, averages = changes
    # Creates a full series using the break points and using the average
    # of each regime
    avg_series = np.concatenate([[averages[i]] * (breaks[i + 1] - breaks[i]) 
                                 for i in range(len(averages))])
    # If offset, shifts the starting point to zero
    if offset:
        avg_series -= avg_series[0]
    # If the first break is other than zero, concatenates empty data at 
    # the start to make the array full size
    if breaks[0] > 0:
        avg_series = np.concatenate([[np.nan] * breaks[0], avg_series])
    return avg_series
change_evolution(change_points(filled[:, gy, gx]), offset=False)
array([        nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan,         nan,         nan,         nan,
               nan,         nan, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499, -0.23727499, -0.23727499, -0.23727499,
       -0.23727499, -0.23727499,  0.6299445 ,  0.6299445 ,  0.6299445 ,
        0.6299445 ,  0.6299445 ,  0.6299445 ,  0.6299445 ,  0.6299445 ,
        0.6299445 ,  0.6299445 ,  0.6299445 ,  0.6299445 ,  0.6299445 ,
        0.6299445 ,  0.6299445 ])

We can also set the offset argument to True to consider the first regime as the baseline, thus computing temperatures with respect to its average instead:

change_evolution(change_points(filled[:, gy, gx]), offset=True)
array([       nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan,        nan,        nan,        nan,
              nan,        nan, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.86721951, 0.86721951, 0.86721951,
       0.86721951, 0.86721951, 0.86721951, 0.86721951, 0.86721951,
       0.86721951, 0.86721951, 0.86721951, 0.86721951, 0.86721951,
       0.86721951, 0.86721951])

In the example above, the second regime exhibits a temperature 0.85 degrees higher than the first, baseline, regime. We’ll use this approach to compute the absolute change in temperature with respect to the first observed regime (in the example above, the regime between 1957 and 2007).

The quality of the estimated absolute change is limited by the availability of data in earlier years, though. In our example, it could have been the case that, before 1957, there was yet another regime with a possibly lower temperature.

If that were the case, it means that the actual change may have be even higher. Conversely, if that unobserved regime had a higher temperature, the actual change would be lower than our estimation. Given that the latter case is rarely observed in practice, it’s rather safe to assume the overall estimation of the absolute change in temperature is actually a lower bound.

Next, we can create two other functions, series_changes() and series_evolution(), to apply the change_points() and change_evolution() functions, respectively, to the time series of each and every grid box.

Code
def series_changes(filled, model='l1', pen=3):
    # Applies the change_points function over every filled series
    changes = np.apply_along_axis(func1d=lambda s: change_points(s, model, pen),
                                  arr=filled.reshape(filled.shape[0], -1).T,
                                  axis=1)
    changes = changes.reshape(*filled.shape[1:], 2)
    return changes
Code
def series_evolution(gridded, changes, offset=True, keep_missing=True):
    # Applies the change_evolution function over every grid box
    missing = np.isnan(gridded.reshape(gridded.shape[0], -1).T)
    evolution = np.apply_along_axis(func1d=lambda s: change_evolution(s, offset),
                                    arr=changes.reshape(-1, 2),
                                    axis=1)
    if keep_missing:
        evolution[missing] = np.nan
    evolution = evolution.T.reshape(gridded.shape)
    return evolution
model = 'l1'
pen = 3
changes = series_changes(filled, model=model, pen=pen)
regimes = series_evolution(griddedy, changes, keep_missing=True)
regimes.shape
(142, 36, 72)

As expected, the regimes variable has the same shape as both filled and griddedy, representing one time series for each grid box.

Let’s visualize the regimes for our grid box close to Antarctica:

Code
gy, gx = 33, 69
fig = plot_evolution(griddedy, gy, gx, 
                     filled=filled, changes=changes, regime=regimes)

8 Absolute Changes in Temperature

Since we have time series of the estimated regimes, we can use them to compute a global average of the absolute change in temperature weighted by the surface area of each grid box.

avg_temp_regimes = surface_stat(regimes, surface_perc)
avg_temp_regimes
array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.00863404, 0.00869392, 0.0095288 , 0.00935041, 0.00905566,
       0.02286673, 0.02684635, 0.0284382 , 0.02915019, 0.03008991,
       0.03348921, 0.03161973, 0.03078799, 0.03283298, 0.03250496,
       0.03158139, 0.03060249, 0.02906777, 0.02820625, 0.02559448,
       0.02694036, 0.0305054 , 0.03330137, 0.03123806, 0.03423738,
       0.04043984, 0.0452272 , 0.0449563 , 0.04931425, 0.05106176,
       0.06527432, 0.06554802, 0.06675241, 0.07094976, 0.07198512,
       0.09538307, 0.101054  , 0.10183483, 0.10148122, 0.10393017,
       0.13250359, 0.13293051, 0.13482252, 0.13729236, 0.14118601,
       0.16000189, 0.16481473, 0.16978522, 0.18086563, 0.19344881,
       0.22351007, 0.22546735, 0.22649203, 0.24322801, 0.22974729,
       0.23892152, 0.24470963, 0.21670845, 0.2101918 , 0.20486365,
       0.20398545, 0.20527823, 0.20239556, 0.20231018, 0.20308164,
       0.20456688, 0.20370002, 0.20492318, 0.20478634, 0.20686934,
       0.20981335, 0.21204866, 0.2143356 , 0.21572274, 0.21344339,
       0.22046692, 0.22191109, 0.22190933, 0.22331023, 0.22627528,
       0.24055303, 0.24281383, 0.25411658, 0.25712333, 0.26394951,
       0.28275697, 0.30668972, 0.3361069 , 0.34702378, 0.34697437,
       0.3974189 , 0.40681937, 0.41492703, 0.41930001, 0.42136172,
       0.43838807, 0.45004648, 0.45576061, 0.4615243 , 0.46840125,
       0.48751201, 0.49167552, 0.49976581, 0.50548547, 0.51417622,
       0.60791707, 0.63340009, 0.65052288, 0.67185301, 0.68097021,
       0.75407967, 0.7549419 , 0.77265617, 0.78323735, 0.78799404,
       0.79857896, 0.80244634, 0.80621087, 0.80784861, 0.80927644,
       0.82207938, 0.82834137, 0.82949779, 0.83665699, 0.84323745,
       0.86435729, 0.86854999, 0.86850925, 0.87068298, 0.87258656,
       0.87397385, 0.87079911])

It was only in the 11th year of the series (1890), that there was a regime change somewhere in the globe. Let’s visualize the data in a plot:

The change in temperature in the 90s is particularly striking!

Finally, let’s take a look at these absolute changes in temperature across the globe in 2021:

Code
year_ini = 2021
year_end = 2021
fig = plot_projection(regimes[year_ini-1880:year_end-1880+1], 
                      seq_offset=year_ini, 
                      add_line=True, animate=True, 
                      colorscale='Jet', colorbar_title='oC', 
                      zmin=-4, zmax=4)

Many regions in the northern hemisphere show regimes that are 3 degrees Celsius or more above their corresponding (observed) first regime since 1880.

To explore the evolution over time, simply select Absolute in the interactive plot in the main page of Heating Planet.

Tip

While exploring the interactive plot, pay attention to how some areas heat up earlier than others, and how warmer regions spread over time.

Did you notice that the landmasses are getting warmer than the oceans? Let’s dig deeper into it!

9 Over Land and Sea

First, we need to define a mask for our grid cells, a Numpy array indicating those grid cells which have more than 50% of its area as landmass (one indicates land, zero indicates sea). I’ve prepared this array and saved it in the land.npy file.

Code
import io

response = requests.get('https://heatingplanet.org/assets/data/land.npy')
response.raise_for_status()
land_mask = np.load(io.BytesIO(response.content))

9.1 Anomalies

Now we can make deep copies of the anomalies (griddedy) and use the land_mask variable to mask either landmasses or oceans. This allows us to compute the average global anomalies over land and sea, separately, using the surface_stat function once again. Let’s see what are the current figures for 2021:

Code
from copy import deepcopy

griddedy_land = deepcopy(griddedy)
griddedy_sea = deepcopy(griddedy)
griddedy_land[:, land_mask==0] = np.nan
griddedy_sea[:, land_mask==1] = np.nan
avg_temp_land = surface_stat(griddedy_land, surface_perc)
avg_temp_sea = surface_stat(griddedy_sea, surface_perc)
avg_temp_land[-1], avg_temp_sea[-1]
(0.9967666810102901, 0.3257011776518976)

The average global temperature anomaly over land is roughly 1° Celsius while the seas warmed roughly 0.3° Celsius. We can see the evolution of these figures over time in the plots below:

Clearly, there’s a much steeper trend in the rise of temperature over landmasses.

What about the absolute change in temperature?

9.2 Absolute Changes

We can apply the same mask to the absolute changes we’ve previously computed (regimes) and, once again, compute the global averages:

Code
regimes_land = deepcopy(regimes)
regimes_sea = deepcopy(regimes)
regimes_land[:, land_mask==0] = np.nan
regimes_sea[:, land_mask==1] = np.nan
avg_temp_regimes_land = surface_stat(regimes_land, surface_perc)
avg_temp_regimes_sea = surface_stat(regimes_sea, surface_perc)
avg_temp_regimes_land[-1], avg_temp_regimes_sea[-1]
(1.4010793734655795, 0.625477496613651)

In 2021, the last data point in our series, landmasses are 1.4° Celsius warmer than they were in 1880 while the oceans are roughly 0.6° Celsius warmer. We can see the evolution of these figures over time in the plots below:

It’s particularly striking the rise in temperature over land in the 1980s, 1990s, and early 2000s. In the 1990s, especially, we see both land and sea warming really fast.

10 Saving to File

The interactive plot in the main page uses two .dat files (zvalues and evolution) containing the raw data from NOAA, the estimated regimes we discussed above, and the imputed data points.

The code below generates these two files.

# filled series -> array
n_years = griddedy.shape[0]
filled_locs, filled_years = np.where(np.isnan(griddedy.reshape(n_years, -1).T) 
                                     & ~np.isnan(filled.reshape(n_years, -1).T))
filled_vals = filled.reshape(n_years, -1).T[filled_locs, filled_years]
filled_arrays = np.array([filled_locs, filled_years, filled_vals])

# changes -> array
idxs_breaks = np.cumsum(list(map(len, changes[:, :, 0].reshape(-1))))
vals_breaks = np.array([v for s in changes[:, :, 0].reshape(-1, ) 
                        for v in s])
vals_avgs = np.array([v for s in changes[:, :, 1].reshape(-1, ) 
                      for v in np.concatenate([[np.nan], s])])
chg_arrays = np.zeros(shape=(3, vals_breaks.shape[0])) * np.nan
chg_arrays[0, :idxs_breaks.shape[0]] = idxs_breaks
chg_arrays[1] = vals_breaks
chg_arrays[2] = vals_avgs

save2dat([griddedy, regimes], 'zvalues', swap=False, nan=-999)
save2dat([filled_arrays, chg_arrays], 'evolution', swap=False)