LOTUS Regression
0.8.0
  • Technical Note on the Baseline Regression Implementation
  • Trends with the GOZCARDS Dataset
  • Seasonal trends with the GOZCARDS Dataset
  • Trends with the SAGE II / OSIRIS / OMPS-LP Dataset
  • Predictors
  • Extra Predictors
  • Running the Regression for a Single Bin
  • Changing the Linear Components
  • API Reference
LOTUS Regression
  • »
  • Seasonal trends with the GOZCARDS Dataset
  • View page source

Seasonal trends with the GOZCARDS Dataset¶

Here we calculate seasonal trends using the GOZCARDS dataset by regressing to the VMR monthly zonal means using seasonal terms in our predictors.

[2]:
import xarray as xr
import numpy as np
from LOTUS_regression.regression import regress_all_bins
from LOTUS_regression.predictors.seasonal import add_seasonal_components
from LOTUS_regression.predictors import load_data

The GOZCARDS data is in multiple NetCDF4 files. Load them all in and concatenate on the time dimension. Also only take values in the latitude range -60 to 60.

[3]:
GOZCARDS_FILES = r'/media/users/data/LOTUS/GOZCARDS/*.nc4'

data = xr.decode_cf(xr.open_mfdataset(GOZCARDS_FILES, concat_dim='time', group='Merged', engine='netcdf4'))

data = data.loc[dict(lat=slice(-60, 60))]

print(data)
<xarray.Dataset>
Dimensions:               (data_source: 6, lat: 12, lev: 25, overlap: 6, time: 384)
Coordinates:
  * lat                   (lat) float32 -55.0 -45.0 -35.0 ... 35.0 45.0 55.0
  * lev                   (lev) float32 1e+03 681.3 464.2 ... 0.2154 0.1468 0.1
  * time                  (time) datetime64[ns] 1979-01-15 ... 2012-12-15
  * data_source           (data_source) int32 1 2 3 4 5 6
  * overlap               (overlap) int32 1 2 3 4 5 6
Data variables: (12/13)
    data_source_name      (time, data_source) |S20 dask.array<chunksize=(12, 6), meta=np.ndarray>
    overlap_start_time    (time, overlap) datetime64[ns] dask.array<chunksize=(12, 6), meta=np.ndarray>
    overlap_end_time      (time, overlap) datetime64[ns] dask.array<chunksize=(12, 6), meta=np.ndarray>
    overlap_used_source   (time, overlap, data_source) int8 dask.array<chunksize=(12, 6, 6), meta=np.ndarray>
    overlap_source_total  (time, overlap, data_source, lev, lat) float64 dask.array<chunksize=(12, 6, 6, 25, 12), meta=np.ndarray>
    average               (time, lev, lat) float32 dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
    ...                    ...
    std_dev               (time, lev, lat) float32 dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
    std_error             (time, lev, lat) float32 dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
    minimum               (time, lev, lat) float32 dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
    maximum               (time, lev, lat) float32 dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
    offset                (time, data_source, lev, lat) float32 dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
    offset_std_error      (time, data_source, lev, lat) float32 dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
Attributes:
    OriginalFileVersion:           1.01
    OriginalFileProcessingCenter:  Jet Propulsion Laboratory / California Ins...
    InputOriginalFile:             All available original source data for the...
    DataQuality:                   All data screening according to the specif...
    DataSource:                    SAGE-I v5.9_rev, SAGE-II v6.2, HALOE v19, ...
    DataSubset:                    Full Dataset

Load some standard predictors and add a constant

[4]:
predictors = load_data('pred_baseline_pwlt.csv')

print(predictors.columns)
Index(['enso', 'solar', 'qboA', 'qboB', 'aod', 'linear_pre', 'linear_post',
       'constant'],
      dtype='object')

Currently our predictors have no seasonal dependence. Add in some seasonal terms with different numbers of Fourier components.

[5]:
predictors = add_seasonal_components(predictors, {'constant': 4, 'linear_pre': 4, 'linear_post': 4, 'qboA': 2, 'qboB': 2})

print(predictors[:5])
                enso     solar      qboA      qboB       aod  linear_pre  \
time
1979-01-01  0.499274  1.568564 -0.991782 -0.925437 -0.395539   -1.800000
1979-02-01  0.281624  1.622217 -1.024426 -1.028774 -0.396092   -1.791667
1979-03-01 -0.070762  1.312212 -1.147750 -0.659631 -0.397189   -1.783333
1979-04-01  0.219438  1.127112 -1.138165 -0.399319 -0.399997   -1.775000
1979-05-01  0.291988  1.002103 -1.205567  0.058949 -0.403881   -1.766667

            linear_post  constant  qboA_sin0  qboA_cos0  ...  \
time                                                     ...
1979-01-01          0.0       1.0  -0.000000  -0.991782  ...
1979-02-01          0.0       1.0  -0.520774  -0.882181  ...
1979-03-01          0.0       1.0  -0.974957  -0.605631  ...
1979-04-01          0.0       1.0  -1.137875  -0.025696  ...
1979-05-01          0.0       1.0  -1.061723   0.571085  ...

            linear_post_sin3  linear_post_cos3  constant_sin0  constant_cos0  \
time
1979-01-01               0.0               0.0       0.000000       1.000000
1979-02-01               0.0              -0.0       0.508356       0.861147
1979-03-01              -0.0              -0.0       0.849450       0.527668
1979-04-01              -0.0               0.0       0.999745       0.022576
1979-05-01               0.0              -0.0       0.880683      -0.473706

            constant_sin1  constant_cos1  constant_sin2  constant_cos2  \
time
1979-01-01       0.000000       1.000000       0.000000       1.000000
1979-02-01       0.875539       0.483147       0.999579      -0.029025
1979-03-01       0.896456      -0.443132       0.096613      -0.995322
1979-04-01       0.045141      -0.998981      -0.997707      -0.067683
1979-05-01      -0.834370      -0.551205      -0.090190       0.995925

            constant_sin3  constant_cos3
time
1979-01-01       0.000000       1.000000
1979-02-01       0.846029      -0.533137
1979-03-01      -0.794497      -0.607268
1979-04-01      -0.090190       0.995925
1979-05-01       0.919817      -0.392347

[5 rows x 40 columns]

Perform the regression and convert the coefficients to percent anomaly. We pass include_monthly_fits = True so that the seasonal fits are used to calculate monthly trends. The results at the end will include ‘linear_post_monthly’ and ‘linear_post_monthly_std’ that are the monthly trend terms and errors respectively

[6]:
results = regress_all_bins(predictors, data['average'], tolerance=0.1, include_monthly_fits=True)

# Convert to ~ percent
results /= data['average'].mean(dim='time')
results *= 100

print(results)
<xarray.Dataset>
Dimensions:                  (lat: 12, lev: 25, month: 12)
Coordinates:
  * lev                      (lev) float32 1e+03 681.3 464.2 ... 0.1468 0.1
  * lat                      (lat) float32 -55.0 -45.0 -35.0 ... 35.0 45.0 55.0
  * month                    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables: (12/90)
    enso                     (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    enso_std                 (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    solar                    (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    solar_std                (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    qboA                     (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    qboA_std                 (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    ...                       ...
    constant_cos2            (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    constant_cos2_std        (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    constant_sin3            (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    constant_sin3_std        (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    constant_cos3            (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
    constant_cos3_std        (lev, lat) float64 dask.array<chunksize=(25, 12), meta=np.ndarray>
[7]:
import LOTUS_regression.plotting.trends as trends
import matplotlib.pyplot as plt

# Plot the seasonal post trends at the level closest to 2 hPa (2.15 hPa)
trends.plot_with_confidence(results.sel(lev=2, method='nearest'), 'linear_post_monthly', x='lat', y='month')
plt.xlabel('Latitude')
plt.ylabel('Month')
[7]:
Text(0, 0.5, 'Month')
../_images/examples_GOZCARDS_Trends_Seasonal_11_1.png
Next Previous

© Copyright 2017, USask ARG and the LOTUS Group.

Built with Sphinx using a theme provided by Read the Docs.