Glider Tools: profile data processing¶
Glider tools is a Python 3.8+ package designed to process data from the first level of processing to a science ready dataset.
The package is designed to easily import data to a standard column format:
numpy.ndarray
, pandas.DataFrame
or xarray.DataArray
(we recommend
the latter which has full support for metadata).
Cleaning and smoothing functions are flexible and can be applied as required by the user.
We provide examples and demonstrate best practices as developed by the SOCCO Group.
For the original publication of this package see: https://doi.org/10.3389/fmars.2019.00738.
For recommendations or bug reports, please visit https://github.com/GliderToolsCommunity/GliderTools/issues/new
Installation¶
Notes on how to install
Conda¶
The easiest way to install the packge is with conda: conda install -c conda-forge glidertools
.
PyPI¶
You can also install with pip: pip install glidertools
.
GitHub¶
For the most up to date version of GliderTools, you can install directly from the online repository hosted on GitLab.
- Clone glidertools to your local machine:
git clone https://github.com/GliderToolsCommunity/GliderTools
- Change to the parent directory of GliderTools
- Install glidertools with
pip install -e ./GliderTools
. This will allow changes you make locally, to be reflected when you import the package in Python
Cheat Sheet¶
cheat sheet image
Loading data¶
To start using Glider Tools you first need to import the package to the interactive workspace.
Import GliderTools
¶
# pylab for more MATLAB like environment and inline displays plots below cells
%pylab inline
# if gsw Warning shows, manually install gsw if possible - will still work without
import glidertools as gt
from cmocean import cm as cmo # we use this for colormaps
Populating the interactive namespace from numpy and matplotlib
Working with Seaglider base station files¶
GliderTools supports loading Seaglider files, including scicon
data (different sampling frequencies).
There is a function that makes it easier to find variable names that you’d like to load: gt.load.seaglider_show_variables
This function is demonstrated in the cell below.
The function accepts a list of file names and can also receive a string with a wildcard placeholder (*
) and basic regular expressions are also supported. In the example below we use a simple asterisk placeholder for all the files.
Note that the function chooses only one file from the passed list or glob string - this file name will be shown. The returned table shows the variable name, dimensions, units and brief comment if it is available.
filenames = '/Users/luke/Work/Data/sg542/p5420*.nc'
gt.load.seaglider_show_variables(filenames)
information is based on file: /Users/luke/Work/Data/sg542/p5420177.nc
<table will be displayed here>
Working with VOTO Seaexplorer files or xarray-datasets¶
Glidertools supports loading Seaexplorer files. This is implemented and tested with VOTO https://observations.voiceoftheocean.org
_ datasets in mind currently, but we are happy about feedback/pullrequests how it works for other SeaExplorer datasets. VOTO data can either be downloaded from the website using a browser or, more comfortable, from an ERDAP server https://erddap.observations.voiceoftheocean.org/erddap/index.html
_ . See the demo notebook <https://github.com/voto-ocean-knowledge/download_glider_data>
_ to get started with downloads over the API.
After download of a .nc file or xarray-Dataset, it can be read into Glidertools by calling gt.load.voto_seaexplorer_nc
or gt.load.voto_seaexplorer_dataset
respectively. Resulting datasets can be merged by calling gt.load.voto_concat_datasets
. The import of the data into GliderTools is hereby finished, remaining steps on this wiki-page are optional.
Load variables¶
From the variable listing, one can choose multiple variables to load. Note that one only needs the variable name to load the data. Below, we’ve created a list of variables that we’ll be using for this demo.
The gt.load.seaglider_basestation_netCDFs
function is used to load a list of variables. It requires the filename string or list (as described above) and keys. It may be that these variables are not sampled at the same frequency. In this case, the loading function will load the sampling frequency dimensions separately. The function will try to find a time variable for each sampling frequency/dimension.
Coordinates and automatic time fetching¶
All associated coordinate variables will also be loaded with the data if coordinates are documented. These may included latitude, longitude, depth and time (naming may vary). If time cannot be found for a dimension, a time variable from a different dimension with the same number of observations is used instead. This insures that data can be merged based on the time of sampling.
Merging data based on time¶
If the return_merged
is set to True, the function will merge the dimensions if the dimension has an associated time variable.
The function returns a dictionary of xarray.Datasets
- a Python package that deals with coordinate indexed multi-dimensional arrays. We recommend that you read the documentation (http://xarray.pydata.org/en/stable/) as this package is used throughout GliderTools. This allows the original metadata to be copied with the data. The dictionary keys are the names of the dimensions. If return_merged
is set to True an additional entry under the key merged
will be included.
The structure of a dimension output is shown below. Note that the merged data will use the largest dimension as the primary dataset and the other data will be merged onto that time index. Data is linearly interpolated to the nearest time measurement of the primary index, but only by one measurement to ensure transparancy.
names = [
'ctd_depth',
'ctd_time',
'ctd_pressure',
'salinity',
'temperature',
'eng_wlbb2flvmt_Chlsig',
'eng_wlbb2flvmt_wl470sig',
'eng_wlbb2flvmt_wl700sig',
'aanderaa4330_dissolved_oxygen',
'eng_qsp_PARuV',
]
ds_dict = gt.load.seaglider_basestation_netCDFs(
filenames, names,
return_merged=True,
keep_global_attrs=False
)
DIMENSION: sg_data_point
{
ctd_pressure, eng_wlbb2flvmt_wl470sig, eng_wlbb2flvmt_wl700sig, temperature,
ctd_time, ctd_depth, latitude, aanderaa4330_dissolved_oxygen, salinity,
eng_wlbb2flvmt_Chlsig, longitude
}
100%|██████████| 336/336 [00:04<00:00, 73.66it/s]
DIMENSION: qsp2150_data_point
{eng_qsp_PARuV, time}
100%|██████████| 336/336 [00:01<00:00, 181.67it/s]
Merging dimensions on time indicies: sg_data_point, qsp2150_data_point,
The returned data contains the dimensions of the requested variables a merged
object is also returned if return_merged=True
print(ds_dict.keys())
dict_keys(['sg_data_point', 'qsp2150_data_point', 'merged'])
Metadata handling¶
If the keyword arguement keep_global_attrs=True
, the attributes from the original files (for all that are the same) are passed on to the output Datasets from the original netCDF attributes. The variable attributes (units, comments, axis…) are passed on by default, but can also be set to False if not wanted. GliderTools functions will automatically pass on these attributes to function outputs if a xarray.DataArray
with attributes is given.
All functions applied to data will also be recorded under the variable attribute processing
.
The merged dataset contains all the data interpolated to the nearest observation of the longest dimension the metadata is also shown for the example below
ds_dict['merged']
xarray.Dataset>
Dimensions: (merged: 382151)
Coordinates:
ctd_depth (merged) float64 -0.08821 0.018 ... -0.1422
latitude (merged) float64 -42.7 -42.7 ... -43.0 -43.0
longitude (merged) float64 8.744 8.744 ... 8.5 8.5
ctd_time_dt64 (merged) datetime64[ns] 2015-12-08T07:36:16 ...
Dimensions without coordinates: merged
Data variables:
ctd_pressure (merged) float64 -0.08815 0.01889 ... -0.1432
eng_wlbb2flvmt_wl470sig (merged) float64 375.0 367.0 ... 98.0 91.0
eng_wlbb2flvmt_wl700sig (merged) float64 2.647e+03 ... 137.0
temperature (merged) float64 11.55 11.54 ... 11.06 10.97
ctd_time (merged) float64 1.45e+09 ... 1.455e+09
aanderaa4330_dissolved_oxygen (merged) float64 nan nan nan ... 269.1 269.1
salinity (merged) float64 nan nan nan ... 34.11 34.11
eng_wlbb2flvmt_Chlsig (merged) float64 145.0 126.0 ... 215.0 215.0
dives (merged) float64 1.0 1.0 1.0 ... 344.5 344.5
eng_qsp_PARuV (merged) float64 0.551 0.203 ... 0.021 0.023
time (merged) float64 1.45e+09 ... 1.455e+09
time_dt64 (merged) datetime64[ns] 2015-12-08T07:36:16 ...
Attributes:
date_created: 2019-07-11 14:08:40
number_of_dives: 344.0
files: ['p5420001.nc', 'p5420002.nc', 'p5420004.nc', '...
time_coverage_start: 2015-12-08 07:36:16
time_coverage_end: 2016-02-08 04:39:04
geospatial_vertical_min: -0.6323553853732649
geospatial_vertical_max: 1011.1000623417478
geospatial_lat_min: -43.085757609206
geospatial_lat_max: -42.70088638031523
geospatial_lon_min: 8.29983279020758
geospatial_lon_max: 8.7753734452125
processing: [2019-07-11 14:08:40] imported data with Glider...
Renaming for ease of access¶
When renaming, just be sure that there are no variables with names that you are trying to replace. In the example below we remove time
in case it exists in the files.
# Here we drop the time variables imported for the PAR variable
# we don't need these anymore. You might have to change this
# depening on the dataset
merged = ds_dict['merged']
if 'time' in merged:
merged = merged.drop(["time", "time_dt64"])
# To make it easier and clearer to work with, we rename the
# original variables to something that makes more sense. This
# is done with the xarray.Dataset.rename({}) function.
# We only use the merged dataset as this contains all the
# imported dimensions.
# NOTE: The renaming has to be specific to the dataset otherwise an error will occur
dat = merged.rename({
'salinity': 'salt_raw',
'temperature': 'temp_raw',
'ctd_pressure': 'pressure',
'ctd_depth': 'depth',
'ctd_time_dt64': 'time',
'ctd_time': 'time_raw',
'eng_wlbb2flvmt_wl700sig': 'bb700_raw',
'eng_wlbb2flvmt_wl470sig': 'bb470_raw',
'eng_wlbb2flvmt_Chlsig': 'flr_raw',
'eng_qsp_PARuV': 'par_raw',
'aanderaa4330_dissolved_oxygen': 'oxy_raw',
})
print(dat)
# variable assignment for conveniant access
depth = dat.depth
dives = dat.dives
lats = dat.latitude
lons = dat.longitude
time = dat.time
pres = dat.pressure
temp = dat.temp_raw
salt = dat.salt_raw
par = dat.par_raw
bb700 = dat.bb700_raw
bb470 = dat.bb470_raw
fluor = dat.flr_raw
# name coordinates for quicker plotting
x = dat.dives
y = dat.depth
Quality Control¶
Note that this summary carries on from the Loading data page.
The cleaning
module contains several tools that help to remove erroneous data - profiles or points.
These filters can be applied globally (IQR and standard devation limits), vertically (running average filters) or horizontally (horizontal filters on gridded data only).
There are also two approaches one can use to clean data: 1) filtering out bad points/dives; 2) smoothing data.
Original Data¶
Below we use salinity to demonstrate the different functions available to users.
dives = dat.dives
depth = dat.depth
salt = dat.salinity_raw
x = np.array(dives) # ensures these are arrays
y = np.array(depth)
gt.plot(dives, depth, salt, cmap=cmo.haline, robust=True)
plt.xlim(50, 150)
plt.title('Original Data')
plt.show()
png
Global filtering: outlier limits (IQR & STD)¶
These functions find upper and lower limits for data outliers using interquartile range and standard deviations of the entire dataset. Multipliers can be set to make the filters more or less strict
salt_iqr = gt.cleaning.outlier_bounds_iqr(salt, multiplier=1.5)
salt_std = gt.cleaning.outlier_bounds_std(salt, multiplier=1.5)
# Plotting
gt.plot(x, y, salt_iqr, cmap=cmo.haline, robust=True)
plt.title('Outlier Bounds IQR Method')
plt.xlim(50,150)
gt.plot(x, y, salt_std, cmap=cmo.haline, robust=True)
plt.title('Outlier Bounds Stdev Method')
plt.xlim(50,150)
plt.show()
png
png
Horizontal filtering: differential outliers¶
Erroneous measurements often occur sequentially - i.e. in the vertical. The vertical filtering approaches would thus miss any outliers as rolling windows are often used. It is thus useful to have an approach that compares dives in the horizontal. The horizontal_diff_outliers
first grids data and then calculates where gradients (rolling mean - measurement) are outliers (same as outlier_bounds_std
). If a certain fraction of measurements in a dive exceed the threshold, then that dive is deemed a bad dive. The example below shows three dives that have anomalous measurements. These fall well within the global bounds of acceptable data, but horizontally that are masked out.
salt_horz = gt.cleaning.horizontal_diff_outliers(
x, y, salt,
multiplier=3,
depth_threshold=400,
mask_frac=0.1
)
gt.plot(x, y, salt, cmap=cmo.haline)
plt.title('Original dataset')
plt.xlim(150,250)
plt.show()
gt.plot(x, y, salt_horz, cmap=cmo.haline)
plt.title('Horizontal Differential Outliers removed')
plt.xlim(150,250)
plt.show()
png
png
Vertical smoothing approaches¶
Despiking¶
This approach was used by Briggs et al. (2010). The idea is to apply a rolling filter to the data (along the time dimension). This forms the baseline. The difference from the original data are spikes.
There are two rolling filters that can be applied to the data. The median approach is the equivalent of a rolling median. The minmax approach first applies a rolling minimum and then rolling maximum to data. This is useful particularly for optics data where spikes are particles in the water column and are not normally distributed.
In the case of salinity, the median approach is likely best, as “spikes” would be positive and negative (Gaussian distribution).
salt_base, salt_spike = gt.cleaning.despike(salt, window_size=5, spike_method='median')
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, salt_base, cmap=cmo.haline, ax=ax[0])
ax[0].set_title('Despiked using median filter')
ax[0].cb.set_label('Salinity baseline')
ax[0].set_xlim(50,150)
ax[0].set_xlabel('')
gt.plot(x, y, salt_spike, cmap=cm.RdBu_r, vmin=-6e-3, vmax=6e-3, ax=ax[1])
ax[1].cb.set_label('Salinity spikes')
ax[1].set_xlim(50,150)
plt.xticks(rotation=0)
plt.show()
png
Rolling window¶
The rolling window method simply applies an aggregating function (mean, median, std, min, max
) to the dataset.
Because the above example is equivalent to a rolling median, we show what a rolling 75th percentile
looks like instead.
This could be used to create additional filters by users. Note that in this more complex example we create a wrapper function for the percentile so that we can tell the percentile function that we want the 75th percentile and we want to calculate this along the nth axis.
def seventyfith(x, axis=0):
# wrapper function so we can pass axis and percentile to
# the input function
return np.percentile(x, 75, axis=axis)
# other numpy functions also work: np.mean, np.median, np.std
salt_roll75 = gt.cleaning.rolling_window(salt, seventyfith, window=5)
salt_rollavg = gt.cleaning.rolling_window(salt, mean, window=5)
# PLOTTING
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, salt_roll75, cmap=cmo.haline, ax=ax[0])
ax[0].set_title('75$^{th}$ for a rolling window with size 5')
ax[0].cb.set_label('Salinity baseline')
ax[0].set_xlim(50,150)
ax[0].set_xlabel('')
gt.plot(x, y, salt_roll75 - salt, cmap=cm.RdBu_r, vmin=-6e-3, vmax=6e-3, ax=ax[1])
ax[1].cb.set_label('Difference from original data')
ax[1].set_xlim(50,150)
plt.xticks(rotation=0)
plt.show()
png
Savitzky-Golay¶
The Savitzky-Golay function fits a low order polynomial to a rolling window of the time series. This has the result of smoothing the data. A larger window with a lower order polynomial with have a smoother fit.
We recommend a 2nd order kernel. Here we use first order to show that the difference can be quite big.
salt_savgol = gt.cleaning.savitzky_golay(salt, window_size=11, order=1)
# PLOTTING
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, salt_savgol, cmap=cmo.haline, ax=ax[0])
ax[0].set_title('Smoothing the data with Savitzky-Golay')
ax[0].cb.set_label('Smoothed salinity')
ax[0].set_xlim(50,150)
ax[0].set_xlabel('')
gt.plot(x, y, salt_savgol - salt, cmap=cm.RdBu, vmin=-6e-3, vmax=6e-3, ax=ax[1])
ax[1].cb.set_label('Difference from original')
ax[1].set_xlim(50,150)
plt.show()
png
Wrapper functions¶
Wrapper functions have been designed to make this process more efficient, which is demonstrated below with temperature and salinity.
temp_qc = gt.calc_physics(temp, x, y,
iqr=1.5, depth_threshold=0,
spike_window=5, spike_method='median',
savitzky_golay_window=11, savitzky_golay_order=2)
# PLOTTING
fig, ax = plt.subplots(3, 1, figsize=[9, 8.5], sharex=True, dpi=90)
gt.plot(x, y, temp, cmap=cmo.thermal, ax=ax[0])
gt.plot(x, y, temp_qc, cmap=cmo.thermal, ax=ax[1])
gt.plot(x, y, temp_qc - temp, cmap=cm.RdBu_r, vmin=-0.05, vmax=0.05, ax=ax[2])
[a.set_xlabel('') for a in ax]
ax[0].cb.set_label('Original Data')
ax[1].cb.set_label('Cleaned Data')
ax[2].cb.set_label('Difference from Original')
plt.show()
==================================================
Physics Variable:
Removing outliers with IQR * 1.5: 0 obs
Removing spikes with rolling median (spike window=5)
Smoothing with Savitzky-Golay filter (window=11, order=2)
png
salt_qc = gt.calc_physics(salt, x, y,
mask_frac=0.2, iqr=2.5,
spike_window=5, spike_method='median',
savitzky_golay_window=11, savitzky_golay_order=2)
# PLOTTING
fig, ax = plt.subplots(3, 1, figsize=[9, 8.5], sharex=True, dpi=90)
gt.plot(x, y, salt, cmap=cmo.haline, ax=ax[0])
gt.plot(x, y, salt_qc, cmap=cmo.haline, ax=ax[1])
gt.plot(x, y, salt_qc - salt, cmap=cm.RdBu_r, vmin=-0.02, vmax=0.02, ax=ax[2])
[a.set_xlabel('') for a in ax]
ax[0].cb.set_label('Original Data')
ax[1].cb.set_label('Cleaned Data')
ax[2].cb.set_label('Difference from Original')
plt.show()
==================================================
Physics Variable:
Removing outliers with IQR * 2.5: 1551 obs
Removing spikes with rolling median (spike window=5)
Removing horizontal outliers (fraction=0.2, multiplier=2.5)
Smoothing with Savitzky-Golay filter (window=11, order=2)
png
dat['temp_qc'] = temp
dat['salt_qc'] = salt
Secondary physical variables¶
Density¶
GliderTools provides a wrapper to calculate potential density.
This is done by first calculating potential temperature and then calculating absolute salinity.
A reference depth of 0
is used by default
dens0 = gt.physics.potential_density(salt_qc, temp_qc, pres, lats, lons)
dat['density'] = dens0
gt.plot(dat.dives, dat.depth, dens0, cmap=cmo.dense)
plt.xlim(50,150)
plt.show()
png
Mixed Layer Depth¶
import matplotlib.pyplot as plt
mld = gt.physics.mixed_layer_depth(ds, 'density', verbose=False)
mld_smoothed = mld.rolling(10, min_periods=3).mean()
mld_mask = gt.utils.mask_below_depth(ds, mld)
mld_grid = gt.grid_data(ds.dives, ds.depth, mld_mask, verbose=False)
fig, ax = plt.subplots(1, 2, figsize=[9, 3], dpi=100, sharey=True)
mld_smoothed.plot(ax=ax[0])
gt.plot(mld_grid, ax=ax[1])
[a.set_ylim(100, 0) for a in ax]
ax[0].set_ylabel('Depth (m)')
[a.set_xlabel('Dives') for a in ax]
plt.xticks(rotation=0)
fig.tight_layout()
/Users/luke/Git/GliderTools/glidertools/helpers.py:61: GliderToolsWarning:
Primary input variable is not xr.DataArray data type - no metadata to pass on.
png
Optics (BB, PAR, Chl)¶
The optics module contains functions that process backscatter, PAR and fluorescence.
There is a wrapper function for each of these variables that applies several functions related to cleaning and processing. We show each step of the wrapper function seperately and then summarise with the wrapper function.
Backscatter¶
theta = 124
xfactor = 1.076
gt.plot(x, y, bb700, cmap=cmo.delta, vmin=60, vmax=200)
xlim(200,340)
title('Original Data')
show()
png
Outlier bounds method¶
See the cleaning section for more information on gt.cleaning.outlider_bounds_[]
bb700_iqr = gt.cleaning.outlier_bounds_iqr(bb700, multiplier=3)
bb700_std = gt.cleaning.outlier_bounds_std(bb700, multiplier=3)
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, bb700_iqr, cmap=cmo.delta, ax=ax[0], vmin=60, vmax=200)
gt.plot(x, y, bb700_std, cmap=cmo.delta, ax=ax[1], vmin=60, vmax=200)
[a.set_xlabel('') for a in ax]
[a.set_xlim(200, 340) for a in ax]
ax[0].set_title('Outlier IQR')
ax[1].set_title('Outlier STD')
plt.show()
png
Removing bad profiles¶
This function masks bad dives based on mean + std x [1] or median + std x [1] at a reference depth.
# find_bad_profiles returns boolean mask and dive numbers
# we index only the mask
bad_profiles = gt.optics.find_bad_profiles(dives, depth, bb700,
ref_depth=300,
stdev_multiplier=1,
method='median')[0]
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
# ~ reverses True to False and vice versa - i.e. we mask bad bad profiles
gt.plot(x, y, bb700, cmap=cmo.delta, ax=ax[0], vmin=60, vmax=200)
gt.plot(x, y, bb700.where(~bad_profiles), cmap=cmo.delta, ax=ax[1], vmin=60, vmax=200)
[a.set_xlabel('') for a in ax]
[a.set_xlim(40, 120) for a in ax]
ax[0].set_title('All backscatter data')
ax[1].set_title('Bad profiles masked')
plt.show()
png
Conversion from counts to total backscatter¶
The scale and offset function uses the factory calibration dark count and scale factor.
The bback total function uses the coefficients from Zhang et al. (2009) to convert the raw counts into total backscatter (m-1), correcting for temperature and salinity. The $\chi$ factor and $\theta$ in this example were taken from Sullivan et al. (2013) and Slade & Boss (2015).
beta = gt.flo_functions.flo_scale_and_offset(bb700.where(~bad_profiles), 49, 3.217e-5)
bbp = gt.flo_functions.flo_bback_total(beta, temp_qc, salt_qc, theta, 700, xfactor)
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, beta, cmap=cmo.delta, ax=ax[0], robust=True)
gt.plot(x, y, bbp, cmap=cmo.delta, ax=ax[1], robust=True)
[a.set_xlabel('') for a in ax]
[a.set_xlim(200, 340) for a in ax]
[a.set_ylim(400, 0) for a in ax]
ax[0].set_title('$\u03B2$')
ax[1].set_title('b$_{bp}$ (m$^{-1}$)')
plt.show()
png
Correcting for an in situ dark count¶
Sensor drift from factory calibration requires an additional correction, the calculation of a dark count in situ. This is calculated from the 95th percentile of backscatter measurements between 200 and 400m.
bbp = gt.optics.backscatter_dark_count(bbp, depth)
gt.plot(x, y, bbp, cmap=cmo.delta, robust=True)
xlim(200,340)
title('b$_{bp}$ (m$^{-1}$)')
show()
png
Despiking¶
Following the methods outlined in Briggs et al. (2011) to both identify spikes in backscatter and remove them from the baseline backscatter signal. The spikes are retained as the data can be used to address specific science questions, but their presence can decrease the accuracy of the fluorescence quenching function.
bbp_horz = gt.cleaning.horizontal_diff_outliers(x, y, bbp, depth_threshold=10, mask_frac=0.05)
bbp_baseline, bbp_spikes = gt.cleaning.despike(bbp_horz, 7, spike_method='minmax')
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, bbp_baseline, cmap=cmo.delta, ax=ax[0], robust=True)
gt.plot(x, y, bbp_spikes, ax=ax[1], cmap=cm.Spectral_r, vmin=0, vmax=0.004)
[a.set_xlabel('') for a in ax]
[a.set_xlim(200, 340) for a in ax]
ax[0].set_title('Despiked b$_{bp}$ (m$^{-1}$)')
ax[1].set_title('b$_{bp}$ (m$^{-1}$) spikes')
plt.show()
png
Adding the corrected variables to the original dataframe¶
dat['bbp700'] = bbp_baseline
dat['bbp700_spikes'] = bbp_spikes
Wrapper function demonstration¶
A wrapper function was also designed, which is demonstrated below with the second wavelength (700 nm). The default option is for verbose to be True, which will provide an output of the different processing steps.
bbp_baseline, bbp_spikes = gt.calc_backscatter(
bb700, temp_qc, salt_qc, dives, depth, 700, 49, 3.217e-5,
spike_window=11, spike_method='minmax', iqr=2., profiles_ref_depth=300,
deep_multiplier=1, deep_method='median', verbose=True)
dat['bbp700'] = bbp_baseline
dat['bbp700_spikes'] = bbp_spikes
ax = gt.plot(x, y, dat.bbp700, cmap=cmo.delta),
[a.set_xlim(200, 340) for a in ax]
plt.show()
==================================================
bb700:
Removing outliers with IQR * 2.0: 8606 obs
Mask bad profiles based on deep values (depth=300m)
Number of bad profiles = 27/672
Zhang et al. (2009) correction
Dark count correction
Spike identification (spike window=11)
png
bbp_baseline, bbp_spikes = gt.calc_backscatter(
bb470, temp_qc, salt_qc, dives, depth, 470, 47, 1.569e-5,
spike_window=7, spike_method='minmax', iqr=3, profiles_ref_depth=300,
deep_multiplier=1, deep_method='median', verbose=True)
dat['bbp470'] = bbp_baseline
dat['bbp470_spikes'] = bbp_spikes
gt.plot(x, y, dat.bbp470, cmap=cmo.delta)
plt.show()
==================================================
bb470:
Removing outliers with IQR * 3: 2474 obs
Mask bad profiles based on deep values (depth=300m)
Number of bad profiles = 16/672
Zhang et al. (2009) correction
Dark count correction
Spike identification (spike window=7)
png
PAR¶
PAR Scaling¶
This function uses the factory calibration to convert from $\mu$V to $\mu$E m$^{-2}$ s$^{-1}$.
par_scaled = gt.optics.par_scaling(par, 6.202e-4, 10.8)
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, par, cmap=cmo.solar, ax=ax[0], robust=True)
gt.plot(x, y, par_scaled, cmap=cmo.solar, ax=ax[1], robust=True)
[a.set_xlabel('') for a in ax]
[a.set_xlim(200, 340) for a in ax]
[a.set_ylim(70, 0) for a in ax]
ax[0].set_title('PAR ($\mu$V)')
ax[1].set_title('PAR ($\mu$E m$^{-2}$ m$^{-1}$)')
plt.show()
png
Correcting for an in situ dark count¶
Sensor drift from factory calibration requires an additional correction, the calculation of a dark count in situ. This is calculated from the median of PAR measurements, with additional masking applied for values before 23:01 and outside the 90th percentile.
par_dark = gt.optics.par_dark_count(par_scaled, dives, depth, time)
gt.plot(x, y, par_dark, robust=True, cmap=cmo.solar)
xlim(200,340)
ylim(70,0)
title('PAR ($\mu$E m$^{-2}$ m$^{-1}$)')
show()
png
PAR replacement¶
This function removes the top 5 metres from each dive profile, and then algebraically recalculates the surface PAR using an exponential equation.
par_filled = gt.optics.par_fill_surface(par_dark, dives, depth, max_curve_depth=80)
par_filled[par_filled < 0] = 0
par_filled = par_filled.fillna(0)
i = dives == 232
fig, ax = subplots(1, 2, figsize=[6,6], dpi=100)
ax[0].plot(par_dark[i], depth[i], lw=0.5, marker='o', ms=5)
ax[0].plot(par_filled[i], depth[i], lw=0.5, marker='o', ms=3)
ax[1].plot(par_filled[i] - par_dark[i], depth[i], lw=0, marker='o')
ax[0].set_ylim(80,0)
ax[0].set_ylabel('Depth (m)')
ax[0].set_xlabel('PAR ($\mu$E m$^{-2}$ m$^{-1}$)')
ax[1].set_ylim(80,0)
ax[1].set_xlim(-350,350)
ax[1].set_yticklabels('')
ax[1].set_xlabel('Difference between profiles')
fig.tight_layout()
plt.show()
png
gt.plot(x, y, par_filled, robust=True, cmap=cmo.solar)
xlim(200,340)
ylim(100,0)
title('PAR ($\mu$E m$^{-2}$ m$^{-1}$)')
show()
png
Wrapper function demonstration¶
par_qc = gt.calc_par(par, dives, depth, time,
6.202e-4, 10.8,
curve_max_depth=80,
verbose=True).fillna(0)
gt.plot(x, y, par_qc, robust=True, cmap=cmo.solar)
ylim(80, 0)
show()
==================================================
PAR
Dark correction
Fitting exponential curve to data
png
Deriving additional variables¶
Euphotic Depth and Light attenuation coefficient¶
euphotic_depth, kd = gt.optics.photic_depth(
par_filled, dives, depth,
return_mask=False,
ref_percentage=1
)
fig, ax = subplots(1, 1, figsize=[6,4], dpi=100)
p1 = plot(euphotic_depth.index, euphotic_depth, label='Euphotic Depth')
ylim(120,0)
ylabel('Euphotic Depth (m)')
xlabel('Dives')
ax2 = ax.twinx()
p2 = plot(kd.index, kd, color='orange', lw=0, marker='o', ms=2, label='K$_d$')
ylabel('K$_d$', rotation=270, labelpad=20)
lns = p1+p2
labs = [l.get_label() for l in lns]
ax2.legend(lns, labs, loc=3, numpoints=1)
show()
png
Fluorescence¶
Quenching Correcting Method as outlined in Thomalla et al. (2017)
gt.plot(x, y, fluor, cmap=cmo.delta, robust=True)
xlim(150,300)
title('Original Data')
show()
png
Outlier bounds method¶
flr_iqr = gt.cleaning.outlier_bounds_iqr(fluor, multiplier=3)
gt.plot(x, y, flr_iqr, cmap=cmo.delta, robust=True)
title('Outlier Bounds IQR Method')
xlim(150,300)
show()
png
Removing bad profiles¶
This function masks bad dives based on mean + std x [3] or median + std x [3] at a reference depth.
bad_profiles = gt.optics.find_bad_profiles(dives, depth, flr_iqr,
ref_depth=300,
stdev_multiplier=4,
method='mean')
flr_goodprof = flr_iqr.where(~bad_profiles[0])
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, flr_iqr, cmap=cmo.delta, ax=ax[0], robust=True)
gt.plot(x, y, flr_goodprof, cmap=cmo.delta, ax=ax[1], robust=True)
[a.set_xlabel('') for a in ax]
[a.set_xlim(90, 150) for a in ax]
[a.set_ylim(300, 0) for a in ax]
ax[0].set_title('Bad Profiles Included')
ax[1].set_title('Bad Profiles Discarded')
plt.show()
png
Correcting for an in situ dark count¶
Sensor drift from factory calibration requires an additional correction, the calculation of a dark count in situ. This is calculated from the 95th percentile of fluorescence measurements between 300 and 400m.
flr_dark = gt.optics.fluorescence_dark_count(flr_iqr, dat.depth)
gt.plot(x, y, flr_dark, cmap=cmo.delta, robust=True)
xlim(150,300)
show()
png
Despiking¶
flr_base, flr_spikes = gt.cleaning.despike(flr_dark, 11, spike_method='median')
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, flr_base, cmap=cmo.delta, ax=ax[0], robust=True)
gt.plot(x, y, flr_spikes, cmap=cm.RdBu_r, ax=ax[1], vmin=-5, vmax=5)
[a.set_xlabel('') for a in ax]
[a.set_xlim(150, 300) for a in ax]
[a.set_ylim(300, 0) for a in ax]
ax[0].set_title('Despiked Fluorescence')
ax[1].set_title('Fluorescence spikes')
plt.show()
png
Quenching Correction¶
This function uses the method outlined in Thomalla et al. (2017), briefly it calculates the quenching depth and performs the quenching correction based on the fluorescence to backscatter ratio. The quenching depth is calculated based upon the different between night and daytime fluorescence.
The default setting is for the preceding night to be used to correct the following day’s quenching (night_day_group=True
). This can be changed so that the following night is used to correct the preceding day. The quenching depth is then found from the difference between the night and daytime fluorescence, using the steepest gradient of the {5 minimum differences and the points the difference changes sign (+ve/-ve)}.
The function gets the backscatter/fluorescence ratio between from the quenching depth to the surface, and then calculates a mean nighttime ratio for each night. The quenching ratio is calculated from the nighttime ratio and the daytime ratio, which is then applied to fluorescence to correct for quenching. If the corrected value is less than raw, then the function will return the original raw data.
flr_qc, quench_layer = gt.optics.quenching_correction(
flr_base, dat.bbp470, dives, depth, time, lats, lons,
sunrise_sunset_offset=1, night_day_group=True)
fig, ax = plt.subplots(2, 1, figsize=[9, 6], sharex=True, dpi=90)
gt.plot(x, y, flr_qc, cmap=cmo.delta, ax=ax[0], robust=True)
gt.plot(x, y, quench_layer, cmap=cm.RdBu_r, ax=ax[1], vmin=-.5, vmax=2)
[a.set_xlabel('') for a in ax]
[a.set_xlim(150, 300) for a in ax]
[a.set_ylim(100, 0) for a in ax]
ax[0].set_title('Quenching Corrected Fluorescence')
ax[1].set_title('Quenching Layer')
plt.show()
png
Wrapper function¶
flr_qnch, flr, qnch_layer, [fig1, fig2] = gt.calc_fluorescence(
fluor, dat.bbp700, dives, depth, time, lats, lons, 53, 0.0121,
profiles_ref_depth=300, deep_method='mean', deep_multiplier=1,
spike_window=11, spike_method='median', return_figure=True,
night_day_group=False, sunrise_sunset_offset=2, verbose=True)
dat['flr_qc'] = flr
==================================================
Fluorescence
Mask bad profiles based on deep values (ref depth=300m)
Number of bad profiles = 19/672
Dark count correction
Quenching correction
Spike identification (spike window=11)
Generating figures for despiking and quenching report
png
png
Calibration with bottle samples¶
Bottle calibration can also be done using the calibration
module.
The bottle file needs to be in a specific format with dates (datetime64
format), depth and the variable values. This can be imported with any method available. I recommend pandas.read_csv
as shown in the example below. Note that latitude and longitude are not taken into account, thus the user needs to make sure that the CTD cast was in the correct location (and time, but this will be used to match the glider).
import pandas as pd
fname = '/Users/luke/Work/Publications/2019_Gregor_Front_glider/figures/SOSCEX 3 PS1.csv'
cal = pd.read_csv(fname, parse_dates=['datetime'], dayfirst=True)
The calibration.bottle_matchup
function returns an array that matches the size of the ungridded glider data.
The matching is done based on depth and time from both the glider and the CTD. The function will show how many samples have been matched and the smallest time difference between a CTD rosette cast and a dive (any time on the dive).
Using depth¶
%autoreload 2
dat['bottle_sal'] = gt.calibration.bottle_matchup(
dat.dives, dat.depth, dat.time,
cal.depth, cal.datetime, cal.sal)
model = gt.calibration.robust_linear_fit(dat.salt_qc, dat.bottle_sal, fit_intercept=True, epsilon=1.5)
dat['salinity_qc'] = model.predict(dat.salt_qc)
[stn 0/5] FAILED: 2015-07-28 10:25 Couldn't find samples within constraints
[stn 1/5] FAILED: 2015-07-28 16:15 Couldn't find samples within constraints
[stn 2/5] FAILED: 2015-12-08 03:23 Couldn't find samples within constraints
[stn 3/5] SUCCESS: 2016-01-05 17:46 (15 of 15 samples) match-up within 0.0 minutes
[stn 4/5] SUCCESS: 2016-02-08 03:14 (12 of 17 samples) match-up within 0.0 minutes
(13, 1) (100, 1)
png
Using Density¶
%autoreload 2
dat['bottle_sal'] = gt.calibration.bottle_matchup(
dat.dives, dat.density, dat.time,
cal.density, cal.datetime, cal.sal)
model = gt.calibration.robust_linear_fit(dat.salt_qc, dat.bottle_sal, fit_intercept=True, epsilon=1.5)
dat['salinity_qc'] = model.predict(dat.salt_qc)
[stn 0/5] FAILED: 2015-07-28 10:25 Couldn't find samples within constraints
[stn 1/5] FAILED: 2015-07-28 16:15 Couldn't find samples within constraints
[stn 2/5] FAILED: 2015-12-08 03:23 Couldn't find samples within constraints
[stn 3/5] SUCCESS: 2016-01-05 17:46 (15 of 15 samples) match-up within 0.0 minutes
[stn 4/5] SUCCESS: 2016-02-08 03:14 (16 of 17 samples) match-up within 0.0 minutes
(6, 1) (100, 1)
png
Gridding and interpolation¶
Vertical gridding¶
It is often more convenient and computationally efficient to work with data that has been gridded to a standard vertical grid (i.e. depths have been binned). GliderTools offers very easy to use and efficient tools to grid data once all the processing has been completed.
The first task is to select the bin size of the data that will be gridded. GliderTools automatically selects bin sizes according to the sampling frequency of the dataset for every 50m. This is shown in the figure below, where the 2D histogram shows the sampling frequency (by depth) and the line shows the automatically selected bin size rounded up to the nearest 0.5m.
ax = gt.plot.bin_size(dat.depth, cmap=mpl.cm.Blues)
ax.set_xlim(0, 6)
line = ax.get_children()[1]
line.set_linewidth(6)
line.set_color('orange')
legend = ax.get_children()[-2]
legend.set_visible(False)
png
Gridding with automatic bin sizes¶
Gridding the data then becomes easy with automatic binning. But note that the x-coordinate has the be semi-discrete, e.g. dives number or dive time stamp average. You’ll see that the gridding function also returns the mean bin size and then the average sampling frequency.
The function can return either an xr.DataArray or a pd.DataFrame. The DataArray is the default as metadata can be stored in these files (including coordinate information).
Gridded data can be passed to the plot function without x- and y-coordinates, as these are contained in the gridded data.
In fact, data is silently passed through the gridding function when x- and y-coordinates are included in the gt.plot
function
flr_gridded = gt.grid_data(dives, depth, flr)
ax = gt.plot(flr_gridded, cmap=cmo.delta)
ax.set_ylim(200, 0)
Mean bin size = 1.99
Mean depth binned (50 m) vertical sampling frequency = 2.53
png
Gridding with manually defined bins¶
There is also the option to manuualy define your bins if you’d prefer.
A custom bin array needs to be created.
Use np.arange
to create sections of the bins and combine them with np.r_
as shown below:
custom_bin = np.r_[
np.arange(0, 100, 0.5),
np.arange(100, 400, 1.0),
np.arange(400, 1000, 2.0)]
flr_gridded = gt.grid_data(x, y, flr, bins=custom_bin)
# The plot below is the standard plotting procedure for an xarray.DataArray
gt.plot(flr_gridded, cmap=cmo.delta)
ylim(200, 0)
Mean bin size = 1.25
Mean depth binned (50 m) vertical sampling frequency = 2.53
(200, 0)
png
2D interpolation with objective mapping (Kriging)¶
Users may want to interpolate data horizontally when working with finescale gradients.
Several studies have used the objmap
MATLAB function that uses objective mapping (a.k.a. Kriging).
Kriging is an advanced form of inverse distance weighted interpolation, where points influence the interpolation based on the distance from an interpolation point, where the influence falls off with a Gauassian function.
This is an expensive function when the dataset is large (due to a matrix inverse operation).
The computational cost is reduced by breaking the problem into smaller pieces using a quadtree that iteratively breaks data into smaller problems.
GliderTools provides a Python implementation of the MATLAB function. We have added parallel capability to speed the processing up, but this operation is still costly and could take several hours if an entire section is interpolated. We thus recommend that smaller sections are interpolated.
# first we select a subset of data (50k points)
subs = dat.isel(merged=slice(0, 50000))
# we then get time values - this makes creating the interpolation grid easier
var = subs.flr_qc
time = subs.time.values
depth = subs.depth
dives = subs.dives
dist = np.r_[0, gt.utils.distance(subs.longitude, subs.latitude).cumsum()]
Part 1: Semivariance¶
Interpolating any variable requires some knowlege about the spatial autocorrelation of that variable. A semivariogram allows one to get this information from the data. The basic idea of a semivariogram is to assess the similarity between data at different lengthscales (lags), where a low semivariance shows coherence and a large semivariance shows a mismatch. This information is required to interpolate data with sensible estimates and error estimates.
GliderTools offers a derivation of a variogram tool (gt.mapping.variogram
) that makes the process of finding these parameters a little easier, though there is a fair deal of subjectivity, depending on the scale of the question at hand, and tinkering are required to make a sensible interpolation.
1.1. Choosing a subset of the data for semivariance estimation¶
The variogram function selects a number of dives (number depends on max_points) and performs the analysis on the subset of dives rathern than selecting random points. We thus recommend that a subset of the data is used to perform the analysis. In the example below, we take a subset of the data that as particularly high variability that we are interested in preserving. This subset is < 250m depth and limited to the first 20 dives. This should be tailored to the variable that you’re interested in.
m = (depth<150) & (dives > 30) & (dives < 46)
ax = gt.plot(dives, depth, var)
ax.plot(dives[m], depth[m], '-m', ms=3, alpha=0.7)
[<matplotlib.lines.Line2D at 0x1c728526d8>]
png
1.2. Initial estimate of semivariance¶
We can now find an initial estimate of the semivariance. This initial estimate will not scale the x/y coordinates for anisotropy (different scales of variability). The variogram function also accepts a boolean mask as an keyword argument. This will reduce the input data to the subset of data that you’ve chosen.
The example below shows this initial estimate. We’re looking for an estimate where the Gaussian model fits the semi-variance as well as possible, given that the variance paramters are acceptable. These variance parameters are: sill, nugget, x and y length-scales. The function automatically adjusts the range to be one and scales the x and y parameters accordingly.
The variogram function can take time (datetime64), but we use distance (in metres) to demonstrate the the anisotropic scaling.
vargram = gt.mapping.variogram(var, dist, depth, dives, mask=m)
png
The example above shows that x and y are scaled, but the Gaussian model does not fit the semivariance very well. The range is 1, because it is scaled accordingly. The sill and nugget are very similar - this is not a good result.
1.3. Finding the correct x and y length scales (anisotropy)¶
We can now scale the data with the xy_ratio. The ratio represents the scaling of x/y. For example, if x and y are both in metres (as in this case), we need to set a small xy_ratio as x has a much longer lengthscale. With some trial and error we choose a ratio of 0.0005, which fits the semivariogram relatively well and has a reasonably low y scaling estimate.
You’ll see that the Gaussian model does not fit the semivariance exactly - this is OK. The important thing is that the first plateau matches the sill.
We can now use these values for interpolating.
vargram = gt.mapping.variogram(var, dist, depth, dives, mask=m, xy_ratio=0.0005)
png
2. Interpolation¶
2.1 Preparing the interpolation grid¶
To perform the interpolation we first need to create the grid onto which data will be interpolated.
In the example below we use distance from the origin as the x-coordinate.
Time can also be used and has to be in a np.datetime64
format - we show a commented example of this.
The y-coordinate is depth.
# creating the x- and y-interpolation coordinates
# and a 1m vertical grid and a horizontal grid with 500 points
xi = np.linspace(dist.min(), dist.max(), 500)
yi = np.arange(0, depth[var.notnull()].max(), 1, dtype=float)
# time can also be used. This is a commented example of how to create
# a time grid for interpolation.
# xi = np.arange(time.min(), time.max(), 30, dtype='datetime64[m]')
2.2 Interpolation with the semivariance parameters¶
The interpolation has a number of parameters that can be changed or adapted to the dataset at hand. The commented inputs below describe these inputs.
%autoreload 2
interpolated = gt.mapping.interp_obj(
dist, depth, var, xi, yi,
# Kriging interoplation arguments
partial_sill=1.1e4, # taken from the semivariogram (sill - nugget)
nugget=3e3, # taken from the semivariogram
lenscale_x=98942, # in hours if x and xi are in datetime64
lenscale_y=50, # the vertical gridding influence
detrend=True, # if True use linear regression (z - z_hat), if False use average (z - z_mean)
# Quadtree arguments
max_points_per_quad=65, # an optimsation setting ~100 is good
min_points_per_quad=8, # if neighbours have < N points, look at their neighbours
# Parallel calculation inputs.
n_cpus=3, # the number of CPU's to use for the calculation - default is n-1
parallel_chunk_size=512, # when the dataset is very large, memory can become an issue
# this prevents large buildup of parallel results
)
Starting Interpolation with quadtree optimal interpolation
----------------------------------------------------------
Preparing for interpolations:
Finding and removing nans
Removing data trend with linear regression
Building QuadTree
Interpolation information:
basis points: 25226
interp grid: 500, 404
max_points_per_quad: 65
min_points_per_quad: 8
number of quads: 952
detrend_method: linear_regression
partial_sill: 11000.0
nugget: 3000.0
lengthscales: X = 98942
Y = 50 m
Processing interpolation chunks in 2 parts over 3 CPUs:
chunk 1/2 completed in 12s
chunk 2/2 completed in 10s
Finishing off interoplation
Adding back the trend
Creating xarray dataset for output
fig, ax = plt.subplots(3, 1, figsize=[9, 9], sharex=True, dpi=90)
error_mask = (interpolated.variance / interpolated.nugget) < 1.05
interp_robust = interpolated.z.where(error_mask)
props = dict(vmin=0, vmax=300, cmap=cmo.delta)
gt.plot.scatter(dist, depth, var, ax=ax[0], **props)
gt.plot.pcolormesh(interp_robust, ax=ax[1], **props)
gt.plot.pcolormesh(interpolated.variance, ax=ax[2], vmin=interpolated.nugget, vmax=interpolated.nugget*1.08)
ax[2].plot(dist, depth, 'w-', zorder=40, alpha=0.8, lw=0.4)
[a.set_ylim(400, 0) for a in ax]
[a.set_xlabel(' ') for a in ax]
ax[0].get_children()[0].set_sizes([20])
ax[0].set_title('Uninterpolated data')
ax[1].set_title('Interpolated data')
ax[2].set_title('Interpolation variance with dives shown in white')
ax[2].set_xlabel('Distance (m)')
tks = xticks(rotation=0)
png
Saving data¶
We have not created an explicit way to save data in GliderTools. This is primarily due to the fact that the package is built on top of two packages that already do this very well: pandas and xarray. pandas is widely used and deals with tabular formatted data (2D). xarray widely used in earth sciences as it supports multi-dimensional indexing (3D+). We highly recommend that you read through the documentation for these packages as they are incredibly powerful and you would benefit from knowing these tools regardless of using GliderTools or not!
We have written GliderTools primarily with xarray as the backend, due to the ability to store attributes (or metadata) alongside the data - something that pandas does not yet do. Moreover, we have also created the tool so that metadata is passed to the output of each function, while appending the function call to the history attribute. This ensures that the user of the data knows when and what functions (and arguements) were called and for which version of GliderTools this was done.
Examples¶
First we give an example of how to save and read files to netCDF (which we recommend).
import xarray as xr
# xds is an xarray.DataFrame with record of dimensions, coordinates and variables
xds.to_netcdf('data_with_meta.nc')
# this file can simply be loaded in the same way, without using GliderTools
# all the information that was attached to the data is still in the netCDF
xds = xr.open_dataset('data_with_meta.nc')
In this second example we show how to save the data to a CSV. While this is a common and widely used format, we do not recommend this as the go to format, as all metadata is lost when the file is saved.
import pandas as pd
# If you prefer to save your data as a text file, you can easily do this with Pandas
# note that converting the file to a dataframe discards all the metadata
df = xds.to_dataframe()
df.to_csv('data_without_meta.csv')
# this file can simply be loaded in the same way, without using GliderTools
# there will be no more metadata attached to each variable
df = pd.read_csv('data_without_meta.csv')
# finally, you can also convert the file back to an xarray.Dataset
# however, the data will still be lost
xds = df.to_xarray()
API Reference¶
The API reference is automatically generated from the function docstrings in the GliderTools package. Refer to the examples in the sidebar for reference on how to use the functions.
Loading Data¶
load.seaglider_basestation_netCDFs (files, …) |
Load a list of variables from the SeaGlider object as a pandas.DataFrame . |
load.seaglider_show_variables (files) |
|
load.ego_mission_netCDF (filename) |
Loads an EGO formatted glider mission file. |
load.slocum_geomar_matfile (filename[, verbose]) |
Load .mat file generated with the geomar MATLAB script for Slocum data. |
load.voto_seaexplorer_nc |
|
load.voto_seaexplorer_dataset |
|
load.voto_concat_datasets |
High level processing¶
processing.calc_physics (variable, dives, depth) |
A standard setup for processing physics variables (temperature, salinity). |
processing.calc_oxygen (o2raw, pressure, …) |
This function processes oxygen. |
processing.calc_backscatter (bb_raw, tempC, …) |
The function processes the raw backscattering data in counts into total backscatter (bbp) in metres. |
processing.calc_fluorescence (flr_raw, bbp, …) |
This function processes raw fluorescence and corrects for quenching using the Thomalla et al. |
processing.calc_par (par_raw, dives, depth, …) |
Calculates the theoretical PAR based on an exponential curve fit. |
Cleaning¶
cleaning.outlier_bounds_std (arr[, multiplier]) |
Mask values outside the upper and lower outlier limits by standard deviation |
cleaning.outlier_bounds_iqr (arr[, multiplier]) |
Mask values outside the upper/lower outlier limits by interquartile range: |
cleaning.horizontal_diff_outliers (dives, …) |
Find Z-score outliers (> 3) on the horizontal. |
cleaning.mask_bad_dive_fraction (mask, dives, var) |
Find bad dives - where more than a fraction of the dive is masked |
cleaning.data_density_filter (x, y[, …]) |
Use the 2D density cloud of observations to find outliers for any variables |
cleaning.despike (var, window_size[, …]) |
Return a smooth baseline of data and the anomalous spikes |
cleaning.despiking_report (dives, depth, raw, …) |
A report for the results of cleaning.despike. |
cleaning.rolling_window (var, func, window) |
A rolling window function that is nan-resiliant |
cleaning.savitzky_golay (var, window_size, order) |
Smooth (and optionally differentiate) data with a Savitzky-Golay filter. |
Physics¶
physics.mixed_layer_depth (dives, depth, …) |
Calculates the MLD for ungridded glider array. |
physics.potential_density (salt_PSU, temp_C, …) |
Calculate density from glider measurements of salinity and temperature. |
physics.brunt_vaisala |
Optics¶
optics.find_bad_profiles (dives, depth, var) |
Find profiles that exceed a threshold at a reference depth. |
optics.par_dark_count (par, dives, depth, time) |
Calculates an in situ dark count from the PAR sensor. |
optics.backscatter_dark_count (bbp, depth) |
Calculates an in situ dark count from the backscatter sensor. |
optics.fluorescence_dark_count (flr, depth) |
Calculates an in situ dark count from the fluorescence sensor. |
optics.par_scaling (par_uV, …) |
Scaling correction for par with factory calibration coefficients. |
optics.par_fill_surface (par, dives, depth[, …]) |
Algebraically calculates the top 5 metres of the par profile. |
optics.photic_depth (par, dives, depth[, …]) |
Algebraically calculates the euphotic depth. |
optics.sunset_sunrise (time, lat, lon) |
Calculates the local sunrise/sunset of the glider location. |
optics.quenching_correction (flr, bbp, dives, …) |
Corrects the fluorescence data based upon Thomalla et al. |
optics.quenching_report (flr, flr_corrected, …) |
A report for the results of optics.quenching_correction. |
Calibration¶
calibration.bottle_matchup (gld_dives, …[, …]) |
Performs a matchup between glider and bottle samples based on time and depth (or density). |
calibration.model_figs (bottle_data, …[, ax]) |
Creates the figure for a linear model fit. |
calibration.robust_linear_fit (gld_var, …) |
Perform a robust linear regression using a Huber Loss Function to remove outliers. |
Gridding and Interpolation¶
mapping.interp_obj (x, y, z, xi, yi[, …]) |
Performs objective interpolation (or Kriging) of a 2D field. |
mapping.grid_data (x, y, var[, bins, how, …]) |
Grids the input variable to bins for depth/dens (y) and time/dive (x). |
mapping.variogram (variable, horz, vert, dives) |
Find the interpolation parameters and x and y scaling of the horizontal and vertical coordinate paramters for objective interpolation (Kriging). |
Plotting¶
plot.plot_functions |
Plot data (gridded or not) as a section and more. |
General Utilities¶
utils.time_average_per_dive (dives, time) |
Gets the average time stamp per dive. |
utils.mask_above_depths |
|
utils.mask_below_depths |
|
utils.mask_profile_depth |
|
utils.merge_dimensions (df1, df2[, interp_lim]) |
Merges variables measured at different time intervals. |
utils.calc_glider_vert_velocity (time, depth) |
Calculate glider vertical velocity in cm/s |
utils.calc_dive_phase (time, depth) |
Determine the glider dive phase |
utils.calc_dive_number (time, depth) |
Determine the glider dive number (based on dive phase) |
utils.dive_phase_to_number (phase) |
|
utils.distance (lon, lat[, ref_idx]) |
Great-circle distance in m between lon, lat points. |
utils.group_by_profiles |
Package Structure¶
Structure image
What’s New¶
v2023.xx (unreleased)¶
New Features¶
- added import for VOTO seaexplorer data (GH#170) By Martin Mohrmann.
- added versatile, depth dependent masking (GH#172) By Martin Mohrmann.
Breaking changes¶
- GliderTools defaults for Figure creation were changed. Automatic application of plt.tight_layout was dropped in favour of more flexible embedding of GliderTools plots into existing layouts/subplots.
- The mixed layer depth algorithm was corrected. (GH#169, GH#168). By Martin Mohrmann. API change! Existing mixed layer computation code must be adapted.
- Changed the behavior of find_dive_phase and calc_dive_number to use a smaller depth threshold when determining a valid dive (15 dbar down from 200 dbar). this is also now adjusteable. (GH#134) By Tom Hull.
Internal changes¶
- Some cleanup of old python2 dependencies (GH#166). By Martin Mohrmann.
v2022.12.13 (2022/12/13)¶
Internal changes¶
- Refactoring and update of testing and development framework, update of flake, black and almost all python dependencies
Breaking changes¶
- Fixed processing/calc_oxygen (:pull: 116, :issue: 112) By Callum Rollo.
Internal Changes¶
- Implemented code linting as part of the CI (GH#100) By Julius Busecke.
Documentation¶
- Added conda installation instructions + badge. (GH#94) By Julius Busecke.
Bug fixes¶
- Replaced skyfield dependency with astral, fixing sunrise/sunset problems at high latitudes. By Isabelle Sindiswa Giddy.
v2021.03 (2021/3/30)¶
Documentation¶
- Updated contributor guide for conda based workflow. (GH#81) By Julius Busecke.
Internal Changes¶
- Migration of CI to conda based workflow with multiple python versions. (GH#54) By Julius Busecke.
- Revamp distribution actions. (GH#82) By Julius Busecke.
- Migrate from astral to skyfield (:pull:’121’) By ‘Isabelle Giddy <https://github.com/isgiddy>’_.
Citing GliderTools¶
If you would like to cite or reference Glider Tools, please use:
Gregor, L., Ryan-Keogh, T. J., Nicholson, S.-A., du Plessis, M., Giddy, I., & Swart, S. (2019). GliderTools: A Python Toolbox for Processing Underwater Glider Data. Frontiers in Marine Science, 6(December), 1–13. https://doi.org/10.3389/fmars.2019.00738
Project Contributors¶
The following people have made contributions to the project (in alphabetical order by last name) and are considered “The GliderTools Developers”. These contributors will be added as authors upon the next major release of GliderTools (i.e. new DOI release).
- Dhruv Balwada - University of Washington, USA. (ORCID: 0000-0001-6632-0187)
- Julius Busecke - Columbia University, USA. (ORCID: 0000-0001-8571-865X)
- Isabelle Giddy - University of Cape Town: Cape Town, Western Cape, South Africa. (ORCID: 0000-0002-8926-3311)
- Luke Gregor - Environmental Physics, ETH Zuerich: Zurich, Switzerland. (ORCID: 0000-0001-6071-1857)
- Tom Hull - Centre for Environment Fisheries and Aquaculture Science: Lowestoft, UK. (ORCID: 0000-0002-1714-9317)
- Martin Mohrmann - Voice of the Ocean Foundation, Gothenburg, Sweden. (ORCID: 0000-0001-8056-4866)
- Sarah-Anne Nicholson - Council for Scientific and Industrial Research: Cape Town, South Africa. (ORCID: 0000-0002-1226-1828)
- Marcel du Plessis - University of Cape Town: Cape Town, Western Cape, South Africa. (ORCID: 0000-0003-2759-2467)
- Callum Rollo - Voice of the Ocean Foundation, Gothenburg, Sweden. (ORCID: 0000-0002-5134-7886)
- Tommy Ryan-Keogh - Council for Scientific and Industrial Research: Cape Town, South Africa. (ORCID: 0000-0001-5144-171X)
- Sebastiaan Swart - University of Gothenburg: Gothenburg, Sweden. (ORCID: 0000-0002-2251-8826)
- Soeren Thomsen - LOCEAN/IPSL/CNRS/Sorbonne University: Paris, France. (ORCID: 0000-0002-0598-8340)
Contribution Guide¶
Contributions are highly welcomed and appreciated. Every little help counts,
so do not hesitate! You can make a high impact on glidertools
just by using it, being
involved in discussions
and reporting issues.
The following sections cover some general guidelines
regarding development in glidertools
for maintainers and contributors.
Nothing here is set in stone and can’t be changed. Feel free to suggest improvements or changes in the workflow.
Contribution links
Feature requests and feedback¶
We are eager to hear about your requests for new features and any suggestions about the API, infrastructure, and so on. Feel free to start a discussion about these on the discussions tab on github under the “ideas” section.
After discussion with a few community members, and agreement that the feature should be added and who will work on it, a new issue should be opened. In the issue, please make sure to explain in detail how the feature should work and keep the scope as narrow as possible. This will make it easier to implement in small PRs.
Report bugs¶
Report bugs for glidertools
in the issue tracker
with the label “bug”.
If you can write a demonstration test that currently fails but should pass that is a very useful commit to make as well, even if you cannot fix the bug itself.
Fix bugs¶
Look through the GitHub issues for bugs.
Talk to developers to find out how you can fix specific bugs.
Preparing Pull Requests¶
Fork the glidertools GitHub repository. It’s fine to use
glidertools
as your fork repository name because it will live under your username.Clone your fork locally using git, connect your repository to the upstream (main project), and create a branch:
$ git clone git@github.com:YOUR_GITHUB_USERNAME/glidertools.git # clone to local machine $ cd glidertools $ git remote add upstream git@github.com:GliderToolsCommunity/GliderTools.git # connect to upstream remote # now, to fix a bug or add feature create your own branch off "master": $ git checkout -b your-bugfix-feature-branch-name master # Create a new branch where you will make changes
If you need some help with Git, follow this quick start guide: https://git.wiki.kernel.org/index.php/QuickStart
Set up a [conda](environment) with all necessary dependencies:
$ conda env create -f ci/environment-py3.8.yml
Activate your environment:
$ conda activate test_env_glidertools
Make sure you are in this environment when working on changes in the future too.
Install the GliderTools package:
$ pip install -e . --no-deps
Before you modify anything, ensure that the setup works by executing all tests:
$ pytest
You want to see an output indicating no failures, like this:
$ ========================== n passed, j warnings in 17.07s ===========================
Install pre-commit and its hook on the
glidertools
repo:$ pip install --user pre-commit $ pre-commit install
Afterwards
pre-commit
will run whenever you commit. If some errors are reported by pre-commit you should format the code by running:$ pre-commit run --all-files
and then try to commit again.
https://pre-commit.com/ is a framework for managing and maintaining multi-language pre-commit hooks to ensure code-style and code formatting is consistent.
You can now edit your local working copy and run/add tests as necessary. Please follow PEP-8 for naming. When committing,
pre-commit
will modify the files as needed, or will generally be quite clear about what you need to do to pass the commit test.Break your edits up into reasonably sized commits:
$ git commit -a -m "<commit message>" $ git push -u
Committing will run the pre-commit hooks (isort, black and flake8). Pushing will run the pre-push hooks (pytest and coverage)
We highly recommend using test driven development, but our coverage requirement is low at the moment due to lack of tests. If you are able to write tests, please stick to xarray’s testing recommendations.
- Add yourself to the
Project Contributors list via
./docs/authors.md
.
Finally, submit a pull request (PR) through the GitHub website using this data:
head-fork: YOUR_GITHUB_USERNAME/glidertools compare: your-branch-name base-fork: GliderToolsCommunity/GliderTools base: master
The merged pull request will undergo the same testing that your local branch had to pass when pushing.
After your pull request is merged into the GliderTools/master, you will need to fetch those changes and rebase your master so that your master reflects the latest version of GliderTools. The changes should be fetched and incorporated (rebase) also right before you are planning to introduce changes.:
$ git checkout master # switch back to master branch $ git fetch upstream # Download all changes from central upstream repo $ git rebase upstream/master # Apply the changes that have been made to central repo, $ # since your last fetch, onto you master. $ git branch -d your-bugfix-feature-branch-name # to delete the branch after PR is approved
Release Instructions¶
This is a documentation repo for people in the group on how to do the integrated deployment.
NB RULE! Never commit to master.
- Change the version in the setup.py file. Must be format YYYY.<release number>
- Create a release with a tag that has the same format as the version above.
- The distribution will be built automatically and pushed to PyPi
- The DOI will also be updated on Zenodo. (untested, see #165)
Wishlist¶
A list of things we’d love to add to GliderTools and the work involved.
- Support for raw files from Slocum gliders and Seagliders with the following additional functionality
- Thermal lag correction for each of the gliders supported in the suggestion above.
- Support for hadware modules by model and manufacturer
- Make final data output compatible with www.OceanGliders.org data format, https://www.oceangliders.org/taskteams/data-management/