datamanager - easy access to and manipulation of data¶
The datamanager classes and functions are useful for locating the correct data file for a particular day and manipulating data and subsets in a generic way.
Authors: Jon Niehof
Institution: University of New Hampshire
Contact: Jonathan.Niehof@unh.edu
Copyright 2015
About datamanager¶
Examples¶
Examples go here
Classes
DataManager (directories, file_fmt[, …]) |
THIS CLASS IS NOT YET COMPLETE, doesn’t do much useful. |
Functions
apply_index (data, idx) |
Apply an array of indices to data. |
array_interleave (array1, array2, idx) |
Create an array containing all elements of both array1 and array2 |
axis_index (shape[, axis]) |
Returns array of indices along axis, for all other axes |
flatten_idx (idx[, axis]) |
Convert multidimensional index into index on flattened array. |
insert_fill (times, data[, fillval, tol, …]) |
Populate gaps in data with fill. |
rev_index (idx[, axis]) |
From an index, return an index that reverses the action of that index |
values_to_steps (array[, axis]) |
Transform values along an axis to their order in a unique sequence. |
-
class
spacepy.datamanager.
DataManager
(directories, file_fmt, descend=False, period=None)[source]¶ THIS CLASS IS NOT YET COMPLETE, doesn’t do much useful.
Will have to do something that allows the config file to specify regex and other things, and then just the directory to be changed (since regex, etc.
Parameters: directories : list
A list of directories that might contain the data
file_fmt : string
Regular expression that matches the files desired. Will also recognize strftime parameters %w %d %m %y %Y %H %M %s %j %U %W, all zero-pad. https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior Can have subdirectory reference, but separator should be unix-style, with no leading slash.
period : string
Size of file; can be a number followed by one of d, m, y, H, M, s. Anything else is assumed to be “irregular” and files treated as if there are neither gaps nor overlaps in the sequence. If not specified, will be assumed to match one count of the smallest unit in the format string.
Examples
=======
-
spacepy.datamanager.
apply_index
(data, idx)[source]¶ Apply an array of indices to data.
Most useful in dealing with the output from
numpy.argsort()
, and best explained by the example.Parameters: data : array
Input data, at least two dimensional. The 0th dimension is treated as a “time” or “record” dimension.
idx : sequence
2D index to apply to the import data. The 0th dimension must be the same size as
data
’s 0th dimension. Dimension 1 must be the same size as one other dimension in data (the first match found is used); this is referred to as the “index dimension.”Returns: data : sequence
View of
data
, with index applied. For each index of the 0th dimension, the values along the index dimension are obtained by applying the value ofidx
at the same index in the 0th dimension. This is repeated across any other dimensions indata
.Warning
No guarantee is made whether the returned data is a copy of the input data. Modifying values in the input may change the values of the input. Call
copy()
if a copy is required.Raises: ValueError : if can’t match the shape of data and indices
Examples
Assume
flux
is a 3D array of fluxes, with a value for each of time, pitch angle, and energy. Assume energy is not necessarily constant in time, nor is ordered in the energy dimension. Ifenergy
is a 2D array of the energies as a function of energy step for each time, then the following will sort the flux at each time and pitch angle in energy order.>>> idx = numpy.argsort(energy, axis=1) >>> flux_sorted = spacepy.datamanager.apply_index(flux, idx)
-
spacepy.datamanager.
array_interleave
(array1, array2, idx)[source]¶ Create an array containing all elements of both array1 and array2
idx
is an index on the output array which indicates which elements will be populated fromarray1
, i.e.,out[idx] == array1
(in order.) The other elements ofout
will be filled, in order, fromarray2
.Parameters: array1 : array
Input data.
array2 : array
Input data. Must have same number of dimensions as
array1
, and all dimensions except the zeroth must also have the same length.idx : array
A 1D array of indices on the zeroth dimension of the output array. Must have the same length as the zeroth dimension of
array1
.Returns: out : array
All elements from
array1
andarray2
, interleaved according toidx
.Examples
>>> import numpy >>> import spacepy.datamanager >>> a = numpy.array([10, 20, 30]) >>> b = numpy.array([1, 2]) >>> idx = numpy.array([1, 2, 4]) >>> spacepy.datamanager.array_interleave(a, b, idx) array([ 1, 10, 20, 2, 30])
-
spacepy.datamanager.
axis_index
(shape, axis=-1)[source]¶ Returns array of indices along axis, for all other axes
Parameters: shape : tuple
Shape of the output array
Returns: idx : array
An array of indices. The value of each element is that element’s index along
axis
.Other Parameters: axis : int
Axis along which to return indices, defaults to the last axis.
See also
numpy.mgrid
- This function is a special case
Examples
For a shape of
(i, j, k, l)
andaxis
= -1,idx[i, j, k, :] = range(l)
for alli
,j
,k
.Similarly, for the same shape and
axis = 1
,idx[i, :, k, l] = range(j)
for alli
,k
,l
.>>> import numpy >>> import spacepy.datamanager >>> spacepy.datamanager.axis_index((5, 3)) array([[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]) >>> spacepy.datamanager.axis_index((5, 3), 0) array([[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])
-
spacepy.datamanager.
flatten_idx
(idx, axis=-1)[source]¶ Convert multidimensional index into index on flattened array.
Convert a multidimensional index, that is values along a particular axis, so that it can derefence the flattened array properly. Note this is not the same as
ravel_multi_index()
.Parameters: idx : array
Input index, i.e. a list of elements along a particular axis, in the style of
argsort()
.Returns: flat : array
A 1D array of indices suitable for indexing the flat version of the array
Other Parameters: axis : int
Axis along which
idx
operates, defaults to the last axis.See also
Examples
>>> import numpy >>> import spacepy.datamanager >>> data = numpy.array([[3, 1, 2], [3, 2, 1]]) >>> idx = numpy.argsort(data, -1) >>> idx_flat = spacepy.datamanager.flatten_idx(idx) >>> data.ravel() #flat array array([3, 1, 2, 3, 2, 1]) >>> idx_flat #indices into the flat array array([1, 2, 0, 5, 4, 3]) >>> data.ravel()[idx_flat] #index applied to the flat array array([1, 2, 3, 1, 2, 3])
-
spacepy.datamanager.
insert_fill
(times, data, fillval=nan, tol=1.5, absolute=None, doTimes=True)[source]¶ Populate gaps in data with fill.
Continuous data are often treated differently from discontinuous data, e.g., matplotlib will draw lines connecting data points but break the line at fill. Often data will be irregularly sampled but also contain large gaps that are not explicitly marked as fill. This function adds a single record of explicit fill to each gap, defined as places where the spacing between input times is a certain multiple of the median spacing.
Parameters: times : sequence
Values representing when the data were taken. Must be one-dimensional, i.e., each value must be scalar. Not modified
data : sequence
Input data.
Returns: times, data : tuple of sequence
Copies of input times and data, fill added in gaps (
doTimes
True)data : sequence
Copy of input data, with fill added in gaps (
doTimes
False)Other Parameters: fillval :
Fill value, same type as
data
. Default isnumpy.nan
. If scalar, will be repeated to match the shape ofdata
(minus the time axis).Note
The default value of
nan
will not produce good results with integer input.tol : float
Tolerance. A single fill value is inserted between adjacent values where the spacing in
times
is strictly greater thantol
times the median of the spacing across alltimes
. The inserted time for fill is halfway between the time on each side. (Default 1.5)absolute :
An absolute value for maximum spacing, of a type that would result from a difference in
times
. If specified,tol
is ignored and any gap strictly larger thanabsolute
will have fill inserted.doTimes : boolean
If True (default), will return a tuple of the times (with new values inserted for the fill records) and the data with new fill values. If False, will only return the data – useful for applying fill to multiple arrays of data on the same timebase.
Raises: ValueError : if can’t identify the time axis of data
Try using
numpy.rollaxis()
to put the time axis first in bothdata
andtimes
.Examples
This example shows simple hourly data with a gap, populated with fill. Note that only a single fill value is inserted, to break the sequence of valid data rather than trying to match the existing cadence.
>>> import datetime >>> import numpy >>> import spacepy.datamanager >>> t = [datetime.datetime(2012, 1, 1, 0), datetime.datetime(2012, 1, 1, 1), datetime.datetime(2012, 1, 1, 2), datetime.datetime(2012, 1, 1, 5), datetime.datetime(2012, 1, 1, 6)] >>> temp = [30.0, 28, 27, 32, 35] >>> filled_t, filled_temp = spacepy.datamanager.insert_fill(t, temp) >>> filled_t array([datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 1, 0), datetime.datetime(2012, 1, 1, 2, 0), datetime.datetime(2012, 1, 1, 3, 30), datetime.datetime(2012, 1, 1, 5, 0), datetime.datetime(2012, 1, 1, 6, 0)], dtype=object) >>> filled_temp array([ 30., 28., 27., nan, 32., 35.])
This example plots “gappy” data with and without explicit fill values.
>>> import matplotlib.pyplot as plt >>> import numpy >>> import spacepy.datamanager >>> x = numpy.append(numpy.arange(0, 6, 0.1), numpy.arange(12, 18, 0.1)) >>> y = numpy.sin(x) >>> xf, yf = spacepy.datamanager.insert_fill(x, y) >>> fig = plt.figure() >>> ax0 = fig.add_subplot(211) >>> ax0.plot(x, y) >>> ax1 = fig.add_subplot(212) >>> ax1.plot(xf, yf) >>> plt.show()
(Source code, png, hires.png, pdf)
-
spacepy.datamanager.
rev_index
(idx, axis=-1)[source]¶ From an index, return an index that reverses the action of that index
Essentially,
a[idx][rev_index(idx)] == a
Note
This becomes more complicated in multiple dimensions, due to the vagaries of applying a multidimensional index.
Parameters: idx : array
Indices onto an array, often the output of
argsort()
.Returns: rev_idx : array
Indices that, when applied to an array after
idx
, will return the original array (before the application ofidx
).Other Parameters: axis : int
Axis along which to return indices, defaults to the last axis.
See also
Examples
>>> import numpy >>> import spacepy.datamanager >>> data = numpy.array([7, 2, 4, 6, 3]) >>> idx = numpy.argsort(data) >>> data[idx] #sorted array([2, 3, 4, 6, 7]) >>> data[idx][spacepy.datamanager.rev_index(idx)] #original array([7, 2, 4, 6, 3])
-
spacepy.datamanager.
values_to_steps
(array, axis=-1)[source]¶ Transform values along an axis to their order in a unique sequence.
Useful in, e.g., converting a list of energies to their steps.
Parameters: array : array
Input data.
Returns: steps : array
An array, the same size as
array
, with values alongaxis
corresponding to the position of the value inarray
in a unique, sorted, set of the values inarray
along that axis. Differs fromargsort()
in that identical values will have identical step numbers in the output.Other Parameters: axis : int
Axis along which to find the steps.
Examples
>>> import numpy >>> import spacepy.datamanager >>> data = [[10., 12., 11., 9., 10., 12., 11., 9.], [10., 12., 11., 9., 14., 16., 15., 13.]] >>> spacepy.datamanager.values_to_steps(data) array([[1, 3, 2, 0, 1, 3, 2, 0], [1, 3, 2, 0, 5, 7, 6, 4]])