Module netCDF4_classic
Introduction
Python interface to the netCDF version 4 library that maintains
backward compatibility with netCDF version 3 clients. It can read and
write netCDF 3 files, as well as netCDF 4 files that are backward
compatible with netCDF 3 clients. netCDF version 4 has many features not found in
earlier versions of the library and is implemented on top of HDF5. This
module does not implement any of the new features of netCDF 4, except
zlib compression. To use the other new features of netCDF 4, use the
companion netCDF4 module
(which produces netCDF 4 files that can only be read by netCDF 4
clients). The API modelled after Scientific.IO.NetCDF, and should be familiar to users
of that module.
Download
Requires
Install
-
install the requisite python modules and C libraries (see
above).
-
set the
HDF5_DIR
environment variable to point to
where HDF5 is installed. (the libs in $HDF5_DIR/lib
,
the headers in $HDF5_DIR/include
).
-
set the
NETCDF4_DIR
environment variable to point
to where the netCDF version 4 library and headers are
installed.
-
run 'python setup.py install'
-
run some of the tests in the 'test_classic' directory.
Tutorial
1) Creating/Opening/Closing a netCDF file
To create a netCDF file from python, you simply call the Dataset
constructor. This is also the method used to open an existing netCDF
file. If the file is open for write access (w, r+
or
a
), you may write any type of data including new
dimensions, variables and attributes. netCDF files come in several
flavors (NETCDF3_CLASSIC,NETCDF3_64BIT, NETCDF4_CLASSIC
,
and NETCDF4
). The first two flavors are supported by
version 3 of the netCDF library. NETCDF4_CLASSIC
files
use the version 4 disk format (HDF5), but do not use any features not
found in the version 3 API, except zlib compression. They can be read
by netCDF 3 clients only if they have been relinked against the
netCDF 4 library. They can also be read by HDF5 clients, using the
HDF5 API. NETCDF4
files use the HDF5 file format and use
the new features of the netCDF4 version 4 API, and thus cannot be
read by netCDF 3 clients. The netCDF4_classic module can read and
write NETCDF3_CLASSIC
, NETCDF3_64BIT
and
NETCDF4_CLASSIC
files. To write NETCDF4
files, use the netCDF4
module. To see what how a given file is formatted, you can examine
the file_format
Dataset
attribute. Closing the netCDF file is accomplished via the
close
method of the Dataset
instance.
Here's an example:
>>> import netCDF4_classic as netCDF
>>> dataset = netCDF.Dataset('test.nc', 'w')
>>> print dataset.file_format
NETCDF4_CLASSIC
>>>
>>> dataset.close()
2) Dimensions in a netCDF file
netCDF defines the sizes of all variables in terms of dimensions,
so before any variables can be created the dimensions they use must
be created first. A special case, not often used in practice, is that
of a scalar variable, which has no dimensions. A dimension is created
using the createDimension
method of a Dataset
instance. A Python string is used to set the name of the dimension,
and an integer value is used to set the size. To create an unlimited
dimension (a dimension that can be appended to), the size value is
set to None
. In this example, the time
is
unlimited. Only one unlimited dimension per file is allowed in
netCDF 3, and it must be the first (or leftmost) dimension.
NETCDF4
formatted files may have multiple unlimited
dimensions (see the netCDF4 documentation).
>>> dataset = netCDF.Dataset('test.nc', 'a')
>>> dataset.createDimension('time', None)
>>> dataset.createDimension('level', 10)
>>> dataset.createDimension('lat', 73)
>>> dataset.createDimension('lon', 144)
All of the Dimension instances are stored in a python
dictionary.
>>> print dataset.dimensions
{'lat': <netCDF4_classic.Dimension object at 0x24a5f7b0>,
'time': <netCDF4_classic.Dimension object at 0x24a5f788>,
'lon': <netCDF4_classic.Dimension object at 0x24a5f7d8>,
'level': <netCDF4_classic.Dimension object at 0x24a5f760>}
>>>
Calling the python len
function with a Dimension instance returns the current size of that
dimension. The isunlimited()
method of a Dimension instance can be used to determine if the
dimensions is unlimited, or appendable.
>>> for dimname, dimobj in dataset.dimensions.iteritems():
>>> print dimname, len(dimobj), dimobj.isunlimited()
lat 73 False
time 0 True
lon 144 False
level 0 False
>>>
Dimension names can be changed using the
renameDimension
method of a Dataset
instance.
3) Variables in a netCDF file
netCDF variables behave much like python multidimensional array
objects supplied by the numpy module. However, unlike numpy arrays, netCDF
variables can be appended to along the 'unlimited' dimension. To
create a netCDF variable, use the createVariable
method
of a Dataset instance. The createVariable
method has two mandatory arguments, the variable name (a Python
string), and the variable datatype. The variable's dimensions are
given by a tuple containing the dimension names (defined previously
with createDimension
). To create a scalar variable,
simply leave out the dimensions keyword. The variable primitive
datatypes correspond to the dtype.str attribute of a numpy array, and
can be one of 'f4'
(32-bit floating point),
'f8'
(64-bit floating point), 'i4'
(32-bit
signed integer), 'i2'
(16-bit signed integer),
'i1'
(8-bit signed integer), integer), 'S1'
(single-character string). The old single character Numeric
typecodes ('f','d','i','h','b','c'
) are also accepted
for compatibility with Scientific.IO.NetCDF. The dimensions
themselves are usually also defined as variables, called coordinate
variables. The createVariable
method returns an instance
of the Variable class whose methods can be used later to
access and set variable data and attributes.
>>> times = dataset.createVariable('time','f8',('time',))
>>> levels = dataset.createVariable('level','i4',('level',))
>>> latitudes = dataset.createVariable('latitude','f4',('lat',))
>>> longitudes = dataset.createVariable('longitude','f4',('lon',))
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',))
All of the variables in the file are stored in a Python
dictionary, in the same way as the dimensions:
>>> print dataset.variables
{'temp': <netCDF4_classic.Variable object at 0x24a61068>,
'level': <netCDF4_classic.Variable object at 0.3f0f80>,
'longitude': <netCDF4_classic.Variable object at 0x24a61030>,
'pressure': <netCDF4_classic.Variable object at 0x24a610a0>,
'time': <netCDF4_classic.Variable object at 0.3f0.4.58>,
'latitude': <netCDF4_classic.Variable object at 0.3f0fb8>}
>>>
Variable names can be changed using the
renameVariable
method of a Dataset
instance.
4) Attributes in a netCDF file
There are two types of attributes in a netCDF file, global and
variable. Global attributes provide information about an entire
dataset as a whole. Variable attributes provide information about one of
the variables in a group. Global attributes are set by assigning
values to Dataset instance variables. Variable
attributes are set by assigning values to Variable
instances variables. Attributes can be strings, numbers or sequences.
Returning to our example,
>>> import time
>>> dataset.description = 'bogus example script'
>>> dataset.history = 'Created ' + time.ctime(time.time())
>>> dataset.source = 'netCDF4 python module tutorial'
>>> latitudes.units = 'degrees north'
>>> longitudes.units = 'degrees east'
>>> pressure.units = 'hPa'
>>> temp.units = 'K'
>>> times.units = 'days since January 1, 0001'
>>> times.calendar = 'proleptic_gregorian'
The ncattrs()
method of a Dataset or
Variable instance can be used to retrieve the names
of all the netCDF attributes. This method is provided as a
convenience, since using the built-in dir
Python
function will return a bunch of private methods and attributes that
cannot (or should not) be modified by the user.
>>> for name in dataset.ncattrs():
>>> print 'Global attr', name, '=', getattr(dataset,name)
Global attr description = bogus example script
Global attr history = Created Mon Nov 7 10.30:56 2005
Global attr source = netCDF4_classic python module tutorial
The __dict__
attribute of a Dataset or
Variable instance provides all the netCDF attribute
name/value pairs in a python dictionary:
>>> print dataset.__dict__
{'source': 'netCDF4_classic python module tutorial',
'description': 'bogus example script',
'history': 'Created Mon Nov 7 10.30:56 2005'}
Attributes can also be python objects. netCDF4_classic tries to
convert attributes to numpy arrays before saving them to the netCDF
file. If the attribute is cast to an object array by numpy, it is
pickled and saved as a text attribute (and then automatically
unpickled when the attribute is accessed). So, an attribute which is
a list of integers will be saved as an array of integers, while an
attribute that is a python dictionary will be saved as a pickled
string, then unpickled automatically when it is retrieved. For
example,
>>> from datetime import datetime
>>> dataset.timestamp = datetime.now()
>>> print 'Global attr timestamp =',dataset.timestamp
Global attr timestamp = 2006-03-06 09:20:21.520926
Note that data saved as pickled strings will not be very useful if
the data is to be read by a non-python client (the data will appear
to the client as an ugly looking binary string).
Attributes can be deleted from a netCDF Dataset or
Variable using the python del
statement
(i.e. del dset.foo
removes the attribute
foo
the the dataset dset
).
5) Writing data to and retrieving data from a netCDF variable
Now that you have a netCDF Variable
instance, how do you put data into it? You can just treat it like an
array and assign data to a slice.
>>> import numpy as NP
>>> latitudes[:] = NP.arange(-90,91,2.5)
>>> print 'latitudes =\n',latitudes[:]
latitudes =
[-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5
-60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5
-30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5
0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5
30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5
60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5
90. ]
>>>
Unlike numpy array objects, netCDF Variable
objects with an unlimited dimension will grow along that dimension if
you assign data outside the currently defined range of indices.
>>>
>>> nlats = len(dataset.dimensions['lat'])
>>> nlons = len(dataset.dimensions['lon'])
>>> nlevs = len(dataset.dimensions['level'])
>>> print 'temp shape before adding data = ',temp.shape
temp shape before adding data = (0, 10, 73, 144)
>>>
>>> from numpy.random.mtrand import uniform
>>> temp[0:5,:,:,:] = uniform(size=(5,10,nlats,nlons))
>>> print 'temp shape after adding data = ',temp.shape
temp shape after adding data = (5, 10, 73, 144)
>>>
>>>
>>> print 'times shape after adding pressure data = ',times.shape
times shape after adding pressure data = (5,)
>>>
Note that the size of the times variable grows when data is
appended along the time
dimension of the variable
temp
, even though no data has yet been assigned to the
variable times
.
Time coordinate values pose a special challenge to netCDF users.
Most metadata standards (such as CF and COARDS) specify that time
should be measure relative to a fixed date using a certain calendar,
with units specified like hours since YY:MM:DD hh-mm-ss
.
These units can be awkward to deal with, without a utility to convert
the values to and from calendar dates. A module called netcdftime.netcdftime is provided with this package
to do just that. Here's an example of how it can be used:
>>>
>>> from datetime import timedelta
>>> from netcdftime import utime
>>> cdftime = utime(times.units,calendar=times.calendar,format='%B %d, %Y')
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = cdftime.date2num(dates)
>>> print 'time values (in units %s): ' % times.units+'\n',times[:]
time values (in units hours since January 1, 0001):
[ 17533056. 17533068. 17533080. 17533092. 17533104.]
>>>
>>> dates = cdftime.num2date(times[:])
>>> print 'dates corresponding to time values:\n',dates
dates corresponding to time values:
[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00
2001-03-02 12:00:00 2001-03-03 00:00:00]
>>>
Values of time in the specified units and calendar are converted
to and from python datetime
instances using the
num2date
and date2num
methods of the
utime
class. See the netcdftime.netcdftime documentation for more
details.
6) Efficient compression of netCDF variables
Data stored in netCDF Variable
objects is compressed on disk by default, if the file format is
NETCDF4_CLASSIC
. This a new feature of netCDF 4, but the
resulting files can still be read by netCDF 3 clients that have been
linked against the netCDF 4 library. The parameters for the
compression are determined by the zlib
and
complevel
and shuffle
keyword arguments to
the createVariable
method. The default values are
zlib=True
, complevel=6
and
shuffle=True
. To turn off compression, set
zlib=False
. complevel
regulates the speed
and efficiency of the compression (1 being fastest, but lowest
compression ratio, 9 being slowest but best compression ratio).
shuffle=False
will turn off the HDF5 shuffle filter,
which de-interlaces a block of data by reordering the bytes. The
shuffle filter can significantly improve compression ratios. Setting
fletcher32
keyword argument to
createVariable
to True
(it's
False
by default) enables the Fletcher32 checksum
algorithm for error detection.
If your data only has a certain number of digits of precision (say
for example, it is temperature data that was measured with a
precision of 0.1 degrees), you can dramatically improve compression
by quantizing (or truncating) the data using the
least_significant_digit
keyword argument to
createVariable
. The least significant digit is the power
of ten of the smallest decimal place in the data that is a reliable
value. For example if the data has a precision of 0.1, then setting
least_significant_digit=1
will cause data the data to be
quantized using {NP.around(scale*data)/scale}, where scale = 2**bits,
and bits is determined so that a precision of 0.1 is retained (in
this case bits=4). Effectively, this makes the compression 'lossy'
instead of 'lossless', that is some precision in the data is
sacrificed for the sake of disk space.
In our example, try replacing the line
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',))
with
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),
least_significant_digit=3)
and see how much smaller the resulting file is. If the file
format is not NETCDF4_CLASSIC
, using the
least_significant_digit keyword will not result in a smaller file,
since on-the-fly zlib compression will not be done. However, the
resulting file will still be smaller when gzipped.
7) Converting netCDF 3 files to netCDF 4 files (with compression)
A command line utility (nc3tonc4
) is provided which
can convert a netCDF 3 file (in NETCDF3_CLASSIC
or
NETCDF3_64BIT
format) to a NETCDF4_CLASSIC
file, optionally unpacking variables packed as short integers (with
scale_factor and add_offset) to floats, and adding zlib compression
(with the HDF5 shuffle filter and fletcher32 checksum). Data may also
be quantized (truncated) to a specified precision to improve
compression.
>>> os.system('nc3tonc4 -h')
nc3tonc4 [-h] [-o] [--zlib=(0|1)] [--complevel=(1-9)] [--shuffle=(0|1)]
[--fletcher32=(0|1)] [--unpackshort=(0|1)]
[--quantize=var1=n1,var2=n2,..] netcdf3filename netcdf4filename
-h -- Print usage message.
-o -- Overwite destination file
(default is to raise an error if output file already exists).
--zlib=(0|1) -- Activate (or disable) zlib compression (default is activate).
--complevel=(1-9) -- Set zlib compression level (6 is default).
--shuffle=(0|1) -- Activate (or disable) the shuffle filter
(active by default).
--fletcher32=(0|1) -- Activate (or disable) the fletcher32 checksum
(not active by default).
--unpackshort=(0|1) -- Unpack short integer variables to float variables
using scale_factor and add_offset netCDF
variable attributes (active by default).
--quantize=(comma separated list of "variable name=integer" pairs) --
Truncate the data in the specified variables to a given decimal precision.
For example, 'speed=2, height=-2, temp=0' will cause the variable
'speed' to be truncated to a precision of 0.01,
'height' to a precision of 100 and 'temp' to 1.
This can significantly improve compression. The default
is not to quantize any of the variables.
If --zlib=1
, the resulting
NETCDF4_CLASSIC
file will take up less disk space than
the original netCDF 3 file (especially if the --quantize
option is used), and will be readable by netCDF 3 clients as long as
they have been linked against the netCDF 4 library.
All of the code in this tutorial is available in
examples_classic/tutorial.py, along with several other examples. Unit
tests are in the test_classic directory.
Contact:
Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>
Copyright:
2006 by Jeffrey Whitaker.
License:
Permission to use, copy, modify, and distribute this software
and its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all
copies and that both the copyright notice and this permission
notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
|
_get_att(...)
Private function to get an attribute value given its name
|
|
_get_att_names(...)
Private function to get all the attribute names in a group
|
|
_get_dims(...)
Private function to create Dimension instances for all the
dimensions in a Dataset
|
|
_get_format(...)
Private function to get the netCDF file format
|
|
_get_vars(...)
Private function to create Variable instances for all the
variables in a Dataset
|
|
_set_att(...)
Private function to set an attribute name/value pair
|
|
_set_default_format(...)
Private function to set the netCDF file format
|
|
__version__ = '0.6.2'
|
|
_key = 'f8'
|
|
_nctonptype = {1: 'i1', 2: 'S1', 3: 'i2', 4: 'i4', 5: 'f4', 6: 'f8'}
|
|
_nptonctype = {'c': 2, 'b': 1, 'f4': 5, 'd': 6, 'f': 5, 'i1': 1, '...
|
|
_npversion = '1.0.1'
|
|
_private_atts = ['_dsetid', '_dset', '_varid', 'dimensions', 'variab...
|
|
_supportedtypes = ['f4', 'i1', 'S1', 'i2', 'i4', 'f8']
|
|
_value = 6
|
Private function to get an attribute value given its name
-
|
Private function to get all the attribute names in a group
-
|
Private function to create Dimension instances for all the dimensions
in a Dataset
-
|
Private function to get the netCDF file format
-
|
Private function to create Variable instances for all the variables in
a Dataset
-
|
Private function to set an attribute name/value pair
-
|
Private function to set the netCDF file format
-
|
_nctonptype
None
-
- Value:
{1: 'i1', 2: 'S1', 3: 'i2', 4: 'i4', 5: 'f4', 6: 'f8'}
|
|
_nptonctype
None
-
- Value:
{'B': 1,
'S1': 2,
'b': 1,
'c': 2,
'd': 6,
'f': 5,
'f4': 5,
'f8': 6,
...
|
|
_private_atts
None
-
- Value:
['_dsetid',
'_dset',
'_varid',
'dimensions',
'variables',
'dtype',
'file_format']
|
|
_supportedtypes
None
-
- Value:
['f4', 'i1', 'S1', 'i2', 'i4', 'f8']
|
|