Module netCDF4_classic
[hide private]
[frames] | no frames]

Module netCDF4_classic



Introduction

Python interface to the netCDF version 4 library that maintains backward compatibility with netCDF version 3 clients. It can read and write netCDF 3 files, as well as netCDF 4 files that are backward compatible with netCDF 3 clients. netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module does not implement any of the new features of netCDF 4, except zlib compression. To use the other new features of netCDF 4, use the companion netCDF4 module (which produces netCDF 4 files that can only be read by netCDF 4 clients). The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.

Download

Requires

Install

Tutorial

1) Creating/Opening/Closing a netCDF file

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (w, r+ or a), you may write any type of data including new dimensions, variables and attributes. netCDF files come in several flavors (NETCDF3_CLASSIC,NETCDF3_64BIT, NETCDF4_CLASSIC, and NETCDF4). The first two flavors are supported by version 3 of the netCDF library. NETCDF4_CLASSIC files use the version 4 disk format (HDF5), but do not use any features not found in the version 3 API, except zlib compression. They can be read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients, using the HDF5 API. NETCDF4 files use the HDF5 file format and use the new features of the netCDF4 version 4 API, and thus cannot be read by netCDF 3 clients. The netCDF4_classic module can read and write NETCDF3_CLASSIC, NETCDF3_64BIT and NETCDF4_CLASSIC files. To write NETCDF4 files, use the netCDF4 module. To see what how a given file is formatted, you can examine the file_format Dataset attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance.

Here's an example:
>>> import netCDF4_classic as netCDF
>>> dataset = netCDF.Dataset('test.nc', 'w')
>>> print dataset.file_format
NETCDF4_CLASSIC
>>>
>>> dataset.close()

2) Dimensions in a netCDF file

netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created the dimensions they use must be created first. A special case, not often used in practice, is that of a scalar variable, which has no dimensions. A dimension is created using the createDimension method of a Dataset instance. A Python string is used to set the name of the dimension, and an integer value is used to set the size. To create an unlimited dimension (a dimension that can be appended to), the size value is set to None. In this example, the time is unlimited. Only one unlimited dimension per file is allowed in netCDF 3, and it must be the first (or leftmost) dimension. NETCDF4 formatted files may have multiple unlimited dimensions (see the netCDF4 documentation).
>>> dataset = netCDF.Dataset('test.nc', 'a')
>>> dataset.createDimension('time', None)
>>> dataset.createDimension('level', 10)
>>> dataset.createDimension('lat', 73)
>>> dataset.createDimension('lon', 144)
All of the Dimension instances are stored in a python dictionary.
>>> print dataset.dimensions
{'lat': <netCDF4_classic.Dimension object at 0x24a5f7b0>, 
 'time': <netCDF4_classic.Dimension object at 0x24a5f788>, 
 'lon': <netCDF4_classic.Dimension object at 0x24a5f7d8>, 
 'level': <netCDF4_classic.Dimension object at 0x24a5f760>}
>>>
Calling the python len function with a Dimension instance returns the current size of that dimension. The isunlimited() method of a Dimension instance can be used to determine if the dimensions is unlimited, or appendable.
>>> for dimname, dimobj in dataset.dimensions.iteritems():
>>>    print dimname, len(dimobj), dimobj.isunlimited()
lat 73 False
time 0 True
lon 144 False
level 0 False
>>>
Dimension names can be changed using the renameDimension method of a Dataset instance.

3) Variables in a netCDF file

netCDF variables behave much like python multidimensional array objects supplied by the numpy module. However, unlike numpy arrays, netCDF variables can be appended to along the 'unlimited' dimension. To create a netCDF variable, use the createVariable method of a Dataset instance. The createVariable method has two mandatory arguments, the variable name (a Python string), and the variable datatype. The variable's dimensions are given by a tuple containing the dimension names (defined previously with createDimension). To create a scalar variable, simply leave out the dimensions keyword. The variable primitive datatypes correspond to the dtype.str attribute of a numpy array, and can be one of 'f4' (32-bit floating point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer), 'i2' (16-bit signed integer), 'i1' (8-bit signed integer), integer), 'S1' (single-character string). The old single character Numeric typecodes ('f','d','i','h','b','c') are also accepted for compatibility with Scientific.IO.NetCDF. The dimensions themselves are usually also defined as variables, called coordinate variables. The createVariable method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.
>>> times = dataset.createVariable('time','f8',('time',))
>>> levels = dataset.createVariable('level','i4',('level',))
>>> latitudes = dataset.createVariable('latitude','f4',('lat',))
>>> longitudes = dataset.createVariable('longitude','f4',('lon',))
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',))
All of the variables in the file are stored in a Python dictionary, in the same way as the dimensions:
>>> print dataset.variables
{'temp': <netCDF4_classic.Variable object at 0x24a61068>,
 'level': <netCDF4_classic.Variable object at 0.3f0f80>, 
 'longitude': <netCDF4_classic.Variable object at 0x24a61030>,
 'pressure': <netCDF4_classic.Variable object at 0x24a610a0>, 
 'time': <netCDF4_classic.Variable object at 0.3f0.4.58>, 
 'latitude': <netCDF4_classic.Variable object at 0.3f0fb8>}
>>>
Variable names can be changed using the renameVariable method of a Dataset instance.

4) Attributes in a netCDF file

There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about an entire dataset as a whole. Variable attributes provide information about one of the variables in a group. Global attributes are set by assigning values to Dataset instance variables. Variable attributes are set by assigning values to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our example,
>>> import time
>>> dataset.description = 'bogus example script'
>>> dataset.history = 'Created ' + time.ctime(time.time())
>>> dataset.source = 'netCDF4 python module tutorial'
>>> latitudes.units = 'degrees north'
>>> longitudes.units = 'degrees east'
>>> pressure.units = 'hPa'
>>> temp.units = 'K'
>>> times.units = 'days since January 1, 0001'
>>> times.calendar = 'proleptic_gregorian'
The ncattrs() method of a Dataset or Variable instance can be used to retrieve the names of all the netCDF attributes. This method is provided as a convenience, since using the built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.
>>> for name in dataset.ncattrs():
>>>     print 'Global attr', name, '=', getattr(dataset,name)
Global attr description = bogus example script
Global attr history = Created Mon Nov  7 10.30:56 2005
Global attr source = netCDF4_classic python module tutorial
The __dict__ attribute of a Dataset or Variable instance provides all the netCDF attribute name/value pairs in a python dictionary:
>>> print dataset.__dict__
{'source': 'netCDF4_classic python module tutorial',
'description': 'bogus example script', 
'history': 'Created Mon Nov  7 10.30:56 2005'}
Attributes can also be python objects. netCDF4_classic tries to convert attributes to numpy arrays before saving them to the netCDF file. If the attribute is cast to an object array by numpy, it is pickled and saved as a text attribute (and then automatically unpickled when the attribute is accessed). So, an attribute which is a list of integers will be saved as an array of integers, while an attribute that is a python dictionary will be saved as a pickled string, then unpickled automatically when it is retrieved. For example,
>>> from datetime import datetime
>>> dataset.timestamp = datetime.now()
>>> print 'Global attr timestamp =',dataset.timestamp
Global attr timestamp = 2006-03-06 09:20:21.520926

Note that data saved as pickled strings will not be very useful if the data is to be read by a non-python client (the data will appear to the client as an ugly looking binary string).

Attributes can be deleted from a netCDF Dataset or Variable using the python del statement (i.e. del dset.foo removes the attribute foo the the dataset dset).

5) Writing data to and retrieving data from a netCDF variable

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.
>>> import numpy as NP
>>> latitudes[:] = NP.arange(-90,91,2.5)
>>> print 'latitudes =\n',latitudes[:]
latitudes =
[-90.  -87.5 -85.  -82.5 -80.  -77.5 -75.  -72.5 -70.  -67.5 -65.  -62.5
 -60.  -57.5 -55.  -52.5 -50.  -47.5 -45.  -42.5 -40.  -37.5 -35.  -32.5
 -30.  -27.5 -25.  -22.5 -20.  -17.5 -15.  -12.5 -10.   -7.5  -5.   -2.5
   0.    2.5   5.    7.5  10.   12.5  15.   17.5  20.   22.5  25.   27.5
  30.   32.5  35.   37.5  40.   42.5  45.   47.5  50.   52.5  55.   57.5
  60.   62.5  65.   67.5  70.   72.5  75.   77.5  80.   82.5  85.   87.5
  90. ]
>>>
Unlike numpy array objects, netCDF Variable objects with an unlimited dimension will grow along that dimension if you assign data outside the currently defined range of indices.
>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(dataset.dimensions['lat'])
>>> nlons = len(dataset.dimensions['lon'])
>>> nlevs = len(dataset.dimensions['level'])
>>> print 'temp shape before adding data = ',temp.shape
temp shape before adding data =  (0, 10, 73, 144)
>>>
>>> from numpy.random.mtrand import uniform
>>> temp[0:5,:,:,:] = uniform(size=(5,10,nlats,nlons))
>>> print 'temp shape after adding data = ',temp.shape
temp shape after adding data =  (5, 10, 73, 144)
>>>
>>> # times have grown, but no values yet assigned.
>>> print 'times shape after adding pressure data = ',times.shape
times shape after adding pressure data =  (5,)
>>>

Note that the size of the times variable grows when data is appended along the time dimension of the variable temp, even though no data has yet been assigned to the variable times.

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF and COARDS) specify that time should be measure relative to a fixed date using a certain calendar, with units specified like hours since YY:MM:DD hh-mm-ss. These units can be awkward to deal with, without a utility to convert the values to and from calendar dates. A module called netcdftime.netcdftime is provided with this package to do just that. Here's an example of how it can be used:
>>> # fill in times.
>>> from datetime import timedelta
>>> from netcdftime import utime
>>> cdftime = utime(times.units,calendar=times.calendar,format='%B %d, %Y') 
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = cdftime.date2num(dates)
>>> print 'time values (in units %s): ' % times.units+'\n',times[:]
time values (in units hours since January 1, 0001): 
[ 17533056.  17533068.  17533080.  17533092.  17533104.]
>>>
>>> dates = cdftime.num2date(times[:])
>>> print 'dates corresponding to time values:\n',dates
dates corresponding to time values:
[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00
 2001-03-02 12:00:00 2001-03-03 00:00:00]
>>>
Values of time in the specified units and calendar are converted to and from python datetime instances using the num2date and date2num methods of the utime class. See the netcdftime.netcdftime documentation for more details.

6) Efficient compression of netCDF variables

Data stored in netCDF Variable objects is compressed on disk by default, if the file format is NETCDF4_CLASSIC. This a new feature of netCDF 4, but the resulting files can still be read by netCDF 3 clients that have been linked against the netCDF 4 library. The parameters for the compression are determined by the zlib and complevel and shuffle keyword arguments to the createVariable method. The default values are zlib=True, complevel=6 and shuffle=True. To turn off compression, set zlib=False. complevel regulates the speed and efficiency of the compression (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). shuffle=False will turn off the HDF5 shuffle filter, which de-interlaces a block of data by reordering the bytes. The shuffle filter can significantly improve compression ratios. Setting fletcher32 keyword argument to createVariable to True (it's False by default) enables the Fletcher32 checksum algorithm for error detection.

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of 0.1 degrees), you can dramatically improve compression by quantizing (or truncating) the data using the least_significant_digit keyword argument to createVariable. The least significant digit is the power of ten of the smallest decimal place in the data that is a reliable value. For example if the data has a precision of 0.1, then setting least_significant_digit=1 will cause data the data to be quantized using {NP.around(scale*data)/scale}, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Effectively, this makes the compression 'lossy' instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space.

In our example, try replacing the line
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',))
with
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),
least_significant_digit=3)
and see how much smaller the resulting file is. If the file format is not NETCDF4_CLASSIC, using the least_significant_digit keyword will not result in a smaller file, since on-the-fly zlib compression will not be done. However, the resulting file will still be smaller when gzipped.

7) Converting netCDF 3 files to netCDF 4 files (with compression)

A command line utility (nc3tonc4) is provided which can convert a netCDF 3 file (in NETCDF3_CLASSIC or NETCDF3_64BIT format) to a NETCDF4_CLASSIC file, optionally unpacking variables packed as short integers (with scale_factor and add_offset) to floats, and adding zlib compression (with the HDF5 shuffle filter and fletcher32 checksum). Data may also be quantized (truncated) to a specified precision to improve compression.
>>> os.system('nc3tonc4 -h')
nc3tonc4 [-h] [-o] [--zlib=(0|1)] [--complevel=(1-9)] [--shuffle=(0|1)]
         [--fletcher32=(0|1)] [--unpackshort=(0|1)]
         [--quantize=var1=n1,var2=n2,..] netcdf3filename netcdf4filename
-h -- Print usage message.
-o -- Overwite destination file
      (default is to raise an error if output file already exists).
--zlib=(0|1) -- Activate (or disable) zlib compression (default is activate).
--complevel=(1-9) -- Set zlib compression level (6 is default).
--shuffle=(0|1) -- Activate (or disable) the shuffle filter
                   (active by default).
--fletcher32=(0|1) -- Activate (or disable) the fletcher32 checksum
                      (not active by default).
--unpackshort=(0|1) -- Unpack short integer variables to float variables
                       using scale_factor and add_offset netCDF 
                       variable attributes (active by default).
--quantize=(comma separated list of "variable name=integer" pairs) --
  Truncate the data in the specified variables to a given decimal precision.
  For example, 'speed=2, height=-2, temp=0' will cause the variable
  'speed' to be truncated to a precision of 0.01, 
  'height' to a precision of 100 and 'temp' to 1.
  This can significantly improve compression. The default
  is not to quantize any of the variables.

If --zlib=1, the resulting NETCDF4_CLASSIC file will take up less disk space than the original netCDF 3 file (especially if the --quantize option is used), and will be readable by netCDF 3 clients as long as they have been linked against the netCDF 4 library.

All of the code in this tutorial is available in examples_classic/tutorial.py, along with several other examples. Unit tests are in the test_classic directory.


Contact: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>

Copyright: 2006 by Jeffrey Whitaker.

License: Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Classes [hide private]
  Dataset
A netCDF Dataset is a collection of dimensions, variables and attributes.
  Dimension
A netCDF Dimension is used to describe the coordinates of a Variable.
  Variable
A netCDF Variable is used to read and write netCDF data.

Functions [hide private]
  _get_att(...)
Private function to get an attribute value given its name
  _get_att_names(...)
Private function to get all the attribute names in a group
  _get_dims(...)
Private function to create Dimension instances for all the dimensions in a Dataset
  _get_format(...)
Private function to get the netCDF file format
  _get_vars(...)
Private function to create Variable instances for all the variables in a Dataset
  _set_att(...)
Private function to set an attribute name/value pair
  _set_default_format(...)
Private function to set the netCDF file format

Variables [hide private]
  __version__ = '0.6.2'
  _key = 'f8'
  _nctonptype = {1: 'i1', 2: 'S1', 3: 'i2', 4: 'i4', 5: 'f4', 6: 'f8'}
  _nptonctype = {'c': 2, 'b': 1, 'f4': 5, 'd': 6, 'f': 5, 'i1': 1, '...
  _npversion = '1.0.1'
  _private_atts = ['_dsetid', '_dset', '_varid', 'dimensions', 'variab...
  _supportedtypes = ['f4', 'i1', 'S1', 'i2', 'i4', 'f8']
  _value = 6

Function Details [hide private]

_get_att(...)

 
Private function to get an attribute value given its name

_get_att_names(...)

 
Private function to get all the attribute names in a group

_get_dims(...)

 
Private function to create Dimension instances for all the dimensions in a Dataset

_get_format(...)

 
Private function to get the netCDF file format

_get_vars(...)

 
Private function to create Variable instances for all the variables in a Dataset

_set_att(...)

 
Private function to set an attribute name/value pair

_set_default_format(...)

 
Private function to set the netCDF file format

Variables Details [hide private]

__version__

None
Value:
'0.6.2'                                                                
      

_key

None
Value:
'f8'                                                                   
      

_nctonptype

None
Value:
{1: 'i1', 2: 'S1', 3: 'i2', 4: 'i4', 5: 'f4', 6: 'f8'}                 
      

_nptonctype

None
Value:
{'B': 1,
 'S1': 2,
 'b': 1,
 'c': 2,
 'd': 6,
 'f': 5,
 'f4': 5,
 'f8': 6,
...                                                                    
      

_npversion

None
Value:
'1.0.1'                                                                
      

_private_atts

None
Value:
['_dsetid',
 '_dset',
 '_varid',
 'dimensions',
 'variables',
 'dtype',
 'file_format']                                                        
      

_supportedtypes

None
Value:
['f4', 'i1', 'S1', 'i2', 'i4', 'f8']                                   
      

_value

None
Value:
6