das2, Module Reference

Federated Catalog

Factory Functions

das2.get_node(sPathId, sUrl=None)

Read a single das2 catalog node from a local or remote file.

Parameters:
  • sPathId (str) – A string providing the catalog Path ID for the object to load. Since the das2 root node at ‘tag:das2.org,2012: contains the most commonly loaded item, any sPathId starting with ‘site:/’ or ‘test:/’ is assumed to sub item of the das2 root node.
  • note (..) – This value is required even if an explicit URL is provided because much like files on a disk, das2 catalog objects do not embbed the name of node within the contents of the node.
  • sUrl (str, optional) – A string providing a direct load URL. URLs must start with one of ‘file://‘, ‘http://‘, or ‘https://‘.
Returns:

Either a das2.Catalog, das2.Collection or das2.HttpStreamSrc object depending on the contents of the file. If the file does not describe one of these object types or can’t be read then None is returned.

Examples

Load an http stream source for a das2 Voyager dataset:

>>> src = get_node('site:/uiowa/voyager/1/pws/uncalibrated/waveform/das2')
>>> print(src.__class__.__name__)
HttpStreamSrc

Load the catalog of das2 production sites:

>>> cat = get_node('site:/uiowa')
>>> print(cat.__class__.__name__)
Catalog

Load the same item as above but provide the full catalog path URI:

>>> cat = get_node('tag:das2.org,2012:site:/uiowa')

Load a SPASE record for the JAXA listing

>>> rec = get_node('tag:spase-group.org,2018:spase://GBO')
>>> print(cat.__class__.__name__)
SpaseRecord

Load a data source definition from a local file:

>>> src = get_node('tag:place.org:2019:/hidden/my_source',
                   'file:///home/person/my_source.json')

Load a dataset collection from a specific remote URL:

>>> cat = get_node('tag:place.org:2019:/hidden/my_source',
                        'http://place.org/catalog/my_source.json')
das2.get_catalog()

Read a single directory definition from a local or remote file.

This function is essentially a wrapper around get_node() that makes sure the returned object has the sub() method.

param sPathId:

A string providing the catalog Path ID for the object to load. Since the das2 root node at ‘tag:das2.org,2012: contains the

most commonly loaded item, any sPathId starting with ‘site:/’ or ‘test:/’ is assumed to sub item of the das2 root node.

Note: This value is required even if an explicit URL is provided

because much like files on a disk, das2 catalog objects do not embbed

the name of node within the contents of the node.

param sUrl:

(optional) A string providing a direct load URL. URLs must start with one of ‘file://‘, ‘http://‘, or ‘https://‘.

rtype:

Either a Catalog or Collection object is returned depending on the contents of the file.

raises CatalogError:
 

If the file does not describe one of these object types.

raise _das2.error:
 

If a low-level read error occurs.

das2.get_source(sPathId, sUrl=None)

Read a single data source definition from a local or remote files.

This function is essentially a wrapper around get_node() that makes sure the returned object has the get() method.

Parameters:
  • sPathId (str) –

    Provides the catalog Path ID for the object to load Since the das2 root node at ‘tag:das2.org,2012: contains the

    most commonly loaded item, any sPathId starting with ‘site:/’ or ‘test:/’ is assumed to sub item of the das2 root node.

    Note: This value is required even if an explicit URL is provided

    because much like files on a disk, das2 catalog objects do not
    embed the name of node within the contents of the node.
  • sUrl (str,optional) – A string providing a direct load URL. URLs must start with one of ‘file://‘, ‘http://‘, or ‘https://‘.
Returns:

Either a HttpStreamSrc or FileAggSrc object is returned depending on the contents of the file.

Return type:

Node

Raises:
  • CatalogError – If the file does not describe one of these object types.
  • _das2.error – If a low-level read error occurs.

Class Source

class das2.Source(dDef, bStub, bGlobal)

This class exists to define the interface for Source objects and to hook into to the 2-phase node construction mechanisim.

examples()

Return a list of named examples.

Every data source is required to provide at least one example dataset, for testing and evaluation purposes.

Returns:A dictionary of example IDs and descriptions.

Note

The keys in the return dictionary can be supplied to get() to retrieve the name example dataset.

get(where=None)

Get data from a Source.

To get the list of examples by name for this data source use the function examples(). In addition to the queries defined below example names as return from get() can be used as the where argument. For example:

source = das2.get_source('site:/uiowa/mars_express/marsis/spectrogram/das2')
examples = src.examples()
datasets = src.get(examples[0][0])

For general purpose data queries supply a dictionary with the following form:

query = {
   'coord':{
      'VARIABLE_NAME':{
         'ASPECT_NAME':ASPECT_VALUE,
         ...
      },
      ...
   }
   'data':{
      'VARIABLE_NAME':{
         'ASPECT_NAME':ASPECT_VALUE,
         ...
      },
      ...
   }
   'option':{
      OPTION_NAME':OPTION_VALUE,
      ...
   }
})
source.get(query)

Where unneeded sections are omitted.

For example a simple query by start and stop type of a data source would be:

query = {'coord':{'time':{'minimum':'2017-01-01', 'maximum':'2017-01-02'}}}
source.get(query)

Gathering data in the default time range with a hypothetical filter named ‘no_spikes’ for a data variable named ‘electric’ would be:

query = {'data':{'electric':{'no_spikes':True}}}
source.get(query)

The following example sets the time range and turns on the ‘no_spikes’ filter:

query = {
   'coord':{'time':{'minimum':'2017-01-01', 'maximum':'2017-01-02'}},
   'data':{'electric':{'no_spikes':True}}
}
source.get(query)
ShortCuts:

As a convienience, query dictionary keys and values can be under-specified as long as the intent is clear.

  • If a data, coordinate or general option name is unique within data source definition, then the section qualifiers ‘coord’, ‘data’, and ‘option’ may be omitted. The combined query dictionary above could thus be shortend to:

    query = {
       'time':{'minimum':'2017-01-01', 'maximum':'2017-01-02'},
       'electric':{'rm_spikes':True}
    }
    source.get(query)
    
  • Coordinate subset dictionaries may replaced by a tuple. The members of the tuple will be taken to provide the: minimum, maximum, and resolution in that order. The value None can be used skip a spot and the tuple need not be three elements long. In combination with the shortcut above, the query could be given as:

    query = {'time':('2017-01-01', '2017-01-02'), 'electric':{'rm_spikes':True}}
    source.get(query)
    
  • Boolean options can be set to True just by providing thier name alone in a list:

    query = {'time':('2017-01-01', '2017-01-02'), 'electric':['rm_spikes']}
    source.get(query)
    
  • And the list can be omitted if it contains only a single item:

    query = {'time':('2017-01-01', '2017-01-02'), 'electric':'rm_spikes'}
    source.get(query)
    
  • Finally Variables with the ‘enabled’ aspect may have thier enabled state changed by using True or False in the place of the entire aspect dictionary. For example assume a source that can output both a ‘magnetic’ and ‘electric’ data variables. The following query dictionary would enable output of ‘electric’ data but not ‘magnetic’:

    source.get({ 'magnetic':False, 'electric':True})
    
Parameters:where (str, dict, None) – Either the name of a predefined example, a query dictionary, or None to indicate download of the default example dataset.
Returns:List of Dataset objects.
info()

Get a pretty-print string of query options for a data Source.

params()

Get a standardized dictionary describing how to query a data source.

Each data collection in das2 defines one or more named coordinate variables and zero or more named data variables in those coordinates. For example a magnetometer collection could be defined as:

  • Time (coordinate variable)
  • Payload_X_Magnitude (data variable)
  • Payload_Y_Magnitude (data variable)
  • Payload_Z_Magnitude (data variable)

In order to get data, Collections contain one or more data Sources which can be used to obtain values from the collection. Most data Sources are controllable. This function provides the control parameters for a data source.

There are three sections in the Source JSON (or XML) definition that can define control paramaters:

  • coord Each sub-item in this key corresponds to a single coordinate variable for the dataset. Each variable can have one or more aspects that are settable.
  • data Simiar to the coordinates section, except each item here represets a single data variable for the overall dataset. Data variables are the items measured by an instrument or the values computed by a model.
  • options This section contains extra options for the data source that are not directly associated with any particular data or coordinate variable. Items such as the output format appear here.

Each settable aspect of a variable, or settable option contains the sub-key ‘set’. If this sub-key is not present the aspect is not settable and will not appear in the query dictionary ouput by this function. Though nearly any non-whitespace string can be used to name a variable aspect or general option, certian aspect names have a special meaning and may receive special handling in end user code. Special aspect names are listed below:

  • mimimum Used to state the smallest desired value of a Variable. Typically available as a settable coordinate variable aspect.
  • maximum Used to state the largest desired value. Typically available as a settable coordinate variable aspect.
  • resolution Used to state with of desired average value bins, typically as a settable coordinate variable aspect.
  • units Used to state the desired physical units for output values. This is typically available as a settable data Variable aspect.
  • enabled Use toggle the output state of a variable, typically encountered with data variables.

Other settable aspect are often available as well, though no attempt has been made to standardize thier names. The following example output for the Voyager PWS Spectrum Analyzer data source demonstrates both common and customized variable aspects:

{
  'coord':{
    'time'{
      'minimum':{
        'name':'Min Time',
        'title':'Minimum Time Value to stream',
        'default':'2014-08-31',
        'type':'isotime',
        'range': ['1977-08-20','2019-03-01']
      },
      'maximum':{
        'name':'Min Time',
        'title':'Minimum Time Value to stream',
        'default':'2014-09-01',
        'type':'isotime',
        'range': ['1977-08-20','2019-03-01']
      },
      'resolution':{
        'name':'Time Bin Size',
        'title':'Maximum width of output time bins, use 0.0 for intrinsic values',
        'default':0.0,
        'type':'real',
        'range': [0.0, None]
      }
    }
  },
  'data':{
    'efield':{
      'units':{
         'name':'Calibration Units',
         'title':'Set the calibration table, 'raw' means no calibration',
         'default': 'V m**-1',
         'type':'string',
         'enum':['V m**-1, 'raw', 'V**2 m**-2 Hz**-1', 'W m**-2 Hz**-1']
       }
       'negative':{
         'name':'Keep Negative',
         'title':'Negative values are used as a noise flag. Use this option to keep them.',
         'default':False,
         'type':'boolean',
         'enum':[True, False],
       },
       'channel':{
         'name':'SA Channel'
         'title':'Spectrum anaylzer channel to output, 'all' outputs 16 channels',
         'default':'all',
         'enum':['all',
           '10.0Hz','17.8Hz','31.1Hz','56.2Hz','100Hz','178Hz','311Hz','562Hz',
           '1.00kHz','1.78kHz','3.11kHz','5.62kHz','10.0kHz','17.8kHz','31.1kHz','56.2kHz'
         ]
       }
     }
  },
  'option':{
    'text':{
      'name':'Text',
      'title':'Ensure output stream is formatted as UTF-8 text',
      'default':False,
      'enum':[True, False]
    }
  }
}

See get()

Class Catalog

class das2.Catalog(dDef, sSubKey, bStub, bGlobal)

Catalog objects. May contain other Catalogs, Data Source Collections or SpaseRec objects.

type()

Just returns the output of self.__class__.__name__ for shorter code

Streams to Numpy Arrays

Class Dataset

class das2.Dataset(sId, group=None)

Dataset - Sets of arrays correlated in index space.

The Dataset object exists to unambiguously define data read from a das2 stream (or other supported serialization protocol) in memory so that it may be either used directly or handed over to native structures used in larger analysis packages.

Special members of this class are:

  • .id - A string identifing this dataset with it’s group

  • .group - A string identifing which group this dataset belongs to.

    all datasets in a group may be plotted on the same graph

  • .shape - The overall iteration shape of all members of this dataset.

Note: All arrays from all the Dimensions in this dataset can be used
with the same iteration indices, no matter thier internal storage. The variables within this Dataset are thus correlated in index space. For example given the index set [i : j : k], if some set of i’s are valid for one Variable within the Dataset, then the same range of i’s are valid for all Variables within the dataset. The same is true for j, k, and so on. Array broadcasting is used to conserve memory. See the Variable.degenerate() function for more information.

Datasets contain Dimensions.

__getitem__(key)

Get dataset subsets

Parameters:key (str, tuple) – A string to get a dimension, a slice or tuple of slices to get a subset along an axis.
Returns:Dimension or Dataset
__str__()

Get a string summarizing the dataset

coord(sId)

Create or get a coordinate dimension

data(sId)

Create or get a data dimension

dim(sId)

Create or get a dimension.

getVar(sVar)

Get a variable from a dateset using a Variable path string

Parameters:sVar (str) –

The variable path string. These have the form:

[CATEGORY:]DIMENSION[:VARIABLE]

for example:

coords:time:center

would specify the center positions of time coordinate. If the CATEGORY portion is not supplied both coords and data are searched for the given DIMENSION, which will be used if it’s unique within the dataset. If the VARIABLE portion is not supplied, ‘center’ is assumed.

Returns: tuple(sAbsPath, Variable)
A tuple containing the expanded path name and the Variable
Raises KeyError:
 if the no variable can be found with the given name.
ravel()

Force internal arrays to be rank 1.

If the internal arrays of a dataset are already rank 1, this is a no-op

sort(*tSortOn, **kwargs)

Sort ascending all values in all variables in a dataset based on the values in a single variable, and then on the values in a second variable and so on.

The sorting algorithm varies depending on the data layout. Variables that are degenerate in axis will not alter ordering of other variables along that axis. For example in a common time, frequency 2-D cube. Sorting on time will not affect frequency values and vice versa.

Parameters:lSortOn (str) –

A list of Variable path strings stating the first sort parameter, the second sort parameter and so on. Variable path

strings have the form:
CATEGORY:DIMENSION:VARIABLE

for example:

coords:time:center

would specify the center positions of time coordinate. To sort first on one variable and then on another specify more than one string. For example:

[‘coords:time:center’,’coords:frequency:center’]

would sort first on time and then on frequency depending on the limits specified above

Returns: None
There is no return value, data are sorted in place.

Class Dimension

class das2.Dimension(dataset, sName)

A physical or orginaziational dimension in a Dataset.

This object does not represent an index dimensions, but rather categories, such as time, frequency, electric field amplitudes, cites in Austrilia, etc.

Dimensions contain Variables.

__getitem__(key)

Dimension acts as a dictionary of variables

__setitem__(key, item)

Add a Variable to the dimension

center(values, units, axis=None, fill=None)

Shortcut for das2.Dimension.var() for center values

offset(values, units, axis=None, fill=None)

Shortcut for das2.Dimension.var() for offset values

propEq(sKey, sValue)

Does this dimension have a given property and is that property equal to the given value

reference(values, units, axis=None, fill=None)

Shortcut for das2.Dimension.var() for referenece values

var(role, values, units, axis=None, fill=None)

Create or replace a variable in this dimension.

Add a variable to a dataset can trigger broadcasting of other variables to fill the required index space.

Class Variable

class das2.Variable(dim, role, values, units, axis=None, fill=None)

Data arrays with a stated purpose and units

Special members of this class are:

  • .array - The underlying ndarray object

  • .units - The units as a string

  • .name - A name for this variable, defaults to it’s role in the dimension

  • .unique - A list of indicies in which these values are (potentially)

    unique.

Variables are very similar to Quantities in AstroPy. Users of both das2py and astropy are encouraged to use the astrohelp.py to generate Quantity objects from das2py Variables.

__getitem__(tSlice)

Return a Quantity of values

containsAny(quant)

Are any of the given values within the range of this Variable

Parameters:quant (number, Quantity) –

the value to check must be of a type that is comparable to the values in the backing array but not nessecarily

in the same units. If a Quantity is supplied than unit conversions are applied before comparing values.
Returns: bool
True if variable.max <= value <= variable.max, for all supplied values or False otherwise
degenerate(*indexes)

Return true if this variable is degenerate in all the given indecies

sorted()

Determine if the values in this variable are sorted in Ascending order by index.

For Variables built from reference and offset arrays, this function will return false if the data are sorted in the reference array and in the offset array, but the offsets bump final values past the next reference point. Almost all instrument cycles prevent this from being the case, but it is listed here as a possible failure mode.

Parsing times

class das2.DasTime(nYear=0, nMonth=0, nDom=0, nHour=0, nMin=0, fSec=0.0)

A wrapper for the old daslib functions, parsetime and tnorm, as well as adding features that let one do comparisons on dastimes as well as use them for dictionary keys.

__add__(other)

Add a floating point time in seconds to the current time point

__hash__()

Compute a 64-bit hash that is useable down to microsecond resolution over the years +/- 9,999. So, not great for geologic time, nor for signal propagation times on microchips

__long__()

Cast the dastime to a long value, mostly only good as a hash key, but placed here so that you can get the hash easily

__nonzero__()

Used for the boolean ‘is true’ test

__str__()

Prints an ISO 8601 standard day-of-month time string to microsecond resolution.

__sub__(other)

WARNING: This function works very differently depending on the type of the other object. If the other item is a DasTime, then the difference in floating point seconds is returned.

If the other type is a simple numeric type than a new DasTime is returned which is smaller than the initial one by ‘other’ seconds.

Time subtractions between two DasTime objects are handled in a way that is sensitive to small differences. Diferences a small as the smalleset possible positive floating point value times 60 should be preserved.

Time difference in seconds is returned. This method should be valid as long as you are using the gegorian calendar, but doesn’t account for leap seconds. Leap second handling could be added via a table if needed.

adjust(nYear, nMonth=0, nDom=0, nHour=0, nMin=0, fSec=0.0)

Adjust one or more of the field, either positive or negative, calls self.norm internally

ceil(nSecs)

Find the nearest time, evenly divisible by nSec that is greater than the current time value.

dom()

Get the calendar day of month

doy()

Get the day of year, Jan. 1st = 1

floor(nSecs)

Find the nearest time, evenly divisable by nSec, that is less that the current time value.

classmethod from_string(sTime)

Static method to generate a DasTime from a string, uses the C parsetime to get the work done

hour()

Get the hour of day on a 24 hour clock (no am/pm)

isLeapYear()

Returns true if the year field indicates a leap year on the gregorian calendar, otherwise return false.

minute()

Get the minute of the hour

mj1958()

Get the current time value as seconds since January 1st 1958, ignoring leap seconds

month()

Get the month of year, january = 1

norm()

Normalize the time fields so that all contain legal values.

classmethod now()

Static method to generate a DasTime for right now.

round(nWhich)

Round off times to Seconds, Milliseconds, or Microseconds nWhich - One of the constants: SEC, MILLISEC, MICROSEC returns as string to the desired precision in Year-Month-Day format

round_doy(nWhich)

Round off times to Seconds, Milliseconds, or Microseconds nWhich - One of the constants: SEC, MILLISEC, MICROSEC returns as string to the desired precision in Year-Day format

sec()

Get floating point seconds of the minute

set(**dArgs)

Set one or more fields, call self.norm internally keywords are

year = integer month = integer dom = integer (day of month) doy = integer (day of year, can’t be used with month and dom) hour = integer minute = integer seconds = float

year()

Get the year value for the time