Storage

This submodule is a user-friendly storage driver which relies on two major classes from the user perspective: StorageIODriver and StorageInterface

StorageIODriver

The StorageIODriver is the abstract base class which handles IO operations on disk with the real data. Derived classes from this handle the specific storage medium, like NetCDF. This class tracks all the known variables, and where they are on the disk. However, because the abstract class cannot know how the derived class actually interacts with the disk, it is up to the derived class to know how each variable writes to disk.

The NetCDFIODriver is the derived StorageIODriver for NetCDF storage. The NetCDFIODriver handles the top level file operations and keeps track of where each variable and group (equivalent to a directory) is on the disk. Read/Write operations are handed off to the individual NCVariableCodec classes which interpret and write to file.

The NCVariableCodec is an abstract base class which defines how data is passed to and from the disk. Its derived classes handled interpreting the specific types of data we want to store and read from disk, e.g. ints, lists, np.arrays etc. Each derived NCVariableCodec enacts is own codec to know how to format the data type for storage on disk, and how to read that data back from disk, converting it to the correct type.

The StorageIODriver’s and the StorageInterface work on the principal of not knowing or caring what is on the disk until the user first attempt to access it, the process of initial interaction with the disk is called “Binding.” Variables and directories are considered “unbound” if they have not accessed the disk yet, and “bound” if they have. This bound/unbound mechanism is to reduce the amount of IO actions to disk, which is a slow process relative to the main code.

StorageIODriver

Abstract class to define the basic functions any storage driver needs to read/write to the disk.

NetCDFIODriver

Driver to handle all NetCDF IO operations, variable creation, and other operations.

NCVariableCodec

Pointer class which provides instructions on how to handle a given nc_variable

Binding

Unbound variables and directories do not know what type of data they will handle, and only store where on the disk data will be accessed. Upon the first attempt to read/write/append, a binding action occurs. The variables check if there is already data on the disk at the known location, what happens next depends on what operation was called: * If read and on disk:

  1. Determine the codec the variable will use.

  2. Fetch data, only accept data the codec can interpret.

  • If read and NOT on disk: Raise error.

  • If write/append and on disk:

    1. Ensure data to write is compatible with codec that was used to store data.

    2. Ensure data to store is of the same shape (for non-scalar data)

    3. Store new data.

  • If write/append and not on disk:

    1. Allocate storage on disk

    2. Store new data

The variable is now considered “bound” and there are some checks which ensure new data can now be stored on this variable.

StorageInterface

StorageInterface (SI) is a layer which runs on top of a provided StorageIODriver to create an way for users to interface with the disk with as minimal effort as possible. Variables and directories are treated as user defined properties of the SI, which then those properties can also be given user defined properties to point to other variables below it. E.g. SI.mydir.myvar creates a directory object called mydir at the top level of the SI object on disk, then myvar is the variable inside mydir on disk. The depth of this can be arbitrary. None of the user defined properties are bound until the first read/write/append operation, which is done with .read() .write() and .append() functions respectively.

StorageInterfaceDirVar

StorageInterfaceDirVar (SIDV) is the class which is assigned to each of the user defined properties in the SI are attached to. This class is what hooks into the StorageIODriver and passes the instructions to create/manage variables and handle any other sub-directories/variables attached to it.

StorageInterface

This class interfaces with a StorageIODriver class to internally hold what folders and variables are known to the file on the disk, and dynamically creates them on the fly.

StorageInterfaceDirVar

Storage Interface Directory/Variable (SIDV) class is a versatile, dynamic class which gives structure to the variables stored on disk by representing them as methods of the StorageInterface and other instances of itself.