Skip to content

DFS File System

Introduction

The DFS (Data File System) is used extensively within the MIKE Powered by DHI software.

It is a binary data file format and it provides a general file format for handling spatially distributed and time dependent data, ranging from measurements of temperature at a single point, Figure 1.1, to water levels in the North Sea in a 2D grid generated by DHI’s flow models (MIKE 21), Figure 1.2.

Dfs0Example.png

Figure 1.1 Example from dfs0 file, water level measurements from Station 3

Dfs2Example.png

Figure 1.2 Example from dfs2 file, first time step of a 2D water level item

The MIKE SDK installation also includes a number of examples using the DFS .NET API in C#, Iron Python and CsScript. These example files are contained in a zip-file in the default installation folder:

./MIKE SDK/Examples/SDK_Examples.zip

For Matlab users a number of examples can be found in the DHI Matlab Toolbox. The toolbox can be downloaded from the MIKE Powered by DHI website:

http://www.mikepoweredbydhi.com/download/mike-by-dhi-tools

DFS File Contents

A DFS file is a binary file that contains data for a number of quantities at a number of times.

A DFS file is conceptually split into:

  • A header section, containing general information for the file, as start time, geographic map projection, etc.

  • A section with static data, containing data for a number of items. Static data does not have any notion of time, and is thus independent of time.

  • A section with dynamic data, containing data for a number of time steps and items.

DFS file contents Figure 2.2 DFS file contents

The Dynamic data section is usually by far the section using most disc space.

The header contains metadata describing the file, its contents and especially the contents of the two data sections.

  • File title; user defined title of the file.

  • Application title; title of the application that created the file.

  • Application version number; version of the application that created the file.

  • Data type; used to tag the file as a special DFS file type, see (1) below.

  • Type of DFS file storing format, see (2) below.

  • Type of statistics; the level of statistics stored for dynamic items.

  • Delete values, see (3) below.

  • Geographic map projection information.

  • Time axis information.

  • Custom blocks; a number of (small) arrays of a certain type, identified by its name.

  • Compression encoding; when the file is compressed, defines where compressed data point belongs to.

Furthermore, the header contains descriptions of each of the dynamic items. The header does not contain any information on the static items; neither does it contain the number of static items in the file. Static items must be read one by one until there is no more. A read operation will provide as well static item data as information on the static item.

The type of statistics, geographic map projection information, time axis information, custom blocks and compression encoding is described in the next sections.

  1. The data type tag is a user specified integer. The data type tag is used as an identification tag for the type of DFS file at hand. The user should tag bathymetries, result files, input files etc. matching those tags required for the DFS file type at hand. A wrong data type tag in some contexts is an error. Not all DFS files and tools handling DFS use the tag.

    In Appendix G there is a list of data type tags currently used within the DHI model complex. A programmer writing a new type of DFS file should choose the tag carefully, such that it does not interfere with existing DFS file types.

  2. The storing format specifies whether all items are stored in all time steps, and if a time-varying spatial axes is used. Currently the DFS file system only supports files with all items in all time steps and not any of the time-varying spatial axes. Thus this is not used, and is not presently planned to be explored.

  3. A delete value represents a not-defined value. If a value is not available, or if it does not make sense to set a value, setting the delete value indicates that there is no value. There is a delete value for data of type float, double, integer (32 bit), unsigned integer and byte (8 bit).

DFS Items

An item is the smallest data unit that is read and written to a DFS file. It can either be static, in which case it only has one set of values, or it can be dynamic, in which case it has a set of values for each time step in the file. Both types of items are described by:

  • Name; user defined description of the item

  • EUM quantity, i.e., EUM type and EUM unit

  • Spatial axis

  • Type of data, being float, double, integer, etc.

  • Reference coordinates and orientation

EUM quantity

An item defines the data being stored by use of the DHI EUM system. EUM is short of Engineering Unit Management. The EUM system specifies a combination of a type and a unit. The EUM type could be ‘water level’, and the unit ‘meters’. The EUM system assures that the type and unit matches, i.e. it is not possible to specify a unit of square meters for a water level.

Spatial axis

The size and dimension of an item is defined by its spatial axis. An item can store data of the form of:

  • Scalars – dfs0 data

  • Vectors – dfs1 data

  • Matrices – dfs2 data

  • Cubes/3D matrices – dfs3 data

The items of a DFS file need not all have the same spatial axis. However, many of the specialised DFS file formats requires that all items have the same spatial axis. See Spatial axes for details on the different spatial axes.

Type of data

Types of data that can be stored in a DFS item (the DfsSimpleType) are:

  • float

  • double

  • byte (8 bit integer, char in C++)

  • short (16 bit integer)

  • unsigned short (16 bit integer)

  • integer (32 bit integer)

  • unsigned integer (32 bit integer)

Reference coordinates and orientation

An item also holds a set of reference coordinates and orientation, which can be used to translate and rotate the spatial axis of the item compared to the user coordinate system in the file. However, these are not used by most DFS file types: Only in some versions of the dfs0 file is the reference coordinates set. See details on coordinate systems and geographic map projections.

Item unit conversion

The EUM quantity defines the item type and item unit that is used when storing item data in the file. As an example, the type could be water level and the unit meters. It is possible to specify a conversion unit, in case another unit than the one stored in the file is preferable. Setting a conversion unit of feet for an item means that whenever data for that item is read from or written to the file, it is in feet. The data is still stored in the file in meters, and converted on the fly.

Two types of unit conversion are available:

  1. UBG (Unit Base Group) conversion

  2. Free conversion

The UBG conversion will convert data to the unit specified in the Unit Base Group settings. The UBG system is used to set the default unit for various item types, and is dependent on your configuration. For example can a user choose to use Imperial units, in which e.g. length are in feet or mile.

Using the free conversion, the user needs to specify the unit that the data is to be converted to.

Unit conversion can be specified not only for the item data but also for the spatial axis of the item, thus converting the data of a spatial axis. Example; for a 1D equidistant axis the starting point x0 and the axis interval dx values will be converted.

Unit conversions are not stored in the file, but must be set every time the file is opened.

Note when using unit conversions: EUM types and units reported from the file are not changed, only the data is changed. Example, having a file with spatial axis in meters, when setting unit conversion for the spatial axis to feet and then requesting the spatial axis, the spatial axis will report the unit meters, but the axis data will be in feet.

Static items

A static item stores one set of values for the item. A static item has no notion of time.

A DFS file can have any number of static items. To access the static items, they must be read one by one, i.e. it is not known in advance how many static items a file has. Static item data and information on its unit, data type, etc are stored together.

On top of what describes the generic DFS item as described previously, a static item includes the actual data of the static item.

Dynamic items

A dynamic item varies in time. Information of the dynamic item like unit, data type, etc is stored in the header, while the data is stored separately on a time-step and item basis.

On top of what describes the generic DFS item described previously, it also includes

  • Value type, being instantaneous, forward step, etc.

  • A list of associated static item numbers

  • Statistics of the item data

Each of these are described in the following.

Value type

The Value type in time specifies how each value is to be interpreted between two time step values.

  • Instantaneous; the value is defined at the time specified.

  • Accumulated; the value is an accumulated value from the start time of the file to the time specified.

  • Step-accumulated; the value is accumulated between last time step to current time step.

  • Mean-step-backward; mean value from previous time step time to current time step time. This is also sometimes called ‘mean-step-accumulated’.

  • Mean-step-forward; mean value from current time step time to next time step time. This is also sometimes called ‘reverse-mean-step-accumulated’.

It is currently only the dfs0 file format that utilises the different value types. The remainder of the DHI DFS files uses the instantaneous type. See Appendix F for examples of the different value types.

Associated static item numbers

A dynamic item can have a list of static item numbers that in some way is associated with the dynamic item. These are not very often used, and there is not predefined definition of the properties of the association – it depends on the type of DFS file.

Statistics

Together with each dynamic item, some statistics of its data can be stored, e.g., the max and min value for all data values and time steps. See section 2.8 for details.

Temporal Axes

The time in the DFS file can be specified as a relative time, starting from zero, or as an absolute time, starting from a specified date and time. The former is called a time axis, the latter a calendar axis.

Each of the two exists in an equidistant and a non-equidistant version.

For the equidistant type temporal axes you can specify a start time offset. It defines the time of the first time step relative to the start time.

For the non-equidistant type time axes the actual times for each time step is stored together with the dynamic item data. The times are therefore not available in the temporal axis definition, but are retrieved when reading data for an item-time step. The times stored with each time step is the time in a given time unit relative to the start of the file.

The equidistant time axis is defined by:

  • Time unit

  • Time step size

  • Start time offset

  • Number of time steps

The non-equidistant time axis is defined by:

  • Time unit

  • Time span – difference between first and last time step.

The equidistant calendar axis is defined by:

  • Start date and time

  • Time unit

  • Time step size

  • Start time offset

  • Number of time steps

The non-equidistant calendar axis is defined by:

  • Start date and time

  • Time unit

  • Time span – difference between first and last time step

The two non-equidistant temporal axes also provide a start time offset. This start time offset cannot be set, but has the time value of the first time step stored in the file, and is for information only.

See also the Appendix A for details on how the temporal axis parameters are handled

Spatial Axes

The spatial axis defines the dimension of the data, and the size of the data in each dimension for an item.

The axes that are available currently belong into three categories:

  • The equidistant axes

  • The non-equidistant axes

  • The curve-linear axis

All axes coordinates are specified in the user defined coordinate system, see Section 2.6 for details.

The data in an item can either be ‘node based’ or ‘element based’. The difference can be seen in Figure 2.2 and Figure 2.3, which both define an item with 9 values ordered in a 3 by 3 grid. Figure 2.2 defines the values on the nodes and Figure 2.3 defines the values in the centre of each element.

DfsNodeValues.png

Figure 2.2 Node based 3 by 3 values

DfsElementValues.png

Figure 2.3 Element based 3 by 3 values

There is also a set of time-varying axes, which are currently not supported by DFS.

Equidistant axes

The equidistant axes define a structured orthogonal grid of a certain dimension and size. The axes specify for each dimension:

  • The start coordinate offset

  • The grid spacing

  • The number of data points in that dimension

There are equidistant axes ranging in dimensions from 1D to 3D.

The equidistant axes do not specify whether the values are ‘node based’ or ‘element based’. See Figure 2.2 and Figure 2.3 for the difference.

Currently all files in DHI software with an equidistant axis are element based.

Non-equidistant axes

The 1D non-equidistant axis defines a line in 2D/3D space and specifies a number of (x,y,z) coordinates where the data values belong. The number of coordinates matches the number of data values, the values are defined on the coordinates, and i.e. they are ‘node values’. The 1D non-equidistant axis is conceptually more alike a 1D curve-linear axis, but historically not named as such.

The 2D and 3D non-equidistant axes define an orthogonal grid with non-equidistant grid spacing. For each dimension is specified the coordinates in that dimension. The number of coordinates is one longer than the number of data values, i.e. the values are ‘element values’.

C:\Work\main\Products\Source\Manuals\MZ\DFS\Figures\DfsNeq2D.png

Figure 2.4 Non-equidistant 2D grid axis

Figure 2.4 shows a 2D non-equidistant axis with 6 ‘x’ coordinates and 5 ‘y’ coordinates. The number of data values in the item is (6-1) x (5-1) = 20.

Curve-linear axes

There is a 1D curve linear axis, but it is called the 1D non-equidistant axis and described in previous section.

The 2D and 3D curve linear axis describe a grid that can bend, i.e. it is no longer Cartesian.

The grids are specified by a number of node coordinates. The number of coordinates in each dimension is one larger than the number of data values in each dimension, i.e. the values are ‘element values’, cf. Figure 2.3. Nodes are numbered as shown in the figure below, in x-direction first, then y, and for 3D then z.

Custom Blocks

A custom block is a (small) vector containing data of a certain type. It is identified by its name. The vector is stored in the header section, as opposed to the static items.

A custom block contains a name and the vector data.

The vector type can be any of the DfsSimpleType types, see Section 2.2.

Geographic Map Projection and Coordinate Systems

The projection information stored in a DFS file consists of:

  • A projection string

  • A reference longitude and latitude coordinate

  • A reference orientation

The DFS file works with three coordinate systems:

  1. Geographical coordinate system, containing longitude and latitude coordinates

  2. Projected coordinate system, containing easing and northing coordinates

  3. User defined coordinate system, containing x and y coordinates

All coordinates in a DFS file are stored in the user defined coordinate system.

Note that every item defines the unit used within the user defined coordinate system, being meters, feet etc, overriding the unit in the projection definition. Each axis can specify its own unit to use in the user defined coordinate system, however many tools assume that all axes of one file use the same unit.

The projection string defines the conversion from geographical coordinates to projected coordinates and back.

The user defined coordinate system is a translated and rotated version of the projected coordinate system. The reference longitude and latitude coordinates defines the origin of the user defined coordinate system. And the orientation defines the rotation clock-wise from true geographical north to the user defined coordinate system y-axis, the compass heading of the y-axis. Note that the orientation is defined based on north of the geographical coordinate system, not the projected coordinate system.

If setting the reference longitude and latitude matching the origin of the projected coordinate system, and setting the orientation to match north in the projection (zero for most map projections), then the projected coordinate system equals the user defined coordinate system. Note that the origin of a projected coordinate system usually is influenced by its false-easting and false-northing parameters, hence the projected coordinate system and the user coordinate system often differs by exactly the false-easting and false-northing values.

Older dfs files may not contain any projection information at all. New dfs files cannot be created without projection information.

Compression

Files from certain application areas tend to have many values that are delete values. A result file from MIKE 3 contains data in a 3D matrix, and it is not uncommon that 80-90% of the values are delete values.

It is possible to specify which data values are to be stored, and which are not necessary to store, thereby eliminating values that are always delete values from the DFS file and minimising the DFS file size. It is assumed that those data values not being stored are delete values.

When enabling compression, a set of encoding keys is specified, that defines which indices in the data set are to be stored. The encoding keys are 3 integer arrays specifying the x, y, and z indices to be stored. The indices stored in the encoding key arrays are zero based.

Example: Assume that data is a 3D matrix, and that a file contains data that are not delete values (10), positioned at z-layer 0 as in this figure.

The encoding keys for this file will be

xkey = [0,0,1,0,2,0,1];

ykey = [0,1,2,3,3,4,4];

zkey = [0,0,0,0,0,0,0];

The encoding key array lengths match the number of data values being stored in the file.

If the data is only 2D, then the z key array should contain all zeros. If the data is only 1D, then the y and z key arrays should contain all zeros.

When writing and reading dynamic data items of a compressed file, compression and decompression is handled automatically on the fly, i.e. the user must provide the full 3D array when writing, and is given the full 3D array when reading.

Compression currently works under the following conditions:

  • All the dynamic items in the file must have the same spatial axis.

  • All the dynamic items must store its data as floats.

Compression and decompression of static items is not supported on DFS file level. Some file types still store only the compressed data, but then defines a dummy 1D spatial axis having the size of the compressed data. Manual compression and decompression by the user is required when writing and reading such static item data.

Statistics Stored in the DFS File

When writing item data to a DFS file, the DFS keeps track of some statistical properties for each item. There are two level of statistics for each item: Global and local.

The global level is the default level. It stores for each item the minimum, the maximum and the number of delete values over all data values, and all time steps.

The local level stores for each data point/grid point/element of an item

  • Minimum value.

  • Maximum value.

  • Mean value.

  • Standard deviation.

  • Auto correlation

  • Number of non-delete values

  • Number of delete values

  • Number of non-delete value pairs, being the number of times two consecutive time steps both contain non-delete values.