Usage

Background

DataLogs is Python package to log array and dictionary data from scientific experiments. These logs are stored in files (netCDF for array data and JSON for dictionary data). The log files are organized within a nested directory structure and tagged with metadata, such as timestamp or optionally a commit ID from a ParamDB database.

The original purpose of DataLogs was to store logs from graph calibration experiments, where directories correspond to nodes in a graph, so the examples below are based on this application. However, the core functionality is very general.

Logger Setup

Root Logger

To log data, we first have to create a root Logger object, passing the path (either relative or absolute) to the root directory. This directory will be created if it does not exist.

from datalogs import Logger

root_logger = Logger("data_logs")

Our current working directory now contains a directory called data_logs.

data_logs/

Tip

The root Logger should typically be defined in one place, and passed or imported to parts of the code that use it.

Sub-Loggers

We can also create sub-Logger objects, which will correspond to subdirectories within the root directory. By default, a sub-Logger creates a new directory with a timestamp. However, using the timestamp argument, it is possible to create sub-Loggers that, just like root loggers, contain no timestamp and immediately create their directory if it does not exist.

For example, here we create a sub-Logger with no timestamp to contain all calibration experiments, and then timestamped sub-Loggers to run a particular experiment graph containing one node.

calibration_logger = root_logger.sub_logger("calibrations", timestamp=False)
graph_logger = calibration_logger.sub_logger("calibration_graph")
node_logger = graph_logger.sub_logger("q1_spec_node")

We can see that the directory calibrations is created immediately, while the timestamped directories are not created yet.

data_logs/
└── calibrations/

Important

Sub-Loggers with timestamps wait to create their directories until their directory path is accessed, either explicitly via Logger.directory or internally, e.g. to create a log file.

This is done so that timestamps in directory names can reflect when the first file within them was created (often when that part of the experiment is being run), not when the Logger object was created (often when the entire experiment is being set up).

Logging

Data Logs

The first type of log that can be created is a data log, which contains multidimensional array data. This type of log stores data in an xarray.Dataset, which contains data variables, coordinates, and attributes. The log is saved to a netCDF file via xarray.Dataset.to_netcdf().

See also

To learn more about Xarray data, see Data Structures in the Xarray user guide.

To aid in creating xarray.Dataset objects and to enforce certain conventions, DataLogs provides Coord as a wrapper for an Xarray coordinate and DataVar as a wrapper for a Xarray data variable. We can create a data log using these objects and Logger.log_data().

from datalogs import Coord, DataVar

times = [1, 2, 3]
signal = [10, 20, 30]

node_logger.log_data(
    "q1_spec_signal",
    [Coord("time", data=times, long_name="Time", units="s")],
    [DataVar("signal", dims="time", data=signal, long_name="Signal", units="V")],
)
<DataLog 'data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node/q1_spec_signal.nc'>
Data:
  <xarray.Dataset> Size: 48B
  Dimensions:  (time: 3)
  Coordinates:
    * time     (time) int64 24B 1 2 3
  Data variables:
      signal   (time) int64 24B 10 20 30
Metadata:
  directory      data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node
  timestamp      2024-08-24 22:59:00.443174+00:00
  description    q1_spec_signal
  commit_id      None
  param_db_path  None

The directories for the graph and node have now been created, along with the netCDF log file.

data_logs/
└── calibrations/
    └── 24-08-24-2259_calibration_graph/
        └── 24-08-24-2259_q1_spec_node/
            └── q1_spec_signal.nc

Dictionary Logs

Dictionary logs store dict data in JSON files. The data stored in the dictionary log will be converted to JSON-serializable types according to Logger.convert_to_json(). We can create a dictionary log using Logger.log_dict().

node_logger.log_dict(
    "q1_spec_frequency",
    {"f_rf": 3795008227, "f_if": 95008227, "f_lo": 3700000000},
)
<DictLog 'data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node/q1_spec_frequency.json'>
Data:
  {'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}
Metadata:
  directory      data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node
  timestamp      2024-08-24 22:59:00.521049+00:00
  description    q1_spec_frequency
  commit_id      None
  param_db_path  None

The log file has now been created within the node directory.

data_logs/
└── calibrations/
    └── 24-08-24-2259_calibration_graph/
        └── 24-08-24-2259_q1_spec_node/
            ├── q1_spec_frequency.json
            └── q1_spec_signal.nc

Property Logs

Property logs automatically store the properties of an object within a dictionary log. Only properties marked with the type hint LoggedProp will be saved. We can create a property log using Logger.log_props().

Note

LoggedProp can optionally take in a type parameter representing the type of the variable, which is only used by code analysis tools.

from typing import Optional
from datalogs import LoggedProp

class SpecNode:
    _element: LoggedProp
    xy_f_rf: LoggedProp[int]
    xy_f_if: LoggedProp[Optional[int]]

    def __init__(self, element: str) -> None:
        self._element = element
        self.xy_f_rf = 379500822
        self.xy_f_if = None
        self.xy_f_lo = 3700000000

q1_spec_node = SpecNode("q1")

node_logger.log_props("q1_spec_node_props", q1_spec_node)
<DictLog 'data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node/q1_spec_node_props.json'>
Data:
  {'_element': 'q1', 'xy_f_rf': 379500822, 'xy_f_if': None}
Metadata:
  directory      data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node
  timestamp      2024-08-24 22:59:00.558246+00:00
  description    q1_spec_node_props
  commit_id      None
  param_db_path  None

The log file has now been created within the node directory.

data_logs/
└── calibrations/
    └── 24-08-24-2259_calibration_graph/
        └── 24-08-24-2259_q1_spec_node/
            ├── q1_spec_frequency.json
            ├── q1_spec_node_props.json
            └── q1_spec_signal.nc

Loading

Logs can be loaded by passing a file path to load_log(). We can also use Logger.file_path() to aid in creating the file paths to logs. (The full path can also be passed in directly if known.)

from datalogs import load_log

q1_spec_signal_log = load_log(node_logger.file_path("q1_spec_signal.nc"))
q1_spec_frequency_log = load_log(node_logger.file_path("q1_spec_frequency.json"))
q1_spec_node_props_log = load_log(node_logger.file_path("q1_spec_node_props.json"))

Alternatively, logs can be loaded using DataLog for data logs or DictLog for dictionary logs. This is not necessary since load_log() already infers the log type from the file extension, but is useful for static type checking when the log type is known.

from datalogs import DataLog, DictLog

q1_spec_signal_log = DataLog.load(node_logger.file_path("q1_spec_signal.nc"))
q1_spec_frequency_log = DictLog.load(node_logger.file_path("q1_spec_frequency.json"))
q1_spec_node_props_log = DictLog.load(node_logger.file_path("q1_spec_node_props.json"))

Accessing Data

Logs are represented as objects (DataLog or DictLog depending on the log type). Data can be accessed using DataLog.data or DictLog.data.

For a DataLog, data is returned as an xarray.Dataset object.

q1_spec_signal_log.data
<xarray.Dataset> Size: 48B
Dimensions:  (time: 3)
Coordinates:
  * time     (time) int64 24B 1 2 3
Data variables:
    signal   (time) int64 24B 10 20 30

For a DictLog, data is returned as a dictionary.

q1_spec_frequency_log.data
{'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}

Accessing Metadata

Metadata is also loaded in and can be accessed using DataLog.metadata or DictLog.metadata. Metadata is stored using a LogMetadata object.

q1_spec_signal_log.metadata
directory      data_logs/calibrations/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node
timestamp      2024-08-24 22:59:00.443174+00:00
description    q1_spec_signal
commit_id      None
param_db_path  None

Metadata properties can be accessed as properties of this object. For example, we can get the timestamp using LogMetadata.timestamp.

q1_spec_signal_log.metadata.timestamp
datetime.datetime(2024, 8, 24, 22, 59, 0, 443174, tzinfo=datetime.timezone.utc)

ParamDB Integration

Optionally, a ParamDB can be passed to a root Logger, in which case it will be used to automatically tag logs with the latest commit ID.

from paramdb import ParamDB

param_db = ParamDB[int]("param.db")
param_db.commit("Initial commit", 123)

root_logger = Logger("data_logs", param_db)
graph_logger = root_logger.sub_logger("calibration_graph")
node_logger = graph_logger.sub_logger("q1_spec_node")

node_logger.log_dict(
    "q1_spec_frequency",
    {"f_rf": 3795008227, "f_if": 95008227, "f_lo": 3700000000},
)
<DictLog 'data_logs/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node/q1_spec_frequency.json'>
Data:
  {'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}
Metadata:
  directory      data_logs/24-08-24-2259_calibration_graph/24-08-24-2259_q1_spec_node
  timestamp      2024-08-24 22:59:00.802550+00:00
  description    q1_spec_frequency
  commit_id      1
  param_db_path  param.db