Usage¶
Background¶
DataLogs is Python package to log array and dictionary data from scientific experiments. These logs are stored in files (netCDF for array data and JSON for dictionary data). The log files are organized within a nested directory structure and tagged with metadata, such as timestamp or optionally a commit ID from a ParamDB database.
The original purpose of DataLogs was to store logs from graph calibration experiments, where directories correspond to nodes in a graph, so the examples below are based on this application. However, the core functionality is very general.
Logger Setup¶
Root Logger¶
To log data, we first have to create a root Logger object, passing the path
(either relative or absolute) to the root directory. This directory will be created if it
does not exist.
from datalogs import Logger
root_logger = Logger("data_logs")
Our current working directory now contains a directory called data_logs.
data_logs/
Tip
The root Logger should typically be defined in one place, and passed or
imported to parts of the code that use it.
Sub-Loggers¶
We can also create sub-Logger objects, which will correspond to subdirectories
within the root directory. By default, a sub-Logger creates a new directory
with a timestamp. However, using the timestamp argument, it is possible to create
sub-Loggers that, just like root loggers, contain no timestamp and immediately
create their directory if it does not exist.
For example, here we create a sub-Logger with no timestamp to contain all
calibration experiments, and then timestamped sub-Loggers to run a particular
experiment graph containing one node.
calibration_logger = root_logger.sub_logger("calibrations", timestamp=False)
graph_logger = calibration_logger.sub_logger("calibration_graph")
node_logger = graph_logger.sub_logger("q1_spec_node")
We can see that the directory calibrations is created immediately, while the timestamped
directories are not created yet.
data_logs/
└── calibrations/
Important
Sub-Loggers with timestamps wait to create their directories until their
directory path is accessed, either explicitly via Logger.directory or
internally, e.g. to create a log file.
This is done so that timestamps in directory names can reflect when the first file within
them was created (often when that part of the experiment is being run), not when the
Logger object was created (often when the entire experiment is being set up).
Logging¶
Data Logs¶
The first type of log that can be created is a data log, which contains multidimensional
array data. This type of log stores data in an xarray.Dataset, which contains data
variables, coordinates, and attributes. The log is saved to a netCDF file via
xarray.Dataset.to_netcdf().
See also
To learn more about Xarray data, see Data Structures in the Xarray user guide.
To aid in creating xarray.Dataset objects and to enforce certain conventions,
DataLogs provides Coord as a wrapper for an Xarray coordinate and
DataVar as a wrapper for a Xarray data variable. We can create a data log
using these objects and Logger.log_data().
from datalogs import Coord, DataVar
times = [1, 2, 3]
signal = [10, 20, 30]
node_logger.log_data(
"q1_spec_signal",
[Coord("time", data=times, long_name="Time", units="s")],
[DataVar("signal", dims="time", data=signal, long_name="Signal", units="V")],
)
<DataLog 'data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node/q1_spec_signal.nc'>
Data:
<xarray.Dataset> Size: 48B
Dimensions: (time: 3)
Coordinates:
* time (time) int64 24B 1 2 3
Data variables:
signal (time) int64 24B 10 20 30
Metadata:
directory data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node
timestamp 2024-08-24 22:54:58.881421+00:00
description q1_spec_signal
commit_id None
param_db_path None
The directories for the graph and node have now been created, along with the netCDF log file.
data_logs/
└── calibrations/
└── 24-08-24-2254_calibration_graph/
└── 24-08-24-2254_q1_spec_node/
└── q1_spec_signal.nc
Dictionary Logs¶
Dictionary logs store dict data in JSON files. The data stored in the dictionary log
will be converted to JSON-serializable types according to
Logger.convert_to_json(). We can create a dictionary log using
Logger.log_dict().
node_logger.log_dict(
"q1_spec_frequency",
{"f_rf": 3795008227, "f_if": 95008227, "f_lo": 3700000000},
)
<DictLog 'data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node/q1_spec_frequency.json'>
Data:
{'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}
Metadata:
directory data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node
timestamp 2024-08-24 22:54:58.925245+00:00
description q1_spec_frequency
commit_id None
param_db_path None
The log file has now been created within the node directory.
data_logs/
└── calibrations/
└── 24-08-24-2254_calibration_graph/
└── 24-08-24-2254_q1_spec_node/
├── q1_spec_frequency.json
└── q1_spec_signal.nc
Property Logs¶
Property logs automatically store the properties of an object within a dictionary log.
Only properties marked with the type hint LoggedProp will
be saved. We can create a property log using Logger.log_props().
Note
LoggedProp can optionally take in a type parameter representing the type of
the variable, which is only used by code analysis tools.
from typing import Optional
from datalogs import LoggedProp
class SpecNode:
_element: LoggedProp
xy_f_rf: LoggedProp[int]
xy_f_if: LoggedProp[Optional[int]]
def __init__(self, element: str) -> None:
self._element = element
self.xy_f_rf = 379500822
self.xy_f_if = None
self.xy_f_lo = 3700000000
q1_spec_node = SpecNode("q1")
node_logger.log_props("q1_spec_node_props", q1_spec_node)
<DictLog 'data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node/q1_spec_node_props.json'>
Data:
{'_element': 'q1', 'xy_f_rf': 379500822, 'xy_f_if': None}
Metadata:
directory data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node
timestamp 2024-08-24 22:54:58.945980+00:00
description q1_spec_node_props
commit_id None
param_db_path None
The log file has now been created within the node directory.
data_logs/
└── calibrations/
└── 24-08-24-2254_calibration_graph/
└── 24-08-24-2254_q1_spec_node/
├── q1_spec_frequency.json
├── q1_spec_node_props.json
└── q1_spec_signal.nc
Loading¶
Logs can be loaded by passing a file path to load_log(). We can also use
Logger.file_path() to aid in creating the file paths to logs. (The full path can
also be passed in directly if known.)
from datalogs import load_log
q1_spec_signal_log = load_log(node_logger.file_path("q1_spec_signal.nc"))
q1_spec_frequency_log = load_log(node_logger.file_path("q1_spec_frequency.json"))
q1_spec_node_props_log = load_log(node_logger.file_path("q1_spec_node_props.json"))
Alternatively, logs can be loaded using DataLog for data logs or
DictLog for dictionary logs. This is not necessary since load_log()
already infers the log type from the file extension, but is useful for static type
checking when the log type is known.
from datalogs import DataLog, DictLog
q1_spec_signal_log = DataLog.load(node_logger.file_path("q1_spec_signal.nc"))
q1_spec_frequency_log = DictLog.load(node_logger.file_path("q1_spec_frequency.json"))
q1_spec_node_props_log = DictLog.load(node_logger.file_path("q1_spec_node_props.json"))
Accessing Data¶
Logs are represented as objects (DataLog or DictLog depending on
the log type). Data can be accessed using DataLog.data or
DictLog.data.
For a DataLog, data is returned as an xarray.Dataset object.
q1_spec_signal_log.data
<xarray.Dataset> Size: 48B
Dimensions: (time: 3)
Coordinates:
* time (time) int64 24B 1 2 3
Data variables:
signal (time) int64 24B 10 20 30For a DictLog, data is returned as a dictionary.
q1_spec_frequency_log.data
{'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}
Accessing Metadata¶
Metadata is also loaded in and can be accessed using DataLog.metadata or
DictLog.metadata. Metadata is stored using a LogMetadata object.
q1_spec_signal_log.metadata
directory data_logs/calibrations/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node
timestamp 2024-08-24 22:54:58.881421+00:00
description q1_spec_signal
commit_id None
param_db_path None
Metadata properties can be accessed as properties of this object. For example, we can get
the timestamp using LogMetadata.timestamp.
q1_spec_signal_log.metadata.timestamp
datetime.datetime(2024, 8, 24, 22, 54, 58, 881421, tzinfo=datetime.timezone.utc)
ParamDB Integration¶
Optionally, a ParamDB can be passed to a root Logger, in which case it
will be used to automatically tag logs with the latest commit ID.
from paramdb import ParamDB
param_db = ParamDB[int]("param.db")
param_db.commit("Initial commit", 123)
root_logger = Logger("data_logs", param_db)
graph_logger = root_logger.sub_logger("calibration_graph")
node_logger = graph_logger.sub_logger("q1_spec_node")
node_logger.log_dict(
"q1_spec_frequency",
{"f_rf": 3795008227, "f_if": 95008227, "f_lo": 3700000000},
)
<DictLog 'data_logs/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node/q1_spec_frequency.json'>
Data:
{'f_rf': 3795008227, 'f_if': 95008227, 'f_lo': 3700000000}
Metadata:
directory data_logs/24-08-24-2254_calibration_graph/24-08-24-2254_q1_spec_node
timestamp 2024-08-24 22:54:59.095279+00:00
description q1_spec_frequency
commit_id 1
param_db_path param.db