.._datasets:

Datasets

This section contains descriptions of the available datasets in the hwm.datasets module. These datasets are synthetic and designed for use in financial and system modeling applications.

make_system_dynamics

make_system_dynamics

The make_system_dynamics function generates a synthetic control systems dataset with realistic features, modeling how a control system responds to input signals, external disturbances, and nonlinear factors. Designed for supervised learning tasks in control systems analysis, the dataset includes both dynamic and performance-related features, making it suitable for modeling system dynamics and behavior over time.

Parameters

samples

Number of time points in the dataset, representing discrete observations of the control system over the specified duration. Default is 1000.

end_time

Total duration of the simulation in seconds, defining the time range from 0 to end_time across the specified number of samples. Default is 10.

input_noise_level

Standard deviation of Gaussian noise added to the input signal, simulating real-world input variability. Default is 0.05.

control_noise_level

Standard deviation of Gaussian noise added to the control system’s output, modeling external disturbances and control noise. Default is 0.02.

nonlinear_response

Whether to apply a nonlinear transformation to the linear output using a hyperbolic tangent function (tanh). Set to True to simulate systems with nonlinear responses. Default is True.

input_amplitude

Base amplitude of the input signal, defining its initial strength prior to modulation or noise addition. Default is 1.0.

input_frequency

Frequency of the input signal in Hertz (Hz), determining the rate of oscillation in the sinusoidal input. Default is 0.5.

system_gain

Gain applied to the input signal to simulate the linear response of the control system. Represents the system’s linear amplification factor. Default is 0.9.

response_sensitivity

Sensitivity applied in the nonlinear response calculation if nonlinear_response is True, controlling the strength of the nonlinear effect on the linear output. Default is 0.7.

as_frame

If True, returns the dataset as a DataFrame; if False, returns it as a dictionary or another format based on additional arguments. Default is True.

return_X_y

If True, returns feature data X and target y separately. Default is False.

split_X_y

If True, splits data into training and test sets based on test_size. Default is False.

target_names

Names of the target variable(s) to be returned in the dataset. Defaults to [“output”], representing the final output signal of the system.

test_size

Proportion of the dataset to include in the test split when split_X_y is True. Default is 0.3.

seed

Seed for random number generation to ensure reproducibility in noise addition and random operations. Default is None.

Returns

The function returns a structured dataset based on the parameters. The format of the dataset depends on the as_frame, return_X_y, and split_X_y arguments. Possible return formats include:

  • pandas.DataFrame: If as_frame=True, a DataFrame with time-indexed records of simulated control system features and target values.

  • tuple (X, y): If return_X_y=True, a tuple containing the features (X) and target (y).

  • dictionary: A dictionary with the features and target.

The dataset includes features like the input signal, linear and nonlinear outputs, control effort, error signals, power consumption, response rate, stability metrics, and the final system output.

Formulation

The dataset is generated based on a combination of linear and nonlinear transformations on the input signal. Several features capture the control system’s behavior over time:

  • Input Signal: The input is modeled as a sinusoidal wave with added Gaussian noise:

    \[\text{Input Signal} = A \cdot \sin(2 \pi f t) + \text{noise}\]
  • Linear Output: Represents the system’s linear response to the input after applying system_gain:

    \[\text{Linear Output} = \text{system\_gain} \cdot \text{Input Signal}\]
  • Nonlinear Response: If nonlinear_response is True, applies a nonlinear function, controlled by response_sensitivity:

    \[\text{Response Output} = \tanh(\text{response\_sensitivity} \cdot \text{Linear Output})\]
  • Control Effort: Estimated as the absolute value of the product of system_gain and input_signal, providing insight into the effort required to control the system.

  • Power Consumption: Approximates the energy expenditure as a function of control effort:

    \[\text{Power Consumption} = \text{Control Effort}^2\]
  • Stability Metric: Measures system stability by comparing the nonlinear response to the linear output:

    \[\text{Stability Metric} = 1 - \left| \text{Response Output} - \text{Linear Output} \right|\]

Methods

  • manage_data: Utility function for structuring and returning the dataset. Used internally to create datasets.

Examples

The following examples demonstrate how to generate and utilize the control systems dynamics dataset.

Basic Example:

In this example, we generate a simple control systems dataset to understand its basic functionality.

from hwm.datasets import make_system_dynamics
import pandas as pd

# Generate a dataset with default parameters
data = make_system_dynamics()
print(data.head())
# Output:
#        time  input_signal  linear_output  response_output  control_effort  \
# 0  0.000000       0.000000        0.000000         0.000000         0.000000
# 1  0.010101       0.031416        0.028275         0.026581         0.025383
# 2  0.020202       0.062523        0.056271         0.052927         0.050767
# 3  0.030303       0.093243        0.083918         0.075527         0.084237
# 4  0.040404       0.123544        0.111189         0.099343         0.111544

   error_signal  power_consumption  response_rate  stability_metric     output
0       0.000000            0.000000        0.000000          1.000000  0.000000
1      -0.001694            0.000643        2.653534          0.983419  0.026581
2      -0.000927            0.002575        3.255982          0.947073  0.052927
3      -0.007391            0.007113        3.541963          0.915933  0.075527
4      -0.011201            0.012494        3.798764          0.905825  0.099343

Complex Example:

This example demonstrates the make_system_dynamics function in a multi-output control system scenario with sample weights.

from hwm.datasets import make_system_dynamics
import pandas as pd
import numpy as np

# Generate control systems data with custom parameters
data = make_system_dynamics(
    samples=1500,
    end_time=20,
    input_noise_level=0.1,
    control_noise_level=0.05,
    nonlinear_response=True,
    input_amplitude=2.0,
    input_frequency=1.0,
    system_gain=1.2,
    response_sensitivity=0.8,
    as_frame=True,
    return_X_y=False,
    split_X_y=False,
    target_names=["response_output"],
    test_size=0.25,
    seed=42
)

# Inspect the first few rows of the dataset
print(data.head())

# Access specific features
print(data[['input_signal',  'power_consumption']].head())

Explanation:

  • Parameter Customization: - Increased samples to 1500 and end_time to 20 seconds to simulate a longer duration. - Enhanced noise levels (input_noise_level=0.1, control_noise_level=0.05) to model more realistic variability. - Adjusted input_amplitude and input_frequency to change the input signal characteristics. - Modified system_gain and response_sensitivity to simulate different system dynamics.

  • Generated Features: - input_signal: Enhanced amplitude and frequency with added noise. - linear_output: Scaled input signal reflecting system gain. - response_output: Nonlinear transformation applied to the linear output with added control noise. - control_effort: Increased due to higher system gain and input amplitude. - power_consumption: Reflects the squared control effort, indicating higher energy usage. - stability_metric: Measures the deviation between nonlinear response and linear output.

Notes

  • The make_system_dynamics dataset is ideal for training and testing models in control systems analysis, especially those focusing on system identification, dynamics, and response prediction in the presence of both linear and nonlinear behaviors.

  • Proper selection of parameters like alpha, system_gain, and response_sensitivity is crucial for simulating realistic system dynamics.

  • The dataset can be returned in various formats (DataFrame, tuple, dictionary) based on the user’s needs, facilitating flexibility in downstream tasks.

References

See Also

  • hwm.utils.manage_data(): A utility for managing dataset structures.