rcognita.controllers.CtrlRLStab¶

class rcognita.controllers.CtrlRLStab(dim_input, dim_output, mode='JACS', ctrl_bnds=[], action_init=[], t0=0, sampling_time=0.1, Nactor=1, pred_step_size=0.1, sys_rhs=[], sys_out=[], state_sys=[], prob_noise_pow=1, is_est_model=0, model_est_stage=1, model_est_period=0.1, buffer_size=20, model_order=3, model_est_checks=0, gamma=1, Ncritic=4, critic_period=0.1, critic_struct='quad-nomix', actor_struct='quad-nomix', stage_obj_struct='quadratic', stage_obj_pars=[], observation_target=[], safe_ctrl=[], safe_decay_rate=[])¶

Class of reinforcement learning agents with stabilizing constraints.

Sampling here is similar to the predictive controller agent CtrlOptPred

Needs a nominal controller object safe_ctrl with a respective Lyapunov function.

w_actor : weights.

Feature structure is defined via a string flag actor_struct. Read more on features in class description of controllers.CtrlOptPred.

w_critic : weights.

Feature structure is defined via a string flag critic_struct. Read more on features in class description of controllers.CtrlOptPred.

mode¶

Controller mode. Currently available only JACS, joint actor-critic (stabilizing).

Type: : string

Read more

---------

Osinenko, P., Beckenbach, L., Göhrt, T., & Streif, S. (2020). A reinforcement learning method with closed-loop stability guarantee. IFAC-PapersOnLine

__init__(dim_input, dim_output, mode='JACS', ctrl_bnds=[], action_init=[], t0=0, sampling_time=0.1, Nactor=1, pred_step_size=0.1, sys_rhs=[], sys_out=[], state_sys=[], prob_noise_pow=1, is_est_model=0, model_est_stage=1, model_est_period=0.1, buffer_size=20, model_order=3, model_est_checks=0, gamma=1, Ncritic=4, critic_period=0.1, critic_struct='quad-nomix', actor_struct='quad-nomix', stage_obj_struct='quadratic', stage_obj_pars=[], observation_target=[], safe_ctrl=[], safe_decay_rate=[])¶

Parameter specification largely resembles that of CtrlOptPred class.

Parameters

dim_input (: integer) – Dimension of input and output which should comply with the system-to-be-controlled.
dim_output (: integer) – Dimension of input and output which should comply with the system-to-be-controlled.
ctrl_bnds (: array of shape [dim_input, 2]) – Box control constraints. First element in each row is the lower bound, the second - the upper bound. If empty, control is unconstrained (default).
action_init (: array of shape [dim_input, ]) – Initial action to initialize optimizers.
t0 (: number) – Initial value of the controller’s internal clock.
sampling_time (: number) – Controller’s sampling time (in seconds).
sys_rhs (: functions) – Functions that represent the right-hand side, resp., the output of the exogenously passed model. The latter could be, for instance, the true model of the system. In turn, state_sys represents the (true) current state of the system and should be updated accordingly. Parameters sys_rhs, sys_out, state_sys are used in those controller modes which rely on them.
sys_out (: functions) – Functions that represent the right-hand side, resp., the output of the exogenously passed model. The latter could be, for instance, the true model of the system. In turn, state_sys represents the (true) current state of the system and should be updated accordingly. Parameters sys_rhs, sys_out, state_sys are used in those controller modes which rely on them.
prob_noise_pow (: number) – Power of probing noise during an initial phase to fill the estimator’s buffer before applying optimal control.
is_est_model (: number) – Flag whether to estimate a system model. See _estimate_model().
model_est_stage (: number) – Initial time segment to fill the estimator’s buffer before applying optimal control (in seconds).
model_est_period (: number) – Time between model estimate updates (in seconds).
buffer_size (: natural number) – Size of the buffer to store data.
model_order (: natural number) –
Order of the state-space estimation model

\[\begin{array}{ll} \hat x^+ & = A \hat x + B action, \newline observation^+ & = C \hat x + D action. \end{array}\]

See _estimate_model(). This is just a particular model estimator. When customizing, _estimate_model() may be changed and in turn the parameter model_order also. For instance, you might want to use an artifial neural net and specify its layers and numbers of neurons, in which case model_order could be substituted for, say, Nlayers, Nneurons.
model_est_checks (: natural number) – Estimated model parameters can be stored in stacks and the best among the model_est_checks last ones is picked. May improve the prediction quality somewhat.
gamma (: number in (0, 1]) – Discounting factor. Characterizes fading of stage objectives along horizon.
Ncritic (: natural number) – Critic stack size \(N_c\). The critic optimizes the temporal error which is a measure of critic’s ability to capture the optimal infinite-horizon objective (a.k.a. the value function). The temporal errors are stacked up using the said buffer.
critic_period (: number) – The same meaning as ``model_est_period`.`
critic_struct (: string) –
Choice of the structure of the critic’s and actor’s features.

Currently available:

Feature structures¶

Mode

Structure

’quad-lin’

Quadratic-linear

’quadratic’

Quadratic

’quad-nomix’

Quadratic, no mixed terms

Add your specification into the table when customizing the actor and critic.
actor_struct (: string) –
Choice of the structure of the critic’s and actor’s features.

Currently available:

Feature structures¶

Mode

Structure

’quad-lin’

Quadratic-linear

’quadratic’

Quadratic

’quad-nomix’

Quadratic, no mixed terms

Add your specification into the table when customizing the actor and critic.

stage_obj_struct (: string) –

Choice of the stage objective structure.

Currently available:

Running objective structures¶
Mode	Structure
’quadratic’	Quadratic \(\chi^\top R_1 \chi\), where \(\chi = [observation, action]\), `stage_obj_pars` should be `[R1]`
’biquadratic’	4th order \(\left( \chi^\top \right)^2 R_2 \left( \chi \right)^2 + \chi^\top R_1 \chi\), where \(\chi = [observation, action]\), `stage_obj_pars` should be `[R1, R2]`

Methods

`__init__`(dim_input, dim_output[, mode, …])	Parameter specification largely resembles that of `CtrlOptPred` class.
`compute_action`(t, observation)
`receive_sys_state`(state)	Fetch exogenous model state.
`reset`(t0)	Resets agent for use in multi-episode simulation.
`stage_obj`(observation, action)	Stage (equivalently, instantaneous or running) objective.
`upd_accum_obj`(observation, action)	Sample-to-sample accumulated (summed up or integrated) stage objective.

Feature structures¶
Mode	Structure
’quad-lin’	Quadratic-linear
’quadratic’	Quadratic
’quad-nomix’	Quadratic, no mixed terms

Feature structures¶
Mode	Structure
’quad-lin’	Quadratic-linear
’quadratic’	Quadratic
’quad-nomix’	Quadratic, no mixed terms