- class rcognita.controllers.CtrlOptPred(dim_input, dim_output, mode='MPC', ctrl_bnds=[], action_init=[], t0=0, sampling_time=0.1, Nactor=1, pred_step_size=0.1, sys_rhs=[], sys_out=[], state_sys=[], prob_noise_pow=1, is_est_model=0, model_est_stage=1, model_est_period=0.1, buffer_size=20, model_order=3, model_est_checks=0, gamma=1, Ncritic=4, critic_period=0.1, critic_struct='quad-nomix', stage_obj_struct='quadratic', stage_obj_pars=[], observation_target=[])¶
Class of predictive optimal controllers, primarily model-predictive control and predictive reinforcement learning, that optimize a finite-horizon cost.
Currently, the actor model is trivial: an action is generated directly without additional policy parameters.
- dim_input, dim_output
Dimension of input and output which should comply with the system-to-be-controlled.
- Type
: integer
- mode¶
Controller mode. Currently available (\(\rho\) is the stage objective, \(\gamma\) is the discounting factor):
Controller modes¶ Mode
Cost function
‘MPC’ - Model-predictive control (MPC)
\(J_a \left( y_1, \{action\}_1^{N_a} \right)= \sum_{k=1}^{N_a} \gamma^{k-1} \rho(y_k, u_k)\)
‘RQL’ - RL/ADP via \(N_a-1\) roll-outs of \(\rho\)
\(J_a \left( y_1, \{action\}_{1}^{N_a}\right) = \sum_{k=1}^{N_a-1} \gamma^{k-1} \rho(y_k, u_k) + \hat Q^{\theta}(y_{N_a}, u_{N_a})\)
‘SQL’ - RL/ADP via stacked Q-learning
\(J_a \left( y_1, \{action\}_1^{N_a} \right) = \sum_{k=1}^{N_a-1} \hat \gamma^{k-1} Q^{\theta}(y_{N_a}, u_{N_a})\)
Here, \(\theta\) are the critic parameters (neural network weights, say) and \(y_1\) is the current observation.
Add your specification into the table when customizing the agent.
- Type
: string
- ctrl_bnds¶
Box control constraints. First element in each row is the lower bound, the second - the upper bound. If empty, control is unconstrained (default).
- Type
: array of shape
[dim_input, 2]
- action_init¶
Initial action to initialize optimizers.
- Type
: array of shape
[dim_input, ]
- t0¶
Initial value of the controller’s internal clock.
- Type
: number
- sampling_time¶
Controller’s sampling time (in seconds).
- Type
: number
- Nactor¶
Size of prediction horizon \(N_a\).
- Type
: natural number
- pred_step_size¶
Prediction step size in \(J_a\) as defined above (in seconds). Should be a multiple of
. Commonly, equals it, but here left adjustable for convenience. Larger prediction step size leads to longer factual horizon.- Type
: number
- sys_rhs, sys_out
Functions that represent the right-hand side, resp., the output of the exogenously passed model. The latter could be, for instance, the true model of the system. In turn,
represents the (true) current state of the system and should be updated accordingly. Parameterssys_rhs, sys_out, state_sys
are used in those controller modes which rely on them.- Type
: functions
- prob_noise_pow¶
Power of probing noise during an initial phase to fill the estimator’s buffer before applying optimal control.
- Type
: number
- is_est_model¶
Flag whether to estimate a system model. See
.- Type
: number
- model_est_stage¶
Initial time segment to fill the estimator’s buffer before applying optimal control (in seconds).
- Type
: number
- model_est_period¶
Time between model estimate updates (in seconds).
- Type
: number
- buffer_size¶
Size of the buffer to store data.
- Type
: natural number
- model_order¶
Order of the state-space estimation model
\[\begin{array}{ll} \hat x^+ & = A \hat x + B action, \newline observation^+ & = C \hat x + D action. \end{array}\]See
. This is just a particular model estimator. When customizing,_estimate_model()
may be changed and in turn the parametermodel_order
also. For instance, you might want to use an artifial neural net and specify its layers and numbers of neurons, in which casemodel_order
could be substituted for, say,Nlayers
.- Type
: natural number
- model_est_checks¶
Estimated model parameters can be stored in stacks and the best among the
last ones is picked. May improve the prediction quality somewhat.- Type
: natural number
- gamma¶
Discounting factor. Characterizes fading of stage objectives along horizon.
- Type
: number in (0, 1]
- Ncritic¶
Critic stack size \(N_c\). The critic optimizes the temporal error which is a measure of critic’s ability to capture the optimal infinite-horizon cost (a.k.a. the value function). The temporal errors are stacked up using the said buffer.
- Type
: natural number
- critic_period¶
The same meaning as
.- Type
: number
- critic_struct¶
Choice of the structure of the critic’s features.
Currently available:
Critic structures¶ Mode
Quadratic, no mixed terms
Quadratic, no mixed terms in input and output, i.e., \(w_1 y_1^2 + \dots w_p y_p^2 + w_{p+1} y_1 u_1 + \dots w_{\bullet} u_1^2 + \dots\), where \(w\) is the critic’s weight vector
Add your specification into the table when customizing the critic.
- Type
: natural number
- stage_obj_struct¶
Choice of the stage objective structure.
Currently available:
Critic structures¶ Mode
Quadratic \(\chi^\top R_1 \chi\), where \(\chi = [observation, action]\),
should be[R1]
4th order \(\left( \chi^\top \right)^2 R_2 \left( \chi \right)^2 + \chi^\top R_1 \chi\), where \(\chi = [observation, action]\),
should be[R1, R2]
Pass correct stage objective parameters in
(as a list)When customizing the stage objective, add your specification into the table above
- Type
: string
- 1
Osinenko, Pavel, et al. “Stacked adaptive dynamic programming with unknown system model.” IFAC-PapersOnLine 50.1 (2017): 4150-4155
- __init__(dim_input, dim_output, mode='MPC', ctrl_bnds=[], action_init=[], t0=0, sampling_time=0.1, Nactor=1, pred_step_size=0.1, sys_rhs=[], sys_out=[], state_sys=[], prob_noise_pow=1, is_est_model=0, model_est_stage=1, model_est_period=0.1, buffer_size=20, model_order=3, model_est_checks=0, gamma=1, Ncritic=4, critic_period=0.1, critic_struct='quad-nomix', stage_obj_struct='quadratic', stage_obj_pars=[], observation_target=[])¶
- Parameters
dim_input (: integer) – Dimension of input and output which should comply with the system-to-be-controlled.
dim_output (: integer) – Dimension of input and output which should comply with the system-to-be-controlled.
mode (: string) –
Controller mode. Currently available (\(\rho\) is the stage objective, \(\gamma\) is the discounting factor):
Controller modes¶ Mode
Cost function
’MPC’ - Model-predictive control (MPC)
\(J_a \left( y_1, \{action\}_1^{N_a} \right)= \sum_{k=1}^{N_a} \gamma^{k-1} \rho(y_k, u_k)\)
’RQL’ - RL/ADP via \(N_a-1\) roll-outs of \(\rho\)
\(J_a \left( y_1, \{action\}_{1}^{N_a}\right) = \sum_{k=1}^{N_a-1} \gamma^{k-1} \rho(y_k, u_k) + \hat Q^{\theta}(y_{N_a}, u_{N_a})\)
’SQL’ - RL/ADP via stacked Q-learning
\(J_a \left( y_1, \{action\}_1^{N_a} \right) = \sum_{k=1}^{N_a-1} \gamma^{k-1} \hat Q^{\theta}(y_{N_a}, u_{N_a})\)
Here, \(\theta\) are the critic parameters (neural network weights, say) and \(y_1\) is the current observation.
Add your specification into the table when customizing the agent .
ctrl_bnds (: array of shape
[dim_input, 2]
) – Box control constraints. First element in each row is the lower bound, the second - the upper bound. If empty, control is unconstrained (default).action_init (: array of shape
[dim_input, ]
) – Initial action to initialize optimizers.t0 (: number) – Initial value of the controller’s internal clock
sampling_time (: number) – Controller’s sampling time (in seconds)
Nactor (: natural number) – Size of prediction horizon \(N_a\)
pred_step_size (: number) – Prediction step size in \(J\) as defined above (in seconds). Should be a multiple of
. Commonly, equals it, but here left adjustable for convenience. Larger prediction step size leads to longer factual horizon.sys_rhs (: functions) – Functions that represent the right-hand side, resp., the output of the exogenously passed model. The latter could be, for instance, the true model of the system. In turn,
represents the (true) current state of the system and should be updated accordingly. Parameterssys_rhs, sys_out, state_sys
are used in those controller modes which rely on them.sys_out (: functions) – Functions that represent the right-hand side, resp., the output of the exogenously passed model. The latter could be, for instance, the true model of the system. In turn,
represents the (true) current state of the system and should be updated accordingly. Parameterssys_rhs, sys_out, state_sys
are used in those controller modes which rely on them.prob_noise_pow (: number) – Power of probing noise during an initial phase to fill the estimator’s buffer before applying optimal control.
is_est_model (: number) – Flag whether to estimate a system model. See
.model_est_stage (: number) – Initial time segment to fill the estimator’s buffer before applying optimal control (in seconds).
model_est_period (: number) – Time between model estimate updates (in seconds).
buffer_size (: natural number) – Size of the buffer to store data.
model_order (: natural number) –
Order of the state-space estimation model
\[\begin{array}{ll} \hat x^+ & = A \hat x + B action, \newline observation^+ & = C \hat x + D action. \end{array}\]See
. This is just a particular model estimator. When customizing,_estimate_model()
may be changed and in turn the parametermodel_order
also. For instance, you might want to use an artifial neural net and specify its layers and numbers of neurons, in which casemodel_order
could be substituted for, say,Nlayers
model_est_checks (: natural number) – Estimated model parameters can be stored in stacks and the best among the
last ones is picked. May improve the prediction quality somewhat.gamma (: number in (0, 1]) – Discounting factor. Characterizes fading of stage objectives along horizon.
Ncritic (: natural number) – Critic stack size \(N_c\). The critic optimizes the temporal error which is a measure of critic’s ability to capture the optimal infinite-horizon cost (a.k.a. the value function). The temporal errors are stacked up using the said buffer.
critic_period (: number) – The same meaning as
.critic_struct (: natural number) –
Choice of the structure of the critic’s features.
Currently available:
Critic feature structures¶ Mode
Quadratic, no mixed terms
Quadratic, no mixed terms in input and output, i.e., \(w_1 y_1^2 + \dots w_p y_p^2 + w_{p+1} y_1 u_1 + \dots w_{\bullet} u_1^2 + \dots\), where \(w\) is the critic’s weights
Add your specification into the table when customizing the critic.
stage_obj_struct (: string) –
Choice of the stage objective structure.
Currently available:
Running objective structures¶ Mode
Quadratic \(\chi^\top R_1 \chi\), where \(\chi = [observation, action]\),
should be[R1]
4th order \(\left( \chi^\top \right)^2 R_2 \left( \chi \right)^2 + \chi^\top R_1 \chi\), where \(\chi = [observation, action]\),
should be[R1, R2]
(dim_input, dim_output[, mode, …])- param dim_input
Dimension of input and output which should comply with the system-to-be-controlled.
(t, observation)Main method.
(state)Fetch exogenous model state.
(t0)Resets agent for use in multi-episode simulation.
(observation, action)Stage (equivalently, instantaneous or running) objective.
(observation, action)Sample-to-sample accumulated (summed up or integrated) stage objective.