- 勉強内容をnotebooksにまとめてます。
- 全てのnotebookは独立させています。
- 内容は勉強中なので正しいとは限りません。内容が怪しいやつ(Notationの不備など)はTODOがついてます。
- 実装ではしばしば高速化のためにjaxを使用してます。
poetryを使ってください。
git clone [email protected]:syuntoku14/Shumi-Note.git && cd Shumi-Note
poetry install
- NDA的にここに出せないやつは****になってます。(出せるようになったらここにも現れるかも)
- (2023/1/18) Linear MDP: notebooks/RL_linear_MDP.ipynb
- (2023/1/21) 測度論的確率論(TODO): notebooks/PROB_measure_theoretic_probability.ipynb
- (2023/1/26) 確率過程(TODO): notebooks/PROB_probability_process.ipynb
- (2023/1/28) ルベーグ積分: notebooks/PROB_lebesgue_integral.ipynb
- (2023/1/28) 上極限、下極限: notebooks/PROB_liminf_limsup.ipynb
- (2023/1/29) 極値分布: notebooks/PROB_extreme_value_distribution.ipynb
- (2023/1/29) 特性関数: notebooks/PROB_characteristic_function.ipynb
- (2023/1/29) 確率積分(TODO): notebooks/PROB_stochastic_integration.ipynb
- (2023/1/29) バンディットアルゴリズムの基本: notebooks/BANDIT_basics.ipynb
- (2023/1/30) テイラーの定理: notebooks/MATH_taylor_theorem.ipynb
- (2023/1/31) 伊藤積分 & 確率微分方程式の実験(TODO): notebooks/PROB_stochastic_integration.ipynb
- (2023/2/1) Girsanovの定理: notebooks/PROB_stochastic_integration.ipynb
- (2023/2/4) ガウス過程回帰: notebooks/PROB_gp_regression.ipynb
- (2023/2/5) 敵対的バンディット(TODO): notebooks/BANDIT_basics.ipynb
- (2023/2/5) マルチステップ強化学習 (On-policy推定編): notebooks/RL_multi_step.ipynb
- (2023/2/6) マルチステップ強化学習 (Off-policy推定編): notebooks/RL_multi_step.ipynb
- (2023/2/6) マルチステップ強化学習 (マルチステップ制御編): notebooks/RL_multi_step.ipynb
- (2023/2/7) MDPについて (Garnet MDP): notebooks/RL_Markov_Decision_Process.ipynb
- (2023/2/8) 文脈付きバンディット (TODO): notebooks/BANDIT_General_contextual.ipynb
- (2023/2/9) 線形バンディット: notebooks/BANDIT_General_contextual.ipynb
- (2023/2/10) 適合価値反復法: notebooks/RL_General_fitted_Q_iteration.ipynb
- (2023/2/12) Generalized RL (適合Q学習などの一般化): notebooks/generalied_RL.ipynb
- (2023/2/13) Generalized RL (確率的な作用素で一般化): notebooks/generalied_RL.ipynb
- (2023/2/14~17) マルチステップRLのスライド: slides/RL_multi_step.pdf
- (2023/2/19) 方策勾配法 (マルチステップRL): notebooks/RL_PolicyGrad_multi-step_experiment.ipynb
- (2023/2/20~21) Transformer: notebooks/NN_transformer.ipynb
- (2023/2/22) Reward Free RL: notebooks/RL_reward_free.ipynb
- (2023/2/23) 教育用強化学習修行Notebook (行列形式の動的計画法編): notebooks/RL_Exercise.ipynb
- (2023/2/23) Reward Free RL (RF-EXPRESS): notebooks/RL_reward_free.ipynb
- (2023/2/24) **** : ****.ipynb
- (2023/2/24) Self-Normalized Bound for Vector Valued Martingales (途中): notebooks/BANDIT_General_linear_improved.ipynb
- (2023/2/25) YadokoriとDaniのバンディットアルゴリズムの比較: notebooks/BANDIT_General_linear_improved.ipynb
- (2023/2/26) 探索の理論 (UCB編): notebooks/BANDIT_UCB_regret_proof.ipynb
- (2023/2/27-28) 探索の理論 (UCB-VI編): notebooks/RL_UCB_VI_regret_proof.ipynb
- (2023/2/28-3/3) **** : ****
- (2023/3/4) 探索の理論 (UCB-VIのBernstein版): notebooks/RL_UCB_VI_regret_proof.ipynb
- (2023/3/4) Q学習の理論 (UCB-H編): notebooks/RL_UCB_H_regret_proof.ipynb
- (2023/3/5) 遷移確率の推定について: notebooks/RL_transition_estimation_proofs.ipynb
- (2023/3/6) **** : ****.ipynb
- (2023/3/7) Task-agnostic探索の理論: notebooks/RL_reward_free_task_agnostic.ipynb
- (2023/3/8) ロバストMDP: notebooks/RL_robust_MDP.ipynb
- (2023/3/10) ロバストMDPの理論(モデルベース&Generative model): notebooks/RL_robust_MDP.ipynb
- (2023/3/12) 強化学習と線形計画問題: notebooks/RL_Convex_as_LP.ipynb
- (2023/3/12) 強化学習の便利な関数: notebooks/RL_utils.ipynb
- (2023/3/13) マルチステップRLのスライド(簡単版): slides/RL_multi_step_easy.pdf
- (2023/3/13) TODO: ロバストMDPの理論(正則化との関係): notebooks/RL_robust_MDP_and_regularization.ipynb
- (2023/3/14) TODO: 模倣学習: notebooks/RL_imitation_learning.ipynb
- (2023/3/17) 最尤推定:notebooks/PROB_maximum_likelihood.ipynb
- (2023/3/20) 凸集合:notebooks/CVX_convex_sets.ipynb
- (2023/3/21) 凸集合:notebooks/CVX_convex_sets.ipynb
- (2023/3/22) 強化学習のサンプル効率の下界:notebooks/RL_LowerBound_statistical_limits.ipynb
- (2023/3/20~2023/3/23) : 読書:ソフトウェア見積り 人月の暗黙知を解き明かす
- (2023/3/24) **** : ****.ipynb
- (2023/3/25) 強化学習のサンプル効率の下界(Linear Realizable編):notebooks/RL_LowerBound_statistical_limits.ipynb
- (2023/3/27) TODO: 強化学習とエントロピー正則化(途中):notebooks/RL_entropy_regularization.ipynb
- (2023/3/28) 凸関数(共役関数とか):notebooks/CVX_convex_functions.ipynb
- (2023/3/29) Approximate Dynamic Programming:notebooks/RL_approximate_dynamic_programming.ipynb
- (2023/3/30) TODO: Approximate Dynamic Programming(正則化あり):notebooks/RL_approximate_dynamic_programming.ipynb
- (2023/4/01) 最小楕円問題:notebooks/CVX_minimum_volume_ellipsoids.ipynb
- (2023/4/02) 他変量関数の微分:notebooks/MATH_multivariate_derivative.ipynb
- (2023/4/03) 最小楕円問題のアルゴリズム:notebooks/CVX_MVEE_algorithm.ipynb
- (2023/4/04) 最小楕円問題とCore-set:notebooks/CVX_MVEE_algorithm.ipynb
- (2023/4/05) **** : ****.ipynb
- (2023/4/06) **** : ****.ipynb
- (2023/4/07) 行列と行列式(途中): LA_matrix_determinant.ipynb
- (2023/4/07) エントロピー最大化と探索(途中): RL_reward_free_max_ent.ipynb
- (2023/4/08) 強化学習の便利な関数(有限ホライゾン): notebooks/RL_utils.ipynb
- (2023/4/09) 強化学習の便利な関数(探索用): notebooks/RL_utils.ipynb
- (2023/4/09) エントロピー最大化と探索(EntGameアルゴリズム): RL_reward_free_max_ent.ipynb
- (2023/4/10) 凸関数(Bregman Divergenceとか):notebooks/CVX_convex_functions.ipynb
- (2023/4/11) 凸関数(Projectionについて):notebooks/CVX_convex_functions.ipynb
- (2023/4/12) バンディットの便利な関数: notebooks/BANDIT_utils.ipynb
- (2023/4/13) 敵対的バンディット: notebooks/BANDIT_adversarial.ipynb
- (2023/4/14) RL Theory Book翻訳: 特徴付きMDP
- (2023/4/16) Fenchelの双対定理:notebooks/CVX_convex_functions.ipynb
- (2023/4/17) RLと線型計画問題(方策評価):notebooks/RL_Convex_as_LP.ipynb
- (2023/4/19) 置換と基本行列: notebooks/LA_matrix_determinant.ipynb
- (2023/4/20) クロネッカー積: notebooks/LA_matrix_determinant.ipynb
- (2023/4/21) 行列式: notebooks/LA_matrix_determinant.ipynb
- (2023/4/22) Linear MDPでのサンプル効率の下界: notebooks/RL_LowerBound_linearMDP.ipynb
- (2023/4/26) 余因子行列: notebooks/LA_matrix_determinant.ipynb
- (2023/4/27) ロバストQ学習: notebooks/RL_robust_Q_learning.ipynb
- (2023/4/28) ゲルシュゴリンの定理: notebooks/LA_Gershgorin_circle_theorem.ipynb
- (2023/4/30) 最小楕円問題と最小二乗法 :notebooks/CVX_minimum_volume_ellipsoids.ipynb
- (2023/5/01 ~ 2023/5/07) ****.ipynb
- (2023/5/08) 敵対的線形バンディット: notebooks/BANDIT_adversarial_linear.ipynb
- (2023/5/09) CPIのスライドの追加: slides/RL_CPI.pdf
- (2023/5/09) CVIのスライドの追加: slides/RL_CVI.pdf
- (2023/5/09) コンパクト集合: notebooks/MATH_compact_set.ipynb
- (2023/5/09) Kiefer-Wolfowitzの定理: notebooks/BANDIT_Kiefer_Wolfowitz.ipynb
- (2023/5/10) 模倣学習の証明の修正: notebooks/RL_imitation_learning.ipynb
- (2023/5/11) 制約付きMDP(途中): notebooks/RL_CMDP.ipynb
- (2023/5/11) 最小楕円問題のアルゴリズムとKYの初期化:notebooks/CVX_MVEE_algorithm.ipynb
- (2023/5/14) エピソディック有限ホライゾンRL(途中): notebooks/RL_episodic_finite_horizon.ipynb
- (2023/5/17) 統計的学習理論(途中): notebooks/MATH_statistical_learning_theory.ipynb
- (2023/5/18) VC次元: notebooks/MATH_complexity_of_hypothesis.ipynb
- (2023/5/18) Binet-Cauchyの公式とか: notebooks/LA_matrix_determinant.ipynb
- (2023/5/19) VC次元の続き: notebooks/MATH_complexity_of_hypothesis.ipynb
- (2023/5/19) ランクについて: notebooks/LA_matrix_rank.ipynb
- (2023/5/21) ラデマッハ複雑度とタラグランドの補題: notebooks/MATH_complexity_of_hypothesis.ipynb
- (2023/5/21) 正定値対称行列: notebooks/LA_matrix_definite.ipynb
- (2023/5/22) Linear MDPとMDVI: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
- (2023/5/24) 制約付きMDP(OptCMDP): notebooks/RL_CMDP_explore_exploit_LP.ipynb
- (2023/5/24) Schur標準形、Rayleigh商: notebooks/LA_matrix_definite.ipynb
- (2023/5/25) 有限ホライゾンでの線型計画法: notebooks/RL_Convex_as_LP_finite_horizon.ipynb
- (2023/5/25) OptCMDPの実装: notebooks/RL_CMDP_explore_exploit_LP.ipynb
- (2023/5/27) ベクトル空間: notebooks/LA_vector_space.ipynb
- (2023/5/28) 計量、内積: notebooks/LA_vector_space.ipynb
- (2023/5/28) 標準形: notebooks/LA_normal_form.ipynb
- (2023/5/29) PAC-Bayes: notebooks/MATH_PAC_Bayes.ipynb
- (2023/5/29) Stabilityと汎化誤差バウンド: notebooks/MATH_PAC_stability.ipynb
- (2023/5/30) 階数標準形の実装: notebooks/LA_normal_form.ipynb
- (2023/5/31) 制約違反なしのCMDP: notebooks/RL_CMDP_explore_exploit_LP.ipynb
- (2023/6/01) 整数行列: notebooks/LA_integer_matrix.ipynb
- (2023/6/01) 行列形式の有限ホライゾンでのLP: notebooks/RL_Convex_as_LP_finite_horizon.ipynb
- (2023/6/02) 双対法によるCMDPの解法: notebooks/RL_CMDP_dual.ipynb
- (2023/6/06) ロバスト最適化: notebooks/OPT_robust_optimization.ipynb
- (2023/6/07) 行列形式の有限ホライゾンでのLP(主問題): notebooks/RL_Convex_as_LP_finite_horizon.ipynb
- (2023/6/08) 凸最適化としてのロバストMDP: notebooks/RL_robust_MDP_convex..ipynb
- (2023/6/08) 制約付きMDPと強双対性: notebooks/RL_CMDP_zero_duality_gap.ipynb
- (2023/6/09) 占有率の集合の凸性: notebooks/RL_occupancy_measure.ipynb
- (2023/6/11) ロバストMDPの理論(正則化との関係): notebooks/RL_robust_MDP_and_regularization.ipynb
- (2023/6/12) 線形計画法のContraction Lemmaによる変形の注記: notebooks/RL_Convex_as_LP.ipynb
- (2023/6/13) 最適化と双対問題:notebooks/CVX_duality.ipynb
- (2023/6/14) ロバストMDPでの完全双対性について文献の補足 :notebooks/RL_Convex_as_LP_finite_horizon と notebooks/RL_robust_MDP
- (2023/6/15) 劣勾配法: notebooks/OPT_subgradient.ipynb
- (2023/6/16) 勾配法: notebooks/OPT_gradient.ipynb
- (2023/6/21) ロバストMDPと確率的制約: notebooks/OPT_robust_chance_constraint.ipynb
- (2023/7/01) 制約付き最適化(最適性条件): notebooks/OPT_constraint.ipynb
- (2023/7/05) 感度解析: notebooks/CVX_sensitivity_analysis.ipynb
- (2023/7/06) ロバストMDPにおけるRectangularity: notebooks/RL_robust_rectangularity.ipynb
- (2023/7/12) Danskinの定理: notebooks/CVX_Danskin_theorem.ipynb
- (2023/7/13) 凸関数は連続: notebooks/CVX_convex_functions.ipynb
- (2023/7/24) 制約付きMDPとラグランジュ関数(途中): notebooks/RL_CMDP_Lagrange.ipynb
- (2023/7/25) 制約付きMDPとラグランジュ関数: notebooks/RL_CMDP_Lagrange.ipynb
- (2023/7/29) 連合Q学習: notebooks/RL_federated_Q.ipynb
- (2023/7/30) PACベイズ制御: notebooks/CONTROL_PAC_Bayes.ipynb
- (2023/8/04) 制約付きMDPと強双対性(修正): notebooks/RL_CMDP_zero_duality_gap.ipynb
- (2023/8/07) ロバストMDPと強双対性: notebooks/RL_robust_MDP_zero_duality.ipynb
- (2023/8/11) 凸ではない価値関数: notebooks/RL_value_function.ipynb
- (2023/8/15) PACベイズとメタ学習: notebooks/MATH_PAC_Bayes_Meta_Learning.ipynb
- (2023/8/16) 色々なMinimax双対定理と証明: notebooks/MATH_minimax_theorems.ipynb
- (2023/8/17) FanのMinimax定理について: notebooks/MATH_minimax_theorems.ipynb
- (2023/8/20) Minimax双対性の必要条件と十分条件について: notebooks/MATH_minimax_conditions.ipynb
- (2023/8/21) バリア関数法: notebooks/MATH_barrier_function_method.ipynb
- (2023/8/24) ロバストMDPとNP困難: notebooks/RL_robust_MDP_NP_hard.ipynb
- (2023/8/25) Policy searchとNP困難: notebooks/RL_Policy_search_NP_hard.ipynb
- (2023/8/26) 凸ではない価値関数について追記: notebooks/RL_value_function.ipynb
- (2023/8/28) マルチタスク模倣学習と表現学習: notebooks/RL_multi_task_imitation_learning.ipynb
- (2023/8/30) MDPにおける安全な探索: notebooks/RL_CMDP_safe_exploration.ipynb
- (2023/9/01) CMDPにおける動的計画法: notebooks/RL_CMDP_by_DP.ipynb
- (2023/9/03) CMDPにおける動的計画法(non-stationary): notebooks/RL_CMDP_by_DP_non_stationary.ipynb
- (2023/9/04) CMDPにおける動的計画法(non-stationary)の修正: notebooks/RL_CMDP_by_DP_non_stationary.ipynb
- (2023/9/08) 制約付きMDPと強双対性(エントロピー): notebooks/RL_CMDP_zero_duality_gap_entropy.ipynb
- (2023/9/15) 整数計画問題とNP困難: notebooks/MATH_integer_programming_is_NP_hard.ipynb
- (2023/9/16) CMDPの実行可能解を見つけるのはNP困難: notebooks/RL_CMDP_feasibility_NP_hard.ipynb
- (2023/9/17) CMDPの実行可能解を見つけるのはNP困難(更新): notebooks/RL_CMDP_feasibility_NP_hard.ipynb
- (2023/9/24) LQR: notebooks/RL_LQR.ipynb
- (2023/9/25) LQRと半正定値計画問題: notebooks/RL_LQR_as_SDP.ipynb
- (2023/9/25) 制約付きLQR: notebooks/RL_LQR_safe.ipynb
- (2023/9/28) Mean-Variance MDPとNP困難: notebooks/RL_mean_variance_MDP_NP_hard.ipynb
- (2023/9/28) Domain-Randomizationの数理(途中): notebooks/RL_multi_task_domain_randomization.ipynb
- (2023/9/29) Convex MDPについて: notebooks/RL_Convex_MDP.ipynb
- (2023/10/1) LQRの方策勾配法: notebooks/RL_LQR_policy_gradient.ipynb
- (2023/10/1) 線形システムにおけるSystem Level Synthesis: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/1) LQRによる経路追従(途中で諦め): notebooks/RL_LQR_path_tracking.ipynb
- (2023/10/2) LQRにおけるダイナミクスの推定とロバストな制御: notebooks/RL_LQR_estimation_and_robustness.ipynb
- (2023/10/3) 文脈付きMDP: notebooks/RL_multi_task_contextual_MDP.ipynb
- (2023/10/4) マルチタスク強化学習: notebooks/RL_multi_task.ipynb
- (2023/10/4) Latent MDP: notebooks/RL_multi_task_latent_MDP.ipynb
- (2023/10/4) マルチタスクMDPにおける文脈の推定: notebooks/RL_multi_task_context_identification.ipynb
- (2023/10/8) LQRの便利な関数: notebooks/RL_LQR_utils.ipynb
- (2023/10/9) System Level Synthesisの導出について: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/10) Robust System Level Synthesis: notebooks/RL_LQR_robust_synthesis.ipynb
- (2023/10/13) System Level Synthesisの説明: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/16) SLSによるH∞制御: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/16) SLSによるロバスト制御: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/16) SLSによるノイズあり制約付き制御: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/18) RLにおける汎化の難しさ: notebooks/RL_multi_task_generalization_intractable.ipynb
- (2023/10/18) 観測だけからの模倣学習: notebooks/RL_imitation_from_observation.ipynb
- (2023/10/18) 報酬設計によるサンプル効率の向上: notebooks/RL_reward_shaping.ipynb
- (2023/10/19) 平均報酬強化学習: notebooks/RL_AverageReward.ipynb
- (2023/10/20) 強化学習とリプシッツ連続: notebooks/RL_Continuous_Lipschitz.ipynb
- (2023/10/24) RLHFの理論: notebooks/RL_RLHF.ipynb
- (2023/10/25) データソースが複数ある場合のオフライン強化学習: notebooks/RL_offline_perturbed_data.ipynb
- (2023/10/26) データを利用したロバスト最適化(微妙): notebooks/OPT_robust_learning_uncertainty_set_deprecated.ipynb
- (2023/10/29) データを利用したロバスト最適化: notebooks/OPT_robust_learning_based.ipynb
- (2023/11/1) カーネル強化学習: notebooks/RL_Continuous_Kernel.ipynb
- (2023/11/3) Bellman rank: notebooks/RL_General_Bellman_rank.ipynb
- (2023/11/3) モデルフリー平均報酬強化学習: notebooks/RL_AverageReward_model_free.ipynb
- (2023/11/3) Bilinear class: notebooks/RL_Bilinear_class.ipynb
- (2023/11/4) 制約違反なしのCMDP: notebooks/RL_CMDP_zero_constraint_violation.ipynb
- (2023/11/5) Witness rank: notebooks/RL_witness_rank.ipynb
- (2023/11/6) RLとFenchel Rockafellar双対性とDICE: notebooks/RL_Convex_Fenchel_Duality_and_DICE.ipynb
- (2023/11/7) 制約違反なしのCMDPの証明の続き(まだ途中): notebooks/RL_CMDP_zero_constraint_violation.ipynb
- (2023/11/8) RLの証明で便利な定理: notebooks/RL_useful_lemma.ipynb
- (2023/11/8) CMDPでの探索の証明の修正: notebooks/RL_CMDP_explore_exploit_LP.ipynb
- (2023/11/9) CMDPでの双対法のリグレット: notebooks/RL_CMDP_explore_exploit_dual.ipynb
- (2023/11/10) カーネル強化学習のリグレット解析(途中): notebooks/RL_Continuous_Kernel.ipynb
- (2023/11/16) 一様PAC: notebooks/RL_uniform_PAC.ipynb
- (2023/11/19) CMDPでの主双対法のリグレット: notebooks/RL_CMDP_explore_exploit_primal_dual.ipynb
- (2023/11/19) CMDPでのLast-iterate-convergence: notebooks/RL_CMDP_last_iterate_convergence.ipynb
- (2023/11/20) 一様PACのアルゴリズムの続き: notebooks/RL_uniform_PAC.ipynb
- (2023/11/23) 裾確率とSubgaussian: notebooks/PROB_sub_gaussian.ipynb
- (2023/11/23) 一様な集中不等式について: notebooks/MATH_uniform_concentration_inequality.ipynb (むずくて全然読めてない.TODO)
- (2023/11/24) 一様PACの証明: notebooks/RL_uniform_PAC.ipynb
- (2023/11/25) Last-iterate-convergenceの証明の補足: notebooks/RL_CMDP_last_iterate_convergence.ipynb
- (2023/11/28) CMDPでのアルゴリズム(DOPE): notebooks/RL_CMDP_DOPE.ipynb
- (2023/12/01) Bilinear Classの一般化: notebooks/RL_Extended_Bilinear_class.ipynb
- (2023/12/05) CMDPでの主双対法の実装: notebooks/RL_CMDP_explore_exploit_primal_dual.ipynb
- (2023/12/29) 統計的学習理論の修正: notebooks/MATH_statistical_learning_theory.ipynb
- (2023/12/30) 実現可能と仮説集合が無限の統計的学習理論: notebooks/MATH_statistical_learning_theory.ipynb
- (2024/01/11) バンディットとEluder dimension: notebooks/BANDIT_General_Eluder_dimension.ipynb
- (2024/01/12) RLとEluder dimension: notebooks/RL_General_Eluder_dimension.ipynb
- (2024/01/25) Lagrange法と正則化: notebooks/RL_CMDP_Lagrange_regularization.ipynb
- (2024/01/29) モデルベースのEluder dimension: notebooks/RL_General_Eluder_dimension.ipynb
- (2024/01/31) Bellman Eluder dimension: notebooks/RL_General_Eluder_dimension.ipynb
- (2024/02/01) Strong dualityの証明:notebooks/CVX_duality.ipynb
- (2024/02/02) Slater条件とSionのMinimax定理について補足:notebooks/CVX_duality.ipynb
- (2024/02/03) Decision Estimation Coefficient:notebooks/RL_General_DEC.ipynb
- (2024/02/04) E2Dアルゴリズム:notebooks/RL_General_DEC.ipynb
- (2024/02/07) バンディットとEluder dimensionの修正:notebooks/BANDIT_General_Eluder_dimension.ipynb
- (2024/02/10) 動的計画法の入門スライド:slides/RL_DP_introduction.ipynb
- (2024/02/10) 動的計画法の入門ノートブック:notebook/RL_Exercise_DP.ipynb
- (2024/02/13) 強化学習とABCクラス:notebook/RL_General_ABC_class.ipynb
- (2024/02/14) ABCクラスの証明の修正:notebook/RL_General_ABC_class.ipynb
- (2024/02/15) GOLFアルゴリズムとその証明:notebook/RL_General_Eluder_dimension.ipynb
- (2024/02/21) 平均報酬とTUCRLアルゴリズム(途中):notebook/RL_AverageReward_TUCRL.ipynb
- (2024/02/21) 平均報酬とUCRL2アルゴリズム(途中):notebook/RL_AverageReward_UCRL2.ipynb
- (2024/03/1) Bellman Eluder dimensionの証明の続き: notebooks/RL_General_Eluder_dimension.ipynb
- (2024/03/2) 強化学習のサンプル効率の下界(修正):notebooks/RL_LowerBound_statistical_limits.ipynb
- (2024/03/6) Foundation of RLの翻訳:notebooks/RL_General_Foundation_of_RL.ipynb
- (2024/03/7) Foundation of RLの翻訳:notebooks/RL_General_Foundation_of_RL.ipynb
- (2024/03/10) 平均報酬強化学習: notebooks/RL_AverageReward.ipynb
- (2024/03/11) Low Inherent Bellman Error: notebooks/RL_General_linear_Bellman_completeness.ipynb
- (2024/03/13) Bellman rankとBilinear classの例の補足
- (2024/03/14) マルコフ連鎖の用語について: notebooks/RL_AverageReward.ipynb
- (2024/03/15) Bellman completenessと$\pi$-realizableの関係の証明: notebooks/RL_General_linear_Bellman_completeness.ipynb
- (2024/03/17) Safe RLの安全性いろいろ: notebooks/RL_CMDP_general_safe_problem.ipynb
- (2024/03/19) モデルフリーの下界について追記: notebooks/RL_General_witness_rank.ipynb
- (2024/03/20) モデルフリーの下界の証明: notebooks/RL_General_witness_rank.ipynb
- (2024/03/20) モデルベース・方策ベース・価値ベースのrealizabilityについて(途中): notebooks/RL_Model_based_vs_Model_free.ipynb
- (2024/04/01) Safe RLの安全性いろいろの追記: notebooks/RL_CMDP_general_safe_problem.ipynb
- (2024/04/05) RLにおける正則化と最適化について: notebooks/RL_Convex_regularized_optimization.ipynb
- (2024/04/11) ノルム線形空間: notebooks/LA_normed_linear_space.ipynb
- (2024/04/12) ガウス=ザイデル価値反復法: notebooks/RL_Gauss_Seidel_VI.ipynb
- (2024/04/12) occupancy measureとnon-markovianな方策: notebooks/RL_occupancy_measure.ipynb
- (2024/04/16) V-Bellman rankについて追記: notebooks/RL_General_Bellman_rank.ipynb
- (2024/05/01) 有界でない報酬や状態でのValue Iteration (途中): notebooks/RL_value_unbounded.ipynb
- (2024/05/04) R-contamination モデルは割引率を下げたものと一緒: notebooks/RL_robust_R_contamination.ipynb
- (2024/05/04) SA-rectangular Robust MDPでの方策勾配(途中): notebooks/RL_robust_sa_gradient.ipynb
- (2024/05/06) TODO: RL_useful_lemmaの整理
- (2024/05/07) Mirror descentとSimulation lemma : notebooks/RL_useful_lemma/Mirror_descent.ipynb と notebooks/RL_useful_lemma/RL_simulation_lemma.ipynb
- (2024/05/10) Mirror descentのマシなバウンド: notebooks/RL_useful_lemma/Mirror_descent.ipynb
- (2024/05/13) Soft NPGの収束: notebooks/RL_PolicyGrad_convergence_rate.ipynb
- (2024/05/14) ロバストMDPでの自然方策勾配: notebooks/RL_robust_sa_gradient.ipynb
- (2024/05/22) R-contamination モデルは割引率を下げたものと一緒 の補足: notebooks/RL_robust_R_contamination.ipynb
- (2024/05/23) 方策勾配法の収束について: notebooks/RL_PolicyGrad_convergence_rate.ipynb とnotebooks/RL_useful_lemma/Optimization_standard.ipynb
- (2024/05/24) 全微分とフレシェ微分: notebooks/OPT_gradient.ipynb
- (2024/05/26) ペナルティ関数法: notebooks/OPT_constraint.ipynb
- (2024/05/26) ロバスト方策勾配法の収束: notebooks/RL_robust_policy_gradient.ipynb
- (2024/05/28) ラデマッハ複雑度と仮説集合: notebooks/MATH_Generalization_infinite_hypothesis_Rademacher.ipynb
- (2024/05/28) 正則化された仮説集合のラデマッハ複雑度: notebooks/MATH_Generalization_L2_L1_regularization.ipynb
- (2024/05/31) ロバスト方策勾配法の収束の改善: notebooks/RL_robust_policy_gradient.ipynb
- (2024/05/31) 弱凸性とモーロー包: notebooks/CVX_weakly_convex_and_Moreau_envelope.ipynb
- (2024/06/01) ロバスト方策勾配法の収束の改善: notebooks/RL_robust_policy_gradient.ipynb
- (2024/06/03) SVMの汎化誤差: notebooks/MATH_Generalization_margin_loss_and_SVM.ipynb
- (2024/06/12) 他クラス分類の汎化誤差: notebooks/MATH_Generalization_multi_class_classification.ipynb
- (2024/06/16) 方策勾配法の導出: notebooks/RL_PolicyGrad_convergence_rate.ipynb
- (2024/06/18) 勾配ブースティング: notebooks/MATH_Generalization_Boosting.ipynb
- (2024/06/25) NNの汎化誤差解析: notebooks/MATH_Generalization_NN.ipynb
- (2024/07/02) 位相空間,開集合,etc: notebooks/MATH_FA_basic.ipynb
- (2024/07/10) 位相線形空間: notebooks/MATH_FA_linear_topological_space.ipynb
- (2024/07/20) (途中)近接勾配法: notebooks/OPT_smooth_proximal_gradient.ipynb
- (2024/07/28) (途中)大数の法則,中心極限定理: notebooks/PROB_low_of_large_numbers.ipynb
- (2024/08/04) (途中)一般化微分: notebooks/OPT_generalized_gradient.ipynb
- (2024/08/11) CMDPにおける混合方策のアプローチについて: notebooks/RL_CMDP_mixed_policy.ipynb
- (2024/08/15) RMDPの最適解について: notebooks/RL_robust_MDP_stationary_and_randomized.ipynb
- (2024/08/19) softmaxの方策勾配について: notebooks/RL_PolicyGrad_softmax_convergence.ipynb
- (2024/08/20) 確率的方策勾配について: notebooks/RL_PolicyGrad_baseline.ipynb
- (2024/08/22) RLのスイッチングコストについて: notebooks/RL_switching_cost_model-free.ipynb
- (2024/08/30) Puterman本のメモ: notebooks/RL_Puterman_memo.ipynb
- (2024/09/2) 行列の極限(途中): notebooks/RL_Puterman_memo.ipynb
- (2024/09/5) Ergodicな場合の平均報酬強化学習: notebooks/RL_AverageReward_model_free.ipynb
- (2024/09/6) Weakly Communicatingな場合の平均報酬強化学習: notebooks/RL_AverageReward_model_free.ipynb
- (2024/09/8) 平均報酬強化学習とバイアス: notebooks/RL_AverageReward.ipynb
- (2024/09/9) TUCRLアルゴリズムの補足: notebooks/RL_AverageReward_TUCRL.ipynb
- (2024/09/13) KL uncertainty setの実装: notebooks/RL_robust_MDP.ipynb
- (2024/10/10) RLのスイッチングコストのリグレットバウンド: notebooks/RL_switching_cost_model-free.ipynb
- (2024/10/12) 価値反復法のイテレーション複雑度: notebooks/RL_VI_vs_PI.ipynb
- (2024/10/13) モデルベースlinear MDPのリグレット解析: notebooks/RL_General_linearMDP-Model-based.ipynb
- (2024/10/14) 行列積とノルムのバウンド: notebooks/RL_General_linearMDP-Model-based.ipynb
- (2024/11/19) Occupancy basedな方策勾配法: notebooks/RL_PolicyGrad_occupancy.ipynb
- (2024/11/20) モーメントとモーメント母関数: notebooks/PROB_moment.ipynb
- (2024/11/23) 分布的ロバスト最適化: notebooks/OPT_robust_distributionally.ipynb
- (2024/11/26) 線形MDPでの切り替えコスト: notebooks/RL_switching_cost_linear-MDP.ipynb
- (2024/11/27) 切り替え回数のバウンドの証明: notebooks/RL_switching_cost_linear-MDP.ipynb
- (2024/11/28) 平均報酬の方策勾配(途中): notebooks/RLRL_AverageReward_policy-grad.ipynb
- (2023/4/07) 行列と行列式: LA_matrix_determinant.ipynb
- (2023/4/28) ゲルシュゴリンの定理: notebooks/LA_Gershgorin_circle_theorem.ipynb
- (2023/5/21) 正定値対称行列: notebooks/LA_matrix_definite.ipynb
- (2023/5/24) Schur標準形、Rayleigh商: notebooks/LA_matrix_definite.ipynb
- (2023/5/27) ベクトル空間: notebooks/LA_vector_space.ipynb
- (2023/5/28) 標準形: notebooks/LA_normal_form.ipynb
- (2023/6/01) 整数行列: notebooks/LA_integer_matrix.ipynb
- (2024/04/11) ノルム線形空間: notebooks/LA_normed_linear_space.ipynb
- (2024/07/02) 位相空間,開集合,etc: notebooks/MATH_FA_basic.ipynb
- (2024/07/10) 位相線形空間: notebooks/MATH_FA_linear_topological_space.ipynb
- (2023/1/21) 測度論的確率論の導入: notebooks/PROB_measure_theoretic_probability.ipynb
- (2023/1/26) 確率過程: notebooks/PROB_probability_process.ipynb
- (2023/1/28) ルベーグ積分: notebooks/PROB_lebesgue_integral.ipynb
- (2023/1/28) 上極限、下極限: notebooks/PROB_liminf_limsup.ipynb
- (2023/1/29) 極値分布: notebooks/PROB_extreme_value_distribution.ipynb
- (2023/1/29) 特性関数: notebooks/PROB_characteristic_function.ipynb
- (2023/1/29) 確率積分(TODO): notebooks/PROB_stochastic_integration.ipynb
- (2023/1/31) 伊藤積分 & 確率微分方程式の実験: notebooks/PROB_stochastic_integration.ipynb
- (2023/2/1) Girsanovの定理: notebooks/PROB_stochastic_integration.ipynb
- (2023/2/4) ガウス過程回帰: notebooks/PROB_gp_regression.ipynb
- (2023/3/17) 最尤推定:notebooks/PROB_maximum_likelihood.ipynb
- (2024/11/20) モーメントとモーメント母関数: notebooks/PROB_moment.ipynb
- (2023/3/20~21) 凸集合:notebooks/CVX_convex_sets.ipynb
- (2023/3/28) 凸関数:notebooks/CVX_convex_functions.ipynb
- (2023/4/01) 最小楕円問題:notebooks/CVX_minimum_volume_ellipsoids.ipynb
- (2023/4/03) 最小楕円問題のアルゴリズム:notebooks/CVX_MVEE_algorithm.ipynb
- (2023/6/13) 最適化と双対問題:notebooks/CVX_duality.ipynb
- (2023/7/05) 感度解析: notebooks/CVX_sensitivity_analysis.ipynb
- (2023/7/12) Danskinの定理: notebooks/CVX_Danskin_theorem.ipynb
- (2024/05/31) 弱凸性とモーロー包: notebooks/CVX_weakly_convex_and_Moreau_envelope.ipynb
- (2023/6/06) ロバスト最適化: notebooks/OPT_robust_optimization.ipynb
- (2023/6/15) 劣勾配法: notebooks/OPT_subgradient.ipynb
- (2023/6/16) 勾配法: notebooks/OPT_gradient.ipynb
- (2023/6/21) ロバストMDPと確率的制約 notebooks/OPT_robust_chance_constraint.ipynb
- (2023/10/29) データを利用したロバスト最適化: notebooks/OPT_robust_learning_based.ipynb
- (2024/05/23) 勾配降下法に関する定理など:notebooks/RL_useful_lemma/Optimization_standard.ipynb
- バンディット:
- (2023/1/29) バンディットアルゴリズムの基本: notebooks/BANDIT_basics.ipynb
- (2023/2/24) Self-Normalized Bound for Vector Valued Martingales: notebooks/BANDIT_General_linear_improved.ipynb
- (2023/2/8) 文脈付きバンディット: notebooks/BANDIT_General_contextual.ipynb
- (2023/2/26) 探索の理論 (UCB編): notebooks/BANDIT_UCB_regret_proof.ipynb
- (2023/4/13) 敵対的バンディット: notebooks/BANDIT_adversarial.ipynb
- (2023/4/12) バンディットの便利な関数: notebooks/BANDIT_utils.ipynb
- (2023/5/08) 敵対的線形バンディット: notebooks/BANDIT_adversarial_linear.ipynb
- (2023/5/09) Kiefer-Wolfowitzの定理: notebooks/BANDIT_Kiefer_Wolfowitz.ipynb
- (2024/01/11) バンディットとEluder dimension: notebooks/BANDIT_General_Eluder_dimension.ipynb
- Linear MDP:
- (2023/1/18) Linear MDP: notebooks/RL_General_linearMDP.ipynb.ipynb
- (2023/4/22) Linear MDPでのサンプル効率の下界: RL_LowerBound_linearMDP.ipynb
- Reward Free RL:
- (2023/2/22) Reward Free RL: notebooks/RL_reward_free.ipynb
- (2023/3/7) Task-agnostic探索の理論: notebooks/RL_reward_free_task_agnostic.ipynb
- (2023/4/09) エントロピー最大化と探索(EntGameアルゴリズム): RL_reward_free_max_ent.ipynb
- ロバストMDP:
- (2023/3/8) ロバストMDP: notebooks/RL_robust_MDP.ipynb
- (2023/3/10) ロバストMDPの理論(モデルベース&Generative model): notebooks/RL_robust_MDP.ipynb
- (2023/3/13) ロバストMDPの理論(正則化との関係): notebooks/RL_robust_MDP_and_regularization.ipynb
- (2023/4/27) ロバストQ学習: RL_robust_Q_learning
- (2023/6/08) 凸最適化としてのロバストMDP: notebooks/RL_robust_MDP_convex..ipynb
- (2023/8/07) ロバストMDPと強双対性: notebooks/RL_robust_MDP_zero_duality.ipynb
- (2023/8/24) ロバストMDPとNP困難: notebooks/RL_robust_MDP_NP_hard.ipynb
- (2024/05/04) R-contamination モデルは割引率を下げたものと一緒: notebooks/RL_robust_R_contamination.ipynb
- (2024/05/14) ロバストMDPでの自然方策勾配: notebooks/RL_robust_sa_gradient.ipynb
- (2024/05/26) ロバスト方策勾配法の収束: notebooks/RL_robust_policy_gradient.ipynb
- 制約付きMDP:
- (2023/5/24) 制約付きMDP(OptCMDP): notebooks/RL_CMDP_explore_exploit_LP.ipynb
- (2023/6/02) 双対法によるCMDPの解法: notebooks/RL_CMDP_dual.ipynb
- (2023/6/08) 制約付きMDPと強双対性: notebooks/RL_CMDP_zero_duality_gap.ipynb
- (2023/8/30) MDPにおける安全な探索: notebooks/RL_CMDP_safe_exploration.ipynb
- (2023/9/01) CMDPにおける動的計画法: notebooks/RL_CMDP_by_DP.ipynb
- (2023/9/03) CMDPにおける動的計画法(non-stationary): notebooks/RL_CMDP_by_DP_non_stationary.ipynb
- (2023/9/08) 制約付きMDPと強双対性(エントロピー): notebooks/RL_CMDP_zero_duality_gap_entropy.ipynb
- (2023/9/16) CMDPの実行可能解を見つけるのはNP困難: notebooks/RL_CMDP_feasibility_NP_hard.ipynb
- (2023/11/4) 制約違反なしのCMDP: notebooks/RL_CMDP_zero_constraint_violation.ipynb
- (2023/11/9) CMDPでの双対法のリグレット: notebooks/RL_CMDP_explore_exploit_dual.ipynb
- (2023/11/19) CMDPでの主双対法のリグレット: notebooks/RL_CMDP_explore_exploit_primal_dual.ipynb
- (2023/11/19) CMDPでのLast-iterate-convergence: notebooks/RL_CMDP_last_iterate_convergence.ipynb
- (2023/11/28) CMDPでのアルゴリズム(DOPE): notebooks/RL_CMDP_DOPE.ipynb
- (2024/01/25) Lagrange法と正則化: notebooks/RL_CMDP_Lagrange_regularization.ipynb
- (2024/03/17) Safe RLの安全性いろいろ: notebooks/RL_CMDP_general_safe_problem.ipynb
- 平均報酬強化学習:
- (2023/10/19) 平均報酬強化学習: notebooks/RL_AverageReward.ipynb
- (2023/11/3) モデルフリー平均報酬強化学習: notebooks/RL_AverageReward_model_free.ipynb
- (2024/02/21) 平均報酬とTUCRLアルゴリズム:notebook/RL_AverageReward_TUCRL.ipynb
- 連合強化学習:
- (2023/7/29) 連合Q学習: notebooks/RL_federated_Q.ipynb
- マルチステップ強化学習:
- (2023/2/5) マルチステップ強化学習: notebooks/RL_multi_step.ipynb
- (2023/2/14~17) マルチステップRLのスライド: slides/RL_multi_step.pdf
- (2023/2/19) 方策勾配法 (マルチステップRL): notebooks/RL_PolicyGrad_multi-step_experiment.ipynb
- 模倣学習:
- (2023/3/14) 模倣学習: notebooks/RL_imitation_learning.ipynb
- (2023/8/28) マルチタスク模倣学習と表現学習: notebooks/RL_multi_task_imitation_learning.ipynb
- (2023/10/18) 観測だけからの模倣学習: notebooks/RL_imitation_from_observation.ipynb
- オフライン:
- (2023/10/25) データソースが複数ある場合のオフライン強化学習: notebooks/RL_offline_perturbed_data.ipynb
- Mean-Variance MDP:
- (2023/9/28) Mean-Variance MDPとNP困難: notebooks/RL_mean_variance_MDP_NP_hard.ipynb
- 連続MDP:
- (2023/10/20) 強化学習とリプシッツ連続: notebooks/RL_Continuous_Lipschitz.ipynb
- (2023/11/1) カーネル強化学習: notebooks/RL_Continuous_Kernel.ipynb
- 一般のRL:
- (2023/2/7) MDPについて (Garnet MDP): notebooks/RL_Markov_Decision_Process.ipynb
- (2023/2/27-28) UCB-VIの理論 (モデルベース): notebooks/RL_UCB_VI_regret_proof.ipynb
- (2023/3/4) UCB-Hoeffdingの理論 (モデルフリー): notebooks/RL_UCB_H_regret_proof.ipynb
- (2023/3/5) 遷移確率の推定について: notebooks/RL_transition_estimation_proofs.ipynb
- (2023/2/10) 適合価値反復法: notebooks/RL_General_fitted_Q_iteration.ipynb
- (2023/3/12) 強化学習と線形計画問題: notebooks/RL_Convex_as_LP.ipynb
- (2023/2/12) Generalized RL: notebooks/generalied_RL.ipynb
- (2023/3/22) 強化学習のサンプル効率の下界:notebooks/RL_LowerBound_statistical_limits.ipynb
- (2023/3/29) Approximate Dynamic Programming:notebooks/RL_approximate_dynamic_programming.ipynb
- (2023/5/09) CPIのスライド: slides/RL_CPI.pdf
- (2023/5/09) CVIのスライド: slides/RL_CVI.pdf
- (2023/5/11) 制約付きMDP: notebooks/RL_CMDP.ipynb
- (2023/5/14) エピソディック有限ホライゾンRL(途中): notebooks/RL_episodic_finite_horizon.ipynb
- (2023/5/25) 有限ホライゾンでの線型計画法: notebooks/RL_Convex_as_LP_finite_horizon.ipynb
- (2023/6/09) 占有率の集合の凸性: notebooks/RL_occupancy_measure.ipynb
- (2023/8/11) 凸ではない価値関数: notebooks/RL_value_function.ipynb
- (2023/8/25) Policy searchとNP困難: notebooks/RL_Policy_search_NP_hard.ipynb
- (2023/9/29) Convex MDPについて: notebooks/RL_Convex_MDP.ipynb
- (2022/10/18) 報酬設計によるサンプル効率の向上: notebooks/RL_reward_shaping.ipynb
- (2023/10/24) RLHFの理論: notebooks/RL_RLHF.ipynb
- (2023/11/3) Bellman rank: notebooks/RL_General_Bellman_rank.ipynb
- (2023/11/3) Bilinear class: notebooks/RL_Bilinear_class.ipynb
- (2023/11/5) Witness rank: notebooks/RL_witness_rank.ipynb
- (2023/11/6) RLとFenchel Rockafellar双対性とDICE: notebooks/RL_Convex_Fenchel_Duality_and_DICE.ipynb
- (2023/11/8) RLの証明で便利な定理: notebooks/RL_useful_lemma.ipynb
- (2023/11/16) 一様PAC: notebooks/RL_uniform_PAC.ipynb
- (2023/12/01) Bilinear Classの一般化: notebooks/RL_Extended_Bilinear_class.ipynb
- (2024/01/12) RLとEluder dimension: notebooks/RL_General_Eluder_dimension.ipynb
- (2024/02/03) Decision Estimation Coefficient:notebooks/RL_General_DEC.ipynb
- (2024/02/13) 強化学習とABCクラス:notebook/RL_General_ABC_class.ipynb
- (2024/03/6) Foundation of RLの翻訳:notebooks/RL_General_Foundation_of_RL.ipynb
- (2024/03/11) Low Inherent Bellman Error: notebooks/RL_General_linear_Bellman_completeness.ipynb
- (2024/04/12) ガウス=ザイデル価値反復法: notebooks/RL_Gauss_Seidel_VI.ipynb
- (2024/05/13) Soft NPGの収束: notebooks/RL_PolicyGrad_convergence_rate.ipynb
- (2024/05/23) 方策勾配法の収束について: notebooks/RL_PolicyGrad_convergence_rate.ipynb
- (2024/06/16) 方策勾配法の導出: notebooks/RL_PolicyGrad_convergence_rate.ipynb
- マルチタスク系:
- (2023/10/3) 文脈付きMDP: notebooks/RL_multi_task_contextual_MDP.ipynb
- (2023/9/28) Domain-Randomizationの数理(途中): notebooks/RL_multi_task_domain_randomization.ipynb
- (2023/8/28) マルチタスク模倣学習と表現学習: notebooks/RL_multi_task_imitation_learning.ipynb
- (2023/10/4) マルチタスク強化学習: notebooks/RL_multi_task.ipynb
- (2023/10/4) Latent MDP: notebooks/RL_multi_task_latent_MDP.ipynb
- (2023/10/4) マルチタスクMDPにおける文脈の推定: notebooks/RL_multi_task_context_identification.ipynb
- (2023/10/18) RLにおける汎化の難しさ: notebooks/RL_multi_task_generalization_intractable.ipynb
- LQR:
- (2023/9/24) LQR(有限ホライゾン): notebooks/RL_LQR.ipynb
- (2023/9/25) LQRと半正定値計画問題: notebooks/RL_LQR_as_SDP.ipynb
- (2023/9/25) 制約付きLQR: notebooks/RL_LQR_safe.ipynb
- (2023/10/1) LQRの方策勾配法: notebooks/RL_LQR_policy_gradient.ipynb
- (2023/10/1) 線形システムにおけるSystem Level Synthesis: notebooks/RL_LQR_SLS_finite_horizon.ipynb
- (2023/10/1) LQRによる経路追従(途中で諦め): notebooks/RL_LQR_path_tracking.ipynb
- (2023/10/2) LQRにおけるダイナミクスの推定とロバストな制御: notebooks/RL_LQR_estimation_and_robustness.ipynb
- (2023/10/8) LQRの便利な関数: notebooks/RL_LQR_utils.ipynb
- (2023/10/10) Robust System Level Synthesis: notebooks/RL_LQR_robust_synthesis.ipynb
- 一般の制御:
- (2023/7/30) PACベイズ制御: notebooks/CONTROL_PAC_Bayes.ipynb
- 教育用:
- (2023/2/23) 教育用強化学習Notebook (行列形式の動的計画法編): notebooks/RL_Exercise.ipynb
- (2023/3/12) 強化学習の便利な関数: notebooks/RL_utils.ipynb
- (2023/4/12) バンディットの便利な関数: notebooks/BANDIT_utils.ipynb
- (2024/02/10) 動的計画法の入門スライド:slides/RL_DP_introduction.ipynb
- (2024/02/10) 動的計画法の入門ノートブック:notebook/RL_Exercise_DP.ipynb
- (2023/2/20~21) Transformer: notebooks/NN_transformer.ipynb
- (2023/1/30) テイラーの定理: notebooks/MATH_taylor_theorem.ipynb
- (2023/5/09) コンパクト集合: notebooks/MATH_compact_set.ipynb
- (2023/5/17) 統計的学習理論: notebooks/MATH_statistical_learning_theory.ipynb
- (2023/5/18) 仮説集合の複雑度: notebooks/MATH_complexity_of_hypothesis.ipynb
- (2023/5/29) PAC-Bayes: notebooks/MATH_PAC_Bayes.ipynb
- (2023/5/29) Stabilityと汎化誤差バウンド: notebooks/MATH_PAC_stability.ipynb
- (2023/8/15) PACベイズとメタ学習: notebooks/MATH_PAC_Bayes_Meta_Learning.ipynb
- (2023/8/16) 色々なMinimax双対定理と証明: notebooks/MATH_minimax_theorems.ipynb
- (2023/8/17) FanのMinimax定理について: notebooks/MATH_minimax_theorems.ipynb
- (2023/8/20) Minimax双対性の必要条件と十分条件について: notebooks/MATH_minimax_conditions.ipynb
- (2023/8/21) バリア関数法: notebooks/MATH_barrier_function_method.ipynb
- (2023/9/15) 整数計画問題とNP困難: notebooks/MATH_integer_programming_is_NP_hard.ipynb
- (2023/11/23) 裾確率とSubgaussian: notebooks/PROB_sub_gaussian.ipynb
- (2023/11/23) 一様な集中不等式について: notebooks/MATH_uniform_concentration_inequality.ipynb (むずくて全然読めてない.TODO)
- (2024/06/03) SVMの汎化誤差: notebooks/MATH_Generalization_margin_loss_and_SVM.ipynb
- (2024/06/18) 勾配ブースティング: notebooks/MATH_Generalization_Boosting.ipynb
- 確率収束の話
- Integral Probability Metricについて読もう:On the empirical estimation of integral probability metrics
- (2023/3/27) 強化学習とエントロピー正則化:notebooks/RL_entropy_regularization.ipynb
- (2023/3/30) Approximate Dynamic Programming(正則化あり):notebooks/RL_approximate_dynamic_programming.ipynb
- (2023/2/27-28) 探索の理論 (UCB-VI編): notebooks/BANDIT_UCB_regret_proof.ipynb
- UCBのボーナスが大きすぎるとボーナスが優先されてしまうが、こういう理論はあるのかな?
- 途中のHolderの不等式をなんで使うのかがわからん
- やっぱ$f$を$H^S$でUnion Bound取るのよくわかんないな... Hoeffdingでやるだけじゃ駄目なのか?
- これは$\widehat{P}h^k$と$\widehat{V}{h+1}^{\pi^k}$が独立ではないのが原因。$\widehat{P}_h^k$と$P_h^\star$をバウンドすると、余分な$\sqrt{S}$が出てくるよ。
- Near-optimal Regret Bounds for Reinforcement Learningの式(44)あたりが参考になるかも。ただ、Hoeffding+Sについての和を考えても出る気がする。
- (2023/3/8) ロバストMDP: notebooks/RL_robust_MDP.ipynb
- ロバストRLがちゃんと機能しているかの評価方法はどうするべき?
- (2023/3/14) 模倣学習: notebooks/RL_imitation_learning.ipynb
- $\mathbb{E}{s \sim d^{\pi^{\star}}}\left|\widehat{\pi}(\cdot \mid s)-\pi^{\star}(\cdot \mid s)\right|{T V}^2 \leq \frac{2 \log (|\Pi| / \delta)}{M}$ の証明ができなかった。Empirical Processes in M-Estimationに証明あるかも。
- エントロピー最大化逆強化学習ではエントロピーを入れることに必然性があったきがする。探しておこう。
- AggreVateのリグレットの証明
- AggreVateはもっと良い実装方法ないかな?計算コストが高すぎるかも。
- (2023/3/22) 強化学習のサンプル効率の下界:notebooks/RL_LowerBound_statistical_limits.ipynb
- 意外と行列の性質の理解が曖昧。($|\phi(s, a)|2 \leq 1$なので$\sigma{\min }\left(\mathbb{E}_{(s, a) \sim \widetilde{\mu}_h}\left[\phi(s, a) \phi(s, a)^{\top}\right]\right)$が成り立つためです)。これとか微妙。