Curiosity-driven Exploration by Self-supervised Prediction

環境獎勵稀少的強化學習問題往往會面臨學習很沒有效率的困難,其中一種改善方法是設定內部獎勵機制來引導 agent 多多探索未知的部分,進而提高成功的機會。文章提出 Intrinsic Curiosity Module(ICM)架構用以計算內部獎勵。ICM 包含兩部分:(1)Forward dynamics model:學習根據目前狀態和採取的動作預測新的 featurized state,以及(2) Inverse dynamics model:學習看先後兩個 featurized state 反推 agent 曾採取的 action,藉此以自監督式學習的方式訓練出 raw state 中與 agent 的行為真正相關的表徵。使用 featurized space 的預測誤差作為內部獎勵(而不是 raw state 的預測誤差)可以避免隨機或與 agent 無關的資訊干擾以好奇心獎勵 agent 多探索的設計。

3 min read

Meta-Critic Networks for Sample Efficient Learning

延伸強化學習中的 actor-critic 架構,用同一個 meta-critic 指導許多同類型的不同工作,目標是訓練一個可以有效指導新 few-shot 訓練問題的 meta-critic。為了讓 meta-critic 可以區別不同 task 的差異,必須把「學習軌跡(learning traces)」encode 後也作為 meta-critic 的輸入。

因此 Meta-critic 包含兩部分:(1)對應傳統的 value function 的 Meta-Value Network,以及(2)負責 encode 學習軌跡(state, action, reward 的歷史)再輸入給 MVN 的 Task-Action Encoder Network。實驗於 sine & linear function regression(監督式學習)以及 cartpole control 遊戲問題(強化學習)。

2 min read

Neural Architecture Search with RL

訓練一個 Controller(LSTM)來自動設計好的神經網路架構。Controller 負責輸出子模型架構的超參數,建立的子模型在資料上訓練後計算 validation accuracy,以此為 reward 回饋給 Controller 參考。Controller 應用強化學習的 policy gradient 方法修正自己,讓下一次產生的架構能更好(產生的子網路 validation accuracy 更高)。

1 min read

演繹與歸納

演繹(Deduction)與歸納 (Induction)都是常見的推理方法,但如何區分兩者總是覺得混淆。韋士英文字典網站有一篇文章介紹 Deduction、Induction、和 Abduction 的區別。

~1 min read

Matrix Factorization with NN

Approaching the problem of matrix factorization from a neural network point of view is quite interesting. The idea is that, when ignoring bias and activation functions, a fully-connected nework is just a series of successive linear transformations that can be expressed as successive matrix multiplications. So finding two matrices \(\mathbf{M}_1\) and \(\mathbf{M}_2\) such that \(\mathbf{M}_1 \mathbf{M}_2 = \mathbf{M} \) is analogous to training a one-hidden-layer newtork to produce a net transformation \(\mathbf{M}\). When succeeded, the two weight matrices \(\mathbf{W}_1\) and \(\mathbf{W}_2\) is our (non unique) answer.

Exploring this as an exercise actually led me to another interesting question:

What is the difference between (1) a matrix that is randomly generated, and (2) a matrix that is the product of two matrices that are randomly generated?

~1 min read

Hosting Jekyll Site on Github-Page

On Auguts 6 2018, I come back to the idea of using Jekyll to setup a site to host some of my notes. Both jekyll and the so-simple-theme have evolved since I first used them on March 2017. I therefore decide to build the site again with the updated theme.

1 min read