Quartz sync: Dec 26, 2023, 10:40 AM

ps4vs · Dec 26, 2023 · 4381a54 · 4381a54
1 parent cb3575c
commit 4381a54
Show file tree

Hide file tree

Showing 7 changed files with 70 additions and 6 deletions.
diff --git a/Untitled.md b/Untitled.md
diff --git a/content/notes/RL Alpha.md b/content/notes/RL Alpha.md
@@ -0,0 +1,37 @@
+---
+title: "Reinforcement Learning: Alpha"
+tags:
+  - seed
+enableToc: false
+---
+### Introduction
+Reinforcement Learning (RL) is a computational approach uniquely focused on goal-directed agents interacting with an uncertain environment, ie, more towards AGI. It is **"learning from interaction"** featuring "trail and error" and "delayed reward".
+
+The four main subelements of RL are the Policy, Reward Signal, Value Function, and Model:
+- **Policy**: The agent's strategy of action at a given time.
+- **Reward Signal**: An indicator of immediate success.
+- **Value Function**: A measure of long-term success.
+- **Model**: The agent's internal representation of the environment, aiding in planning and predicting future outcomes.
+### **Flip!!!** 
+According to [Reinforcement Learning, Fast and Slow](https://www.cell.com/action/showPdf?pii=S1364-6613%2819%2930061-0), Machine Learning methods are ***sample inefficient*** for given reasons and factors
+1. **Incremental parameter adjustment**
+	- The adjustments made during learning must be small, in order to maximize generalization and avoid overwriting the effects of earlier learning
+2. **Weak inductive bias**
+	- According to learning theory every learning procedure necessarily faces bias-variance tradeoff. 
+	- Generic neural networks are extremely low-bias learning systems, ie, they will be able to master a wider range of patterns (higher variance) but will in general be less sample-efficient.
+Furthermore, RL's ***sample inefficiency*** is exacerbated as the agent must self-determine labels (i.e., actions leading to higher rewards) through environmental interaction, unlike in supervised learning where labeled data is readily available.
+
+### Tic Tac Toe Case Study
+In a practical application, I used RL to train an agent to play Tic-Tac-Toe using the Temporal Difference Method. [The code for RL agent learning to play Tic-Tac-Toe using Temporal Difference Method.](https://github.com/ps4vs/Deep-RL/blob/main/Chapter-1/TicTacToe.ipynb)
+
+Key insights include:
+1. **Adaptive Strategy**: The agent learns to counter repetitive strategies, such as blocking repeated moves.
+2. **Learning and Unlearning**: The agent's learning is dynamic, influenced by the nature of temporal updates (positive or negative).
+3. **Challenge with Random Play**: Learning efficiency drops significantly against a randomly moving opponent due to limited informative feedback.
+4. **Learning Through Self-Play**: When the RL agent plays Tic-Tac-Toe against itself, it experiences a unique learning environment. Since both players (first and second) are the same agent, they update value functions of different sets of states based on their position in the game.
+
+
+
+
+
+
diff --git a/content/notes/Reinforcement Learning.md b/content/notes/Reinforcement Learning.md
@@ -0,0 +1,16 @@
+---
+title: Reinforcement Learning
+tags:
+  - seed
+enableToc: false
+---
+These are some notes/projects I made related to Reinforcement Learning
+## Blogs
+- [[RL Alpha |Reinforcement Learning: Alpha]]
+
+## Resources:
+- [Deep RL Hugging Face](https://huggingface.co/learn/deep-rl-course)
+- [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G](http://incompleteideas.net/book/RLbook2020.pdf)
+- [Foundations of Deep RL Series, by Pieter Abbeel](https://youtu.be/Psrhxy88zww)
+
+
diff --git a/content/notes/anySTOCKnow.md → content/notes/any.STOCK.now.md b/content/notes/anySTOCKnow.md → content/notes/any.STOCK.now.md
@@ -1,5 +1,5 @@
 ---
-title: anySTOCKnow
+title: any.STOCK.now
 tags:
   - seed
 enableToc: false

diff --git a/content/notes/epitomes.md b/content/notes/epitomes.md
@@ -4,20 +4,31 @@ tags:
   - seed
 ---
 some of the notes, blogs, projects that I look up to.
+### AI maniacs
+- [neural-link project foundation](https://projectfoundation.notion.site/projectfoundation/Project-69-cd70c0c75c00430fb2441222f579eacc)
+- https://ayushtues.github.io/
 ### Blogs
 - [lilianweng](https://lilianweng.github.io/)
-
+- https://ayushtues.medium.com/
 ### Notes
 - [ishan.coffee](https://www.ishan.coffee/)
 - [param.codes](https://notes.param.codes/)
 - [divyanshgarg](https://divyanshgarg.com/)
 
-Talks
+### Talks
 - [Transformers United](https://www.youtube.com/watch?v=ylEk1TE1uBo)
 
-SE
+### SE
 - https://medium.com/@narenarya 
 
+### Websites
+- https://zhuanzhi.ai/topic/2001320766352755
+- https://www.zhihu.com/people/dilab-46
+- https://note.com/npaka
+
+### Repositories
+- [OpenDILab - RL, MultiModality, AGI, Diffusion](https://github.com/opendilab)
+
 
 
 

diff --git a/content/notes/mymains.md b/content/notes/mymains.md
@@ -4,7 +4,7 @@ tags:
   - evergreen
 ---
 
-
+[neurallink notes](https://projectfoundation.notion.site/projectfoundation/Project-69-cd70c0c75c00430fb2441222f579eacc)
 
 
 

diff --git a/content/notes/web development.md b/content/notes/web development.md
@@ -6,4 +6,4 @@ enableToc: false
 ---
 These are some notes/projects I made while hacking around web development asynchronously and on an as-needed basis.
 
-[[anySTOCKnow| Blog on node.js, express.js and express-handlebars.js ]]
+[[any.STOCK.now| Blog on node.js, express.js and express-handlebars.js ]]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -4,7 +4,7 @@ tags:
		- evergreen
		---


		[neurallink notes](https://projectfoundation.notion.site/projectfoundation/Project-69-cd70c0c75c00430fb2441222f579eacc)



Expand Down