Notes for Stanford CS229 Machine Learning

Notes for Stanford CS229 Machine Learning

· json · rss
View url →
Subscribe:

About

Notes for Stanford CS229 Machine Learning

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

What matters when training LLMs?

  • Archiecture
  • Training algorithm/loss
  • Data
  • Evaluation
  • Systems
All LLMs are neural networks. When you think about neural networks, you have to think about what architecture you're using.Training algorithm/loss is about how you actually train these models. Data is what do you train these models on. The evaluation, which is how do you know whether you're actually making progress towards the goal of LLMs. The system component, that is like how do you actually make these models run on modern hardware



This lecture will not talk too much about the Archiecture and Training algorithm/loss.


Overview of the LM

  • Pre-training
  • post-training