Which parameter changes the shape of the probability distribution over next tokens?

Prepare for the GARP Risk and AI (RAI) Exam with targeted quizzes. Utilize flashcards, multiple-choice questions, and detailed explanations to enhance learning. Ace your exam with our comprehensive quiz!

Multiple Choice

Which parameter changes the shape of the probability distribution over next tokens?

Explanation:
Temperature controls how sharp or flat the probabilities for the next token are. It does this by scaling the logits before applying softmax. Mathematically, the probability of each token is proportional to exp(logit_i / T). When T is small, the differences between logits are amplified, so the top tokens grab most of the probability mass and the distribution becomes very peaked. When T is large, the differences are damped, spreading probability more evenly across tokens. So temperature directly shapes the distribution over next tokens. Top-K sampling and Top-P (Nucleus) sampling change the sampling process by restricting which tokens can be chosen or how probability mass is allocated, effectively altering the distribution you sample from, but not by scaling the logits themselves. Statelessness doesn’t affect the distribution shape at all; it’s about whether the model carries context across inputs.

Temperature controls how sharp or flat the probabilities for the next token are. It does this by scaling the logits before applying softmax. Mathematically, the probability of each token is proportional to exp(logit_i / T). When T is small, the differences between logits are amplified, so the top tokens grab most of the probability mass and the distribution becomes very peaked. When T is large, the differences are damped, spreading probability more evenly across tokens. So temperature directly shapes the distribution over next tokens.

Top-K sampling and Top-P (Nucleus) sampling change the sampling process by restricting which tokens can be chosen or how probability mass is allocated, effectively altering the distribution you sample from, but not by scaling the logits themselves. Statelessness doesn’t affect the distribution shape at all; it’s about whether the model carries context across inputs.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy