Which method uses the entire training dataset to calculate the gradient for each update, ensuring a smooth path to the optimum but requiring significant memory?

Prepare for the GARP Risk and AI (RAI) Exam with targeted quizzes. Utilize flashcards, multiple-choice questions, and detailed explanations to enhance learning. Ace your exam with our comprehensive quiz!

Multiple Choice

Which method uses the entire training dataset to calculate the gradient for each update, ensuring a smooth path to the optimum but requiring significant memory?

Explanation:
The process described focuses on how the gradient is computed during training. Batch gradient descent uses every training example to calculate the gradient before each update, so each step follows the exact slope of the total loss. This yields a very smooth, steady path toward the minimum because the update direction reflects the full dataset, but it comes at a high memory cost since all data must be available to compute the gradient each time. In contrast, stochastic gradient descent estimates the gradient from a single random example, giving fast updates and low memory usage but introducing noisy, zigzagging progress. Mini-batch gradient descent sits between, using a small subset of data to estimate the gradient, balancing memory usage with smoother updates than the purely stochastic approach. Dynamic Learning Rate is about adjusting how big each update step is over time, not about how the gradient is computed from the data, so it doesn’t describe the gradient calculation method.

The process described focuses on how the gradient is computed during training. Batch gradient descent uses every training example to calculate the gradient before each update, so each step follows the exact slope of the total loss. This yields a very smooth, steady path toward the minimum because the update direction reflects the full dataset, but it comes at a high memory cost since all data must be available to compute the gradient each time.

In contrast, stochastic gradient descent estimates the gradient from a single random example, giving fast updates and low memory usage but introducing noisy, zigzagging progress. Mini-batch gradient descent sits between, using a small subset of data to estimate the gradient, balancing memory usage with smoother updates than the purely stochastic approach.

Dynamic Learning Rate is about adjusting how big each update step is over time, not about how the gradient is computed from the data, so it doesn’t describe the gradient calculation method.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy