Okay, I found it myself. Please check the answer to ensure I understand it.
optimizer state is for momentum, variance, they store all the information for updating parameters.
while there are also some intermediate during the computation. whenever updated, they would be destroyed.