Gradient estimators, 20 points. All else being equal, it's useful for a gradient estimator to be unbiased. The unbiasedness of a gradient estimator...

Hey, I need help with the following machine learning questions regarding gradient

2. Gradient estimators, 20 points. All else being equal, it's useful for a gradient estimatorto be unbiased. The unbiasedness of a gradient estimator guarantees that, if we decay the stepsize and run stochastic gradient descent for long enough (see Robbins & Monroe), it will convergeto a local optimum.The standard REINFORCE, or score-function estimator is defined as:e(2.1)gSF [f] = f (b)log p(b|0),b ~ p(b|0)(a) [5 points] First, let's warm up with the score function. Prove that the score function haszero expectation, i.e. Ep(x|0)[Ve log p(x|0)] = 0. Assume that you can swap the derivativeand integral operators.(b) [5 points] Show that REINFORCE is unbiased: Ep(b/0) f (b) 50 log p(b|0)] = 30(c) [5 points] Show that REINFORCE with a fixed baseline is still unbiased, i.e. show thatSo Ep(ble) [f (b)].Ep(b10) [[f(b) - clog logp(b|0)] = 3086 Ep(b10) [f (b)] for any fixed c.(d) [5 points] If the baseline depends on b, then REINFORCE will in general give biasedgradient estimates. Give an example where Ep(b|0) [[f (b) - c(b)]5 logp(b|0)] + 56for some function c(b), and show that it is biased.( Ep(b10) [f (b)]The takeaway is that you can use a baseline to reduce the variance of REINFORCE, but notone that depends on the current action.