The idea of providing a static instance of OptimizerGD (which is just a singleton in disguise) looks flawed, since optimizers are stateful. Assume one wants one OptimizerGD object "without Nesterov", and a second one "with Nesterov" - in the current design, this would not be possible, since there would always be only one OptimizerGD object with either nesterov set to true or to false.
I don't want the user to be able to [...] instantiate OptimizerGD
But why? I think if I would have to implement this, I would actually want the user to do exactly this - create individual Optimizer objects with individual settings. That would solve two of your three issues just by removing overcomplicated stuff.
For the problem of mixing inheritance with fluent interface, I found this short article. It shows how to utilize generics for solving the problem. Applied to your case, this results in something along the lines of
public class Optimizer<T extends Optimizer<?>> {
protected final T self;
protected Optimizer(final Class<T> selfClass) {
this.self = selfClass.cast(this);
}
public T withLearningRate(double learningRate) {
this.learningRate = learningRate;
return self;
}
}