I am working on a Sparse Autoencoder but Andrew NG's notes are hard to understand.
My question is about the following equation: Loss Function.
In sparse autoencoder, the goal is to inactive some nodes to obtain low-level features or more precise features. Now if you check Andrew NG's Notes on sparse autoencoder (available on Google Scholar easily and the loss function image is also attached) there is a sparsity term added to the loss function. My question is How the sparsity term inactives the hidden units because whatever the value of KL divergence (it is basically a sparsity term) is, it still adds up in the actual loss and makes the loss value bigger this way we are not reducing the loss but we are increasing it.
I am expecting an answer which is a bit clear and easy to understand.