0
$\begingroup$

I am working on a Sparse Autoencoder but Andrew NG's notes are hard to understand. My question is about the following equation: Loss Function. image

In sparse autoencoder, the goal is to inactive some nodes to obtain low-level features or more precise features. Now if you check Andrew NG's Notes on sparse autoencoder (available on Google Scholar easily and the loss function image is also attached) there is a sparsity term added to the loss function. My question is How the sparsity term inactives the hidden units because whatever the value of KL divergence (it is basically a sparsity term) is, it still adds up in the actual loss and makes the loss value bigger this way we are not reducing the loss but we are increasing it.

I am expecting an answer which is a bit clear and easy to understand.

$\endgroup$
2
  • $\begingroup$ I can't understand what you are asking. Please edit your question to provide more background and to state the question more clearly. What "sparsity term" are you referring to? What do you mean by "inactives the hidden units" and why do you think it does that? What KL divergence are you referring to? Some context seems missing. $\endgroup$ Commented Jan 3, 2023 at 20:01
  • $\begingroup$ please check again. $\endgroup$ Commented Jan 4, 2023 at 6:35

1 Answer 1

0
$\begingroup$

I suggest that you think about this differently. An autoencoder maps the input to a "representation". The representation is a vector that contains the output of the hidden units. (It is also sometimes called an "embedding".) A vector is said to be "sparse" if most of its entries are zero. The goal of a sparse autoencoder is to map the input to a representation that is a sparse vector.

This loss function encourages the network to compute a sparse representation, because of the penalty term in the loss function. If the representation is sparse, the penalty will be small. If the representation is not sparse, the penalty will be large. When we train a neural network, training tries to find weights that will make the loss function small. Thus, training will try to find weights that make the penalty small. In other words, training will try to find weights that makes the representation sparse. The more sparse the representation is, the lower the loss will be, and so stochastic gradient descent will try to drive the neural network weights towards values that make the loss lower, i.e., make the representation sparse.

See also https://en.wikipedia.org/wiki/Penalty_method for the use of penalty terms in loss functions.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.