Sigmoid Activation Function

A sigmoid activation function has a characteristic "S"-shaped curve defined, using e (Euler's Number), by the formula:

The curve produced has a fairly gradual rise:

The derivative of the function is:

Common negative comments about the sigmoid activation function include:

  • Sigmoids can saturate and kill gradients. Gradients (change) at the tails are almost zero.

  • Sigmoid outputs are all positive values. This can bias network results. The effect can be mitigated by not using sigmoids in the final layers of a network.

References