This seems to be a very confusing subject for most, and I’ve had difficulty while learning how to setup Keras NN models as the addition/subtraction of layers and neurons creates vastly different outcomes in model. Normally I wouldn’t just link out to others, but there is a very well written synopsis found on StackExchange below that lays it out in a very simple fashion. Very brief summary:

  1. Input (first) layer: Neurons = Number of features in the dataset
  2. Hidden layer(s): Neurons = Somewhere between 1 and the amount in the input later (take the mean); Number of hidden layers: 1 works for *most* applications, maybe none.
  3. Output (last) layer: exactly 1 unless it’s a classification problem and you utilize the softmax activation, in which case the number equals the number of classes you are predicting

https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

Meaning in the case of a dataset with 20 features:

#Example Keras Binary Classification model
model = Sequential()

model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='adam')

#Example Keras Multi-Class model
model = Sequential()

model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation='softmax')) #If I had 3 classes

model.compile(loss='categorical_crossentropy',optimizer='adam')