Advanced Neural Networks in Keras: 10 Code-Along Examples

Learn the Keras Functional API by running it. Ten copy-and-run Python examples covering multiple inputs and outputs, shared layers, embeddings, concatenation, chained regression and classification heads, model plotting, and saving and loading.

The Functional API article explains the ideas: layers as functions you call on tensors, models with several inputs or outputs, shared weights, embeddings as lookup tables, and stitching heads together. This workbook turns those ideas into ten small models you can actually train. The running theme is a league where we predict the score margin between two competitors, but every example generates its own data with NumPy so nothing external is required. Run each one, read the shapes it prints, then change a layer and run it again.

Main article is here: https://datalad.co.uk/advanced-neural-networks-in-keras-the-functional-api/

1. The functional pattern: Input, layer, Model

The Functional API has one core move: create an Input, call layers on the tensor that comes before, then wrap the start and end into a Model. Unlike a Sequential model, you hold a reference to every tensor, which is what makes branching and merging possible.

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# A symbolic placeholder describing one sample: 4 features
inputs = Input(shape=(4,))
# Call each layer on the previous tensor to connect them
hidden = Dense(8, activation="relu")(inputs)
outputs = Dense(1)(hidden)
# Wrap the graph from inputs to outputs into a trainable model
model = Model(inputs=inputs, outputs=outputs)
model.summary()

The summary() prints the layer stack and the parameter count. Notice that calling Dense(8)(inputs) does two things at once: it creates the layer and connects it to inputs.

2. Compile, fit, and predict

A model is just a graph until you compile it with an optimizer and a loss, then fit it to data. Here we make a simple linear target so the network has something real to learn.

import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
X = np.random.rand(500, 4)
y = X @ np.array([1.5, -2.0, 0.5, 3.0]) # a known linear relationship
inputs = Input(shape=(4,))
hidden = Dense(16, activation="relu")(inputs)
outputs = Dense(1)(hidden)
model = Model(inputs, outputs)
model.compile(optimizer="adam", loss="mean_absolute_error")
model.fit(X, y, epochs=5, batch_size=32, verbose=1)
print(model.predict(X[:3])) # predictions for the first three samples

Watch the loss fall across the five epochs. mean_absolute_error is the natural choice for a regression target like a score margin.

3. Two inputs joined with Concatenate

Real problems often have inputs of different shapes, such as a block of numeric stats plus a single flag for home advantage. Concatenate glues them into one tensor that the rest of the network can process.

import numpy as np
from tensorflow.keras.layers import Input, Dense, Concatenate
from tensorflow.keras.models import Model
stats_in = Input(shape=(3,), name="stats") # three numeric features
flag_in = Input(shape=(1,), name="home_flag") # one binary flag
merged = Concatenate()([stats_in, flag_in]) # combine the two inputs
hidden = Dense(8, activation="relu")(merged)
out = Dense(1)(hidden)
model = Model(inputs=[stats_in, flag_in], outputs=out)
model.compile(optimizer="adam", loss="mean_absolute_error")
stats = np.random.rand(200, 3)
flags = np.random.randint(0, 2, size=(200, 1))
y = stats.sum(axis=1, keepdims=True) + flags * 2.0 # home flag adds to the margin
model.fit([stats, flags], y, epochs=3, verbose=1)

A multi-input model takes a list for both inputs= and the data you pass to fit. The order of the list must match the order of the Input layers.

4. A shared layer applied to both sides

When two inputs are symmetric, such as two competitors rated on the same scale, you want the same weights to score both of them. Define the layer once, then call it twice. Subtract then gives the difference between the two scores.

from tensorflow.keras.layers import Input, Dense, Subtract
from tensorflow.keras.models import Model
# One transformation, reused for both competitors
rate_strength = Dense(1, use_bias=False, name="strength")
comp_a = Input(shape=(5,), name="competitor_a")
comp_b = Input(shape=(5,), name="competitor_b")
strength_a = rate_strength(comp_a) # same weights here...
strength_b = rate_strength(comp_b) # ...and reused here
margin = Subtract()([strength_a, strength_b]) # difference in strength
model = Model(inputs=[comp_a, comp_b], outputs=margin)
model.compile(optimizer="adam", loss="mean_absolute_error")
model.summary()

Look at the parameter count: there is only one strength layer, so the weights for competitor A and competitor B are identical by construction. That is weight sharing, and it is hard to express without the Functional API.

5. Embeddings as a learnable lookup table

An Embedding turns an integer id into a learnable vector. Give it the number of possible ids and the size of each vector, and it behaves like a table the network tunes during training. Flatten drops the extra sequence dimension so the result is a plain row per sample.

import numpy as np
from tensorflow.keras.layers import Input, Embedding, Flatten
from tensorflow.keras.models import Model
n_teams = 50
team_in = Input(shape=(1,), name="team_id")
# Map each of the 50 team ids to a single learnable strength value
lookup = Embedding(input_dim=n_teams, output_dim=1, name="strength_lookup")(team_in)
flat = Flatten()(lookup) # shape (None, 1, 1) -> (None, 1)
model = Model(team_in, flat)
ids = np.random.randint(0, n_teams, size=(10, 1))
print(model.predict(ids).shape) # (10, 1)

Embeddings are how you feed categorical things like team ids, product ids, or user ids into a network without one-hot encoding them. The output_dim is how many numbers describe each category.

6. The strength model: a shared embedding for two teams

Now combine the last two ideas. A single embedding, reused for both teams, learns one strength value per team, and Subtract predicts the margin from the difference. This is a complete, trainable model in a handful of lines.

import numpy as np
from tensorflow.keras.layers import Input, Embedding, Flatten, Subtract
from tensorflow.keras.models import Model
n_teams = 50
# One embedding reused for both teams = a shared strength table
strength = Embedding(input_dim=n_teams, output_dim=1, name="strength")
team_a = Input(shape=(1,), name="team_a")
team_b = Input(shape=(1,), name="team_b")
a_strength = Flatten()(strength(team_a))
b_strength = Flatten()(strength(team_b))
predicted_margin = Subtract()([a_strength, b_strength])
model = Model([team_a, team_b], predicted_margin)
model.compile(optimizer="adam", loss="mean_absolute_error")
a = np.random.randint(0, n_teams, (1000, 1))
b = np.random.randint(0, n_teams, (1000, 1))
margins = np.random.randn(1000, 1) * 5
model.fit([a, b], margins, epochs=3, batch_size=64, verbose=1)

After training you could read the embedding weights directly to get a ranked strength for every team. The model has effectively learned a rating system from match results alone.

7. Two outputs, two losses

A model can produce several outputs, each with its own loss. Here one body feeds two heads: a regression head for the margin and a classification head for the win probability. Pass a list of outputs and a matching list of losses.

import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
inputs = Input(shape=(6,))
body = Dense(16, activation="relu")(inputs)
margin_out = Dense(1, name="margin")(body) # regression head
win_out = Dense(1, activation="sigmoid", name="win")(body) # classification head
model = Model(inputs=inputs, outputs=[margin_out, win_out])
model.compile(optimizer="adam",
loss=["mean_absolute_error", "binary_crossentropy"]) # one loss per head
X = np.random.rand(300, 6)
y_margin = np.random.randn(300, 1) * 5
y_win = (y_margin > 0).astype("float32")
model.fit(X, [y_margin, y_win], epochs=3, verbose=1)

The regression head uses a linear activation and mean absolute error, while the classification head uses a sigmoid and binary crossentropy. Keras trains both at once and reports each loss separately.

8. Chained heads: classification built on the regression output

Outputs can feed each other. If a positive margin means a win, the win probability can be derived from the predicted margin rather than from the body directly. You simply call the second head on the first head’s tensor.

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
inputs = Input(shape=(6,))
body = Dense(16, activation="relu")(inputs)
margin_out = Dense(1, name="margin")(body) # predict the margin first
win_out = Dense(1, activation="sigmoid", name="win")(margin_out) # then map margin to a probability
model = Model(inputs=inputs, outputs=[margin_out, win_out])
model.compile(optimizer="adam",
loss=["mean_absolute_error", "binary_crossentropy"])
model.summary()

The win head has very few parameters because it only takes a single number, the predicted margin, as its input. This is the in-graph version of stacking one model on top of another, and the Functional API makes the dependency explicit.

9. Inspect the graph with summary and plot_model

Once a model branches and merges, a picture helps. summary() prints the structure as text, and plot_model saves a diagram showing how the tensors flow and where they join.

from tensorflow.keras.layers import Input, Dense, Concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.utils import plot_model
a = Input(shape=(3,), name="a")
b = Input(shape=(2,), name="b")
merged = Concatenate()([a, b])
out = Dense(1)(merged)
model = Model([a, b], out)
model.summary() # text description
plot_model(model, to_file="model.png", show_shapes=True) # saves a diagram

plot_model needs the pydot package and Graphviz installed (pip install pydot plus a system Graphviz). With show_shapes=True the diagram labels each arrow with its tensor shape, which is the fastest way to catch a mismatch.

10. Save the whole model and load it back

A trained model should be reusable without redefining its architecture. model.save writes the structure, weights, and optimizer state to one file, and load_model brings it all back.

import numpy as np
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model, load_model
inputs = Input(shape=(4,))
out = Dense(1)(Dense(8, activation="relu")(inputs))
model = Model(inputs, out)
model.compile(optimizer="adam", loss="mean_absolute_error")
model.fit(np.random.rand(100, 4), np.random.rand(100, 1), epochs=1, verbose=0)
# Save architecture + weights + optimizer state in a single file
model.save("strength_model.keras")
# Reload later with no need to rebuild the layers
restored = load_model("strength_model.keras")
print(restored.predict(np.random.rand(2, 4)))

The .keras extension is the current recommended format. The reloaded model is ready to predict immediately, and because the optimizer state is saved too, you can even resume training from where you left off.

Work through these in order and you will have built every pattern from the main article: shared layers, embeddings, multiple inputs, multiple outputs, and chained heads. The Functional API rewards experimentation, so once a model runs, try adding a head, sharing a layer you were not sharing, or widening an embedding, and watch how the summary and the loss respond. That is how the API stops feeling abstract and starts feeling like a set of building blocks.

See you soon.

View Comments (1)

Leave a Reply

Prev

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading