The Functional API article explains the ideas: layers as functions you call on tensors, models with several inputs or outputs, shared weights, embeddings as lookup tables, and stitching heads together. This workbook turns those ideas into ten small models you can actually train. The running theme is a league where we predict the score margin between two competitors, but every example generates its own data with NumPy so nothing external is required. Run each one, read the shapes it prints, then change a layer and run it again.
Main article is here: https://datalad.co.uk/advanced-neural-networks-in-keras-the-functional-api/
1. The functional pattern: Input, layer, Model
The Functional API has one core move: create an Input, call layers on the tensor that comes before, then wrap the start and end into a Model. Unlike a Sequential model, you hold a reference to every tensor, which is what makes branching and merging possible.
from tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import Model# A symbolic placeholder describing one sample: 4 featuresinputs = Input(shape=(4,))# Call each layer on the previous tensor to connect themhidden = Dense(8, activation="relu")(inputs)outputs = Dense(1)(hidden)# Wrap the graph from inputs to outputs into a trainable modelmodel = Model(inputs=inputs, outputs=outputs)model.summary()
The summary() prints the layer stack and the parameter count. Notice that calling Dense(8)(inputs) does two things at once: it creates the layer and connects it to inputs.
2. Compile, fit, and predict
A model is just a graph until you compile it with an optimizer and a loss, then fit it to data. Here we make a simple linear target so the network has something real to learn.
import numpy as npfrom tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import ModelX = np.random.rand(500, 4)y = X @ np.array([1.5, -2.0, 0.5, 3.0]) # a known linear relationshipinputs = Input(shape=(4,))hidden = Dense(16, activation="relu")(inputs)outputs = Dense(1)(hidden)model = Model(inputs, outputs)model.compile(optimizer="adam", loss="mean_absolute_error")model.fit(X, y, epochs=5, batch_size=32, verbose=1)print(model.predict(X[:3])) # predictions for the first three samples
Watch the loss fall across the five epochs. mean_absolute_error is the natural choice for a regression target like a score margin.
3. Two inputs joined with Concatenate
Real problems often have inputs of different shapes, such as a block of numeric stats plus a single flag for home advantage. Concatenate glues them into one tensor that the rest of the network can process.
import numpy as npfrom tensorflow.keras.layers import Input, Dense, Concatenatefrom tensorflow.keras.models import Modelstats_in = Input(shape=(3,), name="stats") # three numeric featuresflag_in = Input(shape=(1,), name="home_flag") # one binary flagmerged = Concatenate()([stats_in, flag_in]) # combine the two inputshidden = Dense(8, activation="relu")(merged)out = Dense(1)(hidden)model = Model(inputs=[stats_in, flag_in], outputs=out)model.compile(optimizer="adam", loss="mean_absolute_error")stats = np.random.rand(200, 3)flags = np.random.randint(0, 2, size=(200, 1))y = stats.sum(axis=1, keepdims=True) + flags * 2.0 # home flag adds to the marginmodel.fit([stats, flags], y, epochs=3, verbose=1)
A multi-input model takes a list for both inputs= and the data you pass to fit. The order of the list must match the order of the Input layers.
4. A shared layer applied to both sides
When two inputs are symmetric, such as two competitors rated on the same scale, you want the same weights to score both of them. Define the layer once, then call it twice. Subtract then gives the difference between the two scores.
from tensorflow.keras.layers import Input, Dense, Subtractfrom tensorflow.keras.models import Model# One transformation, reused for both competitorsrate_strength = Dense(1, use_bias=False, name="strength")comp_a = Input(shape=(5,), name="competitor_a")comp_b = Input(shape=(5,), name="competitor_b")strength_a = rate_strength(comp_a) # same weights here...strength_b = rate_strength(comp_b) # ...and reused heremargin = Subtract()([strength_a, strength_b]) # difference in strengthmodel = Model(inputs=[comp_a, comp_b], outputs=margin)model.compile(optimizer="adam", loss="mean_absolute_error")model.summary()
Look at the parameter count: there is only one strength layer, so the weights for competitor A and competitor B are identical by construction. That is weight sharing, and it is hard to express without the Functional API.
5. Embeddings as a learnable lookup table
An Embedding turns an integer id into a learnable vector. Give it the number of possible ids and the size of each vector, and it behaves like a table the network tunes during training. Flatten drops the extra sequence dimension so the result is a plain row per sample.
import numpy as npfrom tensorflow.keras.layers import Input, Embedding, Flattenfrom tensorflow.keras.models import Modeln_teams = 50team_in = Input(shape=(1,), name="team_id")# Map each of the 50 team ids to a single learnable strength valuelookup = Embedding(input_dim=n_teams, output_dim=1, name="strength_lookup")(team_in)flat = Flatten()(lookup) # shape (None, 1, 1) -> (None, 1)model = Model(team_in, flat)ids = np.random.randint(0, n_teams, size=(10, 1))print(model.predict(ids).shape) # (10, 1)
Embeddings are how you feed categorical things like team ids, product ids, or user ids into a network without one-hot encoding them. The output_dim is how many numbers describe each category.
6. The strength model: a shared embedding for two teams
Now combine the last two ideas. A single embedding, reused for both teams, learns one strength value per team, and Subtract predicts the margin from the difference. This is a complete, trainable model in a handful of lines.
import numpy as npfrom tensorflow.keras.layers import Input, Embedding, Flatten, Subtractfrom tensorflow.keras.models import Modeln_teams = 50# One embedding reused for both teams = a shared strength tablestrength = Embedding(input_dim=n_teams, output_dim=1, name="strength")team_a = Input(shape=(1,), name="team_a")team_b = Input(shape=(1,), name="team_b")a_strength = Flatten()(strength(team_a))b_strength = Flatten()(strength(team_b))predicted_margin = Subtract()([a_strength, b_strength])model = Model([team_a, team_b], predicted_margin)model.compile(optimizer="adam", loss="mean_absolute_error")a = np.random.randint(0, n_teams, (1000, 1))b = np.random.randint(0, n_teams, (1000, 1))margins = np.random.randn(1000, 1) * 5model.fit([a, b], margins, epochs=3, batch_size=64, verbose=1)
After training you could read the embedding weights directly to get a ranked strength for every team. The model has effectively learned a rating system from match results alone.
7. Two outputs, two losses
A model can produce several outputs, each with its own loss. Here one body feeds two heads: a regression head for the margin and a classification head for the win probability. Pass a list of outputs and a matching list of losses.
import numpy as npfrom tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import Modelinputs = Input(shape=(6,))body = Dense(16, activation="relu")(inputs)margin_out = Dense(1, name="margin")(body) # regression headwin_out = Dense(1, activation="sigmoid", name="win")(body) # classification headmodel = Model(inputs=inputs, outputs=[margin_out, win_out])model.compile(optimizer="adam", loss=["mean_absolute_error", "binary_crossentropy"]) # one loss per headX = np.random.rand(300, 6)y_margin = np.random.randn(300, 1) * 5y_win = (y_margin > 0).astype("float32")model.fit(X, [y_margin, y_win], epochs=3, verbose=1)
The regression head uses a linear activation and mean absolute error, while the classification head uses a sigmoid and binary crossentropy. Keras trains both at once and reports each loss separately.
8. Chained heads: classification built on the regression output
Outputs can feed each other. If a positive margin means a win, the win probability can be derived from the predicted margin rather than from the body directly. You simply call the second head on the first head’s tensor.
from tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import Modelinputs = Input(shape=(6,))body = Dense(16, activation="relu")(inputs)margin_out = Dense(1, name="margin")(body) # predict the margin firstwin_out = Dense(1, activation="sigmoid", name="win")(margin_out) # then map margin to a probabilitymodel = Model(inputs=inputs, outputs=[margin_out, win_out])model.compile(optimizer="adam", loss=["mean_absolute_error", "binary_crossentropy"])model.summary()
The win head has very few parameters because it only takes a single number, the predicted margin, as its input. This is the in-graph version of stacking one model on top of another, and the Functional API makes the dependency explicit.
9. Inspect the graph with summary and plot_model
Once a model branches and merges, a picture helps. summary() prints the structure as text, and plot_model saves a diagram showing how the tensors flow and where they join.
from tensorflow.keras.layers import Input, Dense, Concatenatefrom tensorflow.keras.models import Modelfrom tensorflow.keras.utils import plot_modela = Input(shape=(3,), name="a")b = Input(shape=(2,), name="b")merged = Concatenate()([a, b])out = Dense(1)(merged)model = Model([a, b], out)model.summary() # text descriptionplot_model(model, to_file="model.png", show_shapes=True) # saves a diagram
plot_model needs the pydot package and Graphviz installed (pip install pydot plus a system Graphviz). With show_shapes=True the diagram labels each arrow with its tensor shape, which is the fastest way to catch a mismatch.
10. Save the whole model and load it back
A trained model should be reusable without redefining its architecture. model.save writes the structure, weights, and optimizer state to one file, and load_model brings it all back.
import numpy as npfrom tensorflow.keras.layers import Input, Densefrom tensorflow.keras.models import Model, load_modelinputs = Input(shape=(4,))out = Dense(1)(Dense(8, activation="relu")(inputs))model = Model(inputs, out)model.compile(optimizer="adam", loss="mean_absolute_error")model.fit(np.random.rand(100, 4), np.random.rand(100, 1), epochs=1, verbose=0)# Save architecture + weights + optimizer state in a single filemodel.save("strength_model.keras")# Reload later with no need to rebuild the layersrestored = load_model("strength_model.keras")print(restored.predict(np.random.rand(2, 4)))
The .keras extension is the current recommended format. The reloaded model is ready to predict immediately, and because the optimizer state is saved too, you can even resume training from where you left off.
Work through these in order and you will have built every pattern from the main article: shared layers, embeddings, multiple inputs, multiple outputs, and chained heads. The Functional API rewards experimentation, so once a model runs, try adding a head, sharing a layer you were not sharing, or widening an embedding, and watch how the summary and the loss respond. That is how the API stops feeling abstract and starts feeling like a set of building blocks.
See you soon.
[…] Advanced Neural Networks in Keras: 10 Code-Along Examples […]