What to read in Haykin's first edition
1. Introduction
1.1 What Is a Neural Network?
Important
1.2 Structural Levels of Organization in the Brain
Browse
1.3 Models of a Neuron
Important
1.4 Neural Networks Viewed as Directed Graphs
Browse
1.5 Feedback
Browse
1.6 Network Architectures
Important (but the Lattice network category is redundant)
1.7 Knowledge Representation
Defines four general rules of knowledge representation, which should
be followed independent of application or ANN architecture.
The rules are important. The rest of the section can be browsed.
1.8 - 1.10
Browse
2. Learning Process
The first time you read this chapter, browse through sections
2.1-2.10 on a conceptual level (sections 2.11-2.13 can be
skipped). When you're through, you should be able to give a general
answer to the question "What is X?", where X is any section heading,
e.g. X="The Credit-Assignment Problem". The details can be skipped at
this time.
Later you should re-read section 2.8 (important), since it is the
only place in this edition of the book where reinforcement learning is
addressed. You may, if you wish, skip the details on Adaptive
Heuristic Critic though.
3. Correlation Matrix Memory
Skip the whole chapter at this time. You may want to return here,
though, if you get stuck with the matrix algebra used later in the
book.
4. The Perceptron
4.1 Introduction
Read
4.2 Basic Considerations
Important
4.3 The Perceptron Convergence Theorem
Important, but you may browse or skip the proof.
4.4 Performance Measure
Skip
4.5 Maximum-Likelihood Gaussian Classifier
Skip
4.6 Discussion
Read
5. Least-Mean-Square Algorithm
You should read section 5.4. The rest of the chapter can be browsed.
6. Multilayer Perceptrons
6.1 Introduction
Read
6.2 Some Preliminaries
Important
6.3 Derivation of the Back-Propagation Algorithm
Read
Note: When the back propagation algorithm was introduced in the PDP
books by Rumelhart et al, they called it the generalized delta
rule (the name backprop came later). Haykin, however, defines the
generalized delta rule as being the delta rule, extended by a momentum
term (page 149-150).
6.4 Summary of the Back-Propagation Algorithm
Read
6.5 Initialization
Browse
6.6 The XOR Problem
Read
6.7 Some Hints for Making the Back-Propagation Algorithm Perform Better
Read
6.8 Output Representation and Decision Rule
Browse
6.9 Computer Experiment
Skip
6.10 Generalization
Read
6.11 Cross-Validation
Read
6.12 Approximations of Functions
Skip
6.13 Back-Propagation and Differentiation
Skip
6.14 Virtues and Limitations of Back-Propagation Learning
Read
6.15 Accelerated Convergence of Back-Propagation Through Learning-Rate Adaptation
Read the introduction. Skip from the 'Delta-Bar-Delta' subsection.
6.16 Fuzzy Control of Back-Propagation Learning
Skip
6.17 Network-Pruning Techniques
Read up to and including the section on "Weight Decay". You may skip
the rest.
6.18 - 6.20
Skip
6.21 Discussion
Read
6.22 Applications
Skip
7. Radial-Basis Function Networks
Read the introduction (7.1) and the comparison of RBF and MLP
(7.9). Skip the rest. See the handed out material (excerpt from
Hassoun) for a better description of RBFs.
8. Recurrent Networks Rooted in Statistical Physics
8.1 Introduction
Read
8.2 Dynamical Considerations
Browse
8.3 The Hopfield Network
Read
Good description, but to the casual reader it appears to differ from
the version presented at the lecture in some places.
Haykin includes a threshold term in the formal definition (Eq 8.4),
but it is never used (always set to 0).
In Equation 8.6, the transfer function is given as a signum function,
but this is not the transfer function used in the following text. In
the text and in all experiments, he introduces the special case for
v=0 (stay in the state you're already in), as if this was not
important. It is, however, in fact necessary to prove convergence, and
should be included in the formal definition.
When setting the weights, he divides all weights by N (the number
of nodes). This does not affect the behaviour of the network, but it
has the potential disadvantage that the weights are no longer integer.
Nit picking: The end of the second paragraph on page 293 states
that "We may therefore make the statement that the Hopfield network
will always converge to a stable state when the retrieval operation is
performed asynchronously". The statement is true, but does not follow
from the earlier discussion, which this sentence suggests.
In the example on page 294, Haykin seems to have missed the fact
that storing a state implicitly stores also the reverse state. In the
example network he has explicitly stored a state and its inverse,
which is the same thing as storing the same pattern twice. The example
would have been better if the second stored pattern had been something
else than the inverse of the first.
8.4 Computer Experiment I
Browse
Haykin uses the notion if iteration here, for asynchronous
updating. It is not obvious what an iteration is in this case, so it
should be defined. Looking at the experiments, an iteration seems to
be that one node changed its value. In other words, the Hamming
distance between two successive iterations is always 1.
This definition also explains Haykin's statement at the bottom of
page 299 that: "Since, in theory, one-quarter of the 120 neurons of
the Hopfield network end up changing state for each corrupted pattern,
the number of iterations needed for recall, on average, is 30."
8.5 Energy Function
Read
The rightmost column in table 8.1 is a bit confusing - it shows the
difference in energy between the state and that of the state in the
previous row. Why this is interesting is a mystery.
8.6 - 8.15
Skip
9. Self-Organizing Systems I: Hebbian Learning
When you're finished with this chapter, you should go back to the
subsection "Hidden Units" on page 187 for comparison.
9.1 Introduction
Read
9.2 Some Intuitive Principles of Self-Organization
Read
9.3 Self-Organized Feature Analysis
Skip
9.4 Discussion
Skip
9.5 Principal Component Analysis
Browse the introductory section.
Skip the eigenstructure and data representation subsections.
Browse the subsection on Dimensionality Reduction.
Read Example 1 on page 369.
9.6 - 9.9
Skip
9.10 How Useful is Principal Component Analysis
Browse
10. Self-Organizing Systems II: Competitive Learning
10.1 Introduction
Read
10.2 Computational Maps in the Cerebral Cortex
Skip
10.3 Two Basic Feature-Mapping Models
Read
10.4 Modification of Stimulus by Lateral Feedback
Read introductory subsection. Skip from the computer experiment
10.5 Self-Organizing Feature-Mapping Algorithm
Read
Also, return to section 2.4 for comparison.
10.6 Properties of the SOFM Algorithm
Skip
10.7 Reformulation of the Topological Neighborhood
Read
10.8 - 10.10
Skip
11. Self-Organizing Systems III: Information-Theoretic Models
Skip
12. Modular Networks
Skip
13. Temporal Processing
Skip
14. Neurodynamics
Skip
15. VLSI Implementations of Neural Networks
Skip
Olle Gällmo
Department of Information Technology Email: crwth@docs.uu.se
Box 337 Phone: +46 18 471 10 09
S-751 05 Uppsala Phax: +46 18 55 02 25
Sweden