What to read in Haykin's first edition

1. Introduction

1.1 What Is a Neural Network?

Important

1.2 Structural Levels of Organization in the Brain

Browse

1.3 Models of a Neuron

Important

1.4 Neural Networks Viewed as Directed Graphs

Browse

1.5 Feedback

Browse

1.6 Network Architectures

Important (but the Lattice network category is redundant)

1.7 Knowledge Representation

Defines four general rules of knowledge representation, which should be followed independent of application or ANN architecture.

The rules are important. The rest of the section can be browsed.

1.8 - 1.10

Browse

2. Learning Process

The first time you read this chapter, browse through sections 2.1-2.10 on a conceptual level (sections 2.11-2.13 can be skipped). When you're through, you should be able to give a general answer to the question "What is X?", where X is any section heading, e.g. X="The Credit-Assignment Problem". The details can be skipped at this time.

Later you should re-read section 2.8 (important), since it is the only place in this edition of the book where reinforcement learning is addressed. You may, if you wish, skip the details on Adaptive Heuristic Critic though.

3. Correlation Matrix Memory

Skip the whole chapter at this time. You may want to return here, though, if you get stuck with the matrix algebra used later in the book.

4. The Perceptron

4.1 Introduction

Read

4.2 Basic Considerations

Important

4.3 The Perceptron Convergence Theorem

Important, but you may browse or skip the proof.

4.4 Performance Measure

Skip

4.5 Maximum-Likelihood Gaussian Classifier

Skip

4.6 Discussion

Read

5. Least-Mean-Square Algorithm

You should read section 5.4. The rest of the chapter can be browsed.

6. Multilayer Perceptrons

6.1 Introduction

Read

6.2 Some Preliminaries

Important

6.3 Derivation of the Back-Propagation Algorithm

Read

Note: When the back propagation algorithm was introduced in the PDP books by Rumelhart et al, they called it the generalized delta rule (the name backprop came later). Haykin, however, defines the generalized delta rule as being the delta rule, extended by a momentum term (page 149-150).

6.4 Summary of the Back-Propagation Algorithm

Read

6.5 Initialization

Browse

6.6 The XOR Problem

Read

6.7 Some Hints for Making the Back-Propagation Algorithm Perform Better

Read

6.8 Output Representation and Decision Rule

Browse

6.9 Computer Experiment

Skip

6.10 Generalization

Read

6.11 Cross-Validation

Read

6.12 Approximations of Functions

Skip

6.13 Back-Propagation and Differentiation

Skip

6.14 Virtues and Limitations of Back-Propagation Learning

Read

6.15 Accelerated Convergence of Back-Propagation Through Learning-Rate Adaptation

Read the introduction. Skip from the 'Delta-Bar-Delta' subsection.

6.16 Fuzzy Control of Back-Propagation Learning

Skip

6.17 Network-Pruning Techniques

Read up to and including the section on "Weight Decay". You may skip the rest.

6.18 - 6.20

Skip

6.21 Discussion

Read

6.22 Applications

Skip

7. Radial-Basis Function Networks

Read the introduction (7.1) and the comparison of RBF and MLP (7.9). Skip the rest. See the handed out material (excerpt from Hassoun) for a better description of RBFs.

8. Recurrent Networks Rooted in Statistical Physics

8.1 Introduction

Read

8.2 Dynamical Considerations

Browse

8.3 The Hopfield Network

Read

Good description, but to the casual reader it appears to differ from the version presented at the lecture in some places.

Haykin includes a threshold term in the formal definition (Eq 8.4), but it is never used (always set to 0).

In Equation 8.6, the transfer function is given as a signum function, but this is not the transfer function used in the following text. In the text and in all experiments, he introduces the special case for v=0 (stay in the state you're already in), as if this was not important. It is, however, in fact necessary to prove convergence, and should be included in the formal definition.

When setting the weights, he divides all weights by N (the number of nodes). This does not affect the behaviour of the network, but it has the potential disadvantage that the weights are no longer integer.

Nit picking: The end of the second paragraph on page 293 states that "We may therefore make the statement that the Hopfield network will always converge to a stable state when the retrieval operation is performed asynchronously". The statement is true, but does not follow from the earlier discussion, which this sentence suggests.

In the example on page 294, Haykin seems to have missed the fact that storing a state implicitly stores also the reverse state. In the example network he has explicitly stored a state and its inverse, which is the same thing as storing the same pattern twice. The example would have been better if the second stored pattern had been something else than the inverse of the first.

8.4 Computer Experiment I

Browse

Haykin uses the notion if iteration here, for asynchronous updating. It is not obvious what an iteration is in this case, so it should be defined. Looking at the experiments, an iteration seems to be that one node changed its value. In other words, the Hamming distance between two successive iterations is always 1.

This definition also explains Haykin's statement at the bottom of page 299 that: "Since, in theory, one-quarter of the 120 neurons of the Hopfield network end up changing state for each corrupted pattern, the number of iterations needed for recall, on average, is 30."

8.5 Energy Function

Read

The rightmost column in table 8.1 is a bit confusing - it shows the difference in energy between the state and that of the state in the previous row. Why this is interesting is a mystery.

8.6 - 8.15

Skip

9. Self-Organizing Systems I: Hebbian Learning

When you're finished with this chapter, you should go back to the subsection "Hidden Units" on page 187 for comparison.

9.1 Introduction

Read

9.2 Some Intuitive Principles of Self-Organization

Read

9.3 Self-Organized Feature Analysis

Skip

9.4 Discussion

Skip

9.5 Principal Component Analysis

Browse the introductory section.
Skip the eigenstructure and data representation subsections.
Browse the subsection on Dimensionality Reduction.
Read Example 1 on page 369.

9.6 - 9.9

Skip

9.10 How Useful is Principal Component Analysis

Browse

10. Self-Organizing Systems II: Competitive Learning

10.1 Introduction

Read

10.2 Computational Maps in the Cerebral Cortex

Skip

10.3 Two Basic Feature-Mapping Models

Read

10.4 Modification of Stimulus by Lateral Feedback

Read introductory subsection. Skip from the computer experiment

10.5 Self-Organizing Feature-Mapping Algorithm

Read

Also, return to section 2.4 for comparison.

10.6 Properties of the SOFM Algorithm

Skip

10.7 Reformulation of the Topological Neighborhood

Read

10.8 - 10.10

Skip

11. Self-Organizing Systems III: Information-Theoretic Models

Skip

12. Modular Networks

Skip

13. Temporal Processing

Skip

14. Neurodynamics

Skip

15. VLSI Implementations of Neural Networks

Skip

Olle Gällmo
Department of Information Technology  Email: crwth@docs.uu.se
Box 337				      Phone: +46 18 471 10 09
S-751 05 Uppsala		      Phax:  +46 18 55 02 25
Sweden