By Gustavo Deco, Dragan Obradovic

Neural networks offer a strong new know-how to version and regulate nonlinear and complicated platforms. during this publication, the authors current a close formula of neural networks from the information-theoretic point of view. They exhibit how this angle presents new insights into the layout conception of neural networks. specifically they exhibit how those equipment should be utilized to the themes of supervised and unsupervised studying together with characteristic extraction, linear and non-linear self sufficient part research, and Boltzmann machines. Readers are assumed to have a uncomplicated knowing of neural networks, yet the entire appropriate options from details thought are conscientiously brought and defined. therefore, readers from a number of varied medical disciplines, significantly cognitive scientists, engineers, physicists, statisticians, and machine scientists, will locate this to be a really worthwhile advent to this topic.

As an example of neural learning we describe in detail two supervised learning algorithms and architectures and one unsupervised learning paradigm. The supervised methods are the well known backpropagation for deterministic feedforward networks and the Boltzmann Machine Learning algorithm for a stochastic recurrent network. As an example of unsupervised learning we present the competitive learning paradigm. Finally, the biologically motivated learning rules of Hebb are introduced at the end of the section.

16) Using the derivative identities .... ( aB a.. 21) It is easy to see that A 2 = A. e. the LLSE of is the projection A of However, since in general AT ;t A, the matrix A is not an orthogonal projection. x x. We now seek the matrix W which minimizes the reconstruction error LSE. Before the formulation of the theorem which defines the optimal W, the following auxiliary lemma is presented. 1 Let S be a N x M -matrix with I:S; M < N and rank (S) = M and let D be a N x N diagonal matrix. 23) where P is a N x M -matrix with orthonormal vectors in its columns, which spans the same space as the one spanned by the columns vectors of the matrix S.

Chapter 3 and Chapter 4 focus on the case of linear feature extraction. Linear feature extraction removes redundancy from the data in a linear fashion. 1)). 2]. This chapter introduces PCA from two perspectives: the standard definition of PCA as a Karhunen-Loeve Transform (statistical approach) and the information theory based formulations. g. 7)). The information theory based approaches can be formulated in two different ways by modeling one of the two tasks associated with PCA: optimal compression or decorrelation of the output components.

