![]() ![]() The second term is then the entropy of the distribution that the recognition weights assign to the various alternative representations. When there are many alternative ways of describing an input vector it is possible to design a stochastic coding scheme that takes advantage of the entropy across alternative descriptions. In that paper, "The Wake-sleep algorithm for unsupervised neural networks", the note before equation #5 says: Neal, in which there is an early use of the term "Hemholtz Machine" - possibly the first. As far as the origin of the term "cross entropy" relates to artificial neural networks, there is a term used in a paper in Science, submitted 1994, published 1995, by G. The concept seems to come from Shannon's work, with Kullback & Leibler's 1951 AMS note being the origin of the current use of the term. The 1980 Shore and Johnson article in IEEE is a good start, but pointer to the Good monograph from 1956 is even better. Thanx for this - good summary of background literature. 27, 1948.ĭoesn't mention cross entropy (and has a strange definition of "relative entropy": "The ratio of the entropy of a source to the maximum value it could have while still restricted to the same symbols").įinally, I looked in some old books and papers by Tribus.ĭoes anyone know what the equation above is called, and who invented it or has a nice presentation of it? Shannon, "A Mathematical Theory of Communication," Bell system technical journal, vol. 911-934, 1963.īut both papers define cross-entropy to be synonymous with KL-divergence.Ĭ. Good, "Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables," The Annals of Mathematical Statistics, vol. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Leibler, "On information and sufficiency," The Annals of Mathematical Statistics, vol. Wehrl, "General properties of entropy," Reviews of Modern Physics, vol. Johnson, "Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy," Information Theory, IEEE Transactions on, vol. Who was first to start using this quantity? And who invented this term? I looked in: Without citing sources, Wikipedia defines the cross-entropy of discrete distributions $P$ and $Q$ to be ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |