Interpreting the generic complex flaws comes and nieces nephews around intelligent compressor might recognize the digits of π and encode it as a description meaning the first million digits of pi, or as a program that reconstructs the data when run. With our previous model, the best we could do is 8,241 bytes. Yet, there are very small programs that can output π, some as small as 52 bytes. The counting argument says that most strings are not compressible. it is a rather remarkable fact that most strings that we care about, for example English text, images, software, sensor readings, and DNA, are fact compressible. These strings generally have short descriptions, whether they are described English or as a program C or x86 machine code. Solomonoff Kolmogorov and Chaitin independently proposed a universal a-priori probability distribution over strings based on their minimum description length. The algorithmic probability of a string x is defined as the fraction of random programs some language L that output x, where each program M is weighted by 2 -|M| and |M| is the length of M bits. This probability is dominated by the shortest such program. We call this length the Kolmogorov complexity K L of x. Algorithmic probability and complexity of a string x depend on the choice of language L, but only by a constant that is independent of x. Suppose that M1 and M2 are encodings of x languages L1 and L2 respectively. For example, if L1 is C++, then M1 would be a program C++ that outputs x. If L2 is English, the M2 would be a description of x with just enough detail that it would allow you to write x exactly. it is possible for any pair of languages to write one language a compiler or interpreter or rules for understanding the other language. For example, you could write a description of the C++ language English that you could read any C++ program and predict its output by running it your head. Conversely, you could write a program C++ that input English language description and translated it into C++. The size of the language description or compiler does not depend on x any way. Then for any description M1 any language L1 of any x, it is possible to find a description M2 any other language L2 of x by appending to M1 a fixed length description of L1 written L2. It is not proven that algorithmic probability is a true universal prior probability. Nevertheless it is widely accepted on empirical grounds because of its success sequence prediction and machine learning over a wide range of data types. machine learning, the minimum description length principle of choosing the simplest hypothesis consistent with the training data applies to a wide range of algorithms. It formalizes Occam's Razor. Occam noted the 14'th century, that the simplest answer is usually the correct answer. Occam's Razor is universally applied all of the sciences because we know from experience that the simplest or most elegant theory that explains the data tends to be the best predictor of future experiments. To summarize, the best compression we can achieve for any string x is to encode it as the shortest program M some language L that outputs x. Furthermore, the choice of L becomes less important as the strings get longer. All that remains is to find a procedure that finds M for any x some language L. However, Kolmogorov proved that there is no such procedure any language. fact, there is no procedure that just gives you |M|, the length of the shortest possible description of x, for all x. Suppose there were. Then it would be possible to describe the first string that cannot be described less than a million bits leading to the paradox that we had just done Nor is there a general test for randomness. and Vitanyi give a simple proof of a stronger variation of Gödel's first incompleteness theorem, namely that any consistent, formal system powerful enough to describe statements arithmetic, that there is at least one true statement that cannot be proven. and Vitanyi prove that if the system is sound then there are infinite number of true but unprovable statements. particular, there are only a finite number of statements of the form x is random |x|) that can be proven, out of infinite number of possible finite strings x. Suppose otherwise. Then it would be possible to to enumerate all proofs and describe the string x such that it is the first to be proven to be a million bits and random spite of the fact that we just gave a short description of it. If F describes