06-06-2016, 11:53 AM
(This post was last modified: 06-06-2016, 04:36 PM by gill1109.

*Edit Reason: minor correction*)
It seems that FrediFizzx does not realise the need for some insight in statistics in order to understand exactly what it means to test the Bell-CHSH inequality by experiment. It is important to make a distinction between empirical (observed) averages and theoretical mean values (expectation values).

Below I will write O(a, b), where "O" stands for "observed", for a correlation computed from a finite amount of data obtained from an experiment, and I will write E(a, b), where "E" stands for "expected", for a theoretical correlation derived in some theory.

In a typical CHSH experiment we end up with four data sets, one for each pair of settings (a, b), (a', b), (a, b'), (a', b'). Each data set consists of a number of pairs of outcomes +/- 1. The four datasets can have different sizes. We compute four empirical correlations: the average of the product of the outcomes, one for each of the four setting pairs . Let me denote these four correlations by O(a, b), O(a, b'), O(a', b) and O(a', b') where "O" stands for "observed".

Now we compute the CHSH quantity O(a, b) - O(a, b') + O(a', b) + O(a', b').

Given what has been said so far, it is obvious that the result can lie anywhere between -4 and +4. Here is the result of a tiny experiment leading to a CHSH value of +4: one pair of observations for each setting pair

a,b: +1, +1

a, b': +1, -1

a', b: +1, +1

a', b': +1, +1

Also a huge experiment can deliver CHSH = +4; for instance, an experiment in which the outcome pairs just listed are duplicated a billion times, once for each of the setting pairs.

Now consider an experiment in which the following actually happens, lots of times: Alice and Bob each choose a setting a or a' and b or b'; independently of those choices, "Nature" chooses a value lambda drawn at random according to a probability distribution rho over some set; Alice observes outcome A(a, lambda) or A(a', lambda) and Bob observes outcome B(b, lambda) or B(b', lambda). Here, I assume that the functions A and B take values +/-1. The functions A and B and the probability distribution rho remain the same for all trials (one trial = one pair of settings and one pair of outcomes). Now we compute four averages of products O(a, b) etc., each computed on the appropriate subset of trials (the ones with the corresponding setting pair), and take a look at the CHSH quantity O(a, b) - O(a, b') + O(a', b) + O(a', b').

Obviously, the result can lie anywhere between -4 and +4 !

But if the experiment is large, then with large probability the four observed correlations will be close to the four theoretical correlations, which according to simple probability theory are E(a, b) = int A (a, lambda) B (b, lambda) rho(lambda) d lambda. By the usual simple algebra E(a, b) - E(a, b') + E(a', b) + E(a', b') lies between -2 and +2. So if the experiment is large enough, the probability that O(a, b) - O(a, b') + O(a', b) + O(a', b') lies substantially above +2 or below -2 is negligible.

Experimenters do not observe E(a, b) - E(a, b') + E(a', b) + E(a', b').

They observe and publish a realised value of O(a, b) - O(a, b') + O(a', b) + O(a', b').

They do some statistics (calculate error bars - standard deviations - or whatever) in order to show that the value which they observed of O(a, b) - O(a, b') + O(a', b) + O(a', b') lies so far above +2 as to discredit a theory according to which E(a, b) - E(a, b') + E(a', b) + E(a', b') is less than or equal to +2.

The classic approach to CHSH outlined here depends on a rather restrictive understanding of a hidden variables theory: for each new trial, lambda is drawn anew with the same probability distribution rho. Moreover, the functions A and B are assumed to remain the same throughout the experiment. It is possible to relax these assumptions.

By the way, one might like to think of the hidden variable lambda as being carried by the particles. This would seem to exclude local hidden variables theories where also some new randomness goes on in the two measurement devices. But there is no reason why we shouldn't add to lambda, as further components of a vector, hidden variables thought of as belonging to the measurement devices as well. And these different components needn't be statistically independent of one another.

Below I will write O(a, b), where "O" stands for "observed", for a correlation computed from a finite amount of data obtained from an experiment, and I will write E(a, b), where "E" stands for "expected", for a theoretical correlation derived in some theory.

In a typical CHSH experiment we end up with four data sets, one for each pair of settings (a, b), (a', b), (a, b'), (a', b'). Each data set consists of a number of pairs of outcomes +/- 1. The four datasets can have different sizes. We compute four empirical correlations: the average of the product of the outcomes, one for each of the four setting pairs . Let me denote these four correlations by O(a, b), O(a, b'), O(a', b) and O(a', b') where "O" stands for "observed".

Now we compute the CHSH quantity O(a, b) - O(a, b') + O(a', b) + O(a', b').

Given what has been said so far, it is obvious that the result can lie anywhere between -4 and +4. Here is the result of a tiny experiment leading to a CHSH value of +4: one pair of observations for each setting pair

a,b: +1, +1

a, b': +1, -1

a', b: +1, +1

a', b': +1, +1

Also a huge experiment can deliver CHSH = +4; for instance, an experiment in which the outcome pairs just listed are duplicated a billion times, once for each of the setting pairs.

Now consider an experiment in which the following actually happens, lots of times: Alice and Bob each choose a setting a or a' and b or b'; independently of those choices, "Nature" chooses a value lambda drawn at random according to a probability distribution rho over some set; Alice observes outcome A(a, lambda) or A(a', lambda) and Bob observes outcome B(b, lambda) or B(b', lambda). Here, I assume that the functions A and B take values +/-1. The functions A and B and the probability distribution rho remain the same for all trials (one trial = one pair of settings and one pair of outcomes). Now we compute four averages of products O(a, b) etc., each computed on the appropriate subset of trials (the ones with the corresponding setting pair), and take a look at the CHSH quantity O(a, b) - O(a, b') + O(a', b) + O(a', b').

Obviously, the result can lie anywhere between -4 and +4 !

But if the experiment is large, then with large probability the four observed correlations will be close to the four theoretical correlations, which according to simple probability theory are E(a, b) = int A (a, lambda) B (b, lambda) rho(lambda) d lambda. By the usual simple algebra E(a, b) - E(a, b') + E(a', b) + E(a', b') lies between -2 and +2. So if the experiment is large enough, the probability that O(a, b) - O(a, b') + O(a', b) + O(a', b') lies substantially above +2 or below -2 is negligible.

Experimenters do not observe E(a, b) - E(a, b') + E(a', b) + E(a', b').

They observe and publish a realised value of O(a, b) - O(a, b') + O(a', b) + O(a', b').

They do some statistics (calculate error bars - standard deviations - or whatever) in order to show that the value which they observed of O(a, b) - O(a, b') + O(a', b) + O(a', b') lies so far above +2 as to discredit a theory according to which E(a, b) - E(a, b') + E(a', b) + E(a', b') is less than or equal to +2.

The classic approach to CHSH outlined here depends on a rather restrictive understanding of a hidden variables theory: for each new trial, lambda is drawn anew with the same probability distribution rho. Moreover, the functions A and B are assumed to remain the same throughout the experiment. It is possible to relax these assumptions.

By the way, one might like to think of the hidden variable lambda as being carried by the particles. This would seem to exclude local hidden variables theories where also some new randomness goes on in the two measurement devices. But there is no reason why we shouldn't add to lambda, as further components of a vector, hidden variables thought of as belonging to the measurement devices as well. And these different components needn't be statistically independent of one another.