It is a crisp early morning as you step down from the night train on the platform of the train station. It is your first time there and you are a bit disoriented. Which direction towards the main hall?
Your instinct is to go where the most of the crowd alighting from the train goes. Success! But, does this truth discovery strategy always works? We seem to be very fond of it, not only at train stations, but also when we chose governments.
Democracy comes from the greek word Demokratia meaning «rule of the commoners». Implicitly we equate democracy with fairness and justice, as we interpret the «rule of the commoners» to be the most common individual choice. This idea, that what is commonly held to be true is most likely to be true, is not scientifically unfounded, but terms and conditions apply.
Should you look for the truth by seeing what are the commonly shared links to articles reporting judgements on the implications of an event among your Facebook friends? We might be able to find the answer in a small but influential work known as the Condorcet’s Jury Theorem.
Condorcet’s jury theorem, terms and conditions
Marquis de Condorcet was a French philosopher and mathematician of the late 18th century who worked in political science. He was a contemporary and friend of Leonhard Euler and Benjamin Franklin, an abolitionist, early defender of human rights and equal rights for women and people of colour. Condorcet is one of the first people to apply mathematics in social sciences. He is best known from his work related to majority voting. Majority voting, or the majority rule is a collective decision method that selects the choice that has the majority of votes. Although intuitively simple, in the presence of more than two choices, or an even number of voters, it can be operationalized by different mathematical functions. Condorcet’s Essay on the Application of Analysis to the Probability of Majority Decisions yielded two results that we are still discussion and using today — the Condorcet majority voting method and the Condorcet Jury Theorem.
In his Jury Theorem, Condorcet identifies how good majority voting is at discovering the true answer to a yes-or-no question. Let us say there is an unknown fact about the world, such as for example whether the weather will be nice tomorrow, or is there intelligent life in the universe outside of the Solar System. The truth of these facts exist, but it is not available to us, or it is not available to us at present, for various reasons. Condorcet proved that if we were to ask sufficiently many people for an answer of a binary question, to make a judgement on yes the weather will be nice tomorrow, or no, it will not be nice, then the answer given by the majority of the people has the highest probability of being the true answer. Further, the more people we ask, the higher the probability that the majority will «find» the truth. However, it is only under certain conditions that this theorem holds and these are rather strict conditions.
Les den norske versjonen av denne artikkelen.
The first, implicit condition is that we are looking to establish the truth about one question, not several questions at the same time that might relate to each other. The second, rather obvious condition is that our question should be such, called value judgement in the literature (see Rabinowicz 2016), that it admits a true answer exists. Not all questions admit value judgements as answers; some are questions of preference. When you and I consider whether we would like to eat salad or pasta for dinner, that is a question of preference. I may prefer the salad, while you prefer the pasta and none of us is wrong. If, however, we were discussing which of these dishes is better for our health, then we would be making a value judgement. If I believe salad is better health-wise, and you believe pasta is better, then one of us must be wrong. There is no logical inconsistency in knowing that salad is better for my health but still preferring to eat pasta. Since there is no truth when considering preferences, no truth can be found.
The third condition necessary for the Condorcet’s theorem to hold is that the sources of the value judgements must be experts. Fortunately, the definition of what constitutes an expert is rather loose in mathematics and economy. An expert is defined to be a person who can guess better than a coin toss, or formally, an expert is a person who has higher than 50% probability of giving the correct answer to a yes-or-no question. This means that if I am trying to decide whether to go left or right, and toss a coin one hundred times, even if 80 of those tosses end up saying to go left, going left is not guaranteed to be the right decision. In this case, when the reliability of the sources is less than 50%, rather than taking the majority decision, it is better to pick randomly one of the sources and hear their decision.
The fourth, and most strict condition for applying Condorcet’s jury theorem is that of statistical independence among the sources. Two sources are statistically independent if the binary (yes or no) answer from one of the sources does not affect the probability with which the second source would answer yes (or no) to the same question. Intuitively, independence requires that the sources do not confer with each other, are not swayed by some influential opinion leader, do not have the same or similar experience or training and do not share common information (see Ladha 1995).
The condition of source independence is very strict, but it was satisfied by the crowd of people alighting from the train with you. The people in this crowd were virtual strangers to each other and did not coordinate when making their decision. They also have independently learned where the exit is, by taking that train often, or having spotted an exit sign which you have not. The train crowd thus effectively behaves as independent sources of information. They are also experts, at least most of them, knowing where the exit is.
Social media and the independence of input
When it comes to your Facebook friends sharing links to articles, the independence condition is by design not satisfied.
Quite obviously your friends are people who influence each other’s opinions by virtue of being in a social network with each-other. These are often also people that share similar experiences and background. These correlations necessarily exist between their judgements and influence how opinions and views are formed among friends. This is something to which we as a society are slowly becoming aware.
The concept that slowly permeates our e-lives is filter bubble.
A filter bubble is the state created for a web user when a personalised search algorithm guesses which information the user would like to see. As a result the user becomes separated from information that does not support her or his viewpoint. Filter bubbles can cause a user to be unaware even of widespread relevant information.
Perhaps if you should not look to your friends for the truth, you can look to strangers on the Internet. Whether the digital crowd of strangers will point to the truth depends on how they discover new content which they can later share, like or abhor.
To be sure that the majority of considered judgements point you towards the truth, you must make sure their sources do not talk to each other and take their information from different independent sources. Let us assume that news reported on the web truly are original articles rather than opinions and summaries of a handful of news reports.
The tools that preselect your Web
The Internet has accomplished an unprecedented connectivity among people. It has also given voice and opportunity to be heard and seen to many. In that sea of infinite content choice which we cannot cognitively process as a whole, how do we find what we would most likely want to consume? This is done by recommender systems.
The more your friends are like you, the less their aggregated judgements would be likely to point to the truth
A recommender system is software that predicts how likely it is that an item is the item you are looking for. Depending on the context where it is deployed, this means how adequate the query response is to answer your question or how likely you are to like a new product, content or service. Recommender systems are a very important tool for content discovery and, among else, the reason why we today have highly efficient search engines.
One of the first and most famous recommender systems is PageRank. PageRank is an algorithm used by the Google search engine to rank the query responses in order of relevance. It is named by Larry Page, one of the founders of Google. PageRank determines the importance of a website by counting how many websites link to it. The underlying assumption is that the more relevant a web-page is, the more web-pages link to it. The intuition behind this assumption actually comes from academia: the most relevant articles in a scientific discipline are those that have the most citations.
PageRank is an algorithm designed specifically for ranking web-pages with respect to relevance. For predicting how much you would like an item unknown to you, typically the methods of collaborative filtering are used. To decide whether you will be interested in an item, two types of information are taken into consideration. The first is, how much did you like similar items, and the second is how much customers most similar to you liked the item in question. The assumption here is that you are most likely to like what your friends and peers like.
Recommender systems and majority voting
What is perhaps not so obvious is that recommender systems, in particular PageRank and collaborative filtering have at its core majority-voting. Perhaps we overlook this because we think of voting as the once-in-four-years serious affair when we go to the polls and elect government. On the Internet we “vote” constantly by constantly choosing one from a selection of alternatives: the most promising link to give us what we are looking for, the cutest video, the most intriguing news article. But that top of the list of choices that are offered to us and we actually choose from is also determined by the choices of the strangers that are most like us on the Internet. The independent “vote” is not a reality on-line. But is this a problem for finding the truth?
The independence condition of Condorcet’s jury theorem is very strong and difficult to satisfy in practice even off-line. Krishna Ladha, among others, explored how much correlation there can exist between voters before Condorcet’s theorem stops holding. His conclusion is that the effectiveness of majority-rule voting decreases as the correlation between voters increase. This means that the more your friends are like you, the less their aggregated judgements would be likely to point to the truth. Ladha also shows that the probability that the majority of correlated votes is the truth depends on the number of votes. Large groups are relatively robust and tolerate higher correlation coefficient averages. This means that the more history, experiences and information sources you share with your friends, the larger group of friends you need in order for their majority supported judgement to point towards the truth. The majority in a small groups of very close friends is probabilistically unlikely to find the true answer to a binary question.
There are limits to the the truth-tracking powers of the majority-vote
Luckily there are many strangers on the Internet. This is the observation that Masterton, Olsson and Angere make. They saw the shadows of Condorcet’s Jury Theorem in PageRank: the probability that a web-page is relevant increases with the number of «votes» it receives from other web-pages. Masterton, Olsson and Angere explored how good is PageRank at finding the true answer to a query? Their answer: pretty good. More precisely, they show that PageRank has epistemological justification for link-based ranking on the web. Of course, their empirical analysis makes certain assumptions on the independence of web-pages.
How collaborative filtering contributes towards transforming the choices of the majority into the truth is an unexplored question, but one we must take seriously. Majority-voting is a very intuitive and simple way to implement democracy and it is at the core of many election procedures in the world. If a group preference is needed, the preference of the majority ensures that as few people as possible are unhappy with the group’s choice. Condorcet gave further legitimacy to majority-voting by showing that it is also good at pointing to the truth, something that we are intuitively aware of when finding ourselves on train platforms in new cities. But there are limits to the the truth-tracking powers of the majority-vote and this is also something we must be aware of.
On-line, as in real life, our instincts are to trust the views of the many, but are the many views we see truly different? Perhaps they are just carefully selected by an algorithm to reflect what the algorithm calculated that we want to see. Knowing this, perhaps we can take a helping hint from the jury theorems, the work of Condorcet and others that followed, and do better.
Włodek Rabinowicz (2016). Aggregation of Value Judgments Differs from Aggregation of Preferences. 10.1163/9789004312654_003
Krishna K. Ladha (1995). Information pooling through majority-rule voting: Condorcet’s jury theorem with correlated votes. Doi: http://dx.doi.org/10.1016/0167–2681(94)00068-P
George Masterton, Erik J. Olsson and Staffan Angere (2016). Linking as voting: how the Condorcet jury theorem in political science is relevant to webometrics. Doi:10.1007/s11192-016‑1837-1
Franz Dietrich and Kai Spiekermann (2016). Jury theorems. Working paper.