Condorcet’s jury theorem and the truth on the web

Facebook friends or Internet strangers, where should you look to learn what is really going on in the news?

It is a crisp ear­ly morn­ing as you step down from the night train on the plat­form of the train sta­tion. It is your first time there and you are a bit dis­ori­ent­ed. Which direc­tion towards the main hall?

Your instinct is to go where the most of the crowd alight­ing from the train goes. Suc­cess! But, does this truth dis­cov­ery strat­e­gy always works? We seem to be very fond of it, not only at train sta­tions, but also when we chose governments.

Democ­ra­cy comes from the greek word Demokra­tia mean­ing “rule of the com­mon­ers”. Implic­it­ly we equate democ­ra­cy with fair­ness and jus­tice, as we inter­pret the “rule of the com­mon­ers” to be the most com­mon indi­vid­ual choice. This idea, that what is com­mon­ly held to be true is most like­ly to be true, is not sci­en­tif­i­cal­ly unfound­ed, but terms and con­di­tions apply.

Should you look for the truth by see­ing what are the com­mon­ly shared links to arti­cles report­ing judge­ments on the impli­ca­tions of an event among your Face­book friends? We might be able to find the answer in a small but influ­en­tial work known as the Con­dorcet’s Jury The­o­rem.

Condorcet’s jury theorem, terms and conditions

Mar­quis de Con­dorcet was a French philoso­pher and math­e­mati­cian of the late 18th cen­tu­ry who worked in polit­i­cal sci­ence. He was a con­tem­po­rary and friend of Leon­hard Euler and Ben­jamin Franklin, an abo­li­tion­ist, ear­ly defend­er of human rights and equal rights for women and peo­ple of colour. Con­dorcet is one of the first peo­ple to apply math­e­mat­ics in social sci­ences. He is best known from his work relat­ed to major­i­ty vot­ing. Major­i­ty vot­ing, or the major­i­ty rule is a col­lec­tive deci­sion method that selects the choice that has the major­i­ty of votes. Although intu­itive­ly sim­ple, in the pres­ence of more than two choic­es, or an even num­ber of vot­ers, it can be oper­a­tional­ized by dif­fer­ent math­e­mat­i­cal func­tions. Con­dorcet’s Essay on the Appli­ca­tion of Analy­sis to the Prob­a­bil­i­ty of Major­i­ty Deci­sions yield­ed two results that we are still dis­cus­sion and using today — the Con­dorcet major­i­ty vot­ing method and the Con­dorcet Jury Theorem.

A fil­ter bub­ble: the state cre­at­ed for a web user when a per­son­alised search algo­rithm guess­es which infor­ma­tion the user would like to see. (Image: Håvard Legreid/Vox Publica)

In his Jury The­o­rem, Con­dorcet iden­ti­fies how good major­i­ty vot­ing is at dis­cov­er­ing the true answer to a yes-or-no ques­tion. Let us say there is an unknown fact about the world, such as for exam­ple whether the weath­er will be nice tomor­row, or is there intel­li­gent life in the uni­verse out­side of the Solar Sys­tem. The truth of these facts exist, but it is not avail­able to us, or it is not avail­able to us at present, for var­i­ous rea­sons. Con­dorcet proved that if we were to ask suf­fi­cient­ly many peo­ple for an answer of a bina­ry ques­tion, to make a judge­ment on yes the weath­er will be nice tomor­row, or no, it will not be nice, then the answer giv­en by the major­i­ty of the peo­ple has the high­est prob­a­bil­i­ty of being the true answer. Fur­ther, the more peo­ple we ask, the high­er the prob­a­bil­i­ty that the major­i­ty will “find” the truth. How­ev­er, it is only under cer­tain con­di­tions that this the­o­rem holds and these are rather strict conditions.

Les den norske ver­sjo­nen av denne artikkelen.

The first, implic­it con­di­tion is that we are look­ing to estab­lish the truth about one ques­tion, not sev­er­al ques­tions at the same time that might relate to each oth­er. The sec­ond, rather obvi­ous con­di­tion is that our ques­tion should be such, called val­ue judge­ment in the lit­er­a­ture (see Rabi­now­icz 2016), that it admits a true answer exists. Not all ques­tions admit val­ue judge­ments as answers; some are ques­tions of pref­er­ence. When you and I con­sid­er whether we would like to eat sal­ad or pas­ta for din­ner, that is a ques­tion of pref­er­ence. I may pre­fer the sal­ad, while you pre­fer the pas­ta and none of us is wrong. If, how­ev­er, we were dis­cussing which of these dish­es is bet­ter for our health, then we would be mak­ing a val­ue judge­ment. If I believe sal­ad is bet­ter health-wise, and you believe pas­ta is bet­ter, then one of us must be wrong. There is no log­i­cal incon­sis­ten­cy in know­ing that sal­ad is bet­ter for my health but still pre­fer­ring to eat pas­ta. Since there is no truth when con­sid­er­ing pref­er­ences, no truth can be found.

Nico­las de Con­dorcet (1743–94) can still teach us impor­tant things. The por­trait hangs in Ver­sailles palace. (pho­to: Wiki­me­dia Commons)

The third con­di­tion nec­es­sary for the Con­dorcet’s the­o­rem to hold is that the sources of the val­ue judge­ments must be experts. For­tu­nate­ly, the def­i­n­i­tion of what con­sti­tutes an expert is rather loose in math­e­mat­ics and econ­o­my. An expert is defined to be a per­son who can guess bet­ter than a coin toss, or for­mal­ly, an expert is a per­son who has high­er than 50% prob­a­bil­i­ty of giv­ing the cor­rect answer to a yes-or-no ques­tion. This means that if I am try­ing to decide whether to go left or right, and toss a coin one hun­dred times, even if 80 of those toss­es end up say­ing to go left, going left is not guar­an­teed to be the right deci­sion. In this case, when the reli­a­bil­i­ty of the sources is less than 50%, rather than tak­ing the major­i­ty deci­sion, it is bet­ter to pick ran­dom­ly one of the sources and hear their decision.

The fourth, and most strict con­di­tion for apply­ing Con­dorcet’s jury the­o­rem is that of sta­tis­ti­cal inde­pen­dence among the sources. Two sources are sta­tis­ti­cal­ly inde­pen­dent if the bina­ry (yes or no) answer from one of the sources does not affect the prob­a­bil­i­ty with which the sec­ond source would answer yes (or no) to the same ques­tion. Intu­itive­ly, inde­pen­dence requires that the sources do not con­fer with each oth­er, are not swayed by some influ­en­tial opin­ion leader, do not have the same or sim­i­lar expe­ri­ence or train­ing and do not share com­mon infor­ma­tion (see Lad­ha 1995).

The con­di­tion of source inde­pen­dence is very strict, but it was sat­is­fied by the crowd of peo­ple alight­ing from the train with you. The peo­ple in this crowd were vir­tu­al strangers to each oth­er and did not coor­di­nate when mak­ing their deci­sion. They also have inde­pen­dent­ly learned where the exit is, by tak­ing that train often, or hav­ing spot­ted an exit sign which you have not. The train crowd thus effec­tive­ly behaves as inde­pen­dent sources of infor­ma­tion. They are also experts, at least most of them, know­ing where the exit is.

Social media and the independence of input

When it comes to your Face­book friends shar­ing links to arti­cles, the inde­pen­dence con­di­tion is by design not satisfied.

Quite obvi­ous­ly your friends are peo­ple who influ­ence each oth­er’s opin­ions by virtue of being in a social net­work with each-oth­er. These are often also peo­ple that share sim­i­lar expe­ri­ences and back­ground. These cor­re­la­tions nec­es­sar­i­ly exist between their judge­ments and influ­ence how opin­ions and views are formed among friends. This is some­thing to which we as a soci­ety are slow­ly becom­ing aware.

The con­cept that slow­ly per­me­ates our e‑lives is fil­ter bub­ble.

A fil­ter bub­ble is the state cre­at­ed for a web user when a per­son­alised search algo­rithm guess­es which infor­ma­tion the user would like to see. As a result the user becomes sep­a­rat­ed from infor­ma­tion that does not sup­port her or his view­point. Fil­ter bub­bles can cause a user to be unaware even of wide­spread rel­e­vant information.

Per­haps if you should not look to your friends for the truth, you can look to strangers on the Inter­net. Whether the dig­i­tal crowd of strangers will point to the truth depends on how they dis­cov­er new con­tent which they can lat­er share, like or abhor.

To be sure that the major­i­ty of con­sid­ered judge­ments point you towards the truth, you must make sure their sources do not talk to each oth­er and take their infor­ma­tion from dif­fer­ent inde­pen­dent sources. Let us assume that news report­ed on the web tru­ly are orig­i­nal arti­cles rather than opin­ions and sum­maries of a hand­ful of news reports.

The tools that preselect your Web

The Inter­net has accom­plished an unprece­dent­ed con­nec­tiv­i­ty among peo­ple. It has also giv­en voice and oppor­tu­ni­ty to be heard and seen to many. In that sea of infi­nite con­tent choice which we can­not cog­ni­tive­ly process as a whole, how do we find what we would most like­ly want to con­sume? This is done by rec­om­mender sys­tems.

The more your friends are like you, the less their aggre­gat­ed judge­ments would be like­ly to point to the truth

A rec­om­mender sys­tem is soft­ware that pre­dicts how like­ly it is that an item is the item you are look­ing for. Depend­ing on the con­text where it is deployed, this means how ade­quate the query response is to answer your ques­tion or how like­ly you are to like a new prod­uct, con­tent or ser­vice. Rec­om­mender sys­tems are a very impor­tant tool for con­tent dis­cov­ery and, among else, the rea­son why we today have high­ly effi­cient search engines.

One of the first and most famous rec­om­mender sys­tems is PageR­ank. PageR­ank is an algo­rithm used by the Google search engine to rank the query respons­es in order of rel­e­vance. It is named by Lar­ry Page, one of the founders of Google. PageR­ank deter­mines the impor­tance of a web­site by count­ing how many web­sites link to it. The under­ly­ing assump­tion is that the more rel­e­vant a web-page is, the more web-pages link to it. The intu­ition behind this assump­tion actu­al­ly comes from acad­e­mia: the most rel­e­vant arti­cles in a sci­en­tif­ic dis­ci­pline are those that have the most citations.

PageR­ank is an algo­rithm designed specif­i­cal­ly for rank­ing web-pages with respect to rel­e­vance. For pre­dict­ing how much you would like an item unknown to you, typ­i­cal­ly the meth­ods of col­lab­o­ra­tive fil­ter­ing are used. To decide whether you will be inter­est­ed in an item, two types of infor­ma­tion are tak­en into con­sid­er­a­tion. The first is, how much did you like sim­i­lar items, and the sec­ond is how much cus­tomers most sim­i­lar to you liked the item in ques­tion. The assump­tion here is that you are most like­ly to like what your friends and peers like.

Recommender systems and majority voting

What is per­haps not so obvi­ous is that rec­om­mender sys­tems, in par­tic­u­lar PageR­ank and col­lab­o­ra­tive fil­ter­ing have at its core major­i­ty-vot­ing. Per­haps we over­look this because we think of vot­ing as the once-in-four-years seri­ous affair when we go to the polls and elect gov­ern­ment. On the Inter­net we “vote” con­stant­ly by con­stant­ly choos­ing one from a selec­tion of alter­na­tives: the most promis­ing link to give us what we are look­ing for, the cutest video, the most intrigu­ing news arti­cle. But that top of the list of choic­es that are offered to us and we actu­al­ly choose from is also deter­mined by the choic­es of the strangers that are most like us on the Inter­net. The inde­pen­dent “vote” is not a real­i­ty on-line. But is this a prob­lem for find­ing the truth?

Illus­tra­tion: Moshanin/Wikimedia Com­monscba

Exam­ple of the col­lab­o­ra­tive fil­ter­ing process.

The inde­pen­dence con­di­tion of Con­dorcet’s jury the­o­rem is very strong and dif­fi­cult to sat­is­fy in prac­tice even off-line. Krish­na Lad­ha, among oth­ers, explored how much cor­re­la­tion there can exist between vot­ers before Con­dorcet’s the­o­rem stops hold­ing. His con­clu­sion is that the effec­tive­ness of major­i­ty-rule vot­ing decreas­es as the cor­re­la­tion between vot­ers increase. This means that the more your friends are like you, the less their aggre­gat­ed judge­ments would be like­ly to point to the truth. Lad­ha also shows that the prob­a­bil­i­ty that the major­i­ty of cor­re­lat­ed votes is the truth depends on the num­ber of votes. Large groups are rel­a­tive­ly robust and tol­er­ate high­er cor­re­la­tion coef­fi­cient aver­ages. This means that the more his­to­ry, expe­ri­ences and infor­ma­tion sources you share with your friends, the larg­er group of friends you need in order for their major­i­ty sup­port­ed judge­ment to point towards the truth. The major­i­ty in a small groups of very close friends is prob­a­bilis­ti­cal­ly unlike­ly to find the true answer to a bina­ry question.

There are lim­its to the the truth-track­ing pow­ers of the majority-vote

Luck­i­ly there are many strangers on the Inter­net. This is the obser­va­tion that Mas­ter­ton, Ols­son and Angere make. They saw the shad­ows of Con­dorcet’s Jury The­o­rem in PageR­ank: the prob­a­bil­i­ty that a web-page is rel­e­vant increas­es with the num­ber of “votes” it receives from oth­er web-pages. Mas­ter­ton, Ols­son and Angere explored how good is PageR­ank at find­ing the true answer to a query? Their answer: pret­ty good. More pre­cise­ly, they show that PageR­ank has epis­te­mo­log­i­cal jus­ti­fi­ca­tion for link-based rank­ing on the web. Of course, their empir­i­cal analy­sis makes cer­tain assump­tions on the inde­pen­dence of web-pages.

How col­lab­o­ra­tive fil­ter­ing con­tributes towards trans­form­ing the choic­es of the major­i­ty into the truth is an unex­plored ques­tion, but one we must take seri­ous­ly. Major­i­ty-vot­ing is a very intu­itive and sim­ple way to imple­ment democ­ra­cy and it is at the core of many elec­tion pro­ce­dures in the world. If a group pref­er­ence is need­ed, the pref­er­ence of the major­i­ty ensures that as few peo­ple as pos­si­ble are unhap­py with the group’s choice. Con­dorcet gave fur­ther legit­i­ma­cy to major­i­ty-vot­ing by show­ing that it is also good at point­ing to the truth, some­thing that we are intu­itive­ly aware of when find­ing our­selves on train plat­forms in new cities. But there are lim­its to the the truth-track­ing pow­ers of the major­i­ty-vote and this is also some­thing we must be aware of.

On-line, as in real life, our instincts are to trust the views of the many, but are the many views we see tru­ly dif­fer­ent? Per­haps they are just care­ful­ly select­ed by an algo­rithm to reflect what the algo­rithm cal­cu­lat­ed that we want to see. Know­ing this, per­haps we can take a help­ing hint from the jury the­o­rems, the work of Con­dorcet and oth­ers that fol­lowed, and do better.

Literature

Włodek Rabi­now­icz (2016). Aggre­ga­tion of Val­ue Judg­ments Dif­fers from Aggre­ga­tion of Pref­er­ences. 10.1163/9789004312654_003

Krish­na K. Lad­ha (1995). Infor­ma­tion pool­ing through major­i­ty-rule vot­ing: Con­dorcet’s jury the­o­rem with cor­re­lat­ed votes. Doi: http://dx.doi.org/10.1016/0167–2681(94)00068‑P

George Mas­ter­ton, Erik J. Ols­son and Staffan Angere (2016). Link­ing as vot­ing: how the Con­dorcet jury the­o­rem in polit­i­cal sci­ence is rel­e­vant to webo­met­rics. Doi:10.1007/s11192-016‑1837‑1

Franz Diet­rich and Kai Spiek­er­mann (2016). Jury the­o­rems. Work­ing paper.

TEMA

D

emocrac
y

1 ARTIKLER FRA VOX PUBLICA

FLERE KILDER - FAKTA - KONTEKST

INGEN KOMMENTARER

Kommentarfeltet til denne artikkelen er nå stengt. Ta kontakt med redaksjonen dersom du har synspunkter på artikkelen.

til toppen