Condorcet’s jury theorem and the truth on the web

Facebook friends or Internet strangers, where should you look to learn what is really going on in the news?

It is a crisp ear­ly mor­ning as you step down from the night tra­in on the plat­form of the tra­in sta­tion. It is your fir­st time the­re and you are a bit dis­ori­ented. Which direc­tion towards the main hall?

Your instinct is to go whe­re the most of the crowd aligh­ting from the tra­in goes. Success! But, does this truth dis­covery stra­te­gy always works? We seem to be very fond of it, not only at tra­in sta­tions, but also when we cho­se govern­ments.

Democracy comes from the gre­ek word Demo­kra­tia meaning «rule of the com­mo­ners». Impli­cit­ly we equa­te democracy with fai­r­ness and jus­tice, as we inter­pret the «rule of the com­mo­ners» to be the most com­mon indi­vi­dual choi­ce. This idea, that what is com­mon­ly held to be true is most like­ly to be true, is not scien­ti­fi­cal­ly unfounded, but terms and con­ditions apply.

Should you look for the truth by see­ing what are the com­mon­ly shared links to articles repor­ting jud­ge­ments on the impli­ca­tions of an event among your Face­bo­ok fri­ends? We might be able to find the answer in a small but influ­en­ti­al work known as the Condorcet’s Jury Theo­rem.

Condorcet’s jury theorem, terms and conditions

Marquis de Condor­cet was a Fren­ch phi­lo­so­pher and mathe­ma­ti­cian of the late 18th cen­tury who wor­ked in poli­ti­cal scien­ce. He was a con­tem­po­ra­ry and fri­end of Leo­n­hard Euler and Ben­ja­min Frank­lin, an abo­litio­nist, ear­ly defen­der of human rights and equal rights for women and peop­le of colour. Condor­cet is one of the fir­st peop­le to apply mathe­ma­tics in social scien­ces. He is best known from his work related to majority voting. Majority voting, or the majority rule is a col­lecti­ve deci­sion met­hod that selects the choi­ce that has the majority of votes. Alt­hough intui­tive­ly sim­ple, in the pre­sen­ce of more than two choi­ces, or an even num­ber of voters, it can be ope­ra­tio­na­lized by dif­fe­rent mathe­ma­ti­cal func­tions. Condorcet’s Essay on the Appli­ca­tion of Ana­ly­sis to the Pro­ba­bi­li­ty of Majority Deci­sions yiel­ded two results that we are still discus­sion and using today — the Condor­cet majority voting met­hod and the Condor­cet Jury Theo­rem.

A fil­ter bubb­le: the sta­te created for a web user when a per­so­na­lised sear­ch algo­rithm guesses which infor­ma­tion the user would like to see. (Ima­ge: Håvard Legreid/Vox Pub­li­ca)

In his Jury Theo­rem, Condor­cet iden­ti­fies how good majority voting is at dis­cove­ring the true answer to a yes-or-no ques­tion. Let us say the­re is an unk­nown fact about the world, such as for examp­le whether the weather will be nice tomor­row, or is the­re intel­li­gent life in the uni­ver­se out­si­de of the Solar Sys­tem. The truth of the­se facts exist, but it is not avai­lab­le to us, or it is not avai­lab­le to us at pre­sent, for various rea­sons. Condor­cet proved that if we were to ask suf­fi­ci­ent­ly many peop­le for an answer of a bina­ry ques­tion, to make a jud­ge­ment on yes the weather will be nice tomor­row, or no, it will not be nice, then the answer given by the majority of the peop­le has the hig­hest pro­ba­bi­li­ty of being the true answer. Furt­her, the more peop­le we ask, the hig­her the pro­ba­bi­li­ty that the majority will «find» the truth. How­ever, it is only under cer­tain con­ditions that this theo­rem holds and the­se are rat­her strict con­ditions.

Les den nors­ke ver­sjo­nen av den­ne artik­ke­len.

The fir­st, impli­cit con­dition is that we are look­ing to estab­lish the truth about one ques­tion, not seve­r­al ques­tions at the same time that might rela­te to each other. The second, rat­her obvious con­dition is that our ques­tion should be such, cal­led value jud­ge­ment in the lite­ra­tu­re (see Rabi­no­wicz 2016), that it admits a true answer exists. Not all ques­tions admit value jud­ge­ments as answers; some are ques­tions of pre­fe­ren­ce. When you and I con­si­der whether we would like to eat salad or pas­ta for din­ner, that is a ques­tion of pre­fe­ren­ce. I may pre­fer the salad, whi­le you pre­fer the pas­ta and none of us is wrong. If, how­ever, we were discus­sing which of the­se dis­hes is bet­ter for our health, then we would be making a value jud­ge­ment. If I belie­ve salad is bet­ter health-wise, and you belie­ve pas­ta is bet­ter, then one of us must be wrong. The­re is no logical incon­sist­ency in knowing that salad is bet­ter for my health but still pre­fer­ring to eat pas­ta. Sin­ce the­re is no truth when con­si­de­ring pre­fe­ren­ces, no truth can be found.

Nico­las de Condor­cet (1743–94) can still teach us impor­tant things. The por­trait hangs in Ver­sail­les palace. (pho­to: Wiki­me­dia Com­mons)

The third con­dition neces­sa­ry for the Condorcet’s theo­rem to hold is that the sources of the value jud­ge­ments must be experts. For­tu­nate­ly, the defi­nition of what con­sti­tutes an expert is rat­her loo­se in mathe­ma­tics and eco­no­my. An expert is defined to be a per­son who can guess bet­ter than a coin toss, or for­mal­ly, an expert is a per­son who has hig­her than 50% pro­ba­bi­li­ty of giving the cor­rect answer to a yes-or-no ques­tion. This means that if I am try­ing to deci­de whether to go left or right, and toss a coin one hundred times, even if 80 of those tos­ses end up say­ing to go left, going left is not gua­rante­ed to be the right deci­sion. In this case, when the reli­a­bi­li­ty of the sources is less than 50%, rat­her than taking the majority deci­sion, it is bet­ter to pick random­ly one of the sources and hear their deci­sion.

The fourth, and most strict con­dition for apply­ing Condorcet’s jury theo­rem is that of sta­ti­s­ti­cal inde­pen­den­ce among the sources. Two sources are sta­ti­s­ti­cal­ly inde­pen­dent if the bina­ry (yes or no) answer from one of the sources does not affect the pro­ba­bi­li­ty with which the second source would answer yes (or no) to the same ques­tion. Intui­tive­ly, inde­pen­den­ce requi­res that the sources do not con­fer with each other, are not sway­ed by some influ­en­ti­al opi­nion lea­der, do not have the same or simi­lar expe­ri­en­ce or tra­i­ning and do not sha­re com­mon infor­ma­tion (see Lad­ha 1995).

The con­dition of source inde­pen­den­ce is very strict, but it was satis­fied by the crowd of peop­le aligh­ting from the tra­in with you. The peop­le in this crowd were vir­tual stran­gers to each other and did not coor­di­na­te when making their deci­sion. They also have inde­pen­dent­ly learned whe­re the exit is, by taking that tra­in often, or having spot­ted an exit sign which you have not. The tra­in crowd thus effec­tive­ly behaves as inde­pen­dent sources of infor­ma­tion. They are also experts, at least most of them, knowing whe­re the exit is.

Social media and the independence of input

When it comes to your Face­bo­ok fri­ends sha­ring links to articles, the inde­pen­den­ce con­dition is by design not satis­fied.

Qui­te obvious­ly your fri­ends are peop­le who influ­en­ce each other’s opi­nions by vir­tue of being in a social network with each-other. The­se are often also peop­le that sha­re simi­lar expe­ri­en­ces and back­ground. The­se cor­re­la­tions neces­sa­ri­ly exist betwe­en their jud­ge­ments and influ­en­ce how opi­nions and views are for­med among fri­ends. This is somet­hing to which we as a socie­ty are slow­ly becoming awa­re.

The con­cept that slow­ly perm­eates our e-lives is fil­ter bubb­le.

A fil­ter bubb­le is the sta­te created for a web user when a per­so­na­lised sear­ch algo­rithm guesses which infor­ma­tion the user would like to see. As a result the user becomes sepa­rated from infor­ma­tion that does not sup­port her or his view­point. Fil­ter bubb­les can cau­se a user to be una­wa­re even of wide­spre­ad rele­vant infor­ma­tion.

Per­haps if you should not look to your fri­ends for the truth, you can look to stran­gers on the Inter­net. Whether the digi­tal crowd of stran­gers will point to the truth depends on how they dis­cover new con­tent which they can later sha­re, like or abhor.

To be sure that the majority of con­side­red jud­ge­ments point you towards the truth, you must make sure their sources do not talk to each other and take their infor­ma­tion from dif­fe­rent inde­pen­dent sources. Let us assu­me that news reported on the web truly are ori­gi­nal articles rat­her than opi­nions and sum­ma­ries of a hand­ful of news reports.

The tools that preselect your Web

The Inter­net has accom­plis­hed an unpre­ce­den­ted con­nec­ti­vity among peop­le. It has also given voi­ce and opport­u­ni­ty to be heard and seen to many. In that sea of infi­ni­te con­tent choi­ce which we can­not cog­ni­tive­ly process as a who­le, how do we find what we would most like­ly want to con­su­me? This is done by recom­men­der sys­tems.

The more your fri­ends are like you, the less their aggre­gated jud­ge­ments would be like­ly to point to the truth

A recom­men­der sys­tem is soft­ware that pre­dicts how like­ly it is that an item is the item you are look­ing for. Depen­ding on the con­text whe­re it is deploy­ed, this means how adequa­te the que­ry respon­se is to answer your ques­tion or how like­ly you are to like a new pro­duct, con­tent or ser­vice. Recom­men­der sys­tems are a very impor­tant tool for con­tent dis­covery and, among else, the rea­son why we today have highly effi­ci­ent sear­ch engi­nes.

One of the fir­st and most famous recom­men­der sys­tems is PageR­ank. PageR­ank is an algo­rithm used by the Goog­le sear­ch engi­ne to rank the que­ry respon­ses in order of rele­van­ce. It is named by Lar­ry Page, one of the foun­ders of Goog­le. PageR­ank deter­mi­nes the impor­tan­ce of a web­si­te by coun­ting how many web­sites link to it. The under­ly­ing assump­tion is that the more rele­vant a web-page is, the more web-pages link to it. The intuition behind this assump­tion actual­ly comes from aca­de­mia: the most rele­vant articles in a scien­ti­fic disci­pli­ne are those that have the most cita­tions.

PageR­ank is an algo­rithm desig­ned spec­i­fi­cal­ly for ran­king web-pages with respect to rele­van­ce. For pre­dic­ting how much you would like an item unk­nown to you, typi­cal­ly the met­hods of col­la­bo­ra­ti­ve fil­te­ring are used. To deci­de whether you will be inte­rested in an item, two types of infor­ma­tion are taken into con­si­de­ra­tion. The fir­st is, how much did you like simi­lar items, and the second is how much custo­mers most simi­lar to you liked the item in ques­tion. The assump­tion here is that you are most like­ly to like what your fri­ends and peers like.

Recommender systems and majority voting

What is per­haps not so obvious is that recom­men­der sys­tems, in par­ti­cu­lar PageR­ank and col­la­bo­ra­ti­ve fil­te­ring have at its core majority-voting. Per­haps we over­look this becau­se we think of voting as the once-in-four-years serious affair when we go to the polls and elect govern­ment. On the Inter­net we “vote” con­stant­ly by con­stant­ly choo­s­ing one from a selection of alter­na­ti­ves: the most promi­sing link to give us what we are look­ing for, the cute­st video, the most intri­gu­ing news article. But that top of the list of choi­ces that are offe­red to us and we actual­ly choo­se from is also deter­mined by the choi­ces of the stran­gers that are most like us on the Inter­net. The inde­pen­dent “vote” is not a rea­li­ty on-line. But is this a pro­blem for fin­ding the truth?

Illust­ra­tion: Moshanin/Wikimedia Com­monscba

Examp­le of the col­la­bo­ra­ti­ve fil­te­ring process.

The inde­pen­den­ce con­dition of Condorcet’s jury theo­rem is very strong and dif­fi­cult to satis­fy in prac­tice even off-line. Krishna Lad­ha, among others, explo­red how much cor­re­la­tion the­re can exist betwe­en voters before Condorcet’s theo­rem stops hol­ding. His con­clu­sion is that the effec­ti­ve­ness of majority-rule voting decreases as the cor­re­la­tion betwe­en voters increa­se. This means that the more your fri­ends are like you, the less their aggre­gated jud­ge­ments would be like­ly to point to the truth. Lad­ha also shows that the pro­ba­bi­li­ty that the majority of cor­re­lated votes is the truth depends on the num­ber of votes. Lar­ge groups are rela­tive­ly robu­st and tole­ra­te hig­her cor­re­la­tion coef­fi­ci­ent averages. This means that the more his­tory, expe­ri­en­ces and infor­ma­tion sources you sha­re with your fri­ends, the lar­ger group of fri­ends you need in order for their majority sup­ported jud­ge­ment to point towards the truth. The majority in a small groups of very clo­se fri­ends is pro­ba­bi­li­s­ti­cal­ly unlike­ly to find the true answer to a bina­ry ques­tion.

The­re are limits to the the truth-tra­ck­ing powers of the majority-vote

Luck­i­ly the­re are many stran­gers on the Inter­net. This is the obser­va­tion that Mas­ter­ton, Ols­son and Ange­re make. They saw the sha­dows of Condorcet’s Jury Theo­rem in PageR­ank: the pro­ba­bi­li­ty that a web-page is rele­vant increases with the num­ber of «votes» it rece­i­ves from other web-pages. Mas­ter­ton, Ols­son and Ange­re explo­red how good is PageR­ank at fin­ding the true answer to a que­ry? Their answer: pretty good. More precise­ly, they show that PageR­ank has episte­mo­lo­gical jus­ti­fi­ca­tion for link-based ran­king on the web. Of cour­se, their empi­ri­cal ana­ly­sis makes cer­tain assump­tions on the inde­pen­den­ce of web-pages.

How col­la­bo­ra­ti­ve fil­te­ring con­tri­butes towards trans­for­ming the choi­ces of the majority into the truth is an unex­plo­red ques­tion, but one we must take serious­ly. Majority-voting is a very intui­ti­ve and sim­ple way to imple­ment democracy and it is at the core of many election proce­du­res in the world. If a group pre­fe­ren­ce is needed, the pre­fe­ren­ce of the majority ensu­res that as few peop­le as pos­sib­le are unhappy with the group’s choi­ce. Condor­cet gave furt­her legi­ti­macy to majority-voting by showing that it is also good at poin­ting to the truth, somet­hing that we are intui­tive­ly awa­re of when fin­ding ours­elves on tra­in plat­forms in new cities. But the­re are limits to the the truth-tra­ck­ing powers of the majority-vote and this is also somet­hing we must be awa­re of.

On-line, as in real life, our instincts are to trust the views of the many, but are the many views we see truly dif­fe­rent? Per­haps they are just care­fully selected by an algo­rithm to reflect what the algo­rithm cal­cu­lated that we want to see. Knowing this, per­haps we can take a hel­ping hint from the jury theo­rems, the work of Condor­cet and others that follow­ed, and do bet­ter.

Literature

Wło­dek Rabi­no­wicz (2016). Aggre­ga­tion of Value Judgments Dif­fe­rs from Aggre­ga­tion of Pre­fe­ren­ces. 10.1163/9789004312654_003

Krishna K. Lad­ha (1995). Infor­ma­tion poo­ling through majority-rule voting: Condorcet’s jury theo­rem with cor­re­lated votes. Doi: http://dx.doi.org/10.1016/0167–2681(94)00068-P

Geor­ge Mas­ter­ton, Erik J. Ols­son and Staf­fan Ange­re (2016). Lin­king as voting: how the Condor­cet jury theo­rem in poli­ti­cal scien­ce is rele­vant to webo­met­rics. Doi:10.1007/s11192-016‑1837-1

Franz Diet­rich and Kai Spie­ker­mann (2016). Jury theo­rems. Wor­king paper.

TEMA

D

emocrac
y

1 ARTIKLER FRA VOX PUBLICA

FLERE KILDER - FAKTA - KONTEKST

INGEN KOMMENTARER

KOMMENTÉR

Skriv en kommentar

Bidra til god debatt - skriv under fullt navn. Se våre kommentarregler.

Abonner på kommentarer
til toppen