Jury Size and the Hung-Jury Paradox

In Williams v. Florida (399 U.S. 78 [1970]), the U.S. Supreme Court decided a case addressing the constitutionality of juries composed of fewer than 12 jurors, ruling that smaller juries are not inconsistent with the Sixth Amendment. In an effort to speed deliberation and reduce the rate of mistrials, 11 states have subsequently adopted juries of fewer than 12 in state felony trials, and 40 states have diminished their jury sizes for state misdemeanor trials. Curiously, however, contrary to the predictions of probability theory and common sense, these reductions in jury sizes have failed to deliver the expected reduction in mistrial rates. In this paper, we offer two interrelated explanations for this fact: informational cascades and the heterogeneity of jurors.

The 12-person jury is a timeworn institution; a staple of novels, stage, and cinema; and well established in the common-law tradition, the historical tendrils of which extend from the medieval courts of England. From as far back as the twelfth century, trials were decided by juries of 12, a number that by the fourteenth century had become so firmly ce-BARBARA LUPPI is Assistant Professor in Political Economics, University of Modena and Reggio Emilia, and Visiting Professor of Law, University of St. Thomas School of Law.
FRANCESCO PARISI is Oppenheimer Wolff and Donnelly Professor of Law, University of Minnesota Law School, and Professor of Economics, University of Bologna. An early incarnation of this idea developed in 1999 during conversations with Cass Sunstein after a presentation on group polarization. The idea remained dormant for more than a decade. We would like to thank our colleague and friend Davide Ferrari for having worked through the formulation of this problem. His generous contributions to the model would warrant full coauthorship. We would also like to thank Daniel Pi for his valuable research and editorial assistance. Gillian Hadfield and Jennifer Reinganum and other participants at the 2011 annual conference of the American Law and Economics Association provided valuable comments and suggestions on an early draft of this paper. We would also like to thank the Max Schmidheiny Foundation and Anne van Aaken for providing Francesco Parisi with the opportunity to complete this project as a visiting professor in the Law and Economics program at the University of St. Gallen. mented in the legal tradition that it subsequently acquired near mystical significance. The 12-person jury later made its way to the American colonies and persisted after the Revolution as a constitutional right in the United States.
However, after more than 6 centuries without challenge, jury trials have recently undergone several substantial transformations, including notably a reevaluation of jury size. Policy makers have grown increasingly concerned with the cost of hung juries, 1 which has led to the lowering of minimum jury sizes in a majority of states. The intuition motivating the reduction in jury size is that smaller juries are less likely to contain outliers who might obstruct convergence around a unanimous verdict.
In this paper, we discover the counterintuitive result that in the presence of informational cascades, reductions in jury size may increase the probability of hung juries. In Section 1, we explore the legal background and review the existing law and economics literature on jury deliberation. In Section 2, we consider how accuracy and decisiveness in jury deliberation change when the process is subject to informational cascades. In Section 3, we analyze the results of our model, describe situations in which unanimous verdicts are more easily reached, and show that under certain conditions unanimity may be more easily reached by large groups than by small groups. We refer to this counterintuitive result as the "hung-jury paradox." We also consider the effects of heterogeneity and the interaction between heterogeneity and jury size on jury deliberation. We conclude in Section 4 with some policy recommendations and suggestions for future research.

OF THE LITERATURE
The framers of the U.S. Constitution viewed the right to a jury trial as a fundamental pillar of liberty-a form of protection against the threat 1. When a jury cannot reach consensus on a given verdict after an extended period of deliberation, the jury is said to be hung or deadlocked. In most jurisdictions, a hung jury results in a mistrial, after which the case may be retried by the prosecution, thereby resulting in duplicative and wasteful efforts by both parties. Rule 31 of the Federal Rules of Criminal Procedure states: "If the jury cannot agree on a verdict on one or more counts, the court may declare a mistrial on those counts. A hung jury does not imply either the defendant's guilt or innocence. The government may retry any defendant on any count on which the jury could not agree." In practice, only a small percentage of cases that end in mistrial are retried. Thus, from a defendant's perspective, a mistrial is often regarded as the practical equivalent of an acquittal. For an extensive and general discussion of American juries, see Kalven and Zeisel (1966). of judicial tyranny. Defendants in criminal trials were therefore guaranteed the right to a trial by jury under the Sixth Amendment of the Constitution. In the leading 1898 case Thompson v. Utah (170 U.S. 343, 349 [1898]) the U.S. Supreme Court construed the Sixth Amendment to require that in all criminal cases "a jury [be] comprised of 12 persons, neither more nor less." 2 However, in Williams v. Florida (399 U.S. 78 [1970]), the Court reconsidered its earlier stance, overturning Thompson. The defendant in Williams argued that the trial court's rejection of a motion to impanel a 12-person jury was unconstitutional under the Sixth Amendment (the trial was decided by a jury of six), 3 which guarantees the criminally accused the right to "an impartial jury of the State and district wherein the crime shall have been committed." Williams argued that implicit in the right was that the jury comprise 12 jurors. The Court rejected Williams's argument, reasoning that the language and legislative history of the Constitution contained no explicit reference to jury size. Instead, the Court held that the test of the constitutionality of jury size was whether a reduction would undermine the jury's essential function: grounding the judicial process in the common-sense judgment of the community. While recognizing the historical imprimatur of the 12-person-jury requirement, the Court repudiated the mystical connotations associated with the number 12, declaring the traditional jury size a "historical accident unrelated to the great purposes which gave rise to the jury in the first place" (Williams v. Florida, 399 U.S. 89). In particular, the ruling in Williams v. Florida upheld the validity of the six-person jury's verdict convicting defendant Williams, establishing that even a jury of six could satisfy the requirements of the Sixth Amendment. This ruling gave state courts the freedom to vary the size of juries in felony and misdemeanor criminal cases. The decision triggered an uproar, but the principle that the Constitution does not require 12-member juries in all cases was upheld in subsequent decisions and remains good law. 4 In Ballew v. Georgia (435 U.S. 223 [1978]), the Court set a lower limit on jury size, perhaps anticipating the consequence of Williams 2. Miller (1998) provides a thorough historical analysis of the rise and fall of the 12person jury in the United States.
3. The High Court of Australia rejected an analogous claim, that the constitutional right to a trial by jury (Australian Constitution, c. 3, sec. 80) required a jury of 12, in Brownlee v. The Queen (207 CLR 278 [2001]).
4. In a series of cases beginning in 1970, the Court reaffirmed the principle in Williams that the federal Constitution does not require 12-member juries in all cases. taken to the extreme, holding that any number less than six would be unconstitutional. The Court reasoned that such a jury would be too small to be representative of the relevant community. 5 Although federal courts continue to use juries composed of 12 members voting unanimously (Fed. R. Crim. Proc. 23[b]), 6 in criminal cases the vast majority of trials are heard in state courts, which now often use juries of fewer than 12.
A total of 11 states currently use juries composed of fewer than 12 jurors in felony and misdemeanor trials (see Table 1; Bureau of Justice Statistics 2004). An additional 29 states allow juries of fewer than 12 in misdemeanor cases (see Table A1; Bureau of Justice Statistics 2004). 7 This brings the number of states that allow smaller juries in some criminal cases to 40.
Unsurprisingly, adopting smaller jury sizes alters the dynamics of jury deliberation. Scholars and policy makers anticipated that the use of smaller juries would reduce the cost of trials and decrease mistrial rates. As we discuss in detail in Section 3.1, according to probability theory, the probability of a trial resulting in a hung jury should decrease from the historic average of 5.5 percent for a jury of 12 to an expected average of 2.1 percent. However, statistics indicate that overall mistrial rates have remained fairly steady. A comparison of the rates reported by Kalven and Zeisel (1966) and Hannaford-Agor et al. (2002) shows no decrease-indeed, there was actually an increase from 5.5 to 6.2 percent-in hung-jury rates in state criminal courts, comparing the years 1955-58 (when 12-person juries were uniformly adopted) and 1980-98 (after most states had adopted smaller juries). A handful of empirical 5. Soon afterward, in Burch v. Louisiana (441 U.S. 130 [1979]), the Court further elaborated that states cannot allow six-person juries to deliver nonunanimous verdicts in criminal cases. See Neilson and Winter (2005) for an analysis of nonunanimity and verdict accuracy.
6. The requirement of unanimity for verdicts leading to criminal convictions has generally been viewed as an important feature of criminal trials. Unanimity is meant to preserve the public's confidence in the criminal justice system by imposing the highest standard of persuasion for conviction. Rule 31 of the Federal Rules of Criminal Procedure requires verdicts to be unanimous in criminal cases. The U.S. Supreme Court ruled that nonunanimous verdicts do not violate the Constitution (Duncan v. Louisiana, 391 U.S. 145 [1968];Johnson v. Louisiana, 406 U.S. 356 [1972]; Apodaca v. Oregon, 406 U.S. 404 [1972]). This finding gave states the flexibility to allow nonunanimous verdicts. Only two states, Louisiana and Oregon, have exercised this new possibility and currently allow nonunanimous verdicts in felony cases, while one state (Oklahoma) allows them in misdemeanor cases. Nonunanimous verdicts are not allowed when smaller juries are used, as per Burch v. Louisiana.
7. Still more states allow the use of juries with fewer than 12 members in civil trials. studies have delved somewhat deeper in attempting to evaluate how jury size affects trial results. Most also conclude that there are no detectable differences between six-person and 12-person juries with respect to mistrial rates. Early studies include Institute of Judicial Administration (1972), Bermant and Coppock (1973), Kessler (1973), Mills (1973), Zeisel and Diamond (1974), Davis et al. (1975), Lempert (1975), Valenti and Downing (1975), Buckhout et al. (1977), and Padawer-Singer, Singer, and Singer (1977). 8 Black (1979Black ( , p. 1114 concluded in his review 8. For a complete review of the literature, see also Hastie, Penrod, and Pennington of the empirical literature that "there are indeed no verdict differences between six-and twelve-member juries." Roper (1980) identifies a slightly higher incidence of hung juries in 12-person mock juries (than in six-person mock juries), but the result is not statistically significant. 9 More recent studies, including Kerr and MacCoun (1985), Saks and Marti (1997), Hannaford-Agor et al. (2002), and Eisenberg et al. (2005), also fail to uncover any significant differences in hung-jury rates, when comparing six-person and 12-person juries. 10 These findings beg the question-the hung-jury paradox-why the widespread reductions in jury size from 12 to six failed to reduce mistrial rates, contrary to the predictions of probability theory.
The empirical findings of Davis et al. (1975), Grofman (1976), Tanford and Penrod (1983), and Myers and Lamm (1976) hint at a possible explanation: larger juries sometimes exacerbate the effects of jury polarization during deliberation. We build on that intuition and explore the extent to which jury polarization can explain the hung-jury paradox. We find that polarization through informational cascades may offset the effects of large jury size and reduce the gap between the hung-jury rates of small and large juries.
To aid our effort, we draw from (and build on) the existing political economy literature to resolve the discrepancy between the theoretical predictions and the empirical observations of the jury size effects. According to the Condorcet ([1785] 1976) theorem, if individual jurors have a greater than 50 percent chance of being correct, increasing the number of voters increases the probability that the majority decision (1983), who similarly report no statistically significant difference in hung-jury rates between six-person and 12-person juries under a unanimity rule. In addition, empirical evidence provides support for the existence of effects of jury size on participation rates, length of deliberations, and recall of evidence. Jury representativeness may worsen significantly because a reduction of jury size has an adverse effect on jury representativeness (see Lempert 1975).
9. The only empirical evidence revealing some difference between different-sized juries looks at jury verdicts under a setting that allows for nonunanimous verdicts. Zeisel (1971) reports a hung-jury rate of 2.4 percent in a sample of 290 six-person criminal juries, slightly lower than the 5 percent rate observed in his national sample of 12-person juries. Still with reference to nonunanimous verdicts, Padawer-Singer, Singer, and Singer (1977) report a hung-jury rate of 8.7 percent for six-person mock juries and of 21.7 percent for 12-person groups. These findings have little practical relevance after the decision of Burch v. Louisiana, which prohibits convictions with nonunanimous verdicts by juries composed of fewer than 12 jurors.
10. The data reported by Hannaford-Agor et al. (2002, p. 25) show no difference in hung-jury rates between the three courts belonging to jurisdictions adopting six-person juries and the 27 jurisdictions adopting 12-person juries. will be correct. One aim of our investigation is to develop a simple model to analyze how the presence of herding behavior may affect the well-known results of Condorcet's jury theorem.

THE MODEL
Consider a jury of n individuals making a binary choice, , X {0, 1} i , such that 1 represents the correct decision and 0 is the i p 1, . . . , n incorrect alternative, and let . Let , , denote the probability that the ith juror decides correctly, i p 1, . . . , n and let denote the probability that the jury unani-P(n) p P(S p n) n mously reaches the correct decision. In addition, let us denote the probability of a correct decision for the ith juror as .
The possible presence of informational cascades violates the independence assumption, and therefore Condorcet's theorem might not hold even in cases in which the jury choice involves only two alternatives. This is because Condorcet's theorem rests crucially on the assumption that individual jurors' erroneous decisions do not affect the decisions of other members of the jury. With independence, each juror will always express the same judgment, regardless of the opinions expressed by other jurors and regardless of the order in which those opinions are expressed. 11 By contrast, cascade behavior occurs in situations where people act sequentially and allow other people's choices to influence their decision making. 12 The putative purpose of jury deliberation is to allow jurors to exchange views and reconsider their prior beliefs in light of the perspectives expressed by others. It is difficult to imagine any such deliberation process free from sequential decision making; jurors cannot talk simultaneously, and the opinion of one juror will necessarily have to be expressed before or after the opinion of another juror. Thus, the necessary temporal structure of jury deliberation may undermine independent decision making. According to Sunstein (2004, p. 17), cascade behavior principally takes two forms: "[G]roup members tend to become far more confident of their judgments after they speak with one another. A significant effect of group interactions is a greater sense that one postdeliberation conclusion is correct-whether or not it actually is. Corroboration by others increases confidence in one's judgments. It follows 11. This hypothesis is usually called independence in probability of the individual judgments (or independence of the judgment errors).
12. Consider, for example, the choice of a restaurant made on the basis of the restaurant's popularity or a voting decision based on opinion polls. that members of deliberating groups will usually converge on a position on which members have a great deal of confidence." We model informational cascades in a simplified way, capturing the spirit of Banerjee (1992) and Sunstein (2005). In order to model the described dynamics of informational cascades, we introduce the parameter a i, which measures juror i's confidence in his own beliefs. A higher value of a i means that juror i regards his own opinion on the case as more accurate than one expressed by the other members in the jury. We assume that the voters express judgments sequentially according to the following process: where , , and . 0 ≤ a ≤ 1 i 1 1, . . . , n a p 1 i 1 Juror will express a voting preference as a function of his beliefs, i ϩ 1 weighted by self-confidence and the mean of the votes of the i jurors a iϩ1 who have expressed their voting preferences prior to juror in the i ϩ 1 deliberation process, weighted by ( ). Cascade behavior is cap-1 Ϫ a iϩ1 tured by the evolution of the sequence of the a i parameter in the deliberation process for each juror i in the jury composed by n members. In the presence of cascade behavior, jurors tend to assign a greater weight to the observations of the other jurors' voting preferences, measured by the average jury vote given the number of jurors who have previously expressed their decisions. Analytically, this dynamic corresponds to the assumption of a decreasing sequence of a i , . i p 1, . . . , n The deliberation process described by the equation models the dynamics of herding behavior and group polarization analyzed in Sunstein and Hastie (2008). The self-confidence variable a i captures the convincibility and persuasiveness of jurors. With high values of a i , juror i is harder to convince by others. Jurors with low values of a i will be easier to persuade. The assumption of a decreasing sequence of a i values captures the idea that people tend to become less confident in their own prior belief and more confident in the emerging group consensus (independent of whether the nascent consensus is correct) as the deliberation progresses and the opinions of other jurors become known. On the other side, the role of the average vote expressed by the jury implies a lower variance of group deliberation: jurors tend to converge to the same vote as the deliberation process unfolds. 13 Informational cascades play a role in jury decisions because the use of others' voting preferences makes each person's decision less responsive to her own information and hence less informative to others. The contamination dynamics of the jury deliberation process captures the essence of the problem discussed by Sunstein and Hastie (2008, p. 4), according to which the convergence of the group "is not disturbing if that position is also likely to be correct-but if it is not, then many group members will end up sharing a view in which they firmly believe, but which turns out to be wrong (a most unfortunate and sometimes quite destructive situation)." It should be noted that informational cascades are a special case of dependent or correlated decisions in jury decision making. Ladha (1992) extends the Condorcet jury theorem to allow for correlated votes. Ladha (1992) points out that under a majority voting rule, the Condorcet jury theorem holds when the jury is sufficiently large and/or the correlation between votes is not excessively high, while it fails for small juries and highly correlated voting. The maximum level of correlation that can be sustained without undermining the results of the Condorcet jury theorem increases with jury size. However, whereas Ladha (1992) focuses on majority voting, our analysis considers the effect of correlated voting under a unanimity rule. 14

MAIN RESULTS
Here we first consider the effects of jury size and the effects of informational cascades on jury deliberation. We then consider the interaction between heterogeneity and jury size in the presence of informational cascades.
13. Note that the deliberation process modeled here does not include the discussion by the jury prior to the deliberation. The model is compatible with jurors discussing their opinions openly in one or subsequent rounds of exchanged opinions. Here we model only the final round, in which jurors express sequentially their votes during their voting procedure to reach unanimity.
14. Other authors have investigated dependence patterns in jury (or organizational) decision making from different perspectives. Nitzan and Paroush (1984) develop an analysis of optimal jury decision making in the presence of correlated votes and identify the optimal decision rule as that which maximizes the probability of reaching the correct verdict. Ben-Yashar and Nitzan (2001) characterize the relationship between the optimal decision rule and jury size in the presence of correlated votes.
In theory, mistrials should arise more frequently with larger juries than with smaller juries. However, in the presence of informational cascades, the effect of jury size on the probability of hung juries may be reduced or even reversed. This implies that unanimous deliberations may just as easily be achieved in larger groups as in smaller ones. This could explain the paradoxical empirical evidence, that jurisdictions allowing smaller juries have not seen the anticipated reduction in the frequency of mistrials.
As per the Condorcet jury theorem, the probability of achieving a unanimous verdict decreases with jury size, when jurors' decisions are independent. Consequently, the probability of a hung jury increases with jury size. 15 To illustrate, consider a jury composed of n jurors, each of whom deliberates accurately with an error rate of . When jurors 1 Ϫ p decide the case independently, there is no possibility of an informational cascade. Hence, the probability of a unanimous decision, equal to n p ϩ , is decreasing in jury size, since . The probability of mistrial n (1 Ϫ p) p ! 1 is increasing in n at a rate of . 16 Empirical studies, n n p (1 Ϫ p) ϩ (1 Ϫ p) p including among others Kalven and Zeisel (1966) and Hannaford-Agor et al. (2002), tell us that the probability of a hung jury is estimated to range from 5.5 to 6.2 percent for a jury of 12. 17 If we use our simple formula, this variation implies an error rate ranging from .45 to .55 percent (that is, each juror deliberates correctly in the range of 99.45 to 99.55 percent), assuming independence of juror decision making. The reduction to sixperson juries should therefore result in hung juries with a probability ranging from 2.1 to 2.8 percent. This theoretical prediction finds no support in the empirical literature. We now consider whether informational cascades allow us to generate predictions that more closely approximate the empirical observations. 15. Note that this is a restatement of the Condorcet jury theorem under unanimity. See Ladha (1992) for a formal analysis.
16. For illustrative purposes and with no loss of generality, the rate of change of the mistrial probability is calculated as the first derivative of the probability of a mistrial, under the simplifying assumption that n is a continuous variable and approximating ln(p) with p.
17. Very few empirical studies assess the size of hung-jury rates. In the most classic study on American juries, Kalven and Zeisel (1966) estimate a hung-jury rate of 5.5 percent in a sample of 3,500 criminal cases. In a more recent study, Hannaford-Agor et al. (2002) analyze hung-jury rates in federal versus state courts. While the rates in federal courts were stable in the range of 2-3 percent, the rates in state courts were on average around 6.2 percent in the period 1980-98 and were characterized by high volatility. See also Eisenberg et al. (2005). Proposition 3.1: Hung-Jury Paradox. The probability of a unanimous verdict may increase with jury size in the presence of informational cascades.
Proof. See the Appendix.
Cascade behavior renders jurors' choices correlated and makes it easier for a jury to converge to unanimity. With informational cascades, jurors are more prone to agree with the observed choice adopted by the other jurors, who express their voting preferences earlier in the deliberation. In reaching a decision, values introduce a drift that helps a ( 0 i the convergence of the decisions for all jurors in the direction of unanimity. 18 A change in jury size produces two countervailing effects. The first effect is the one identified by the Condorcet jury theorem under unanimity, according to which increasing jury size reduces the probability of a unanimous decision. The second effect relates to the magnitude of the cascade effect, which grows larger in n. Under certain conditions, the latter effect dominates the former, such that having a smaller group size results in hung juries more frequently than having a larger group. This explains the hung-jury paradox. In the presence of an informational cascade, unanimity can be more easily reached in a larger jury.
The presence of informational cascades may, therefore, explain why the adoption of smaller juries has not substantially reduced mistrial rates. The point we make is intuitively plausible: as jury size increases, the potential for divergent beliefs to result in a hung jury goes up, but at the same time, if judgments are expressed sequentially, jurors can adjust their judgments in light of what others have expressed and an informational cascade may be triggered. As a result, an increase in jury size increases the likelihood that by the time a dissenting juror is called to express her judgments, her voting preference will have changed in response to the judgments previously expressed by other jurors. Hence, the probability of a unanimous deliberation may increase with jury size.
Unanimity is more likely to be achieved in larger juries if jurors are sufficiently open to the opinions of the other jurors. In other words, the majority of jurors should be able to persuade dissenting minorities of jurors and contaminate beliefs across the group to reach a common judgment. Analytically, this requires that the self-confidence parameter, 18. Note that such a result is symmetric: it also applies to the probability of convergence toward the wrong decision; that is, . P(X p 0, . . . , X p 0) p 1 Figure 1. Unanimity with informational cascades a i , which measures the weight assigned by each juror to his or her own opinion, be decreasing as the number of other jurors expressing an opinion grows. 19 As a consequence of proposition 3.1, the effect of jury size on the probability of a hung jury is reduced and possibly even reversed by informational cascades. These points are illustrated in Figures 1 and  2. Figure 1 shows how the probability of reaching a unanimous verdict 19. Analytically, the persuasiveness (or openness) of jurors is measured as the speed of the informational cascade, that is, the speed at which a i values decrease during the decisionmaking process. Proposition 3.1 requires a condition of asymptotic dominance on the sequence of a i values to generate the hung-jury paradox. Asymptotic dominance can be interpreted as the speed at which the sequence of a i values decreases during the deliberation process and can be used as a proxy for the convincibility of jurors. Analytically, P(n) approaches one for sufficiently large n values when values decrease such that a i . See equation (A5) for the sufficient condition for the hung-   Figure 1 show the different dynamics of unanimous decision making for four speeds of informational cascade. Trivially, unanimity is certain in the case of a single juror. As we can see in Figure 1, the probability of unanimity initially decreases for sufficiently small juries (for any speed of informational cascade and ). In the initial range, when the size of the jury grows above 1 but n 1 1 remains sufficiently small, the jury size effect dominates, and unanimity becomes more difficult to achieve as the number of jurors increases. However, as n continues to increase, the probability of unanimity may start to increase, depending on the speed of informational cascades. As predicted by proposition 3.1, in the upper two curves of Figure 1 (representing higher speeds of informational cascade), as the jury grows 20. We illustrate the probability of a unanimous verdict as a function of jury size n in Figure 1. We consider a jury size of up to for illustrative purposes only, that is, to n p 20 show how the probability of unanimity changes with n because of the cascade effect versus the jury size effect. larger, we start observing the cascade effect, which mitigates and ultimately dominates the jury size effect. With jury polarization, the probability of unanimity might, therefore, be at its lowest with small juries, as is shown in Figure 1. The U-shaped plotting of the probability of a unanimous decision in Figure 1 captures the nonlinear trade-offs between jury size and cascade decision making. As the speed of informational cascade increases, the convergence toward unanimity is accelerated because of the process of contamination.
It is worth noting that proposition 3.1 bears on whether the contamination process leads jurors to converge toward the right or the wrong decision. The contamination process introduces drift, which may theoretically cause even large juries to converge around a wrong decision. Figure 2 disaggregates the probability of unanimity into two components: the probability of reaching a unanimous correct verdict and the probability of reaching a unanimous wrong verdict. Figure 2 shows the dynamics of these two components as a function of jury size, illustrating that the probability of a unanimous correct verdict dominates when the value of n is sufficiently large. This is a reassuring result-cascade behavior is much more likely to foster consensus toward a correct verdict than toward a wrong verdict. These results provide a tentative answer to the concerns raised by Sunstein (2000) regarding the possible polluting effect of cascade decision making in group deliberations. Informational cascades in sufficiently large juries foster consensus toward correct unanimous verdicts. The wedge between the upper and lower plotted lines in Figure 2 indicates that even for moderately sized juries, correct cascades far outnumber wrong cascades. Obviously, the extent to which even a minimal increase in the probability of wrongful convictions may be tolerated in the interest of procedural economy and sound convictions is a question that falls beyond the scope of the present study. 21 Our results, however, suggest that this delicate trade-off, although relevant qualitatively, is quantitatively scaled in favor of unanimous correct decision making, as long as the standard assumptions of (that is, p 1 .5 jurors are more likely to decide correctly than mistakenly) is satisfied. Hence, cascade decision making can reduce the cost of mistrials while imposing little or no increase in the rate of wrongful convictions.
Our analysis has shown that informational cascades can increase the probability of reaching a correct unanimous decision while decreasing 21. For recent literature on the Blackstonian ratio problems, see Rizzolli and Saraceno (2013). the probability of a mistrial. This result seems to run against the established view in the literature. Nitzan and Paroush (1984) show that interdependent decision making (of which informational cascades are a special case) reduces the probability of a correct jury verdict. When the optimal voting rule is adopted, the institutional design of group decision making should, therefore, favor independence among decision makers. However, Nitzan and Paroush (1984) rely on the premise that the optimal voting rule is adopted. 22 Unanimity is not the optimal decision rule, 23 and our result is therefore consistent with the results of Nitzan and Paroush (1984) inasmuch as it shows that when the decision rule is not optimal, interdependence is not necessarily harmful. Put more simply, informational cascades can be desirable under a unanimity rule, given the fact that unanimity is not the optimal decision rule. 24

Hung-Jury Paradox: Jury Size and Heterogeneity
In Ballew v. Georgia, the U.S. Supreme Court emphasized the importance of juries being both heterogeneous and representative of their communities. In particular, the Court ruled that heterogeneity is of utmost concern when jurisdictions opt to employ smaller juries. This decision reflects the conventional wisdom expressed in Sunstein (2002, p. 4) that "homogeneity can be breeding grounds for unjustified extremism, even fanaticism. To work well, deliberating groups should be appropriately heterogeneous and should contain a plurality of articulate people with reasonable views-an observation with implications for the design of regulatory commissions, legislative committees, White House working groups, and even multimember courts." In jury deliberation, heterogeneity matters to the extent that jurors will have differing views on the case. In the determination of criminal liability, the differing views of jurors may entail divergent probabilities 22. Nitzan and Paroush (1984) characterize the optimal voting rule (defined as the one that maximizes the probability of the correct collective decision) as when the jury is faced with a dichotomous choice in the presence of interdependent juror decisions.
23. The optimal decision rule is a weighted majority rule in which weights are the logarithms of individual odds ratios of identifying the correct alternative. Note that the unanimity rule is suboptimal in the presence of strategic voting (see Feddersen and Pesendorfer 1998).
24. The optimality of the decision rule (either unanimity or a qualified majority) is highly sensitive to small changes in jury size, as Ben-Yashar and Nitzan (2001) point out. of achieving a correct verdict, as well as divergent probabilities of achieving a unanimous verdict. 25 We show that it is not always the case that a heterogeneous jury is more likely to reach a correct unanimous decision than a same-sized homogeneous jury. The reason, we discover, is that heterogeneity slows the process of contamination through the cascade. On reflection, we find that this result has much intuitive appeal. The more divergent jurors are in their initial impressions of the evidence, the less likely we suspect they will be to adopt herding behavior during deliberation. Analytically, we model a heterogeneous jury of n jurors as consisting of k subgroups of individuals, of sizes , such that n , . . . , n nϩ 1 k 1 . Within each subgroup i, the jurors have equal abilities (that … ϩ n p n k is, probabilities of reaching a correct decision, p i ) and equal selfconfidence (measured by the same a i ). Hence, when jurors are called to deliberate, each juror's confidence is reinforced by the agreement of his or her subgroup (assuming ) and may be challenged by jurors in n 1 1 i different subgroups (assuming ). In the deliberation process, there k 1 1 is no mitigation of juror self-confidence until a member of a different subgroup presents a position. 26 Armed with this qualitatively precise characterization of heterogeneity, we are now able to deepen our understanding of the effects of heterogeneity on jury deliberation and to give an analytical framing of the Court's holding in Ballew v. Georgia. 27 Proposition 3.2: Jury Size and Heterogeneity. The effect of heterogeneity on dissenting jurors can be more easily mitigated by informational cascades in larger juries than in smaller juries. 28 Proof. See the Appendix.
In heterogenous juries, the contamination process will slow the more diverse that subgroups are from one another-persuasion efforts may end in gridlock. More gridlock entails a higher (possibly excessive) rate of mistrials and a lower probability of reaching a correct unanimous 25. The discussion of the model therefore uses interchangeably the terms "abilities" and "probabilities of coming to a correct decision" as proxies for the different views held by the heterogeneous jurors.
26. Note that a homogeneous jury is simply a heterogeneous jury possessing only one subgroup in which all the members have the same ability, p*, and self-confidence, a*.
27. Our results merely offer a positive analysis of the Court's holding in Ballew v. Georgia. Assuredly, there is no evidence that the Court conducted the analysis that we present here in reaching its decision.
28. The threshold at which heterogeneity ceases to produce correct verdicts is a*(1 Ϫ . See the Appendix for an explanation and a proof. p*) 1 1 Ϫ c exp {ϪKL} verdict. When we consider the relationship between jury size (n) and the number of subgroups (k), it becomes apparent that when jurors are open to each other's ideas, a higher degree of jury heterogeneity is sustainable and, indeed, desirable. As informational cascades weaken in response to a reduction in jury size, heightened levels of heterogeneity are more likely to result in deadlock and lower rates of correct unanimous verdicts. Dissenting jurors may balkanize if the number of other jurors is not large enough to trigger a change in the views of the dissenting subgroup. The lesson to be learned here is that diversity is desirable in juries to promote sound and balanced deliberation, if differences of opinion provoke an exchange of ideas and debate, leading to a convergence of views. The value of heterogeneity hinges on the presence of persuasion, which allows the members of a diverse group to reach a consensus despite their initial differences. However, absent amenability to persuasion through large-group informational cascades, heterogeneity could actually render consensus among jurors more difficult to reach in small juries. 29

CONCLUSIONS
It is difficult to imagine jury deliberations free from the problem of sequential decision making. Jurors form voting preferences on the basis of both their private information and the preferences expressed by other jurors. When jurors have little information on a subject, they instinctively depend on the judgments of others. Thus, the process of jury deliberation may engender consensus, but at the cost of potentially amplifying the errors of some jurors, thereby leading to incorrect judgments.
In this paper, we considered the aggregation of information in a jury in the presence of informational cascades. We examined the validity of Condorcet's jury theorem in situations in which informational cascades are likely to occur and discovered that in the presence of informational cascades, unanimity can be reached just as easily in a jury of 12 as in a jury of six. These results shed light on the empirical literature, which thus far has failed to find any correlation between jury size and mistrial rates. By recognizing the countervailing effects of informational cascades and heterogeneity, our more nuanced model improves on the simple probabilistic prediction and thus explains the empirical data. 29. Hannaford-Agor et al. (2002) identify higher hung-jury rates in those jurisdictions characterized by higher heterogeneity. For example, the authors estimate hung-jury rates of around 9.5 percent in Washington, D.C., and the New York area but around 15 percent in Los Angeles.
Our results provide a natural stepping stone toward a normative analysis of optimal jury size and optimal decision rules in the presence of informational cascades. In this paper, we focused our attention on the relationship between the probability of a hung jury and jury size, without investigating the possibility of simultaneously changing the decision rule. The specific focus of our paper was dictated by the principles set by the U.S. Supreme Court in Burch v. Louisiana (441 U.S. 130 [1979]), which holds that state courts are free to reduce jury size or to modify the decision rule but not both: under current U.S. law, states cannot allow smaller juries to decide nonunanimously. Future extensions of our analysis should consider the impact of informational cascades on the choice of the optimal decision rule, as well as the impact of various procedural and evidentiary rules on jury deliberation with behavioral cascades.
. This is verified under the assumptions of the model. 1, . . . , n In addition, proposition 3.1 requires us to verify that is asymptotically equivalent to if and only if c(n)/[1 Ϫ c(n)] the fact that the uniform distribution over k outcomes (1/k, . . . , has maximum entropy. The inequality is equivalent to 1/k) a*(1 Ϫ . This expression provides the sufficient condition p*) 1 1 Ϫ c exp {ϪKL} for the homogeneous jury to deliberate correctly with a higher probability than for the heterogeneous jury. By inspection of this condition, it follows that any given level of heterogeneity is more problematic in terms of achieving a unanimous decision for a smaller n value. Q.E.D.