Judgments Concerning the Valence of Inter-Turn Silence Across Speakers of American English, Italian, and Japanese

The fact that people with minimal linguistic skill can manage in unfamiliar or reduced linguistic environments suggests that there are universal mechanisms of meaning construction that operate at a level well beyond the particular structure or semantics of any one language. The authors examine this possibility in the domain of discourse by focusing on how gaps arising at the juncture between 2 persons' turns-at-talk (inter-turn silences) are evaluated by speakers of typologically distinct languages: English, Italian, and Japanese. This cross-linguistic design allows the testing of both universal and relative aspects in orientation to silence. For this study, the effects of inter-turn silence are tested using study participants' ratings of speakers' willingness to comply with requests or agree with assessments that were embedded in conversations. In a 3 × 2 × 3 between-groups design, 3 silence lengths (0 ms, 600 ms, or 1200 ms) were crossed with 2 speech act types (requests and assessments) in manipulations of telephone conversations that were modeled on an actual telephone call between friends. Native-speaking study participants, in their home countries, provided ratings on Likert-type scales. Ratings significantly decreased within each language group at longer inter-turn silences, indicating a generalized response to the gaps; however, means were also significantly different between groups, indicating different expectations for agreement.

glitches as some form of trouble, as displayed in their pursuit of responses or reformulations of prior talk (Davidson, 1984;Pomerantz, 1984).
This project is grounded in these descriptive findings and builds on them to develop an experimental approach that is based on actual interaction. The aim is to capture a possible overhearer's perception of others' conduct with regard to inter-turn silence (cf., Fox Tree, 2002) and to directly compare these judgments across language groups. This can help inform us as to the generality (or not) of overhearers' responses to disruptions in the forward motion of talk. 1 Admittedly, this is an "external view" of the phenomenon (Heritage, 1995, p. 406), not an analysis of members' methods for organizing the "occasions and resources of understanding" (Schegloff, 1992(Schegloff, , p. 1299. Nonetheless, the cross-linguistic approach to the study of inter-turn silence affords a glimpse at one possible entry point for understanding the seamless interrelation of the social/relative and the cognitive/universal in language use (Levinson & Enfield, 2006). We begin by reviewing these two approaches to silence, then provide a rationale for the choice of the additional languages under study (Italian and Japanese), and finally turn to the experimental design and study results.

CULTURAL, COGNITIVE, AND CONVERSATION ANALYTIC CONTRIBUTIONS
Anthropologists and sociolinguists have described cultural differences in the tolerance for silence in conversation, demonstrating and theorizing how those differences can affect intercultural interactions (e.g., Basso, 1970;Clancy, 1986;Lehtonen & Sajavaara, 1984;Nakane, 2006;Nwoye, 1985;Scollon & Scollon, 1979;Tryggvason, 2006; for edited collections about silence in conversation, see Jaworski, 1997, andTannen &Saville-Troike, 1985). Viewing orientations to silence from the perspective of norms that are culturally relative and distinct, these studies provide grounds for an assumption that members of different language groups (as a surrogate for cultural groups) will differentially perceive the valence of inter-turn silence (i.e., the degree of negative or positive attribution to gaps in conversation). Conversely, psychologists and psycholinguists have studied silence in terms of response latency, primarily in the context of factual questions. In this processing paradigm, latency is considered a cue to uncertainty or deception (for an overview, see Andersen, 1999). This cognitive approach, built on the conceptualization of latency to respond as a symptom or signal of processing effort in finding an answer (Glucksberg & McCloskey, 1981;Smith & Clark, 1993), assumes that answerers, in the moments of hesitancy, are both doing a search and monitoring the search process (Nelson, 1993). The outward appearance of this searching/monitoring process can be characterized as uncertainty (Brennan & Williams, 1995) possibly construed by listeners as a clue to deception in the making (Andersen, 1999) or conversely, simply thoughtfulness (Burgoon, Buller, & Guerrero, 1995). This research suggests, although it is less cross-culturally robust, that interpretations of inter-silence as "trouble" or thoughtfulness may be grounded in generalized cognitive processes related to memory, retrieval, integration of cues, and so on. Most of this cognitively oriented research has studied silence (and disfluency) in terms of individual rather than collaborative activity, although alternative approaches have been advocated (cf., Clark, 1994;Schober & Bloom, 2004).
What differentiates this study from both anthropological and psychological studies is that it entertains the possibility of a cultural difference in orientation to silence, but examines it at the level of specific collaborative conversational activities (i.e., requests and assessments). Rather than addressing the orientation to silence as either culturally specific or cognitively generalized, we address perception of silence in terms of both and, more important, in terms of particular discourse environments where specific socially appropriate actions are expected. This is distinct from research on response latency and filled pauses in which study participants are evaluating responses to factual questions (e.g., Brennan & Williams, 1995) or direct interrogatives concerning beliefs (e.g., Fox Tree, 2002).
We base our design on the turn-taking model as empirically derived by Sacks, Schegloff, and Jefferson (1974). From that vantage point, it is clear that silence in conversation is a great deal more than an absence of speech. Indeed, Conversation Analysts 2 have demonstrated, as briefly noted earlier, that when inter-turn silence grows subsequent to a speaker's utterance that sets up conditional relevance, it is indicative of possible trouble at that point in the interaction (Davidson, 1984;Pomerantz, 1984). This trouble is empirically available through speakers' routine practice of producing "subsequent versions" (Davidson, 1984, p. 104) of their turn at talk. The generalization that dispreferred actions can be structured around inter-turn silence (i.e., realized as delays in launching turns at talk) is borne out in qualitative studies of conversation in typologically disparate languages such as Japanese (e.g., Mori, 1999;Tanaka, 2005), Finnish (e.g., Sorjonen, 1996), and Italian (e.g., Monzoni, 2007), to name a few.
What is not known, however, is how particular courses of action in the context of particular inter-turn silence lengths are perceived by native speakers of different languages. Such comparative research will help us refine our understanding of the extent to which interactional norms are part of the variable "vocabulary" of language (Chomsky, 1995) or whether tolerances for specific lengths of silence are universal and, therefore, have implications for understanding the social (as well as cognitive) processing of speech. To explore these possibilities, we proceeded with the design and implementation of this study based on the following exploratory questions: 1. Do linguistic/cultural groups differ significantly in their judgments of addressees' willingness to comply with requests or agree with assessments as a function of inter-silence? 2. Do groups differ in their judgments of addressees' willingness to comply with requests or agree with assessments based on the sequential environment (i.e., requests vs. assessments) in the context of inter-turn silence?
The second question brings a unique perspective to the study of inter-turn silence. Because we choose "adjacency pair" as the relevant level of analysis, we move the study of silence away from descriptions of generalized norms and toward a comparative investigation based on judgments concerning the same courses of action (requests and assessments). We are, therefore, better positioned to draw conclusions about the orientation to inter-turn silence at interactionally relevant and comparable moments.

LANGUAGE GROUPS: WHY ITALIAN AND JAPANESE?
Stereotypes abound about the interpersonal styles of many linguistic/cultural groupings. In terms of orientation to inter-turn silence (or propensity for overlapping talk) one relatively unexplored explanation for differing conversational styles is suggested in Schegloff, Ochs, and Thompson (1996). They noted that structures of projectabilty are likely different across typologically disparate languages; and, therefore, the opportunities for overlap would be less or more possible (pp. 28-29). They suggested a possible "confluence of grammar, culture and turn-taking organization," which could "raise the possibility : : : of early entry [into a preceding turn] as a common practice : : : " (p. 30). We chose, therefore, to study speakers of typologically distinct languages, which also have distinct conversational stereotypes attached to them, to provide an initial snapshot of the degree to which culture/language background is a relevant construct in orientation to inter-turn silence.
Two disparately stereotyped groups, from an Anglo-centric standpoint, are Italian speakers and Japanese speakers: One is characterized as voluble and possibly less tolerant of silence (in that speakers are more prone to overlapping talk), whereas the other is viewed as less voluble and more tolerant of silence. These distinctly opposed stereotypes, for better or worse, have generated interest among scholars, although with mixed results.
Italian literature on silence and its functions in conversation is scarce; the general claim that Italians do not tolerate silence and that they tend to enter conversation in overlap more frequently than speakers of other languages do is often mentioned or implied, but rarely demonstrated. A prefatory chapter in Bazzanella (2002) is promisingly dedicated to silence, but contains few references to empirical studies. The studies of Italian that consider features of the turn-taking system with respect to silence (Monzoni, 2004(Monzoni, , 2007Shultz, Florio, & Erickson, 1982) provide contrasting analyses-one that draws on culture as an explanation (Shultz et al., 1982) and one that draws on participation structure (Monzoni, 2004(Monzoni, , 2007. Nonetheless, these empirical studies do indicate that there is some orientation to minimizing inter-turn silence among Italians in multi-party settings, and that this tends to produce overlapping talk. It is possible to draw quite a different conclusion about Japanese speakers. Based on anthropological and sociolinguistic research, one could infer that the Japanese differ from both Americans and Italians in their perceptions of silence. Sociocultural norms have been described that stress both hierarchy and harmony in interpersonal relationships (Watanabe, 1993). Thus, there may be potential social and psychological preferences for agreement among Japanese speakers along with a more generic structural preference (Pomerantz, 1984) for minimized inter-turn gaps at sequentially relevant moments. 3 Such a confluence could lead to less tolerance for long gaps. Conversely, a strong underlying social orientation to agreement might predict more tolerance for long gaps, as there might be a presumed expectation for agreement; thus, delayed affirmative responses would not necessarily elicit negative judgments.
In sum, from the few studies of Italian and Japanese that focus on silence, it is not clear whether norms for orientation to silence are operating at the level of participation structure or at a broader cultural level or whether, in fact, there is any stable cultural norm. This study is certainly not the last word on cultural orientation to silence, but it does provide a stable footing of reasonably controlled, consequential discourse environments (requests and assessments) on which to test and build further investigation.

METHOD Overview
We devised a series of telephone conversations that were based on actual telephone calls between college-aged friends. These dialogues were then performed by American, Italian, and Japanese university students who were native speakers. Digital recordings were made for subsequent sound editing. We inserted interturn silences between requests or assessments and the affirmative responses that followed them. Undergraduates at study sites in their home countries were recruited as study participants to listen to the dialogues and provide their judgments (as ordinal ratings) of addressees' willingness, within the recorded dialogues, to comply with requests or agree with assessments.
A central challenge in designing this study was to control, as much as possible, for the vocal qualities of the American, Italian, and Japanese speakers who performed the dialogues. To fully account for any vocal differences between the speakers, we would have had to enlist trilingual speakers to perform all of the dialogues, but even this "matched-guise" approach (Lambert, Hodgson, Gardner, & Fillenbaum, 1960) has been called into question (Bradac, Cargile, & Halett, 2001). That technique only works under the assumption that the multilingual speaker is equally skilled in each language or language variety, when in fact the speaker may present idiosyncratic differences of fluency in the various guises that could still affect hearer judgments (Bradac et al., 2001, pp. 139-141). Thus, we proceeded by controlling for production features within each language group, as detailed later, but we did not control paralinguistic qualities to the extent of finding one speaker who could perform in all three languages.

Study Participants
Native-speaking American English (n D 70), Italian (n D 72), and Japanese (n D 80) undergraduate students, all living in their home countries at the time of data collection, were recruited and compensated according to an approved institutional review board protocol. The sample populations did not differ reliably in age (American: M D 19.97, SD D 2.57; Italian: M D 21.96, SD D 2.60; Japanese: M D 20.14, SD D 1.26), F(1, 217) D 2.35, ns; but they differed in the representation of the genders. For the American English group, 60% were women (n D 42); for the Italian group, 74% were women (n D 53); and for the Japanese group, 84% were women (n D 67), 2 (1, N D 222) D 54.50, p < .01. However, as discussed later, gender did not affect results across or within language groups.

Materials
To fully replicate the design of the earlier study of American English (Roberts et al., 2006), the same nine dialogues were used as the foundation for the Japanese and Italian study materials. The dialogues are based on the transcription of the opening of an actual telephone conversation between two female American college-aged friends (Roberts & Robinson, 2004). For this study, one native speaker for each language group translated the dialogues from English into the target languages. These were then back-translated into English by a second native speaker from the target language groups. This was to check that the dialogues were roughly equivalent in the different languages, yet still idiomatic and comprehensible for their local contexts. Each conversation culminated with a request or an assessment by the caller; the call recipient responded with an affirmative one-word response (further details follow).
Of the nine dialogues, six were manipulated for purposes of the study and three remained as they had been produced by the actors. These three dialogues ("lures") were interspersed among the six target stimuli to mask the manipulations. Although the stimuli with long gaps clearly had a different flavor than the un-manipulated dialogues, study participants had a variety of guesses concerning the purpose of the study: In response to our verbally posed debriefing question, "What do you think this study was about?," study participants mostly said "tone of voice" or "emotion in conversation," but some (roughly 1 = 4 ) were able to pinpoint that it was about "pauses" in speech. This elicitation of study aims was done as a precursor to disclosing the actual aims of the study and was not designed as part of the data collection. Thus, although our conclusion about study participants' awareness of study aims is impressionistic, we can say that it was not generally obvious to participants that the study had targeted gaps in conversation.
Construction and recording of dialogues. Each conversation, whether target or lure, included a mundane request or assessment made by the caller. These were about everyday topics concerning school flyers, going to the gym, or picking up a new computer. In response to the requests and assessments surrounding these three themes, the call recipient responded in the affirmative.
The target stimuli controlled for theme, but differed on speech act. For example, if the theme of the call concerned flyers for a school function, it ended with the caller either formulating a request in terms of that topic (e.g., getting a ride to pick up the flyers) or offering an opinion on the topic at hand (e.g., reporting that the flyers look good). The call recipient answered in the affirmative for both sequence types: In English, the token "sure" followed requests, and "yeah" displayed agreement with the assessments. The roughly equivalent tokens used for Italian were certo for "sure" and si for "yeah." For Japanese, ii yo was used for "sure" and soo da ne for "yeah." Appendix A contains examples in English and Appendix B contains the targeted translated sequences along with notes on similarity of syntactic projectablity.
All of the dialogues (target and lure) were enacted by native-speaking students in the same age range as the target study population. To control for gender and voice quality, the target stimuli in each language were performed by the same two women, who maintained the same caller-call recipient identity. Familiarity in all of the conversations was maintained by features such as omission of the caller's name, use of an informal register, and/or truncating the greeting sequence (Hopper, 1992;Schegloff, 1979). The person portraying the call recipient in the recording was directed to make responses to requests and assessments sound agreeable, but not unusually enthusiastic. The student actor was instructed to maintain a normal register and to respond affirmatively, with no sense of hesitation (i.e., to sound willing and agreeable in an everyday sort of way).
Clearly, one cannot control for all intervening variables when using human voices; some voices just sound friendlier or more sincere or more needy and those qualities can influence judgments. However, to perfectly match the acoustic characteristics through machine produced language would seriously undercut face validity and, as mentioned earlier, a matched-guise approach was neither possible nor completely reliable. Our best approach was to competently direct actors in their native languages to produce the kind of friendly, familiar demeanor we were seeking. We did not undertake a full study, either before or after choosing our actors, of whether their voices were considered "friendly" within their own cultural contexts. Such a study might yield some reassurance that each of the voices was, indeed, friendly, whatever that would mean in each culture, but whether they were equally so across the groups is simply impossible to calibrate or control. Readers concerned with issues of voice quality are invited to request digital files of the experimental stimuli by contacting Felicia Roberts.
Agreement token manipulations. To control for possible confounding from the acoustic qualities of the actors' slightly different agreement token pronunciations (during recording) we identified, following procedures detailed in Roberts et al. (2006), two median agreement tokens (1 "yeah" and 1 "sure" and their equivalents in the other languages) from all of the agreement tokens produced by each actor. The "sure" token and the "yeah" token, which fell in the middle in terms of duration, pitch range, and pitch change, were chosen as the default agreement tokens and edited into the appropriate dialogues. In other words, within each language condition the agreement tokens were identical in each dialogue so that only the inter-turn silence between the request or assessment and the response differed. Thus, the acoustic quality of the responses was fully controlled.
Inter-turn silence manipulations. Once the median agreement tokens were edited into the dialogues, silences were inserted between the first pair part (the request or assessment) and the agreement token ("sure" or "yeah"). Three lengths of inter-turn silence were used to test the interaction between silence and sequence type: 0 ms (no lag time), 600 ms, and 1200 ms. These lengths were chosen to provide a baseline simulation of no gap between the request/assessment and the affirmative response, and then equal increments leading up to Jefferson's (1989) proposed limit of "approximately 1 s" as a standard maximum of silence in conversation. Silence was taken from other dead space in the dialogue (i.e., it was not a machine-produced silence) to best maintain the natural acoustic environment. These natural silences were spliced to the end of phonation on the request/assessment utterance as visually apparent from the sound wave.

Design
In a 3 2 3 mixed model with repeated measures, silence length (0 ms gap, 600 ms, and 1200 ms) and sequence type (request and assessment) were within-group factors; language group (English, Italian, or Japanese) was the between-groups factor. These independent variables were manipulated in the context of constructed dialogues that simulated telephone conversations between two college-aged female friends. The dependent variable was rater perceptions of an addressee's willingness to comply with requests or agree with assessments as elicited from a written question following each dialogue. Judgments were captured using an ordered scale ranging from 1 (not willing or not in agreement) to 6 (very willing and very much in agreement). Raters were encouraged to use the whole scale, and some study participants circled adjacent values (i.e., they could provide a rating of 2.5 by circling the 2 and the 3 on the scale).
Three different orders of presentation were used. These were counterbalanced based on the inter-turn silence manipulations of the target stimuli. Admittedly, this was not a full counterbalancing of sequence types (2 levels) and inter-turn silence length (3 levels), which would have produced six presentation orders. Given our recruitment and administration procedure (in classrooms of consenting instructors; see the Procedure section) and no evidence from an item analysis that ratings between the six counterbalanced orders were significantly different for English (see Roberts et al., 2006, Experiment 1), we chose to take this more economical path. Thus, across the three presentation orders, each interturn silence appeared in each position an equal number of times, but not every Speech Act Gap combination was heard in every possible position.
The dialogues intended to mask the purpose of the study (lures) always appeared in the same place in the audio presentation. Each presentation order began with a masking dialogue to get study participants oriented to the task. The other masking dialogues appeared fourth and seventh in the series of nine conversations.
Each of the three audio presentation orders consisted of nine conversations: Three ended with the assessment/response pairing ("I think it/they look[s] pretty good," followed by "Yeah"), which was separated by one of the three inter-turn silence lengths (0 ms, 600 ms, or 1200 ms); three conversations ended with the request/response pairing ("Can you give me a ride over there," followed by "Sure"), separated by the three inter-turn silence lengths; and three nontarget conversations ended with an assessment or request (also paired with the affirmative response tokens "sure" and "yeah"), but they were un-manipulated. The natural inter-turn silence, as originally produced by the actors, was left intact for the non-target stimuli. Thus, nine stimuli were heard by each study participant, and each Silence Length Speech Act type manipulation was heard once.

Procedure
Study participants were recruited by the study authors from classrooms where instructors had given permission for the last 10 min of the class to be used for student participation in "a study about communication." Students who were willing remained in the classroom and were given consent forms to review and sign. Thus, from these classrooms, independent samples of convenience were drawn. Different classrooms were used for each administration such that no student ever participated twice.
For each sample group, students were given the same recorded instructions: that they would be hearing "several telephone conversations among a group of friends" and that each friend "was just relaxing at home." The recording also said that after each conversation there would be a question to answer about the conversation. Including the consent process and debriefing, the process took about 15 min. About 6 min were required for the actual administration of the experimental protocol. Study participants (and the student actors who recorded the dialogues) were compensated for their time.

RESULTS
The fact that the study participants were homogenous in terms of age (see the Participants section) and that exploratory analysis revealed that their gender had no significant effect on their judgments of the silences-F(1, 218) D 0.896, ns-analyses reported here do not include gender or age as covariates. 4 In the remainder of this section, therefore, we report results from an omnibus analysis of variance (ANOVA), including calculation and interpretation of effect size for relevant variables. Key interactions and major findings are then explored in subsections. For ease of presentation, the inter-turn silence length variable is referred to as "gap," and the sequence type (request or assessment) is referred to as "speech act." The ANOVA indicated statistically significant main effects for the three independent variables: gap, F(2, 438) D 529.76, p < .01; language group, F(2, 219) D 51.82, p < .01; and speech act, F(1, 219) D 50.32, p < .01. The interaction of gap and language group was statistically significant, F(4, 438) D 10.57, p < .01; as were the interactions of speech act and language group, F(2, 219) D 44.54, p < .01; and gap and speech act, F(2, 438) D 3.21, p < .01. There was no three-way interaction (p D .381).
Effect size is one way of estimating the contribution of a particular factor to an observed effect. Although the ANOVA indicated statistically significant main and two-way interaction effects for all three independent variables, the effect sizes, calculated using partial eta-squared (Á 2 p ), for Gap Language (Á 2 p D .08) and Gap Act (Á 2 p D .01), were small. Although interpretation of effect sizes is made cautiously, subject to several limitations (Cohen, 1988), it does appear that gap length (Á 2 p D .70) and language group (Á 2 p D .32) contribute large percentages (70% and 32%, respectively) of the overall variance. Speech act accounts for 19% (Á 2 p D .19) and Sequence Language Group is 29% (Á 2 p D .29). Although these values are not additive (i.e., partial eta-squared does not provide estimates that sum to 100%), they nonetheless provide an indication of the contribution of each factor (or interaction) as if it were the only variable (Young, 1993). The salient result from the effect size calculations is that inter-turn silence length (gap) has, by far, the strongest effect on study participants' ratings of an addressee's willingness to comply with requests and agree with assessments.
American, Italian, and Japanese Raters All Judged Longer Silences More Negatively, but Differed Sufficiently to Suggest Disparate Cultural Orientations to the Inter-Turn Silences As Figure 1 illustrates, regardless of language background, all raters judged speakers to be less willing to comply with requests or agree with assessments the longer the speaker paused before agreeing: linear trend, F(1, 219) D 794.08, p < .01. This indicates that, across all three language groups, there is a decreasing sense of speaker agreement or compliance with increased silence. The effect of language group is also salient in Figure 1. Overall, the Japanese study participants rated the speakers as more agreeable than either the Italian or the American participants. Between-groups differences were statistically reliable in post hoc comparisons (Bonferroni-corrected) across all gap conditions when comparing the Japanese and American groups (M diff D 0.721, p < .01) and the Japanese and Italian groups (M diff D 0.935, p < .01). The only statistically reliable mean difference between the American and Italian raters was in the 600 ms gap condition (M diff D 0.46, p < .01).
The distinctly different ratings between the Japanese and the other groups suggests that what constitutes "agreeable" in the context of conversational gaps is calibrated slightly differently for them, and that they may be entering the scale at a different point. The statistically reliable differences between all three groups in the 600 ms gap condition suggests that a roughly 1 = 2 s gap may be enough to distinguish cultural differences in orientation to inter-turn silence.
Italian Raters Judged the Smaller Inter-Turn Gaps as More Problematic, Whereas the Japanese Judged the Longer Inter-Turn Gaps as More Problematic To further explore the interaction of gap length and language group, we examined raters' judgments of the inter-turn silences within their language group and then compared the groups descriptively. Most notable is the comparison between the Japanese and the Italian raters. As visualized in Figure 1, the magnitude of the drop in ratings by the Italians from 0 ms to 600 ms is double that of the Japanese, whereas the drop in ratings for the Japanese is double that of the Italians when comparing the 600 ms and 1200 ms gaps. This pattern of difference ratings points to the possibility that the different language groups do not simply differ in where they enter the scale of "agreeableness," but may also have different thresholds at which they find silence problematic.

Americans, Italians, and Japanese Respond Slightly Differently to the Sequence Types (Requests vs. Assessments)
A unique contribution of our approach to the valence of inter-turn silence was to examine the effects of sequence type ("speech act") on raters' judgments. Because most prior research on gaps in conversation is situated in the environment of question-asking (factual/trivia questions), we chose to manipulate the stakes of the face threat by using more naturalistic, collaborative, and consequential social actions. To further refine our understanding of the significant main effect of speech act and the interaction of speech act and language group, we explored differences and similarities within each condition.
Overall, raters judged the request sequences as sounding more agreeable (i.e., the ratings were generally higher in this condition; see Table 1), but only the Japanese rated requests substantially higher than they did assessments (M diff D 1.04, p < .01). For all of the groups, this is likely due to the relative strength of the agreement carried in the different tokens ("sure" vs. "yeah," as we discuss further later). Although the comparison of raters' judgments of the speech act types may be obscured by the semantics of the different response tokens, the result for the request condition between the three language groups is clear: Participants' ratings are significantly different, F(2, 221) D 102.731, p < .01-a finding that was reliable in all post hoc pairwise comparisons. This may reflect different underlying cultural orientations to compliance with requests in general, but this is an initial speculation and remains to be systematically explored in further studies.
In sum, what seems to be driving the interaction effect of sequence type and language group is the fact that (a) the Japanese participants rated requests and assessments quite differently; and (b) between the three language groups, primarily requests were rated significantly differently from one group to the next.

DISCUSSION
This study was designed to examine the effect of inter-turn silence and sequence type (speech act) on native speakers' judgments of an addressee's willingness to comply with requests or agree with assessments. We tested for these effects across identical "friendly telephone conversations," which were idiomatically translated from American English for Italian and Japanese study participants. The limitations of this exploratory experimental approach are acknowledged and addressed, but we begin with the strengths of our design, a discussion of the main findings, and their relevance for building research that could help synthesize the contributions of cognitive and social approaches to discourse studies.
The basis for the experimental design was the empirically derived turn-taking model as described in Sacks et al. (1974). Empirical research based on that model indicates both a preference for progressivity in talk (Stivers & Robinson, 2006) and that gaps arising in the context of conditional relevance (utterances designed for, and expecting, responses of some sort) will prove problematic for speakers (Davidson, 1984;Pomerantz, 1984). What the prior research could not help us to tease apart is the extent to which this perception of "trouble" would be truly universal (all speakers responding the same to the same silence lengths) or relative (cultural groups displaying different norms), or whether there was some intersection of these two dimensions. Our aim, therefore, was to maximally control for a variety of acoustic, thematic, and speech act factors so that we could explore potential relations among language/cultural group, sequence type, and inter-turn silence.
Among the American, Italian, and Japanese undergraduate students in our sample, length of the inter-turn silent gap matters. Across language groups, the longer the inter-turn silence length, the lower the ratings of willingness to comply with requests or agree with assessments. This supports emerging evidence of a widely shared social orientation to silence as problematic (Stivers et al., 2009).
A socially oriented theory might posit that a preference for progressivity in conversation is what underlies this "distaste" for the growing silence. A more individualistic (cognitive) theory would claim that elapsed time reflects searching and monitoring activity as participants spin out attributions in their minds. We underscore the argument presented elsewhere (e.g., Clark, 1996;Levinson, 2006) that these are complementary dimensions of analysis and should not be either/or explanations. These levels must be integrated in future research as discourse scholars become more familiar with and find ways to explore and test the notion of a socially embedded interaction engine (Levinson, 2006). This study provides an initial step in that direction.
Despite the negative judgments of longer silences across language groups, results also indicate that the American, Italian, and Japanese raters differed in their tolerance for the different gaps. By tolerance, we simply mean that ratings differ between language groups in a way that reflects slightly different underlying expectations for agreement (among peers, in this study) and for the rapidity with which agreement is offered. The Japanese average ratings were consistently highest on ratings of willingness and agreement, the Italians tended to give the lowest ratings, and the American judgments were generally in between (although tending closer to the Italian group's ratings).
An alternative explanation for the different ratings among groups would be based on an assumption that the different linguistic structures of the languages differentially stall a projectable possible completion point (Schegloff et al., 1996). Briefly, projectability refers to the hearer's ability to anticipate that a particular type of utterance is unfolding. Because the target stimuli in all of the languages were similar on this dimension (see Appendix B), we were confident that responses to the inter-turn silences were not based on differential syntactic processing of the utterance underway. We suggest, therefore, that projectability being equal in these dialogues it is the inter-turn silence itself that is driving the disparity in judgments, not hesitation due to completion of syntactic processing.
Thus, any group distinction in orientation to silence is not about differential need for syntactic processing based on formal features of the language, but about cultural expectations for the speed with which agreement is offered. If there is a baseline sense that agreement is expected (e.g., in the Japanese context), it makes sense that ratings of willingness and agreement among the Japanese are less attenuated, as we found, from 0 ms to 600 ms. Conversely, if there is less expectation of agreement, then as soon as some silence is heard, judgments are affected and any additional increments, although still negatively valenced, are perhaps less marked (e.g., in the Italian context, as we found). This pattern merits further investigation using additional, and incremental, lengths of silence. In this way, it might be possible to get a sense of a dose response and to more finely tune the overall finding of culturally differentiated response patterns within a universally obvious sense of decreasing agreement.
The fact that addressees' responses to requests were rated as reflecting greater willingness than responses to assessments is best explained by the fact that their response to the requests ("sure") is a stronger form of agreement than the response used for the assessments ("yeah"; on the full character of weak agreements, see Davidson, 1984, andPomerantz, 1984). Thus, although the response tokens were prosodically similar, they were semantically distinct. We cannot conclusively state that it was the sequential environment itself (the conditional relevance set up by the request or the assessment) that was driving the perceptions. It could well be that the weaker form of agreement, which is broader in scope, is also more ambiguous and, therefore, did not elicit equally strong positive responses.
It is also possible that the Japanese study participants' significantly higher ratings of the "sure" condition (ii yo in Japanese) is simply reflecting an even stronger distinction between the tokens ii yo and so da ne than for the English "sure" and "yeah" or Italian certo and si. It would be possible to retest the findings about sequence type using parallel terms for English and Italian (e.g., "yeah" and "si" for both requests and assessments), but it would be harder to match them prosodically. For Japanese, other tokens such as "eh" might be appropriate for gauging response to sequence type without risk of confounding from the semantics of the agreement tokens. This is clearly an area for further exploration within each language.
Whether the sequence types were tapping a semantic difference (based on response token) or a pragmatic difference (based on assessment vs. request condition) or even a difference in face threat, it is important to note that the requests not only received consistently higher ratings across all groups and gap lengths, but they also were more distinctive between groups. Because requests can be considered face-threatening acts (i.e., in Brown & Levinson's, 1987, terms, a threat to negative face or the desire for autonomy), study participants may be reflecting cultural attitudes about not infringing on others' autonomy. As a more obvious threat, this may have been a more salient concern than that represented in the threat posed by an assessment (i.e., agreement with another's opinion).
In sum, the somewhat murky results for speech act open up the possibility for further clarification of the relative weight of the threat posed by the utterance versus the semantics of the response. Because all responses were affirmative, we were able to isolate the effect of the inter-turn silence, which was clearly the most powerful factor coloring study participants' judgments; however, the question does remain open as to the role of the response token itself in shading response to the sequence types.
Given our finding that ratings are more similar between all language groups at 1200 ms than at 600 ms, there is reason to consider Jefferson's (1989) proposal of "approximately 1 s" (she recognized a span from 0.9 s-1.2 s) as a possible maximum for silence in conversation. However, because our silences were machine timed (and Jefferson was using an analogue procedure, measuring silence in relation to the surrounding talk), it may be that in absolute terms we overshot the mark. In as much as the Italian and American ratings were similar, and the Japanese ratings remained significantly different from both of those groups, we do not yet see strong support for the proposal.

Challenges for Cross-Linguistic Study of Discourse
There are numerous weaknesses in any experimental study of human social interaction. One cannot control for all of the natural, in-the-moment inventiveness and complexity of real conversation, particularly in a cross-linguistic study. One can raise questions about the representativeness of the study population (undergraduate students), voice comparability of the recorded speakers, recognize possible differences in cultural preference/aversion to the types of requests and assessments being made, and consider whether true translation of base dialogues from English to the other languages is even possible. All of these caveats are implicit in our discussion of findings and as limits to their generalizability.
Indeed, one could argue that the robust result concerning different cultural responses to silence could simply be an artifact of the method; although the dialogues were parallel in terms of semantic content, the quality of the voices of the different speakers in the different languages may have had some effect. Unfortunately, there is no body of acoustics literature comparing issues of "tone" (e.g., friendliness of voice) in these languages, nor sufficient research on what such acoustic correlates might be for each language. Thus, researchers must privilege their native speaker auditory perception over acoustic measurements until there are, if ever, reliable cross-cultural measures of the paralinguistic correlates of speaker affect.
As noted in the Method section overview, only a matched-guise approach using trilingual actors might have addressed such concerns about paralinguistic comparability, but even that technique is not perfect (Bradac et al., 2001). Multilingual speakers may not have full command over their affective orientation to the languages they speak and may, therefore, have differences of fluency or vocal quality in the various guises they are asked to perform, which then can affect hearer judgments.
The most relevant and potentially feasible comparison to make, given the aims of this study, would be to assess speaking rates because a silence might sound long or short depending on the speed of the surrounding talk. However, speaking rates are calculated on different units of analysis for the languages in this study: Japanese is a mora-timed language, English is stress-timed language, and Italian is generally considered a syllable-timed language; but such classifications are not uncontroversial (see Fletcher, 2010). 5 From our vantage point as scholars of language and social interaction, the major weakness of this study is the reliance on contrived conversations and assessment of external perceptions (listener judgments), rather than internally motivated analyses of natural interaction. We recognize that our approach is not a substitute for the internal proof procedures of qualitative analysis (Heritage, 1995) and that we tread dangerously by treating "sequence type" (request and assessment) as an abstract category (i.e., a speech act), rather than as a particular and real course of action (see Schegloff, 1995, pp. xxviii-xxix).
However, within the confines of the experimental method, we have worked to ground the design in findings from studies of talk-in-interaction and based the materials on actual telephone calls. Our aim was to investigate a measurable and manipulable aspect of the operation of the turn-taking machinery (gaps) within specific courses of action; we were not attempting to provide an account or analysis of the course of action itself. We forced, in fact, a particular hearing of the silences-one in which agreement was at stake rather than, for example, friendliness or thoughtfulness or comfort or comprehension. We make no claim that disagreement is the only interpretation that study participants might have conjured from the silence.

CONCLUSION
There is a striking overall similarity of judgments in response to inter-turn silence across American, Italian, and Japanese raters, yet there are also compelling finegrained differences between the groups. This lends support to furthering the investigation of the complex interrelation among cognition, social practices, and discourse processes (see, inter alia, Gee, 1992;Lave & Wegner, 1991;Levinson & Enfield, 2006;Ochs, 1996;Rogoff, 1990; for discussions of usage-based approaches to cross-linguistic studies of language universals, see Sidnell, 2007, andTomasello, 2003).
Clearly, something generalized about human perceptual processing generates an observable-reportable phenomenon of silence as indicative of trouble in conversation. In turn, the valence of that silence (as good, bad, long, or short) is, as has been argued, culturally conditioned. This type of Pan-human, ontological basis for social cues has been strongly argued by researchers of non-vocal communication, as in (albeit controversial) proposals on the universality of several facial displays of emotion (Ekman, Sorenson, & Friesen, 1969), the deployment of which is nonetheless governed by cultural display rules (e.g., Ekman & Friesen, 1975;Saarni, 1993).
Tolerance for silence is, of course, a dynamic factor that will significantly vary depending on the types of communicative activities underway and the identities being enacted in and through the deployment of silence. Although we assume that norms are flexible and deployable-avoidable on an "as needed basis" as resources for accomplishing and enacting identity and solidarity, we also maintain that there is room for considering the more universal or generalized cognitive underpinnings of these practices. This study allows us to move beyond descriptive comparisons between languages to a position where the "adaptations and inflections" of generic features of talk-in-interaction (Sidnell, 2007, p. 230) can be specifically examined. This involves breaking down the artificial boundary between social and cognitive. Although it may be somewhat tidier to study them separately, that bifurcation is difficult to maintain as we move toward more comprehensive understandings of human communication in terms of discourse processes and processing.