* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MEMORY, 2000, 8 (3), 145–157 Phonological similarity and the irrelevant speech effect: Implications for models of short-term verbal memory Janet D. Larsen John Carroll University, Cleveland, OH, USA Alan Baddeley University of Bristol, UK Jackie Andrade University of Sheffield, UK Three experiments studied the interaction between irrelevant speech and phonological similarity within both the remembered and the irrelevant auditory material. Phonological similarity within the remembered list impaired performance in both baseline and irrelevant speech conditions, whereas phonological similarity between the remembered and ignored irrelevant items did not influence performance. Although there was a tendency for similarity within the irrelevant items to reduce interference, this proved to be a less robust finding. Implications for the theoretical interpretation of the irrelevant speech effect are discussed. Colle and Welsh (1976) were the first to describe a phenomenon that they called the ‘‘acoustic masking in primary memory’’ (p. 17), whereby the immediate serial recall of a string of visually presented letters was impaired by the simultaneous presentation of continuous spoken text in an unfamiliar language, which the subject was instructed to ignore. Colle (1980) proposed that this effect must be a central process because it occurred both for loud speech and for speech at the level of a whisper. The irrelevant speech effect occurs not only when the speech is in a language that the subjects do not know (Baddeley & Salame, 1986; Colle & Welsh, 1976; Salame & Baddeley, 1986, 1989), but also when the speech is nonsense syllables (Salame & Baddeley, 1982). Furthermore, Salame and Baddeley (1982) demonstrated that the effect did not operate at a lexical level, as subjects who were remembering visually presented digit sequences were no more disrupted by streams of irrelevant digits than they were by sequences of items comprising the same phonemes but in a different order (e.g. TUN, WOO rather than ONE, TWO). Colle and Welsh referred to their results as a masking effect, but they observed that a simple masking interpretation fails to account for two further results, namely the failure of irrelevant Gaussian noise to impair memory performance (Colle & Welsh, 1976), together with the insensitivity of the irrelevant speech effect to the intensity of the material that was assumed to be masking the memory trace (Colle, 1980). While accepting that a simple masking interpretation was ruled out, Salame and Baddeley (1982, 1987) suggested that something analogous to a masking hypothesis could be incorporated into the Working Memory model (Baddeley, 1986) by assuming that the phonological shortterm store allowed spoken material direct access, Requests for reprints should be sent to Janet D. Larsen, Department of Psychology, John Carroll University, University Heights, OH 44118, USA. Email [email protected] Ó 2000 Psychology Press Ltd http://www.tandf.co.uk/journals/pp/09658211.html 146 LARSEN, BADDELEY, ANDRADE while visually presented material gained access to the store only if subvocally articulated. They proposed that the store was protected by an auditory filter which prevented sounds that were not speech-like from being registered. Provided that the store is assumed to register information regardless of auditory intensity, the lack of an effect of loudness can readily be explained. Positive, if not particularly strong, evidence for something analogous to masking came from the observation that, although the memory for digits was no more disrupted by the need to ignore digits than to ignore other words made up from the same phonemes, a third condition in which the irrelevant items were disyllables did reduce the magnitude of the irrelevant speech effect. Salame and Baddeley (1982) suggested that this is because the disyllabic irrelevant words were phonologically dissimilar to the monosyllabic digits. As we shall see, the robustness of this result has subsequently been questioned (Jones & Macken, 1995b; Jones, Madden, & Miles, 1992; Le Compte & Shaibe, 1997). Salame and Baddeley (1986) attempted to test this memory masking hypothesis more directly by studying the effect of irrelevant speech on memory for letter sequences differing in degree of intra-list phonological similarity and in the number of items in the list. Sequences of similar letters were assumed to be harder to recall accurately (the phonological similarity effect), because the remembered items had fewer distinguishing phonological features, and hence were more vulnerable to trace decay. It was argued that if irrelevant speech added noise to the memory trace, then there should be an interaction between the effects of irrelevant speech and phonological similarity, such that similar items, having fewer distinguishing cues, should be more dramatically impaired. There was no evidence of the predicted interaction at lists of intermediate length, although on lists exceeding seven letters the influence of both irrelevant speech and phonological similarity was abolished. This result is broadly consistent with an earlier report by Colle and Welsh (1976) who used only lists of eight similar or dissimilar consonants, observing that irrelevant speech abolished the effect of phonological similarity. Abroadly similar result was obtained by Jones and Macken (1995b). On the other hand, Surprenant, Neath, and Le Compte (1999) report that the phonological similarity effect is not removed by irrelevant speech, when the remembered items are presented auditorily. This pattern of results forms an important component of the attempt by Neath (in press) to test the feature model of the irrelevant speech effect. An alternative interpretation of this pattern of results is to suggest that, although subjects preferentially use phonological coding for immediate serial recall, when performance drops below some critical level, they abandon phonological coding and attempt to use other strategies such as semantic coding in the case of words, or using initials or association in the case of letters. Baddeley (1966a,b) showed that, whereas immediate memory for sequences of five words relied on phonological coding, with little influence of semantic similarity, for sequences of ten words the pattern was completely reversed. The tendency for acoustic coding to be abandoned at longer list lengths was at the root of a major controversy concerning the relationship between immediate memory and reading. Initial research by Mann, Liberman, and Shankweiler (1980) noted that children who were identified as poor readers tended to show reduced evidence of phonological coding in immediate recall of consonant sequences. It subsequently emerged, however, that their result was dependent on using similar list lengths for both good and poor readers; poor readers tend to have shorter memory spans, and hence were operating at a much higher error rate than the good readers. When performance was studied across a range of sequence lengths, it became clear that both good and poor readers coded phonologically, provided the sequences were not too far beyond their span (Hall et al., 1983; Johnston, Rugg, & Scott, 1987); a similar interaction between length and phonological coding was shown for a group of children with specific language impairment by Gathercole and Baddeley (1990). However, while we regard strategy switching as a plausible interpretation of the existing results, given that the study by Salame and Baddeley (1986) is the only report of acoustic similarity effects for visually presented items in the presence of irrelevant speech, there is clearly a need to investigate this issue further. A second problem with the initial working memory model stemmed from the observation that immediate serial recall was disrupted not only by speech, but also by music (Salame & Baddeley, 1989), and even by a noise stimulus, when that stimulus fluctuated not only in intensity, but also in pitch (Salame 1990). The auditory filter hypothesis could still handle these results, given a plausible further assumption, namely that a filter PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH that admitted speech would be unlikely to be a perfect filter, and hence would also allow in other sounds that were speech-like in nature. Jones and his colleagues have extensively investigated kinds of material other than speech that may disrupt serial recall. Jones and Macken (1993) have shown that even pure tones are capable of disrupting performance, provided they fluctuate in time, whereas babble, in which many speakers talk simultaneously to produce a relatively homogeneous background, does not (Hellbruck & Kilcher, 1993; Jones & Macken, 1995a; Kilcher & Hellbruck, 1993). Furthermore, they showed that much less disruption is caused by a single repeated item, such as the letter C, than by a set of letters, such as C, H, J, U, repeated randomly (Jones et al., 1992). Jones et al. proposed what they term the changing state hypothesis to account for these results. The changing state hypothesis proposes that memory will be disrupted by any fluctuation in the state of the auditory irrelevant stimulus. While the changing state hypothesis specifies the type of irrelevant stimulus that will disrupt serial recall, it does not imply a specific mechanism. In suggesting a possible mechanism, Jones (1993) abandons the idea of a specific phonological store in favour of what he terms the ObjectOriented Episodic Record (O-OER) hypothesis. This proposes a common memory store that contains the to-be-remembered items together with clues about their order of presentation. In the case of auditory material, the ‘‘objects’’ are spoken sounds which presumably may be identified by the listener as words or possibly longer prosodic units such as phrases and sentences. When irrelevant speech is present, a different set of items and their associated order cues also enter the memory store. Memory disruption is attributed to the confusion among the order cues within the store, rather than to any disruption of the items being stored. Furthermore Jones and Macken (1995b) suggest that if the spoken disrupting words are similar in sound, then they will tend to cohere into a single auditory object, causing much less interference than the many objects represented by different sounding words. This prediction was tested by Jones and Macken (1995b, Experiment 3) in a study in which the to-be-remembered lists of letters were either highly confusable (B, C, D, G, T, V, and P), or different (F, K, L, M, Q, R, and Y). The irrelevant speech comprised lists of seven words that either rhymed with the confusable letters or each of which rhymed with one of the 147 different letters, rhymed with each other but not with any of the letters, or did not rhyme with any of the letters or with each other. They observed no effect of similarity between the letters being recalled and the words being ignored. This result is inconsistent with Salame and Baddeley’s (1982) finding of reduced disruption of digit recall from dysyllabic spoken words, one of the few pieces of evidence in favour of a masking hypothesis. In contrast, Jones and Macken found that acoustic similarity within the list of irrelevant spoken words reduced the amount of interference, as the O-OER hypothesis had predicted. We are left, therefore, with three questions concerning the relationship between phonological similarity and the irrelevant speech effect. The first is the question of whether irrelevant speech abolishes the phonological similarity effect for visually presented letter sequences (Jones & Macken, 1995b; Surprenant et al., 1999), or whether, as the acoustic coding strategy hypothesis suggests, the effect will be found provided sequence lengths are relatively short (Salame & Baddeley, 1986). The second issue concerns the question of whether the irrelevant speech effect is greater when the items remembered and those ignored are phonologically similar. The third issue concerns whether phonological similarity within the sequence of letters to be ignored will reduce the capacity of such letters to disrupt recall, as the O-OER hypothesis predicts (Jones, 1993). Our first study, therefore, involved combining the recall of letters varying in degree of within-list phonological similarity with the effect of disruption by irrelevant sequences that are similar or dissimilar to those being recalled. EXPERIMENT 1 We used two types of phonologically similar sequences: one comprised the letters C, D, G, P, T, and V, a second comprised the set F, L, M, N, S, and X, while the dissimilar set comprised the letters B, F, H, J, Q, and R. These were combined with irrelevant spoken words that were phonologically either similar or dissimilar to the remembered letters. This design allowed us to ask three questions, namely whether an acoustic similarity effect occurred and survived the effect of irrelevant speech, whether the similarity between the irrelevant and the remembered material was an important variable, and finally whether similar 148 LARSEN, BADDELEY, ANDRADE words in the irrelevant speech caused less interference than dissimilar words. Method Participants. A total of 96 undergraduate students, 44 men and 52 women, participated in partial fulfilment of an Introductory Psychology course requirement. Three sets of letters were used as the to-be-remembered material. One set, C, D, G, P, T, and V, all end with the long E sound. Another set, F, L, M, N, S, and X, all start with the short E sound. The final set, B, F, H, J, Q, and R, do not have any sounds in common. The six letters were each presented in the centre of the screen of a personal computer for 600 ms, followed by a blank screen for 240 ms. After the sixth letter there was a delay of about 10 s, during which the screen was blank. Then the word RECALL appeared on the screen as a cue for the participants to write down the letters in the order they had seen them. The computer created different random orders of the letters for each trial for each participant. The irrelevant speech consisted of three sets of six words. Words in one set, FEE, HE, KNEE, LEE, ME, and SHE, end in a long E sound. A second set of words, EBB, ECHO, EDGE, EGG, ET, and ETCH, begin with a short E sound. The third set of words, BAY, HOE, IT, ODD, SHY, and UP do not have any sound in common and do not have either a short E or long E sound. These words were recorded in a female voice by computer in digitised sound files with 8-bit resolution and a sampling rate of 11kHz. The files for all the words were edited to be exactly the same length, with silence being inserted at the end of a file when necessary to make it the correct length. A set of files of the same length containing no sound was also created for use in the quiet condition. Each file played for the 600 ms that the letter was on the screen and there was no sound during the 240 ms delay between letters. The sound was played at an average sound level of approximately 75 dB through a Labtec stereo computer speaker system (Model CS-150) attached to the computer. Each irrelevant speech condition and the quiet condition was presented once in each block of four trials, with the order within each block randomly determined by the computer. For the trials with irrelevant speech, the six words were presented three times each, in a different random order on each trial. As soon as each letter was displayed on the screen, a sound file was played, making the presentation of the letter and sound as close to simultaneous as possible. The words continued to be played at the same rate during the 10 s delay period before the word RECALL appeared on the screen as a cue to the participant to begin writing down the letters. Procedure. Participants were tested individually. They initiated each trial by pressing the space bar. Each participant first did 12 practice trials, followed by 80 experimental trials. Each participant saw the same six letters throughout the experiment so 32 participants studied each of the three sets of letters. Participants were instructed to try to ignore the words they heard and just remember the order of the letters. As soon as the word RECALL appeared on the screen, the participant wrote the six letters on an answer sheet. The participant covered each response with a masking paper before initiating the next trial. An experimenter was present throughout the trials to ensure that the participant did not begin to write the letters prior to the RECALL cue and covered the response to one trial before initiating the next trial. Results The average number of letters per trial, out of six, that participants wrote down in the correct serial position with each of the different types of background, broken down by the set of letters participants studied, is shown in Table 1, along with standard deviations. The average number of letters recalled in the TABLE 1 Experiment 1 Type of words in irrelevant speech Long E Short E Unrelated None Sounds shared by letters to be remembered Long E Short E None 4.04 (1.16) 4.05 (1.17) 3.97 (1.09) 4.47 (1.03) 4.54 (1.04) 4.58 (1.08) 4.56 (1.15) 4.80 (1.01) 4.82 (0.90) 4.63 (1.06) 4.53 (0.96) 5.34 (0.65) Average number (and standard deviations) of letters remembered in the correct position in a list, out of six letters, in Experiment 1. N = 32 in each letter condition. All participants studied 20 lists in each background condition. PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH correct serial position was analysed with a 3 ´ 4 (letter set by irrelevant speech type) mixed factorial ANOVA with irrelevant speech type as the within-subject factor. There was a main effect for the letter set studied, F(2,93)= 4.324, p = .016. Mauchly’s test indicated that sphericity could not be assumed for irrelevant speech, so a multivariate approach was used where irrelevant speech was involved in the analysis. Pillai’s Trace showed that there was a main effect for irrelevant speech, F(3,91)= 22.144, p <.001, and an interaction between letter set studied and irrelevant speech type, F(6,184)= 2.467, p = .026. In view of the significant interaction, a test of simple main effects was conducted for each irrelevant speech condition to determine whether the acoustic similarity effect was present. There was a simple effect of the letter set studied in the long E, unrelated, and quiet irrelevant speech conditions, Fs(2,93)= 4.663, 3.084 and 7.384 with ps of .012, .050 and .001 respectively. The simple effect of the letter set studied in the short E irrelevant speech condition was only marginally significant, F(2,93)= 2.688, p = .073. These simple main effects were further examined, using the Bonferroni correction to adjust for multiple comparisons. In the presence of irrelevant speech consisting of words ending with the long E sound, people who studied the unrelated letters remembered more letters than those studying the long E letters, p = .010, but the scores of those studying the short E letters were not different from either of these conditions. With unrelatedwords, the differencein the recall of long E letters and short E letters was marginally significant (p = .089). In the quiet condition, Dunnett’s C was used because there was not homogeneity of variances in this condition. In quiet, people who studied the unrelated letters remembered more letters than those studying the long E letters, p = .001, and marginally more than those studying the short E letters, p = .063, but those studying the short E letters were not different from the long E letter condition, p = .440. A follow-up test for the marginally significant main effect for letter set within short E irrelevant speech, showed that the letter sets with the greatest difference, the long E and the unrelated letters, were not significantly different (p = .115). Thus the acoustic similarity effect was not the same across all of the irrelevant speech conditions. In the long E, and quiet conditions, the unrelated letters were remembered better than the long E letters, and the short E letters were not different 149 from either letter set. In the other two irrelevant speech conditions, the phonological similarity effect was only marginally significant but there was a trend in the data for better recall of the unrelated letters than the long E letters here as well. Next, the question of whether the effects of the different types of irrelevant speech were similar within each letter set was examined. There was a simple effect for irrelevant speech conditions within the long E letter set, F(3,93)= 10.539, p <.001, and within the unrelated letter condition, F(3,93)= 26.76 p <.001. Mauchly’s test indicated that sphericity could not be assumed for the short E letter set. Pillai’s Trace showed that, for short E letters, there was not a simple effect for irrelevant speech, F(3,29)= 1.674, p = .194. When studying the long E letters, people remembered better in the quiet condition than with any of the kinds of irrelevant speech (long E, p = .005, short E, p = .001, and unrelated words, p <.001). None of the irrelevant speech conditions differed from each other, providing no evidence that similar sounding words will cohere into a single auditory object. When studying unrelated letters, people remembered the letters significantly better in the quiet condition than with the long E, short E, and unrelated words irrelevant speech, all ps <.001. They also remembered them better with long E words than with unrelated words, p = .033, but there was no difference between the long E and the short E, p = .290, or the short E and the unrelated words, p = 1.00. This does provide some evidence of auditory streaming, as predicted by the changing state hypothesis. The prediction that the long E irrelevant speech would interfere more with recall of the long E letters and short E irrelevant speech would interfere more with recall of the short E letters was not confirmed. When only these two levels of irrelevant speech and letter sets were examined, the interaction was not significant, F(1,62) <1. Discussion The effects of both phonological similarity and irrelevant speech were different, depending on the type of letters people were trying to recall. Surprisingly, when people studied the letters starting with the short E sound, there was no irrelevant speech effect in any condition. The prediction that long E irrelevant speech would interfere more with recall of the long E letters and that short E irrelevant speech would 150 LARSEN, BADDELEY, ANDRADE interfere more with recall of the short E letters was not confirmed. The prediction of the O-OER hypothesis that people will be better able to ignore irrelevant speech if the words have a sound in common than if they are all different was supported only when people studied unrelated letters. Even then, this difference was found only between words ending in long E and unrelated words. Words starting with short E did not show any tendency to cohere into a unitary auditory object. There was also no difference between any of the sets of words when people studied the long E or the short E letters. EXPERIMENT 2 Comments from participants indicated that they used various mnemonic strategies to remember the order of the letters. Certain letter combinations, such as TV and FM are meaningful because they are common abbreviations in the language. Other letter combinations were of personal significance to individual participants, such as the initials of a friend. The logic of this experiment as a test of the phonological masking hypothesis is compromised if people were not remembering the letters with a phonological code, but as the concepts for which letter combinations might stand. It is possible that the degree of recoding may have differed across letter sets, resulting in the absence of phonological similarity effect for the letters with a common initial sound, a result at variance with earlier studies (Conrad & Hull, 1964). As the presence of a standard phonological similarity effect in the baseline control condition is necessary for a satisfactory test of our hypothesis, we opted in Experiment 2 to concentrate on the long E letter set, and to encourage our subjects to encode phonologically. One way to induce people to rely on a phonological code is to have people recite the letters aloud while they are waiting for the recall signal. Preliminary work suggested that people use one of two rehearsal strategies. Some people read each letter as it appeared and began to recite the string of six letters in order only after they have seen and read the sixth letter. Others rehearsed cumulatively, saying the first letter, then the first and second letter as the second letter appeared, and so forth. To keep the rehearsal strategy consistent, we instructed participants to use the cumulative strategy, providing training during the practice trials. To be sure that the participant’s own voice did not mask the irrelevant speech, participants heard the irrelevant speech through headphones. In this study we manipulated similarity within subjects, using the long E and dissimilar letter sets. There were three irrelevant speech conditions, words with the long E sound, words with no sounds in common, and silence. Method Participants. A total of 22 undergraduate students at John Carroll University, 13 men and 9 women, participated to partially fulfil a course requirement. Materials. The stimulus materials were the same as those used in the long E and unrelated letter conditions and the long E, unrelated and quiet irrelevant speech conditions in Experiment 1. There were 14 trials for each letter in each irrelevant speech condition. Participants heard the irrelevant speech over Sony stereo headphones (Model MDR-V200). For each participant the computer determined a different random order of trials of irrelevant speech conditions and letter set, in blocks of six trials. The order of the letters within each trial was also randomly determined for each participant. Procedure. Participants were instructed to say each letter aloud as they saw it on the screen and, after the first letter, to recite the previously seen letters while seeing each new letter. During the 12 practice trials participants were corrected if they failed to follow this procedure. By the time the fifth letter was being displayed, some subjects could not speak fast enough to say all the letters before the sixth letter appeared on the screen so they were permitted to just add the sixth letter at that point and to continue reciting the six letters in order while waiting for the recall cue. Participants were permitted to whisper the letters rather than speak them loudly if they preferred, as long as the experimenter could hear them reciting the letters. Following 12 practice trials, each participant completed 84 experimental trials. Results and discussion The results are shown in Table 2. A 2 ´ 3 (letter set by irrelevant speech type) within-subject ANOVA showed that there was a main effect for PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH TABL E 2 Experiment 2 Type of words in irrelevant speech Sounds shared by letters to be remembered Long E None Overall Long E Unrelated None 3.42 (1.00) 3.27 (0.86) 3.69 (0.87) 4.63 (0.76) 4.28 (0.89) 4.95 (0.46) 4.03 (0.73) 3.78 (0.81) 4.33 (0.55) Overall 3.46 (0.86) 4.62 (0.65) Average number (and standard deviations) of letters remembered in the correct position in a list, out of six letters, in Experiment 2. N = 22 studied 14 lists in each background condition. letter set F(1,21)= 54.342, p <.001. The expected phonological similarity effect was found. Participants recalled more letters in the correct position for the letters with no sounds in common than for long E letters. There was a main effect for irrelevant speech condition, F(2,42)= 18.28. Using the Bonferroni correction for multiple comparisons to examine the irrelevant speech conditions, participants recalled more letters in quiet than in the long E, p = .002, or unrelated word conditions, p <.001 and significantly more in the long E than in the unrelated word condition, p = .049. On this occasion, therefore, there was support for the prediction of the O-OER hypothesis that similar items would cohere into a single auditory ‘‘object’’, hence causing less disruption of recall. There was no significant interaction between letter type and irrelevant speech conditions F(2,42)= 1.41, p = .256. This absence of an interaction has two separate theoretical implications. First of all, there is no evidence that similarity between remembered items and irrelevant speech influences performance, as the simple mnemonic masking hypothesis predicted. Second, the failure of the irrelevant speech effect to remove the effect of phonological similarity, which is similar to the finding of Surprenant et al. (1999) that the phonological similarity effect is not abolished by irrelevant speech when participants say the letters as they see them, suggests that the relation of irrelevant speech to articulatory suppression must be more complicated than simply being the same phenomenon. Experiments 1 and 2 are consistent in showing first of all that the effect of phonological similarity withstands the influence of irrelevant speech, at least when sequence lengths are relatively 151 short. Second, we found no effect of phonological similarity between the items remembered and those ignored, a result that replicates other findings in the literature (Jones & Macken, 1995b; Le Compte & Shaibe, 1997). The question of whether phonologically similar streams interfere less than dissimilar ones shows some inconsistency between the two studies. Experiment 1 found a significant effect of similarity within the unattended material for only one of the three conditions, namely that involving recall of unrelated letters, whereas an effect was found for both types of remembered letters used in Experiment 2. In an attempt to resolve the discrepancy we conducted a third experiment using a new set of words that either shared a common initial speech sound, had a common rhyme, or were dissimilar. Each was combined with the requirement to remember sequences of dissimilar consonants. EXPERIMENT 3 The O-OER hypothesis predicts that strings of phonologically related words are likely to cohere into a single object, hence causing less interference. In Experiment 3, in order to optimise the chance of producing an effect, we used two types of similar words, one list having a common rhyme such as LAY and DAY, and another list having a common onset such as ALE and AID. It has been shown by Conrad using consonants, that similarity of both onset and rhyme influences both perceptual errors of auditorily presented letters and immediate recall of letter sequences (Conrad, 1964; Conrad & Hull, 1964). We chose words made up by re-ordering approximately the same phonemes as occurred in each of the words on the other list, so that differences other than the onset– rhyme difference were minimal. In addition to these two sets of similar irrelevant material, there were also irrelevant sequences that had no sounds in common, and a quiet control condition. The O-OER hypothesis would presumably predict that both of the similar irrelevant sequences would interfere less than the sequence comprising many different sounds. Method Participants. A total of 39 undergraduate students at John Carroll University participated in partial fulfilment of a course requirement. 152 LARSEN, BADDELEY, ANDRADE Materials. The irrelevant speech was recorded in the manner described for Experiment 1. The rhyming words were DAY, JAY, LAY, MAY, PAY, and SAY, the corresponding words with similar onset were AID, AGE, ALE, AIM, APE, and ACE, while the dissimilar words, with minimal phonemic overlap, were HOE, IT, KNEE, ODD, SHY, and UP. The material to be remembered comprised the letters B, F, H, K, Q, and R. Procedure. The letters were presented visually as in Experiments 1 and 2, and the irrelevant speech was presented over headphones as in Experiment 2. After 12 practice trials, participants did 80 trials with the three types of irrelevant speech and silence, randomised within blocks of four trials. Results and Discussion Table 3 shows the mean number of letters correct out of six for each of the four conditions, along with the standard deviations. A one-way withinsubject ANOVA indicated a significant effect of conditions, F(3,114)= 27.80, p <.001. Follow-up tests with the Bonferroni correction for multiple comparisons showed that more letters were remembered in the quiet condition than in each irrelevant speech condition, all ps <.001, but that there was no difference in the number of letters recalled in the three irrelevant speech conditions. Failure to observe an effect of similarity of either onset or rhyme on the magnitude of the effect of irrelevant speech failed to support the O-OER hypothesis. TABLE 3 Experiment 3 Type of words in irrelevant speech Similar onset Rhyming Dissimilar None Mean SD 4.80 4.78 4.69 5.37 0.74 0.75 0.80 0.51 Average number (and standard deviation) of letters remembered in the correct position in a list, out of six unrelated letters, in Experiment 3. The 39 participants studied 20 lists in each background condition. GENERAL DISCUSSION Experiments 1 and 2 corroborate the results of Jones and Macken (1995b) and Le Compte and Shaibe (1997) in finding no influence on the magnitude of the irrelevant speech effect of the degree of phonological similarity between the material to be remembered and the items being ignored. In the present study, similarity was maximised within and between both remembered and ignored sets. It therefore seems unlikely that the previous failures of Jones and Macken to find an effect could be attributable to a lower degree of similarity obtainable using a design in which similarity within lists is minimised. Finally, the failure by Le Compte and Shaibe (1997) to replicate the observation by Salame and Baddeley (1982) of reduced disruption in memory for digit sequences when irrelevant material was disyllabic removes even this rather slender support for the masking hypothesis. The mnemonic masking model was introduced largely because it was one of the apparently few detailed explanations possible within the limited specification of the phonological loop model. As the model contained no explanation of how serial order was retained, it was clearly not in a position to give a detailed interpretation of the irrelevant speech effect in terms of the serial order component of the process, unlike the approach of Jones which emphasises serial order as a primary feature. However, the nature of the influence of phonological similarity was itself largely unspecified, and paradoxically, a rival interpretation presented by Neath (in press) offers a way in which the mnemonic masking hypothesis could in fact be preserved, despite the clear lack of any effect of similarity between the remembered and ignored material. Neath (in press) has used the feature model (Nairne, 1990; Neath & Nairne, 1995) to produce simulated data that are similar to the effects of irrelevant speech on serial recall. The feature model assumes that the memory trace of an item is carried out by a number of features, making up the cue in primary memory. Forgetting occurs through interference when the cue’s features are re-set by disrupting material, which might result from either overwriting by subsequent items or feature adaptation from irrelevant speech. At first sight this seems very similar to the mnemonic masking hypothesis proposed by Salame and Baddeley (1982), but it differs in one crucial way. The degree of disruption is assumed to be deter- PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH mined by the overlap between the specific features in the item to be remembered, and those in the irrelevant sound. Note that when the remembered and irrelevant items are acoustically similar, the maximal overlap, and hence maximal interference, will occur on the features that are common, in the present case, the vowel sounds. Note also that these are highly redundant. Thus, similar irrelevant speech will differ from dissimilar irrelevant speech only in its capacity to disrupt the common, and hence totally redundant, vowel sound. Those components of the memory trace that are not dependent on the common vowel will presumably be disrupted just the same amount by similar and dissimilar words. The overall disruption is thus likely to be approximately the same in the similar and dissimilar irrelevant speech conditions. The crucial difference therefore between model used by Neath (in press) and the unspecified interference process assumed by Salame and Baddeley (1982) is thus in the detailed assumptions as to the way in which the mnemonic masking might occur. Salame and Baddeley made the apparently plausible assumption that similar items would mask more than dissimilar, whereas Neath and Nairne’s model defines similarity much more precisely. There is, of course, no reason why Salame and Baddeley could not simply accept Neath’s interpretation of the effect of similarity, and hence continue to maintain a modified mnemonic masking hypothesis. Although such a step might preserve the mnemonic masking hypothesis, it emphasises the danger of opting for simple verbal models of rich and complex tasks such as serial memory span. A more constructive step for a phonological loop model might be to attempt to incorporate the data on similarity within a more complete and more precisely specified model of serial recall—one that at the very least has the capacity to explain how serial order is maintained, a problem that has never been adequately tackled within the original phonological loop hypothesis. We will return to this issue after considering what other constraints the evidence from our three experiments might place on theories of the subsequent irrelevant speech effect. All three experiments showed effects of both irrelevant speech and of phonological similarity in the material to be recalled, although the detailed patterns were not always straightforward. In Experiment 1, for example, while the long vowel sound items consistently produced a phonological 153 similarity effect, the short vowel sounds were much less clear, both in the effect of phonological similarity and in their interaction with other variables. We suspect that this might be because the particular letters (F, L, M, N, S, and X) could have lent themselves to semantic coding of the type mentioned by some of the subjects in this experiment, to a greater extent than the letters with long E sounds, although it is also possible that they may simply have been less mutually phonologically similar. In the case of the long E letters however, a clear and significant difference from dissimilar letters occurred in the silent control conditions for both Experiments 1 and 2. A second question concerns the influence of irrelevant speech on this difference. It may be recalled that Colle and Welsh (1976) found no acoustic similarity effect in the presence of irrelevant speech, an effect also reported by Jones and Macken (1995b), leading Surprenant et al. (1999) to conclude that irrelevant speech obliterates the phonological similarity effect for visually presented items. In contrast, Salame and Baddeley (1986) report a slightly more complex picture, whereby the effects of both irrelevant speech and acoustic similarity interact with list length. For short lists, neither is present because of ceiling effects. When lists become long enough for errors to appear, then clear and additive effects of similarity and irrelevant speech are found, both disappearing when list length increases. It is suggested that this interaction with list length reflects a tendency for subjects to abandon phonological coding when their performance drops beyond a critical point, an effect that has been observed in a number of other comparable situations (Gathercole & Baddeley, 1990; Hall et al., 1983). The presence of a clear effect of phonological similarity across irrelevant speech conditions in both Experiments 1 and 2 lends further weight to the conclusion that, at moderate list lengths, effects of both phonological similarity and irrelevant speech are detectable. A third issue concerns the effect of phonological similarity among the irrelevant spoken items. The phonological masking hypothesis makes no prediction on this point, whereas the O-OER hypothesis predicts that similar items will be combined into a smaller number of auditory objects, hence leading to less disruption. Our evidence on this point is somewhat mixed. In Experiment 1, the only statistically reliable effect of similarity in the unattended words occurred when subjects were studying dissimilar letters, 154 LARSEN, BADDELEY, ANDRADE when subjects recalled significantly more items when ignoring the long E words than when ignoring dissimilar words (p = .033). Four of the other five potential comparisons were in the right direction, suggesting the possibility of a genuine but rather weak effect. A statistically significant effect was observed in Experiment 2, but was notably absent from Experiment 3, which focused on this issue. The most likely interpretation of this rather mixed pattern of results would seem to be that an effect probably does exist but is far from robust. If, as Jones plausibly argues, the effect stems from a tendency for subjects to incorporate similar letters into a unitary auditory percept, then it is not implausible to assume that this may be a relatively fragile effect which requires more careful control than simply specifying the nature of the irrelevant material. It clearly merits more detailed investigation. What are the broader theoretical implications of our results? One should perhaps begin by noting the comparative dearth of carefully worked out models of the irrelevant speech effect. This is most clearly illustrated in the case of the phonological loop interpretation, which had virtually no detailed specification at the modelling level. As we have seen, even the tentative suggestion of some form of mnemonic masking (Salame & Baddeley, 1982), which was subsequently discarded (Salame & Baddeley, 1986), does not give rise to clear predictions in the absence of a specific model of the way in which similarity influences performance. Hence the model is consistent with either an interaction, as suggested by Salame and Baddeley (1982), or with the absence of such an interaction, in the case of a feature-based interpretation of similarity such as that proposed by Neath (in press). Clearly, the phonological loop hypothesis is in need of much more precise specification. Fortunately, clearly specifiedmodels are now beginning to appear (Burgess & Hitch, 1996; Henson, 1998; Page & Norris, 1998a,b), and it is surely only a matter of time before they are applied to the irrelevant sound effect. The O-OER hypothesis proposed by Jones (1993) meets with mixed success in accounting for the results obtained. Experiment 2 clearly supported the prediction that similar irrelevant items would disrupt performance less than dissimilar, but only mixed support was provided by the first experiment, while Experiment 3 clearly did not support the hypothesis. Furthermore, although the O-OER hypothesis explicitly refers to serial order, the exact manner in which it operates remains poorly specified, making it difficult to test directly. The model does, however, have two assumptions that explicitly differentiate it from the phonological loop approach. The first concerns the assumption that the effects of irrelevant speech and of articulatory suppression have a common source, an assumption that is also made by Macken and Jones (1995) and also Neath (in press). The second concerns the assumption that both visual and verbal memory have a similar basis (Jones, Farrand, Stuart, & Morris, 1995). Although we take issue with both of these assumptions, discussion is beyond the limits of the present study. A third theoretical approach is that recently outlined by Neath (in press). As this implementation of the feature model expressly predicts that the effects of phonological similarity on recall of visually presented sequences will be abolished by irrelevant speech, our finding of a phonological similarity effect in some of the irrelevant speech conditions in Experiment 1 appears to be inconsistent with this model. Although our data do not offer resounding support for either a masking hypothesis or the O-OER hypothesis, the clear absence of an interaction between the similarity within the remembered and irrelevant lists clearly favours the latter hypothesis over the Salame and Baddeley (1982) version of mnemonic masking. Furthermore, although the effects are relatively weak, the tendency in several conditions for similar irrelevant items to disrupt recall less also offers some support to Jones’s position. However, while in the absence of well worked-out alternative explanations of the irrelevant speech effect the O-OER hypothesis is clearly the front-runner, if one moves beyond this phenomenon, there are a number of reasons for being reluctant to abandon the phonological loop model in favour of the O-OER hypothesis. First of all, the phonological loop hypothesis gives a good account of a range of effects including those of phonological similarity, word length, articulatory suppression, and the interaction of these variables. It is not clear that the O-OER hypothesis can give a good account of this rich array of robust results. For example, in attempting to extend the generality of the O-OER hypothesis, Macken and Jones (1995) have argued for the functional equivalence of articulatory suppression and the irrelevant speech effect. However, Gupta and MacWhinney (1995) have presented evidence for a separation of these two effects. Furthermore, if as Jones (1993) proposes, PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH repetition of a single unattended item has virtually no effect on concurrent memory performance, then one might expect that articulatory suppression would be ineffective when a single item is repeated. However, this is one of the most common methods of suppression, with frequent repetition of a single word such as ‘‘the’’ causing marked disruption (e.g., Murray, 1968). A further extension of the O-OER hypothesis has been to suggest that visual and verbal memory may have a similar basis, with both showing evidence of disruption from concurrent activity (Jones et al., 1995). However, such a view is quite inconsistent with the neuropsychological evidence, which indicates that digit span may be disrupted while spatial span is preserved (Basso, Spinnler, Vallar, & Zanobio, 1982), and performance on the spatial Corsi block span test may be impaired while digit span is well preserved (Hanley, Young, & Pearson, 1991). Further evidence for a dissociation between verbal and spatial span comes from the recent work contrasting performance on visual and spatial immediate memory of people with Downs Syndrome, for whom verbal span is particularly poor relative to spatial span, and those with Williams Syndrome who show the opposite pattern (Wang & Bellugi, 1994). Finally, PET studies suggest that quite different areas of the brain are involved in visual and verbal immediate memory performance (see Smith, Jonides, & Koeppe, 1996 for a review). The failure of the O-OER hypothesis to account for the neuropsychological and other interference data may, of course, reflect incompleteness rather than inadequacy. While there is abundant evidence for separate visual and auditory memory systems, there is also evidence for a degree of interaction, such that serial verbal recall of visually presented material may be influenced by such characteristics as visual complexity (Chincotta & Underwood, 1997) and visual similarity (Della Sala et al., 1999). Similarly, it is clear that verbal serial recall can be influenced by semantic (Baddeley, 1970; Baddeley & Levy, 1971) and lexical factors (Hulme, Maughan, & Brown, 1991). In so far as working memory is capable of storing and manipulating multimodal information, then there will be a need for a level of storage that extends beyond the specific inputs. As discussed earlier, our data are still compatible with a modified mnemonic masking hypothesis, but even if this in turn were disproved, it would not make it necessary to abandon the 155 whole phonological loop model of verbal recall. One could concur with Jones’s proposal that the irrelevant speech effect operates through disruption of order cues, and still maintain the remainder of the model. The simple verbal model of working memory, presented by Baddeley (1986), does not contain a mechanism capable of reproducing serial order, and this omission has stimulated a number of recent attempts to provide a more detailed specification of the phonological loop. The first such model to be formulated was that of Burgess and Hitch (1992). This model explains serial order by separating out the phonological store from the series of contextual cues that specify the order in which the items are retrieved. As a result of criticism of some of its assumptions, the model has been modified (Burgess & Hitch, 1996), but both models separate the storage of item information from that of order, a process that relies on contextual cues. It is entirely plausible to assume that irrelevant speech might disrupt this process of contextual order cueing. It remains to be seen whether this model can give an account of the full range of irrelevant speech data. Another attempt to give an account of serial verbal recall is provided by the primacy model of Page and Norris (1998a,b). This also involves two separate stages, one in which the incoming items are associated with a cue representing the beginning of the list, while the second is a readout mechanism that selects the strongest trace, reproduces it, and subsequently inhibits it to prevent its reappearance as an intrusion. Logically, the irrelevant speech effect could occur at either the item store or the read-out stage. It should be possible to use the model to explain irrelevant speech effects, once there is general agreement about the phenomena that must be explained. What are these phenomena? Unfortunately the area has so far suffered from being studied by a sequence of different investigators. First Colle, then Salame and Baddeley, more recently Jones, and subsequently Le Compte have each tended to concentrate on a somewhat different question, and to use their own favoured techniques. Consequently, while there is no doubt about the robustness of the basic phenomena, there has simply not yet been sufficient cross-laboratory replication of some of the subtler effects. The present study suggests that there is clearly a need for a concerted effort to establish which effects are sufficiently robust to be used to test the computational models, and which should be addressed 156 LARSEN, BADDELEY, ANDRADE when we have a better understanding of the empirical phenomena. Fortunately, there are signs that the irrelevant speech or sound effect is being studied in an increasingly wide range of laboratories, suggesting that it should not be long before we achieve a sufficiently rich and detailed data base to allow the various computational models to be adequately tested. Manuscript received 10 June 1998 Manuscript accepted 3 November 1999 REFERENCES Baddeley, A.D. (1966a). The influence of acoustic and semantic similarity on long-term memory for word sequences. Quarterly Journal of Experimental Psychology, 18, 302–309. Baddeley, A.D. (1966b). Short-term memory for word sequences as a function of acoustic, semantic, and formal similarity. Quarterly Journal of Experimental Psychology, 18, 362–365. Baddeley, A.D. (1970). Simultaneous acoustic and semantic coding in short-term memory. Nature, 227, 288–289. Baddeley, A.D. (1986). Working memory. Oxford: Oxford University Press. Baddeley, A.D., & Levy, B.A. (1971). Semantic coding and short-term memory. Journal of Experimental Psychology, 89, 132–136. Baddeley, A.D., & Salame, P. (1986). The unattended speech effect: Perception or memory? Journal of Experimental Psychology, 12, 525–529. Basso, A., Spinnler, H., Vallar, G., & Zanobio, E. (1982). Left hemisphere damage and selected impairment of auditory verbal short-term memory: A case study. Neuropsychologia, 20, 263–274. Burgess, N., & Hitch, G.J. (1992). Towards a network model of the articulatory loop. Journal of Memory and Language, 31, 429–460. Burgess, N., & Hitch, G.J. (1996). A connectionist model of STM for serial order. In S.E. Gathercole (Ed.), Models of short-term memory. Hove, UK: Psychology Press. Chincotta, D., & Underwood, G. (1997). Bilingual memory span advantage for Arabic numerals over digit words. British Journal of Psychology, 88, 295– 310. Colle, H.A. (1980). Auditory encoding in visual shortterm recall: Effects of noise intensity and spatial location. Journal of Verbal Learning and Verbal Behavior, 19, 722–735. Colle, H.A., & Welsh, A. (1976). Acoustic masking in primary memory. Journal of Verbal Learning and Verbal Behavior, 15, 17–32. Conrad, R. (1964). Acoustic confusion in immediate memory. British Journal of Psychology, 55, 75–84. Conrad, R., & Hull, A.J. (1964). Information, acoustic confusion and memory span. British Journal of Psychology, 55, 429–432. Della Sala, S., Gray, C., Baddeley, A.D., Allamano, N., & Wilson, L. (1999). Pattern span: A tool for unwelding visuo-spatial memory. Neuropsychologia, 37, 1189–1199. Gathercole, S.E., & Baddeley, A.D. (1990). Phonological memory deficits in language disordered children: Is there a causal connection? Journal of Memory and Language, 29, 336–360. Gupta, P., &MacWhinney, B. (1995). Is the articulatory loop articulatory or auditory? Re-examining the effects of concurrent articulation on immediate and serial recall. Journal of Memory and Language, 34, 63–88. Hall, J.W., Wilson, K.P., Humphreys, M.S., Tinzmann, M.B., & Bowyer, P.M. (1983). Phonemic similarity effects in good versus poor readers. Memory and Cognition, 11, 520–527. Hanley, J.R., Young, A.W., & Pearson, N.A. (1991). Impairment of the visuo-spatial sketchpad. Quarterly Journal of Experimental Psychology, 43A, 101–125. Hellbruck, J., & Kilcher, H. (1993). Effects on mental tasks induced by noise recorded and presented via artificial head system. In M. Vallet (Ed.), Noise and Man (pp.315–322). Arcueil, France: Institut National de Recherche sur les Transports et Leur Sécurité. Henson, R.A. (1998). Short term memory for serial order. Cognitive Psychology, 36, 73–137. Hulme, C., Maughan, S., & Brown, G.D.A. (1991). Memory for words and nonwords: Evidence for a long term memory contribution to short-term memory tasks. Journal of Memory and Language, 30, 685–701. Johnston, R.S., Rugg, M.D., & Scott, T. (1987). Phonological similarity effect, memory span and developmental reading disorders: The nature of the relationship. British Journal of Psychology, 78, 205– 211. Jones, D.M. (1993). Objects, streams and threads of auditory attention. In A.D. Baddeley & L. Weiskrantz (Eds.), Attention: Selection, awareness and control (pp.87–104). Oxford: Clarendon Press. Jones, D., Farrand, P., Stuart, G., & Morris, N. (1995). Functional equivalence of verbal and spatial information in serial short-term memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 1008–1018. Jones, D.M., & Macken, W.J. (1993). Irrelevant tones produce an irrelevant speech effect: Implications for phonological coding in working memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 369–381. Jones, D.M., & Macken, W.J. (1995a). Auditory babble and cognitive efficiency: Role and number of voices and their location. Journal of Experimental Psychology: Applied, 1, 216–226. Jones, D.M., & Macken, W.J. (1995b). Phonological similarity in the irrelevant speech effect. Within- or between-stream similarity? Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 103–115. Jones, D., Madden, C., & Miles, C. (1992). Privileged access by irrelevant speech to short term memory: The role of changing state. The Quarterly Journal of Experimental Psychology, 44A, 645–669. PHONOLOGICAL SIMILARITY/IRRELEVANT SPEECH Kilcher, H., & Hellbruck, J. (1993). The irrelevant speech effect: Is binaural processing relevant or irrelevant? In M. Vallet (Ed.), Noise and Man (pp.323–326). Arcueil, France: Institut National de Recherche sur les Transports et Leur Sécurité. Le Compte, D.C., & Shaibe, D.M. (1997). On the irrelevance of phonological similarity to the irrelevant speech effect. Quarterly Journal of Experimental Psychology, 50A, 100–118. Macken, W.J., & Jones, D.M. (1995). Functional characteristics of the inner voice and the inner ear: Single or double agency? Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 436– 448. Mann, V.A., Liberman, I.Y., & Shankweiler, D. (1980). Children’s memory for sentences and word strings in relation to reading ability. Memory & Cognition, 8, 329–335. Murray, D.J. (1968). Articulation and acoustic confusability in short-term memory. Journal of Experimental Psychology, 78, 679–684. Nairne, J.S. (1990). A feature model of immediate memory. Memory & Cognition, 18, 251–269. Neath, I. (in press). Modelling the effects of irrelevant speech on memory. Psychonomic Bulletin & Review. Neath, I., & Nairne, J.S. (1995). Word length effects in immediate memory: Overwriting trace decay theory. Psychonomic Bulletin & Review, 2, 429–441. Page, M., & Norris, D. (1998a). Modeling immediate serial recall with a localist implementation of the primacy model. In J. Grainger &A.M. Jacobs (Eds.), Localist connectionist approaches to human cognition (pp.227–255). Mahwah, NJ: Lawrence Erlbaum Associates Inc. Page, M., & Norris, D. (1998b). The primacy model: A new model of immediate serial recall. Psychological Review, 105, 761–781. 157 Salame, P. (1990). Effects of music, speech-like noise, and irrelevant speech on immediate memory. In B. Bergland & T. Lindvall (Eds.), Noise as a public health problem (pp.411–423). Stockholm: Swedish Council for Building Research. Salame, P., & Baddeley, A.D. (1982). Disruption of short-term memory by unattended speech: Implications for the structure of working memory. Journal of Verbal Learning and Verbal Behavior, 21, 150– 164. Salame, P., & Baddeley, A.D. (1986). Phonological factors in STM; Similarity and the unattended speech effect. Bulletin of the Psychonomic Society, 24, 263–265. Salame, P., &Baddeley, A.D. (1987). Noise, unattended speech and short-term memory. Ergonomics, 30, 1185–1194. Salame, P., & Baddeley, A.D. (1989). Effects of background music on phonological short-term memory. Quarterly Journal of Experimental Psychology, 41A, 107–122. Smith, E., Jonides, J., & Koeppe, R.A. (1996). Dissociating verbal and spatial working memory using PET. Cerebral Cortex, 6, 11–20. Surprenant, A.M., Neath, I., &Le Compte, D.C. (1999). Irrelevant speech, phonological similarity, and presentation modality. Memory, 7, 405–420. Wang, P.P., & Bellugi, U. (1994). Evidence from two genetic syndromes for a dissociation between verbal and visual-spatial short-term memory. Journal of Clinical and Experimental Neuropsychology, 16, 317–322.