Do perfume additives termed human pheromones warrant being

attractiveness of the wearers of the substances to the other sex. It is concluded that more research using matched homogenous groups of participants is needed.
163KB taille 0 téléchargements 213 vues
Physiology & Behavior 82 (2004) 697 – 701

Do perfume additives termed human pheromones warrant being termed pheromones? Anders Winman* Department of Psychology, Uppsala University, Box 1225, SE-751 42, Uppsala, Sweden Received 18 September 2003; received in revised form 4 June 2004; accepted 4 June 2004

Abstract Two studies of the effects of perfume additives, termed human pheromones by the authors, have conveyed the message that these substances can promote an increase in human sociosexual behaviour [Physiol. Behav. 75 (2003) R1; Arch. Sex. Behav. 27 (1998) R2]. The present paper presents an extended analysis of this data. It is shown that in neither study is there a statistically significant increase in any of the sociosexual behaviours for the experimental groups. In the control groups of both studies, there are, however, moderate but statistically significant decreases in the corresponding behaviour. Most notably, there is no support in data for the claim that the substances increase the attractiveness of the wearers of the substances to the other sex. It is concluded that more research using matched homogenous groups of participants is needed. D 2004 Elsevier Inc. All rights reserved. Keywords: Human pheromones; Sexual behaviour; Perfume additives

1. Introduction In a recent issue of this journal [1], the effect of a putative female pheromone on the sociosexual behaviour in young women was investigated. (The term ‘putative pheromones’ is used because there is no real evidence that the compounds studied are natural human products and the compounds have not been identified.) In a double-blind and placebo-controlled study, some participants had the synthesized putative pheromone added to their perfume. The main results of the study was that a significantly greater proportion of participants in the experimental group increased over baseline in frequency of the dependent variables labelled sexual intercourse, sleeping next to a partner, formal dates and petting/affection/kissing. The authors concluded that the compound acted as a sex attractant in increasing the attractiveness of women to men. In the following, a critique and a reanalysis of the presentation and interpretation of the data of Ref. [1] is offered. In any study involving the comparison of a control group and an experimental group, it is important to obtain homog-

* Tel.: +46-18-471-2162; fax: +46-18-471-2202. E-mail address: [email protected] (A. Winman). 0031-9384/$ – see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.physbeh.2004.06.006

enous groups of participants. As noted in Ref. [1], the control group and experimental group differed on a number of potentially important attributes already at the initial baseline period. For instance, the experimental group reported more male approaches than the control group ( P < .05), a lower degree of sleeping next to a romantic partner ( P < .10), being younger than the control group ( P < .10) and of shorter height ( P < .05). More conspicuously, the prior distribution of participants’ relational status (‘not dating’, ‘dating’, and ‘dating steadily’) was statistically significantly different (v2 = 7.01, P < .05; n = 36, df = 2) with more participants in the ‘dating’ category in the experimental group and more ‘dating steadily’ participants in the control group.1 Although the authors tried to control for the influence of some of these variables (i.e., height and age), one may still wonder if the observed results depended on any of these initial differences.

1 For unknown reasons, the data for the ‘dating’ and ‘dating steadily’ categories are collapsed in the statistical analyses of Ref. [1] and it is concluded that similar proportions in both groups are dating as are not dating. It is quite straightforward to conceive, however, that the distinction between ‘dating’ and ‘dating steadily’ is of importance in a study of sexual behavior. The latter group may, for instance, be less prone to actively seek new acquaintances of the opposite sex.

698

A. Winman / Physiology & Behavior 82 (2004) 697–701

Table 1 Average weekly number of behaviours during baseline and treatment for the experimental and control groups, respectively (from Ref. [1]) Pheromone group (women) a

PAK ROMa INTa FDATa IFDAT MALE

Baseline Treatment t(df = 18) P

no + no

v2

P

1.8 0.97 0.63 1.03 1.24 2.44

11 11 9 10 9 7

2.4 1.56 0.64 0.24 0.06 0.5

N.S. N.S. N.S. N.S. N.S. N.S.

2.2 1.32 0.96 1.06 1.25 2.14

1.42 1.24 1.43 0.12 0.06 1.56

N.S N.S. N.S. N.S. N.S. N.S.

4 5 5 7 8 11

Control group (women) a

PAK ROMa INTa FDATa IFDAT MALE

Baseline Treatment t(df = 16) P

no + no

v2

P

2.35 2.3 1.26 0.68 0.85 1.41

4 3 1 3 8 8

1.23 1.45 2.29 0.44 0.07 0

N.S. N.S. N.S. N.S. N.S. N.S:

1.78 1.7 0.76 0.35 0.81 1.71

2.2 2.11 2.09 1.75 0.17 1.33

< .05 < .10 < .10 < .10 N.S. N.S.

9 8 6 6 6 7

Dependent t tests and chi-square goodness of fit measures (with Yates’ correction) for the proportion of participants increasing (no+)/decreasing (no ) over baseline. PAK = petting/affection/kissing, ROM = sleeping next to a romantic partner, INT = sexual intercourse, FDAT = formal dates, IFDAT = informal dates, MALE = male approaches. a Variable for which a significant result was reported in Ref. [1].

Although the baseline significant differences are not elegant, and warrant future research, they do not necessarily negate the findings of the study. For example, these worries could, to a high degree, be mitigated by within-groups comparisons showing, for example, increases in the sexual behaviour of the participants in the experimental group over baseline. However, no summary statistics are reported for the baseline and experimental time periods separately for the experimental and control group and the analyses are confined to chi-square tests of the proportion of participants that show an increase in a particular behaviour in both groups. In the present pretest – posttest control-group design, with an a priori hypothesis that the experimental group will increase in a behaviour, it is necessary to (a) establish that a statistically significant increase in the behaviour has occurred as predicted and (b) to eliminate a placebo explanation by showing that the control group does not show a corresponding increase in behaviour. This implies that (c) it can be demonstrated that the experimental and control group differ from each other in the posttest. In Ref. [1], neither of these tests are performed. Because descriptive statistics are absent altogether in the paper, we do not know whether the experimental group engage in more sexual behaviour or not with the additive present in the perfume. It is possible that the observation of a larger proportion of participants increasing in a behaviour in either of the groups is due to the fact that the behaviour in the other group for some unknown reason decreases. For example, because there is a higher proportion of ‘dating steadily’ participants in the control group than in the experimental group, a behaviour decrease could occur

as a result of the category having high expectations in their partners’ behaviour that turn into frustration and lack of sexual interest after a couple of weeks of observation of no apparent effects. It is argued in Ref. [1] that (because of the ‘‘considerable variability in subjects’ behaviour at baseline’’), what is important in the statistical analysis is whether or not a particular participant increased over her baseline, that average weekly behaviour of a group is an irrelevant measure and that using ANOVA or Mann – Whitney-type statistics would impose ‘‘an error in logic’’, although it is unclear whereof this error in logic would consist. As a consequence, the study merely presents nonparametric statistical tests comparing the proportion of participants increasing in a particular behaviour for both groups, respectively. However, there is no reason to doubt that any assumption underlying parametric tests is violated.2 The logic of a within-subjects parametric test, such as a dependent t test, does take into consideration the individual variation; it is the point of such a test. The present analysis will show within-groups t tests and group averages. In addition, nonparametric within-subjects tests will be supplemented. The rationale of the paper is thus to extend the analyses in Ref. [1] with within-group comparisons. The question is whether the participants in the experimental group actually increase their sociosexual behaviour over baseline to a statistically significant degree. Table 1 shows averages and dependent t tests for the dependent measures of sociosexual behaviour involving a partner for the pheromone and control group, respectively (data from Ref. [1]). As can be seen, there is no statistically significant difference for any measure in the experimental group. For the control group, however, there are statistically significant or marginally significant ( P < .10) effects for four out of six variables. These variables coincide with those for which a statistically significant effect was reported in Ref. [1]. As is seen, all these differences are due to a decrease in sociosexual behaviour. It is argued [1] that it is important to rely on a measure of how many participants in the experimental group increase over their baseline behaviour. Taking this logic to a within-groups comparison, we can compare the number that shows an increase in sociosexual behaviour to the number that shows a decrease. Effects of the substance on sociosexual behaviour should reveal itself by a larger proportion increasing than decreasing in the behaviour.3 A random model on the other hand would postulate that all changes are due to chance fluctuations and that the difference in proportions 2

In fact, this seems to be implicitly assumed in Ref. [1] because independent t tests are used to compare the groups at baseline for the sociosexual behaviours. 3 It is possible, however, that, for example, a response bias make it less likely for participants to report events, something that would obscure a true increase in behaviour in the experimental group. Fortunately in Ref. [1], both accuracy and backfilling of self-reports are reported to be adequate and comparable between the groups.

A. Winman / Physiology & Behavior 82 (2004) 697–701

can well have arisen from sampling error. Columns 5 –8 of Table 1 show the number of participants for whom there is a behavioural decrease vs. increase in both groups, followed by chi-square goodness of fit measures (with Yates’ correction) with equal proportions as the expected frequencies. As is evident, there are no significant within-groups chi-squares in the study. The proportion of participants increasing in three or more behaviours was presented in Ref. [1], supposedly supporting the robustness of the effect. However, a chi-square test comparing this number (14 participants) with the number of participants who decrease in three or more behaviours (8 participants) is not statistically significant [v2 (df =1) = 1.14]. Again, the reported significant result is driven by a decrease in sociosexual behaviour in the control group where more (9) participants decrease in three or more behaviours than those that increase to a similar extent (4 participants), although this difference also fails to reach statistical significance [v2(df = 1) = 1.23, N.S.]. Similarly, the average number of positive behaviour changes is not significantly different from the average number of negative changes, neither in the experimental group (mean = 3.0, mean negative changes = 2.1), t(18) = 1.58, N.S., nor in the control group (mean positive changes = 1.59, mean negative changes = 2.47), t(18) = 1.09, N.S. Fig. 1 shows averages (and 95% confidence intervals) of the composite measure of the total numbers of behaviour changes involving a partner for the placebo and experimental groups, respectively. It can be seen in the figure that there is a decreasing trend in the control group and a slight increase in the experimental group, but that both of these changes are very moderate in relation to the confidence intervals. In Ref. [1], it is frequently argued that data show that not only is there an effect of the tested substance, but that this effect is due to an increase in the sexual attractiveness of

Fig. 1. Averages and 95% confidence intervals of the number of sociosexual behaviours at baseline and experimental period for the control (5) and experimental groups (n), respectively. Data from Ref. [1].

699

Table 2 Average weekly number of behaviours during baseline and treatment for the experimental and control groups, respectively (from Ref. [2]) Pheromone group (men)

PAK ROM INT FDAT IFDAT

Baseline Treatment t(df = 16) P

no +

no

v2

P

1.94 2.59 0.67 0.62 0.59

8 6 8 7 6

6 5 3 5 4

0.07 0.00 1.45 0.08 0.10

N.S. N.S. N.S. N.S. N.S.

Baseline Treatment t(df = 20) P

no +

no

v2

P

1.55 1.52 0.69 0.78 0.35

4 3 3 7 4

11 5 8 7 4

2.4 0.13 1.45 0 0

N.S. N.S. N.S. N.S. N.S.

1.87 2.45 0.81 0.59 0.32

0.27 0.52 1.2 0.19 0.97

N.S N.S. N.S. N.S. N.S.

Control group (men)

PAK ROM INT FDAT IFDAT

1.27 1.37 0.47 0.64 0.27

1.64 1.5 1.5 1.09 0.70

N.S. N.S. N.S. N.S. N.S.

Dependent t tests and chi-square goodness of fit measures (with Yates’ correction) for the proportion of participants increasing/decreasing over baseline. PAK = petting/affection/kissing, ROM = sleeping next to a romantic partner, INT = sexual intercourse, FDAT = formal dates, IFDAT = informal dates.

women to men rather than due to an increased motivation in females. This claim is unfounded. For the only measure with an obvious potential of unambiguously capturing such an effect (i.e., number of male approaches), the proportion of participants noting an increase over the baseline figure is lower than the number of participants that experience a decrease. Thus, anyone in the experimental group who is keen to be approached by men is paradoxically better off without the putative pheromone added to her perfume than with it. Posttreatment between group comparisons reveal only one significant difference, the number of formal dates, which is significantly higher in the experimental group than in the control group, t(34) = 2.69, P < .05. This difference is due to a reduction of the behaviour in the control group, where it almost is halved, whereas it remains virtually unchanged (baseline = 1.02, experimental = 1.06) in the experimental group. To summarize, in the data of Ref. [1], there is no evidence of an increase in sociosexual behaviour in the group given the putative pheromone for any dependent measure. The only statistically significant within-groups differences are found in the control group where some sociosexual behaviours overall decrease moderately, perhaps because of regression effects. The reason for the apparent decrease in sociosexual behaviour of the control group is hard to guess. One possibility is that there is a general tendency for behaviour to decrease, for instance, as a long-term result of participants’ negative experiences in being observed that is counteracted by a true effect of pheromone in the experimental group. Another possibility is that some of the initial differences between the experimental and control groups are responsible for the result which thus merely are caused by sampling error.

700

A. Winman / Physiology & Behavior 82 (2004) 697–701

Interestingly, a study [2] tested the effect of putative male pheromones (a substance commercially marketed by the first author of the paper at the Athena Institute under the label Athena Pheromone 10X) on sociosexual behaviour of men (see also Ref. [3]). The reported results showed that significantly more experimental than placebo users increased above baseline in ‘sexual intercourse’ and ‘sleeping with a romantic partner’. Because this study largely relies on identical methods and statistical analyses as Ref. [1], and all raw data are presented, a comparison of the analyses is of interest. Table 2 shows the results of Ref. [2] analyzed in the same way as the previous study. There are no statistically significant differences in any of the groups, the t values were somewhat higher in the control group, which appears to be associated with a slight decrease in behaviour. Again, the number in the pheromone group that increased in three or more behaviours (7) is not statistically significant from the number that decreased in three or more behaviours (3) [v2(df = 1) = 0.9, N.S.]. The corresponding numbers in the control group (3 increasing vs. 7 decreasing) is likewise not differing to a statistically significant degree [v2(df = 1) = 0.9, N.S.).4 Because the data for all 8 weeks of the study is presented in Ref. [3], the material was divided into subcategories of pairs of study weeks to search for sequence effects. A decrease in behaviour of the control group due to a negative long-term effect of being tested or a seasonal change in behaviour would be gradual and cumulative, and thus accumulate over time (because participants are tested during baseline as during the experimental period and seasonal changes in behaviour do not occur from one week to the next). If the effects were due to chance factors, such as sampling error, when distributing participants over the conditions, one would expect no such gradual build-up. Fig. 2 shows average number of sociosexual behaviour at Study Weeks 1 – 2 (baseline), 3 –4, and 5 –6 separately. As evident, for the control group, the baseline average is slightly higher than the other averages, but there is no decreasing trend over time of study. For the experimental group, the behaviour is very stable over the entire study, resembling a schoolbook illustration of a null effect. In fact, the average number of behaviours is not higher at any of the experimental weeks than at baseline. Repeated-measures ANOVA with study week as repeated factor shows no statistical significance [ F(3,48) = 0.4, N.S.]. 4 In this context, it should be pointed out that according to the scheme of coding behavioural increases in Ref. [2], one participant (S48 in the control group) is erroneously coded in Table 2 as not showing an increase in the behaviour ‘sleeping next to a romantic partner’. With the behaviour sequence, 01100112 the participant should be recorded as showing an increase over baseline. When correcting this coding error, the test reported to be statistically significant for this variable ceases to be so by the conventional .05 a level. The contingency is not statistically significant even with a one-tailed Fischer Exact Test. The error also invalidates the claim in Ref. [4] that ‘‘an ‘infinitely’ higher proportion of experimental than placebo users showed an increase in 4 or more behaviours because the placebo group had no (0) men who met this criterion’’ (p. 632). S48 thus would meet this criterion if coded correctly.

Fig. 2. Averages and 95% confidence intervals of the number of sociosexual behaviours during 8 study weeks. The experimental manipulation is introduced at the third week after 2 weeks of baseline measuring for the control (5) and experimental groups (n), respectively. Data from Ref. [2].

Similarly, the number of participants for whom the average increases from baseline to the experimental period is lower than the number for which it decreases. Although the decrease in the control group appears modest in size, the corresponding ANOVA shows that this decrease actually reaches statistical significance [ F(3,60) = 2.8, P < .05]. A post hoc least significant difference (LSD) test shows that this significance is between the baseline and Study Weeks 3 –4 and between baseline and Study Weeks 7 –8. Finally, posttreatment between-group comparisons reveal no significant differences on any of the dependent measures between the groups. Thus, 6 weeks of exposure to the substance does apparently not make any kind of sexual behaviour more frequent in the experimental group than in the control group. The men in the experimental group did not perceive positive results, something which is said to reflect the fact that they ‘‘. . .did not always accurately perceive the romance in their lives’’ (p. 10). On the contrary, as the present analysis shows, the men were quite accurate in failing to perceive a positive change, since none took place.

2. Conclusion The present paper has presented an extended analysis of the results of Refs. [1,2] with within-subject tests. Three main findings are of interest: (1) In none of the 11 presented parametric within-subjects tests of both studies here is there a statistically significant increase in a behaviour in the experimental groups. This is true for the same number of nonparametric tests. All statistically significant withingroups changes found are in terms of modest behavioural decreases in the control groups. This fact has been obscured

A. Winman / Physiology & Behavior 82 (2004) 697–701

in previous studies by focusing on particular statistical comparisons and it is directly contrary to the stronger erroneous claim made in the discussion of Ref. [1] where it is stated that the putative pheromone ‘‘caused a statistically significant and distinct increase over average weekly baseline in four sociosexual behaviours’’ (p. 374). Again, no statistically significant increases over baseline can be shown. (2) The claim in Ref. [4] that the data are statistically highly robust is too strong. Even when counting cumulative increases in several behaviours do we fail to reject a simple chance null hypothesis with a within-subjects test in any study. More participants would be needed to increase power. (3) Finally, the claim in both studies that the action of pheromones is through an increase in attractiveness of the opposite sex depends on unconvincing assumptions is at best premature and receives no support in data. As shown, the only measure with potentiality of directly tapping this, the number of male approaches in Ref. [1], actually seems to suggest the opposite, with male approaches in the experi-

701

mental group being more common during baseline than during the experimental period. The putative pheromones used in the two studies may well still be found to have effects, but this remains to be unambiguously shown. A suggestion for future studies is the absolute necessity to use groups of participants that are matched on relevant factors (such as relational status) to increase homogeneity and statistical power. References [1] McCoy NL, Pitino L. Pheromonal influences on sociosexual behaviour in young women. Physiol Behav 2002;75:367 – 75. [2] Cutler WB, McCoy NL, Friedmann E. Pheromonal influences on sociosexual behavior in men. Arch Sex Behav 1998;27:1 – 13. [3] Wysocki CJ, Preti G. Pheromonal influences. Arch Sex Behav 1998;27: 627 – 9. [4] Cutler WB, McCoy NL, Friedmann E. Pheromonal influences on sociosexual behavior: response to Wysocki and Preti. Arch Sex Behav 1998;27:629 – 34.