Media coverage of Star Wars : The Empirics Strike Back - Marc Sangnier

Apr 10, 2016 - to mention students and supervisors, have a general tendency to appreciate ... Rather than simply asserting that economists don't understand significance, and are .... Economic Logic: Test statistics and the publication game.

Télécharger le PDF

4MB taille 4 téléchargements 241 vues

commentaire

Report

Media coverage of

Star Wars : The Empirics Strike Back

PSE Working Paper N◦ 2012-29, 2012 American Economic Journal : Applied Economics, Vol. 8(1), pp. 1-32, 2016 Abel Brodeur

Mathias Lé

Marc Sangnier

Yanos Zylberberg

1. June

28th , 2012

- Aid Thoughts

2. June

28th , 2012

- Marginal Revolution

3. June

29th , 2012

- Lorenzo Burlon's Blog

4. June

30th ,

- A (Budding) Sociologist's Commonplace Book

2012

0. March

22nd , 2013

- IZA Newsroom

5. March

29th , 2013

- Delopment Impact

6. April

1st ,

7. April

11th , 2013

- Economic Logic

8. April

17th ,

- Econometrics Beat Dave Giles' Blog

9. January 10. October

- Chris Blattman's Blog

2013

24th ,

2014

15th , 2014

11. February 12. April

2013

24th ,

- Running Randomized Evaluations - Statistical Modeling, Causal Inference, and Social Science

2015

23rd , 2015

- Blog.alphaarchitect.com

- Deutschlandradio Kultur

13. December

10th , 2015

- ResearchGate Blog

14. December

23rd ,

- Cherokee Gothic

2015

15. January

21st , 2016

16. January

22nd , 2016

17. March 18. April 19. June

1st ,

2016

10th , 2016 20th , 2016

0. December

16th ,

- The Economist / Free Exchange Blog - GemNews (taken over from the Free Exchange Blog)

- Videnskab.dk - Tarjomaan.com (taken over from the Free Exchange Blog) - Ekonomistas.se

2016

- Leamer-Rosenthal Prize

» My God, it’s full of stars!

http://aidthoughts.org/?p=3429

Digesting the difficult decisions of development HOME

ABOUT THE BLOG

CONTACT

My God, it’s full of stars! POSTED ON JUNE 28, 2012 BY MATT

Categories Africa Aid Development Economics Elections Governance History Middle-East Political Economy Poverty porn Research Uncategorized Worst practices

Recent posts: It must have been a reviewer error

There’s a new paper out which exams reported levels of statistical significance in (mainly) empirical economics papers from the top three journals: Journals favor rejections of the null hypothesis. This selection upon results may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above .25, a valley between .25 and

Edition)™ The most trolling, self-aware Nigerian e-mail scam ever A randomista for hire is a dangerous thing How I Learned to Stop Worrying and Love the Bomb

Blogroll

.10 and a bump slightly under .05. Missing tests are those which would have been accepted but close to being rejected (p-values between .25 and .10). We show that this pattern corresponds to a shift in the distribution of p-values: between 10% and 20% of marginally rejected tests are misallocated. Our interpretation is that

Africa Can Africa is a Country Aid Watch

researchers might be tempted to inflate the value of their tests by choosing the specification that provides the highest statistics. Note that Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.

Development Drums Development Impact Find What Works

Basically, if results were “unbiased”, a graph of the distribution of the observed results (or in this case, observed p-values of significance) should be relatively smooth and monotonic. Here’s what the distribution looks like (taken from the paper):

1 sur 3

So how do you feel about not winning the lottery? The Twelve Days of Christmas (Aid

Blood and Milk Chris Blattman's Blog Dani Rodrik

From Poverty to Power (Duncan Green) Innovations for Poverty Action Marginal Revolution Owen Abroad Tales from the Hood Texas in Africa The Burning Issue The Roving Bandit

17/01/2016 20:39

» My God, it’s full of stars!

http://aidthoughts.org/?p=3429

View from the Cave Views from the Center (CGD) Waylaid Dialectic Wronging Rights

Links AidData Centre for Global Development Givewell.net Matt @ Oxford OECD-DAC Overseas Development Institute Poverty Action Lab

Tags Africa Aid

Aid cynicism

badadvocacy

badvocacy Bill Easterly

capitalism Cities

charity Charter

climate change conflict

Do you see that second little hump? That’s just below the p = 0.05 threshold, the magic and totally-arbitrary rule of thumb for whether a

corruption Development

statistical result is worthwhile or not (although in my experience p = 0.10 is becoming the new norm). This suggests an abnormal grouping just below

Haiti health HIV/AIDS huge mistakes

the threshold. Now if this was only a result of systematic selection bias, with academic journals only accepting results which were significant above this threshold, we’d expect to see abnormal grouping to the right of the threshold. However, this doesn’t explain why the distribution is bimodal:

land rights Language

DFID

economics education Elections global health

incentives Institutions land grabs

malawi

MDGs media migration Millenium Village

Oxfam Paris

results which are nearly significant are less frequent than those that are much further way. This suggests something more nefarious than publisher bias: that researchers with results that are nearly significant are doing

Declaration Poverty porn

things to nudge their results into the just-significant category.

building Tanzania Uganda violence

I think someone is assuming we should be scared and outraged by all this – but I don’t think we should. Here’s why:

Project Nick Kristof

randomistas rant RCTs State

Zanzibar Zimbabwe Archives

These results suggest that researchers care very deeply about getting under that p = 0.05 threshold. They do this because we seem to attach some value to the presence of “stars” (typical way of highlighting significant results in econ papers). But our weighting of results shouldn’t be binary – it should be continuous. We should give results which are just barely insignificant about the same weight as those which are just barely significant. So even if the rest of the academic establishment has decided to be irrational, resulting in a shift in the average result from a p = 0.052 to a p = 0.048, we shouldn’t be bothered by these small shifts, because the change in our interpretation of those results should be very minor.

Archives

JANUARY 2016 M

T

W

T

F 1

S 2

S 3

4 11 18

5 12 19

6 13 20

7 14 21

8 15 22

9 16 23

10 17 24

25

26

27

28

29

30

31

« Dec

Hat tip to Marginal Revolution for the paper link. THIS ENTRY WAS POSTED IN RESEARCH, WORST PRACTICES. BOOKMARK THE PERMALINK.

2 sur 3

17/01/2016 20:39

“Star Wars: The Empirics Strike Back” — Marginal Revolution

http://marginalrevolution.com/marginalrevolution/2012/06/star-wars-th...

Marginal Revolution

“Star Wars: The Empirics Strike Back” by Tyler Cowen on June 28, 2012 at 6:47 am in Data Source, Science | Permalink That is a new paper by Abel Brodeur, Mathias Lé, Marc Sangnier, and Yanos Zylberberg: Abstract: Journals favor rejections of the null hypothesis. This selection upon results may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above .25, a valley between .25 and .10 and a bump slightly under .05. Missing tests are those which would have been accepted but close to being rejected (p-values between .25 and .10). We show that this pattern corresponds to a shift in the distribution of p-values: between 10% and 20% of marginally rejected tests are misallocated. Our interpretation is that researchers might be tempted to inflate the value of their tests by choosing the specification that provides the highest statistics. Note that Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models. For the pointer I thank Michelle Dawson. Addendum: Here is related commentary from Mark Thoma. 9 comments

1 sur 1

04/07/2012 13:57

So you got stars? That don’t impress me much. « Lorenzo Burlon's Blog

https://lorenzoburlon.wordpress.com/2012/06/29/so-you-got-stars-that...

Lorenzo Burlon's Blog Home > Uncategorized > So you got stars? That don’t impress me much.

So you got stars? That don’t impress me much. June 29, 2012 Leave a comment Go to comments

Some empirical motivation, when not a nice regression with all the shiny stars, is always welcome, independently of how superfluous it is for the point of the paper. Moreover, authors, editors, referees, not to mention students and supervisors, have a general tendency to appreciate more the rejection of the null hypotheses rather than the rejection of thesis of the paper. This is perfectly normal, but at some point (which we have passed apparently) it starts showing up in publishing statistics. From a new paper by Abel Brodeur, Mathias Lé, Marc Sangnier, and Yanos Zylberberg: Journals favor rejections of the null hypothesis. This selection upon results may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above .25, a valley between .25 and .10 and a bump slightly under .05. Missing tests are those which would have been accepted but close to being rejected (p-values between .25 and .10). We show that this pattern corresponds to a shift in the distribution of p-values: between 10% and 20% of marginally rejected tests are misallocated. Our interpretation is that researchers might be tempted to inflate the value of their tests by choosing the specification that provides the highest statistics. Note that Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.

The boldface is mine. When you see stars, remember that they most probably hide plenty of equally interesting null hypotheses that will blush unseen like flowers in a country churchyard. Stop staring at the stars! HT to Tyler Cowen and to Marc Thoma for a 2006 post on the similar patterns in political science.

1 sur 1

22/08/2012 10:31

What do P-Values and Earnings Management Have in Common? | A (...

http://asociologist.com/2012/06/30/what-do-p-values-and-earnings-for...

Marginal Revolution just linked to a paper with one of my favorite titles of all time: Star Wars: the Empirics Strike Back. The paper reports a big meta-analysis of all of the significance tests done in three top econ journals – AER, JPE, and QJE – and shows how the distribution of their significance tests shows a tip between .10 and .25, and a slight bump at just under .05, thus exhibiting a two-humped “camel shape.” The authors argue that this distribution suggests that researchers play with specifications to coax findings near significance across the threshold. The paper’s title is actually a substantive finding as well: “Inflation [of significance] is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.” I like this finding because it’s much narrower and suggests a much more plausible mechanism than McCloskey and Ziliak’s famous “Standard Error of Regressions.” (For a good critique of M&Z, especially their coding scheme, see Hoover and Siegler.) Rather than simply asserting that economists don’t understand significance, and are part of a “cult”, Brodeur et al. show a small but predictable amount of massaging to push results towards the far-too-important thresholds of .10 and .05. So, they agree in some sense with M&Z that economists are putting too much emphasis on these thresholds, but without an excessive claim about cult-like behavior. Economists, in fact, look like corporate earnings managers. I’m not super up on the earnings management literature, but various authors in that field argue that corporations have strong incentives to report positive rather than negative earnings, and to meet analyst expectations. The distribution of earnings shows just that: fewer companies reporting very slightly negative earnings than you would expect, and fewer that just barely miss analyst expectations than just barely exceed (see, e.g. Lee 2007, Payne and Robb 2000, Degeorge et al. 1999 – if anyone has a better cite for these findings, please leave a comment!). Like the economists* engaged in the Star Wars, businesses have incentives to coax earnings towards analyst’s expectations. What’s the larger lesson? I think these examples are both cases of a kind of expanded Goodheart’s Law, or a parallel case to Espeland and Sauder’s “reactivity” of rankings. Another variant that perhaps gets closest is Campbell’s Law first articulated in the context of high-stakes testing: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” It’s not clear just how “corrupt” economics findings or corporate earnings statements are, but Campbell’s law and its close proxies remind us of the need to look for both subtle and overt forms of distortion whenever we turn a particular measure into a powerful threshold. * Note that I am picking on economics here only because the article studied econ journals. I would bet that a similar finding could be obtained in Sociology journals, and probably other social science fields with a heavy statistical bent.

Posted by Dan Hirschman on June 30, 2012 http://asociologist.com/2012/06/30/what-do-p-values-and-earnings-forecasts-have-in-common/

1 sur 1

17/01/2016 19:21

Star Wars: The Empirics Strike Back | IZA Newsroom

http://newsroom.iza.org/en/star-wars-the-empirics-strike-back/

IZA Newsroom simply labor

Posted on March 22, 2013 by IZA Press

Most academic disciplines use a statistical threshold to decide whether a hypothesis is lik statistic is below this threshold, the finding is too uncertain to be suggested to be true. An consequence of having thresholds is that researchers know the conventional statistical th as a stumbling block for their ideas to be considered since positive findings are more likel Abel Brodeur, Mathias Lé, Marc Sangnier, and Yanos Zylberberg collected the value of te in three of the most prestigious economic journals over the period 2005-2011. The distrib shows a hole just before the threshold, that is in the region where results are too uncerta a surplus after it. This finding suggests that researchers may be tempted to present tests with higher statist their chances of being published. For example, imagine that there are three types of resu findings which are very likely to be true. Red lights are findings where effects are too unc be considered. Amber lights are in-between. The paper argues that researchers would ma green, rather than in the initially red and green cases. According to the authors’ calculatio percent of tests published are misallocated. The following graph shows the distribution of test scores where a z-statistic of 1.96, which to a p-value of 0.05, is the conventional threshold used in economics. Tests with z-statist usually regarded as too unconvincing. The picture shows the hole in the distribution just threshold and the excess mass above, which hints at a systematic misallocation.

1 sur 2

12/04/2013 11:35

Star Wars: The Empirics Strike Back | IZA Newsroom

http://newsroom.iza.org/en/star-wars-the-empirics-strike-back/

Test score distribution

The study also identifies several papers’ and authors’ characteristics that seem to be relat such as being a young researcher in a tenure-track job. There is no indication of misalloca randomized control trials. Surprisingly, data and code availability do not seem to be assoc less misallocation. Results presented in this paper may have interesting implications for the academic comm unclear whether these biases should be larger or smaller in other journals and disciplines about the importance given to values of tests and the consequences for a discipline to ign Suggestions have already been made in order to reduce this trend. Journals have been lau ambition of giving a place where authors may publish negative findings. There is also mo for researchers to submit their methodology before doing an experiment. Read abstract o paper.

This entry was posted in Research and tagged economics, significance, statistics. Bookmark the permalink.

IZA Newsroom Proudly powered by WordPress.

2 sur 2

12/04/2013 11:35

Weekly links March 29, 2013: job opportunities, CCTs, fishy p-values,...

http://blogs.worldbank.org/impactevaluations/print/weekly-links-marc...

Published on Impact Evaluations (http://blogs.worldbank.org/impactevaluations) Home > Weekly links March 29, 2013: job opportunities, CCTs, fishy p-values, and more…

Weekly links March 29, 2013: job opportunities, CCTs, fishy p-values, and more…

Submitted by David McKenzie

On Fri, 03/29/2013

· Work for me this summer: I'm looking for someone with good Stata skills who can help work with data coming in from a couple of randomized experiments, as well as to help develop and design some new work on improving measurement of business profits in developing countries. The latter would include the use of some innovative experiments with RFID technology, which I don't know much about, so the summer intern would spend some time trying to set this up. The position would be DC-based, but there would be the possibility of a few weeks fieldwork depending on the interest of the person, and how quickly the RFID work can get up and running. This project is something that could potentially lead to a co-authored paper, depending on how the intern performs. Ideally you should be a PhD student in a top program, or otherwise an exceptional undergraduate. Email me [1] your CV and a cover letter describing your qualifications/interests if you are interested – I will only reply if I want to follow up with you. · Work for Markus: Job opening with the Gender Innovation Lab [2] at the World Bank: to work on Private Sector Development (PSD) and Agriculture and Rural Development (ARD) and gender in Africa, namely on a set of rigorous impact evaluation studies, including working with various partners on the design of innovative interventions to address gender inequality, as well as on the rigorous impact evaluations, including design, implementation, and data analysis. · The long-term impacts of a CCT in Nicaragua [3] – J-PAL summarizes work by Tania Barham, Karen Macours and John Maluccio : “By 2010, seven years after the early treatment group stopped receiving the transfers, boys in the early treatment group still had nearly half a year more schooling than those in the late treatment group. The increase in years of schooling was accompanied by gains in learning…However, no significant impact was found on cognition, as measured by the Raven test, consistent with cognitive development taking place mostly during early childhood.” · The demand for evidence? [4] Chris Blattman summarizes an experiment which tested microfinance organizations’ willingness to learn about studies that are more positive or more negative in their findings for microfinance – organizations are twice as likely to want the information if told it is positive. ·

1 sur 3

Paper with a cool title but a serious implication: Star Wars: The Empirics Strike Back [5] -

17/01/2016 19:36

Weekly links March 29, 2013: job opportunities, CCTs, fishy p-values,...

http://blogs.worldbank.org/impactevaluations/print/weekly-links-marc...

“Using 50,000 tests published between 2005 and 2011 in the AER, JPE, and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above 0.25, a valley between 0.25 and 0.10 and a bump slightly below 0.05. The missing tests (with p-values between 0.25 and 0.10) can be retrieved just after the 0.05 threshold and represent 10% to 20% of marginally rejected tests. Our interpretation is that researchers might be tempted to inflate the value of those almost-rejected tests by choosing a “significant” specification.”. The figure here shows this:

The good news for those of us doing experiments is that this seems to be much less of a problem for experiments

Other interesting finding are that papers published by tenured and older researchers are less prone to this issue; inflation seems larger in articles that thank research assistants; but it does not vary along data and codes availability on journals' website (h/t @JustinSandefur [6])

2 sur 3

17/01/2016 19:36

The other kind of Star Wars: The quest for ** and *** - Chris Blattman

←

1 sur 3

http://chrisblattman.com/2013/04/01/the-other-kind-of-star-wars-the-qu...

→

17/01/2016 19:32

Economic Logic: Test statistics and the publication game

1 sur

http://economiclogic.blogspot.fr/2013/04/test-statistics-and-publicatio...

Economic Logic There is Economics in everything

Economic Logic, Too

THURSDAY, APRIL 11, 2013

A solution to sovereign debt dilution Appropriate Aid Subsidy Levels Are homeowners delusional about the value of their home? Introducing "Economic Logic, Too"

Test statistics and the publication game

About Me ECONOMIC LOGICIAN

I discuss recent research in Economics and various events from an economic perspective, as the name of the blog indicates. I plan on adding posts approximately every workday, with some exceptions, for example when I travel.

It is well known that journals do not like replications or confirmations of hypotheses. They are looking for the empirical results that contradict popular wisdom, and this must be influencing the way researchers look for test results. To increase your chances of success, you want to only mention highly significant results and ignore the so-so ones. Abel Brodeur, Mathias Lé, Marc Sangnier and Yanos Zylberberg look at the distribution of p-values in articles published in the top three economics journals. I am not quite sure what the distribution of pvalues would be if the publication process were unbiased, but it would probably look like a Poisson distribution and it would be monotonic on each side of the mode. What the authors find does not look at all like this. There is a distinct lack of test results that just miss the 5% or 10% significance, and distinctively more that just pass those thresholds, making the distribution bimodal. Interestingly, this problem is less present when stars are not used to highlight significance or when the authors are tenured. These results indicate that there is more than a selection bias. This is an inflation bias by the researcher when he only presents the most significant results, which were obtained by finding the specification that allows to pass the magic significance thresholds. I do not think this is ethical, but the publishing game makes it unavoidable, so the profession is apparently fine with it. I guess we have to tolerate this and take it into account when reading papers much like we know there is grade inflation when looking at transcripts or there is similar inflation when reading recommendation letters. PS: This paper is a strong candidate for the best paper title of the year. Bravo! PS2: What is really unethical is claiming results are significant when they are not. The case of Ulrich Lichtenthaler comes to mind, who added "significance stars" to his results when they were not warranted. The fact that he still managed to publish widely is an indictment of the quality of research in business journals, too.

Posted by Economic Logician at 8:00 AM Labels: Economics profession , ethics , research

5

17/01/2016 20:31

Econometrics Beat: Dave Giles' Blog: Star Wars

Plus

Home

Data

Code

http://davegiles.blogspot.fr/2013/04/star-wars.html

Blog suivant»

Readers' Forum

Créer un blo

Former Students

Jobs

We d n e sd a y, A p r i l 1 7 , 20 1 3

MathJax

Star Wars

About Me

Dave Giles

Today, Ryan MacDonald, a UVic Economics grad. who works with Statistics Canada, sent me an interesting paper by Abel Brodeur et al.: "Star Wars: The Empirics Strike Back". Who can resist a title like that!

Victoria, B.C., Can

The "stars" that are being referred to in the title are those single, double (triple!) asterisks that authors just love to put against the parameter estimates in their tables of results, to signal statistical significance at the 10%, 5% (1%!) levels. A table without stars is like champagne without bubbles!

I'm a Professor of Economics at the University of Victor Canada, where I specialize in Econometrics

Basically, the paper is about selection bias in the refereeing and publishing of empirical papers in economics. It's a topic that has received attention previously, but in this paper come up with some compelling evidence. In particular, they provide a new way of measuring "star inflation".

View my complete profile

Search This Blog

Here's the abstract of their paper: "Journals favor rejection of the null hypothesis. This selection upon tests may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE, and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above 0.25, a valley between 0.25 and 0.10 and a bump slightly below 0.05. The missing tests (with p-values between 0.25 and 0.10) can be retrieved just after the 0.05 threshold and represent 10% to 20% of marginally rejected tests. Our interpretation is that researchers might be tempted to inflate the value of those almost-rejected tests by choosing a “significant” specification. We propose a method to measure inflation and decompose it along articles’ and authors’ characteristics."

Total Pageviews

2,495,786 Subscribe To Ths Blog

I have to share a couple of quotes that the authors include in their paper:

If the stars were mine I'd give them all to you I'd pluck them down right from the sky And leave it only blue.

Featured Post

The Rise of Bayesian Econometrics

("If The Stars Were Mine" by Melody Gardot) and, He who is fi xed to a star does not change his mind.

A recent discussion paper by Bas et al. (2014) provides us with (at least) two interesting pieces of material. First, they give a ve...

(Leonardo Da Vinci) It's an interesting paper. © 2013, David E. Giles Posted by Dave Giles at 1:49 PM Labels: Hypothesis testing, p-values

1 sur 5

My Books

Amazon: Author Central

17/01/2016 19:28

Are RCTs less manipulable than other studies? — Running Randomiz...

1 sur 3

http://runningres.com/blog/2014/1/24/are-rcts-less-manipulable-than-o...

17/01/2016 20:57

Are RCTs less manipulable than other studies? — Running Randomiz...

2 sur 3

http://runningres.com/blog/2014/1/24/are-rcts-less-manipulable-than-o...

17/01/2016 20:57

The Fault in Our Stars: It's even worse than they say - Statistical Mode...

http://andrewgelman.com/2014/10/15/fault-stars-even-worse-say/

Statistical Modeling, Causal Inference, and Social Science The Fault in Our Stars: It’s even worse than they say Posted by Andrew on 15 October 2014, 11:51 am

In our recent discussion of publication bias, a commenter link to a recent paper, “Star Wars: The Empirics Strike Back,” by Abel Brodeur, Mathias Le, Marc Sangnier, Yanos Zylberberg, who point to the notorious overrepresentation in scientific publications of p-values that are just below 0.05 (that is, just barely statistically significant at the conventional level) and the corresponding underrepresentation of p-values that are just above the 0.05 cutoff. Brodeur et al. correctly (in my view) attribute this pattern not just to selection (the much-talked-about “file drawer”) but also to data-contingent analyses (what Simmons, Nelson, and Simonsohn call “p-hacking” and what Loken and I call “the garden of forking paths”). They write: We have identified a misallocation in the distribution of the test statistics in some of the most respected academic journals in economics. Our analysis suggests that the pattern of this misallocation is consistent with what we dubbed an inflation bias: researchers might be tempted to inflate the value of those almost-rejected tests by choosing a “significant” specification. We have also quantified this inflation bias: among the tests that are marginally significant, 10% to 20% are misreported.

1 sur 5 17/01/2016 19:42 The Fault in Our Stars: It's even worse than they say - Statistical Mode... http://andrewgelman.com/2014/10/15/fault-stars-even-worse-say/

They continue with “These figures are likely to be lower bounds of the true misallocation as we use very conservative collecting and estimating processes”— but I would go much further. One way to put it is that there are (at least) three selection processes going on here: 1. (“the file drawer”) Significant results (traditionally presented in a table withasterisks or “stars,” hence the photo above) more less likely to get published. 2. (“inflation”) Near-significant results get jiggled a bit until they fall into the box 3. (“the garden of forking paths”) The direction of an analysis is continuallyadjusted in light of the data. Brodeur et al. point out that item 1 doesn’t tell the whole story, and they come up with an analysis (featuring a “lemma” and a “corollary”!) explaining things based on item 2. But I think item 3 is important too. The point is that the analysis is a moving target. Or, to put it another way, there’s a one-to-many mapping from scientific theories to statistical analyses. So I’m wary of any general model explaining scientific publication based on a fixed set of findings that are then selected or altered. In many research projects, there is either no baseline analysis or else the final analysis is so far away from the starting point that the concept of a baseline is not so relevant. Although maybe things are different in certain branches of economics, in that people are arguing over an agreed-upon set of research questions. P.S. I only wish I’d known about these people when I was still in Paris; we could’ve met and talked. Filed under Economics, Miscellaneous Statistics, Zombies | Permalink

blog.alphaarchitect.com

http://blog.alphaarchitect.com/2015/02/24/top-5-geeky-yet-funny-economic-paper-titles/

Top 5 Geeky, Yet Funny, Economic Paper Titles Wesley R. Gray, Ph.D.

As many are aware, economists aren’t the funniest group in the crowd. Here are some sample jokes from the funniest economists out there–Yoram Bauman. Here is a sample economist joke: When Yorum told his dad that he wanted to use his Ph.D. in economics as the basis for a comedy career, his dad was unsure.

He didn’t think there would be enough demand.

har har har… And in response to his Dad’s skepticism:

I told him not to worry. I’m a supply-side economist. I just stand up and let the jokes trickle down.

N’yuk, N’yuk, N’yuk… I know, pretty bad… But just because economists can’t tell jokes, doesn’t mean they can’t come up with some funny titles for their esoteric academic articles submitted to professional journals.

Our Top 5 Funny Titles of All-Time: Number 5: Star wars: The empirics strike back Journals favor rejections of the null hypothesis. This selection upon results may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above :25, a valley between :25 and :10 and a bump slightly under :05. Missing tests are those which would have been accepted but close to being rejected (p-values between :25 and :10). We show that this pattern corresponds to a shift in the distribution of p-values: between 10% and 20% of marginally rejected tests are misallocated. Our interpretation is that researchers might be tempted to inflate the value of their tests by choosing the specification that provides the highest statistics. Note that Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.

Number 4: An Option Value Problem from Seinfeld In an episode of the sitcom Seinfeld (season 7, episode 9, original air date December 7, 1995), Elaine Benes uses a contraceptive sponge that gets taken off the market. She scours pharmacies in the neighborhood to stock a large supply, but it is finite. So she must “reevaluate her whole screening process.” Every time she dates a new man, which happens very frequently, she has to consider a new issue: Is he “spongeworthy”? The purpose of this article is to quantify this concept of spongeworthiness.

Number 3: Macroeconomic Policy and the Optimal Destruction of Vampires Although human beings have endured the recurring ravages of vampires for centuries, scarcely any attempts have been made to analyze the macroeconomic implications of this problem and to devise socially optimal policy responses. Despite the increasing incidence of vampire epidemics in recent years (in Transylvania, Hollywood, and elsewhere), vampirism remains a thoroughly neglected topic in the theory of macroeconomic policy. The “vampires” considered in this paper are not the blood-sucking bats (e.g., Desmodus rotundus or Diphylla ecaudata) to be found in the forests of tropical America, but the blood-sucking ghosts of dead Homo sapiens. The bats are comparatively innocuous; aside from taking their occasional blood sample from missionaries asleep in the jungle, they have had no measurable influence on human welfare. The blood-sucking ghosts, on the other hand, have periodically provided grave threats to human populations; their most conspicuous macroeconomic impact arises from their detrimental effect on the labor force.

Number 2: Size Matters, If You Control Your Junk The size premium has been challenged along many fronts: it has a weak historical record, varies significantly over time, in particular weakening after its discovery in the early 1980s, is concentrated among microcap stocks, predominantly resides in January, is not present for measures of size that do not rely on market prices, is weak internationally, and is subsumed by proxies for illiquidity. We find, however, that these challenges are dismantled when controlling for the quality, or the inverse “junk”, of a firm. A significant size premium emerges, which is stable through time, robust to the specification, more consistent across seasons and markets, not concentrated in microcaps, robust to non-price based measures of size, and not captured by an illiquidity premium. Controlling for quality/junk also explains interactions between size and other return characteristics such as value and momentum.

And the Number 1 Hit: An-arrgh-chy: The Law and Economics of Pirate Organization This article investigates the internal governance institutions of violent criminal enterprise by

examining the law, economics, and organization of pirates. To effectively organize their banditry, pirates required mechanisms to prevent internal predation, minimize crew conflict, and maximize piratical profit. Pirates devised two institutions for this purpose. First, I analyze the system of piratical checks and balances crews used to constrain captain predation. Second, I examine how pirates used democratic constitutions to minimize conflict and create piratical law and order. Pirate governance created sufficient order and cooperation to make pirates one of the most sophisticated and successful criminal organizations in history.

Who says academics can’t have a sense of humor? So what if their jokes stink? They still know how to have a little fun.

Zeitfragen / Archiv | Beitrag vom 23.04.2015

Scheitern in der WissenschaftAus Fehlern anderer lernen Von Pia Rauschenberger Beim Forschen gehen viele Experimente schief, die dann nicht mehr erwähnt werden. Deshalb haben die beiden Studenten eine Zeitschrift für gescheiterte Forschung gegründet. "JUnQ" (Journal of Unsolved Questions. (picture alliance / dpa / Fredrik Von Erichsen) Die Wissenschaft ist das letzte gallische Dorf, wenn es um die Bewertung des Scheiterns geht. Was in anderen Bereichen längst zum guten Ton gehört, ist hier noch immer verpönt. Dabei könnte man davon profitieren, Fehlschläge zu analysieren. Scheitern, wieder versuchen, wieder scheitern, besser scheitern. Samuel Becketts Lob des Scheiterns hat dieser Tage quasi Hochkonjunktur. Wir alle müssten besser scheitern lernen, heißt es. Im Silicon Valley gehört das Scheitern quasi schon zum guten Ton. Dort werden Unternehmensgründer nur ernst genommen, die schon mal eine richtige Pleite hinter sich hatten - und daraus gelernt haben. Nur ein gallisches Dorf wehrt sich beharrlich gegen die schöne Erzählung vom Scheitern: die Wissenschaft. Denn da ist Scheitern immer noch nicht en vogue. Rene John ist Sozialwissenschaftler und hat ein Buch über das Scheitern geschrieben: "Wenn der Wissenschaftler sagt, er sei jetzt gescheitert, dann ist das von keinem Interesse. Also niemanden interessiert das. Also brauch er es auch nicht sagen. Und wenn er es nicht sagt, ist er quasi nicht da. Also publish or perish. (…) Zwar sagt Popper, dass nur die Falsifikation ein richtiges Ergebnis ist und dass das eigentlich das beste Ergebnis ist, weil es einem ermöglicht Optionen auszuschließen. (…) Aber in der Praxis hilft das niemandem. " Marc Sagnier hat rund 50.000 wirtschaftswissenschaftlichen Publikationen in teils hochrangigen Journals untersucht. 10 bis 20 Prozent der Studien würden da als signifikant – also wissenschaftlich relevant – eingestuft und publiziert, die eigentlich unter der künstlichen Schwelle für Signifikanz liegen müssten. Wenn die Forscher nicht mit allen möglichen Methoden versuchten, ihre Ergebnisse aufzupolieren. Vor allem junge und befristet angestellte Forscher neigen laut Sagnier dazu, nach signifikanten Ergebnissen zu fischen. Koste es was es wolle: "Das ganze Belohnungssystem in der Wissenschaft, also Beförderung oder Gehaltserhöhung, ist von Publikationen abhängig. Also gibt es einen sehr starken Anreiz zu publizieren. Also wirst du alles nutzen, was dir hilft, zu publizieren, besonders wenn du prekär beschäftigt bist."

Gescheiterte Resultate beiseite legen? Wird nicht gemacht "Es ist nicht explizit so, dass nur signifikante Forschung publiziert wird. Tatsächlich ist es natürlich so, weil wenn jemand ne Untersuchung durchführt und nichts findet, dann ist das natürlich ambivalent, dann weiß man nicht, woran liegt das. Das könnte daran liegen, dass die zugrundeliegende Hypothese falsch war. Das könnte aber auch daran liegen, dass jemand einfach schlampig geforscht hat." Nicht-Ergebnisse seien ambivalenter als klare Ergebnisse, sagt der Sozialpsychologe Klaus Fiedler. Ein guter Grund, die gescheiterten Resultate beiseite zu legen. Schließlich interessieren wir uns auch nicht für Medikamente, die nicht wirken. Andererseits scheint die Qualität der Forschung darunter zu leiden, wenn nur signifikante Ergebnissen beachtet werden. Andreas Neidlinger ist Chemiker und Mitherausgeber des Journal of Unsolved Questions, kurz JunQ. Eine Plattform für gescheiterte Forschung: "Es geht uns darum, dafür zu sorgen, dass die Negativresultate generell nicht mehr als schlecht sondern schlicht und ergreifend als Teil des wissenschaftlichen Erkenntnisgewinns angesehen werden. Jeder Forscher, der wissenschaftlich tätig ist, wird feststellen, dass er zweifelsohne. Ich möchte jetzt nicht übertreiben, aber ich glaube einen Großteil der Zeit ist man tatsächlich damit beschäftigt zu scheitern- bevor man zum Erkenntnisgewinn oder zum positiven Ergebnis kommt."

Mainstream ist das noch nicht Im JunQ werden nur solche Studien publiziert, die trotz der richtigen Methode und korrekter Versuchsanordnung zu keinem Ergebnis kommen. Mainstream ist das JunQ allerdings noch nicht. Die fehlende Zeit und die Scham davor seine gescheiterte Forschung zu präsentieren, kann Andreas Neidlinger nachvollziehen. Der Dienst, den man damit der Wissenschaft erweisen könne, sei aber kaum zu unterschätzen: "Wir lernen aus den Fehlern unserer Vorgänger. Wenn unsere Vorgänger ihre Ergebnisse nicht veröffentlicht hätten, könnten wir darauf heute nicht aufbauen. Natürlich ist es dann für den Forscher, der nur das gescheiterte Ergebnis publiziert hat nur mäßig schön, weil seine Arbeit zwar zum positiven Ergebnis beigetragen hat, aber er letztlich nur die Vorarbeit geleistet hat." "Das interessante ist ja die Perspektive, die man hat. Also ist man auf dem Schiff, oder ist man der Zuschauer. Also es gibt immer Zuschauer. Also scheitern findet ja nicht statt, wenn es niemand gibt, der zuschaut. Dann kann man sagen, das Schiff zerschellt und dann kommt Strandgut an den Strand und das ist doch ein Gewinn für die Zuschauer. (…) Man hat da Treibgut ja, man hat da Planken. Man kann sich daraus vielleicht ein Floß bauen, oder ein neues Schiff. Also man kann weitermachen, also eben die Zuschauer, vielleicht nicht unbedingt die, die auf dem Schiff waren." Dass Scheitern und Erfolg oft eine Sache der Perspektive sind, zeigt Flemings Entdeckung des Penicillin: Aus Schlampigkeit bildet sich Schimmelpilz auf seinem Labortisch. Er wirft den Versuch nicht weg, sondern bleibt dran und kommt zu seiner bahnbrechenden Entdeckung. Negativresultaten die nötige Zeit einzuräumen, vielleicht ein Ideal in Zeiten von Drittmittel-Förderung und prekärer Beschäftigung. Sollte es Wissenschaftlern trotzdem möglich gemacht werden, könnten davon nicht nur sie profitieren.

star wars |

http://cherokeegothic.com/tag/star-wars/

Cherokee Gothic development.growth.macro

star wars DECEMBER 23, 2015 BY KEVIN

May the Force be with you! I have a forthcoming paper (with Dan Hicks and Weici Yuan) that has a running head of “Peacocks in Porsches”. I once tried to publish a paper called “How Dead is the Solow Model” (the abstract was two words: “Stone dead”). So you can imagine how impressed I was to see a piece in the new AEJ: Applied called “Star Wars: The Empirics Strike Back”. An ungated version of the paper is available here (h6p://www.econstor.eu/bitstream/10419/71700 /1/739716212.pdf). Star Wars in this context refers to the barbaric practice of pu6ing “stars” beside coefficients in regression tables that are significant. The lower the p-value, the greater the number of stars. The authors investigate the distribution of p-values across tens of thousands of coefficients in published economics articles and find, “The distribution of p-values exhibits a camel shape with abundant p-values above 0.25, a valley between 0.25 and 0.10 and a bump slightly below 0.05. The missing tests (with p-values between 0.25 and 0.10) can be retrieved just after the 0.05 threshold” In other words, if your p-value is worse than .25, most researchers will not try to “rescue” their test. But if you get close to the holy grail of at least marginal significance, it appears that many researchers will not report that test and work to find a specification that pushes the test into “significance”. It’s all about them stars.

Posted in Uncategorized | Tagged ba6le of the network stars, econometrics fail, star wars, valley of 1 sur 2

15/01/2016 23:01

Free exchange Economics

Fudging hell

Are results in top journals to be trusted? Jan 21st 2016, 17:21 by S.K. | LONDON PUBLICATION bias in academic journals is nothing new. A finding (http://www.economist.com/blogs/freeexchange /2009/11/no_news_is_good_news) of no correlation between sporting events and either violent crime or property crime may be analytically top class, but you couldn’t be blamed, frankly, for not giving a damn. But if journal editors are more interested in surprising or dramatic results, there is a danger that the final selection of published papers offer a distorted vision of reality. This should skew the distribution of published results, towards more 'significant' findings. But a paper (https://www.aeaweb.org/articles.php?doi=10.1257/app.20150044) just published in the American Economic Journal finds evidence of a different sort of bias, closer to the source. Called "Star Wars, the empirics strike back", it analyses 50,000 tests published between 2005 and 2011 in three top American journals. It finds that the distribution of results (as measured by z-score, a measure of how far away a result is from the expected mean) has a funny double-humped shape (see chart). The dip between the humps represents "missing" results, which just happen to be in a range just outside the standard cut-off point for statistical significance (where significance is normally denoted with stars, though the name may also be something to do with a film recently released—file under 'economists trying to be funny'). Their results suggest that among the results that are only just significant, 10-20% have been fudged. One explanation is that if a result shows up as significant at the 5% significance level (the industry standard) then researchers crack open the champagne and move on to making economics jokes (http://www.economist.com /news/finance-and-economics/21678274-supplyand-demand-walk-bar-graph-minute) . But if the result is tantalisingly close to a positive result then perhaps the researchers will fiddle a bit with their method...and celebrate their nice publisherfriendly result. Yanos Zylberberg, one of the paper's authors, explains that in economics it is

difficult to conduct controlled experiments, which ultimately gives a lot of freedom to researchers to tweak their methods. Sometimes researchers are tweaking because they want to find the best way of estimating an effect, but sometimes it's in the search for a significant effect. The distinction might be hazy, even in their own minds. The paper does look at the results split into subgroups, and there seem to be some factors that are associated with a less humpy distribution (which could suggest less fudging). Although the overall pattern holds across all three prestigious journals the paper considers (the American Economic Review, the Quarterly Journal of Economics and the Journal of Political Economy), papers by older researchers and ones describing randomised control trials have less marked humps—though they are still there. This is worrying for those trying to interpret and communicate the latest research, as it is impossible to tell if there has been foul play in any individual study. But more fundamentally it is worrying for the profession and policymakers making decisions based on economic evidence; fiddling and running multiple, slightly different tests on the same data rapidly sucks meaning from the reported size and accuracy of the final results. Various solutions have been proposed. One is to publish 'pre-analysis plans', where researchers say how they will do their analysis before they actually do it. Another is to encourage more replication. A new NBER working paper (http://www.nber.org/papers/w21842.pdf) by Marcel Fafchamps and Julien Labonne suggests another, related, method. The idea is that researchers send their data to a third party, who randomly splits the data sample in half. The researchers do their analysis based on the first dataset, finalise their method, and submit for publication. If and when the paper is accepted, the same analysis is carried out on the second sample, and the unadulterated results published. If the initial result only showed up because of manipulation, then the chances of the same result in the second sample are relatively low. To avoid the embarassment of a non-result, researchers should be stricter with themselves when it comes to tweaking their results. When sample sizes are small, this fix is difficult, as halving the sample saps power from tests. But in a world of big data, it could work. The bigger barrier might be getting career-conscious researchers to sign up.

http://gemnews.tv/en

Are formula in tip journals to be trusted? Posted by admin Date: January 22, 2016

PUBLICATION disposition in educational journals is zero new. A finding of no association between sporting events and possibly aroused crime or skill crime competence be analytically tip class, though we couldn’t be blamed, frankly, for not giving a damn. But if biography editors are some-more meddlesome in startling or thespian results, there is a risk that a final preference of published papers offer a twisted prophesy of reality. This should askance a placement of published results, towards some-more ‘significant’ findings. But a paper usually published in a American Economic Journal finds justification of a opposite arrange of bias, closer to a source. Called “Star Wars, a empirics strike back”, it analyses 50,000 papers published between 2005 and 2011 in 3 tip American journals. It finds that a placement of formula (as totalled by z-score, a magnitude of how distant divided a outcome is from a approaching mean) has a humorous double-humped figure (see chart). The drop between a humps represents “missing” results, that usually occur to be in a operation usually outward a customary cut-off indicate for statistical stress (where stress is routinely denoted with starsthough a name competence also be something to do with a film recently released—file underneath ‘economists perplexing to be funny’). Their formula advise that among a formula that are usually usually significant, 10-20% have been fudged. One reason is that if a outcome shows adult as poignant during a 5% stress turn (the attention standard) afterwards researchers moment open a champagne and pierce on to creation economics jokes. But if a outcome is tantalisingly tighten to a certain outcome afterwards maybe a researchers will fiddle a bit with their method…and applaud their good publisher-friendly result. Yanos Zylberbe, one of a paper’s authors, explains that in economics it is formidable to control tranquil experiments, that eventually gives a lot of leisure to researchers to tweak their methods. Sometimes researchers are tweaking since they wish to find a

best approach of estimating an effect, though infrequently it’s in a hunt for a poignant effect. The eminence competence be hazy, even in their possess minds.

The paper does demeanour during a formula separate into subgroups, and there seem to be some factors that are compared with a reduction humpy placement (which could advise reduction fudging). Although a altogether settlement binds opposite all 3 prestigious journals a paper considers (the American Economic Review, a Quarterly Journal of Economics and a Journal of Political Economy), papers by comparison researchers and ones describing randomised control trials have reduction noted humps—though they are still there. This is worrying for those perplexing to appreciate and promulgate a latest research, as it is unfit to tell if there has been tainted play in any particular study. But some-more essentially it is worrying for a contention and policymakers creation decisions formed on mercantile evidence; fiddling and using multiple, somewhat opposite tests on a same information fast sucks definition from a reported distance and correctness of a final results. Various solutions have been proposed. One is to tell ‘pre-analysis plans’, where researchers contend how they will do their research before they indeed do it. Another is to inspire some-more replication. A new NBER operative paper by Marcel Fafchamps and Julien Labonne suggests another, related, method. The thought is that researchers send their information to a third party, who incidentally splits a information representation in half. The researchers do their research formed on a initial dataset, finalise their method, and contention for publication. If and when a paper is accepted, a same research is carried out on a second sample, and a sheer formula published. If a initial outcome usually showed adult since of manipulation, afterwards a chances of a same outcome in a second representation are comparatively low. To equivocate a embarassment of a non-result, researchers should be stricter with themselves when it comes to tweaking their results. When representation sizes are small, this repair is difficult, as halving a representation saps energy from tests. But in a universe of large data, it could work. The bigger separator competence be removing career-conscious researchers to pointer up. source: http://www.economist.com/blogs/freeexchange/2016/01/fudging-hell?fsrc=rss

Økonomer masserer data for at opnå markante resultater Af: Mads Moltsen 1. marts 2016 kl. 14:18

Ifølge nyt studie har økonomer en tendens til at skrue på deres forskningsmetoder for at opnå mere markante resultater. Det er ikke snyd, men der er brug for mere åbenhed, mener dansk professor. Der er grund til at hæve øjenbrynene over arbejdsmetoderne i det økonomiske forskningsmiljø, hvis man skal tro et nyt studie, som netop er udgivet i American Economic Journal: Applied Economics. Ifølge studiets forfattere ’masserer’ forskere data for at gøre deres statistiske tests mere udgivelsesegnede. Det vil sige, at hvis resultatet af en test ikke er signifikant nok, er der blandt økonomer en tendens til at justere metoden for at gøre resultatet mere signifikant og dermed mere interessant for tidsskrifterne. Forfatterne bag studiet anklager deres kollegaer for at pumpe signifikansen af deres tests ved at vælge signifikante testspecifikationer. »Det hænger sammen med, at økonomer er under et kæmpe pres for at blive udgivet. Det er ekstremt vigtigt for karrieren at blive udgivet i de respekterede tidsskrifter, men konkurrencen er benhård, og derfor står forskere i det dilemma, at de skal vælge, om de vil justere metoden for at få mere signifikante resultater,« siger en af forfatterne bag studiet, ph.d. og lektor Yanos Zylberberg fra University of Bristol. Han påpeger, at blot én enkelt udgivelse i et af de prestigefyldte tidsskrifter kan give adgang til et permanent forskerjob på et amerikansk top 100-universitet.

Informations-asymmetri formindsker gennemsigtigheden Der er ifølge Yanos Zylberberg noget særligt ved det økonomiske forskningsfelt, som besværliggør kontrollerede eksperimenter, hvilket gør det nemmere for forskere at ’massere’ data. »Problemet er, at vi har nogle statistiske metoder, som giver en stor grad af frihed. Når man kører en test på sine data, er der en masse muligheder for at ændre lidt på specifikationerne, så resultatet bliver signifikant. For eksempel kan man udelade observationer, der virker ekstreme, selvom de kan være sande observationer. Desuden er der meget informations-asymmetri, fordi forskeren har adgang til data, som tidsskriftsreviewerne ikke har, hvilket formindsker gennemsigtigheden,« siger Yanos Zylberberg.

Studiet rejser spørgsmålet om, hvor meget vægt man kan lægge på resultater, udgivet i selv de mest respekterede tidsskrifter. Skal politikere, medier og meningsdannere være mere skeptiske over for tidsskriftsudgivet forskning, som ellers ofte bliver brugt til at legitimere politiske holdninger og ændringer af samfundets indretning? »Der er et klart problem med, at offentligheden opfatter hver enkelt resultat som udtryk for en entydig sandhed. For forskere forholder det sig helt anderledes; for os er sandheden en akkumulering af de mange resultater, der siger noget om et bestemt emne. Når der så samtidig udgives ting, hvor resultatet er pumpet op, så understreger det vigtigheden af, at medier og politikere holder igen med at fremhæve enkelte resultater som endegyldige sandheder,« mener Yanos Zylberberg.

Unaturligt lavt antal udgivelser med insignifikans Studiet undersøger 50.000 tests udgivet i tre af de mest respekterede økonomiske tidsskrifter: American Economic Review, Quarterly Journal of Economics og Journal of Political Economy fra 2005 til 2011. Når man ser på grafen over fordelingen af de publicerede tests, kan man se en lille dal mellem to pukler. Dalen repræsenterer, hvad forfatterne kalder ’manglende’ resultater, hvilket betyder, at der er et unaturligt lavt antal publicerede tests med resultater i det interval. »Det er bemærkelsesværdigt, fordi ’dalen’ befinder sig lige før grænsen for, hvornår et resultat har statistisk signifikans. Det indikerer, at hvis forskere får et resultat, der er lige på grænsen til at være signifikant, så ændrer de lidt på metoden, så resultatet bliver signifikant og dermed udgivelsesegnet,« forklarer Yanos Zylberberg. I statistik arbejder man med udgangspunkt i en såkaldt nulhypotese, som fastholdes, indtil en alternativ hypotese kan accepteres. Nulhypotesen er, at der ingen forbindelse er mellem to fænomener.

Et tænkt eksempel Lad os eksemplificere med et tænkt eksempel: En forsker vil undersøge, om en 10-procents prisstigning på en liter mælk vil påvirke salget. I det tilfælde er nulhypotesen, at prisstigningen ikke vil påvirke salget. Før undersøgelsen fastsætter man et signifikansniveau, det vil sige en grænse for, hvornår nulhypotesen kan forkastes. Resultatet skal have en signifikans, som gør, at det ikke bare kan forklares som en ren tilfældighed. I økonomi sætter man ofte grænsen ved fem procent. I prisstigningseksemplet vil forskeren måske spørge 1.500 tilfældige borgere, hvordan en prisstigning på 10 procent vil påvirke deres indkøb af mælk. Derudover vil han måske spørge ind til borgernes nuværende indkøbsvaner, demografiske forhold og så videre. Når han har indsamlet sin data, kan han foretage en række statistiske tests. For at kunne forkaste nulhypotesen og dermed bevise, at en prisstigning vil påvirke salget, skal testresultatet have et signifikansniveau på fem procent eller derunder. Det vil sige, at der skal være fem procent eller mindre chance for, at resultatet er tilfældigt, før nulhypotesen kan forkastes. Lad os sige, at han i sin første test opnår et insignifikant resultat, som dog er meget tæt på at være signifikant. Han er altså tæt på at kunne bevise med statistisk signifikans, at en prisstigning på 10 procent påvirker salget af mælk. For at opnå signifikans kan han vælge

at ændre lidt i metoden. Måske ser han i sin næste test bort fra ekstreme observationer i datasættet, selvom observationerne kan være sande.

10-20 procent af resultaterne er pumpet op Ovenstående er et forsimplet eksempel, men det kan i grove træk illustrere, hvordan man ganske nemt kan ændre lidt på specifikationerne for at opnå signifikans. Ifølge Yanos Zylberberg indikerer dalen med de ’manglende’ resultater lige før signifikansgrænsen, at nogle forskere, hvis tests er lige på grænsen til at være signifikante, netop ændrer metoden, indtil de opnår et signifikant resultat, hvorefter de sender det til udgivelse. Ifølge studiet er 10-20 procent af de resultater, som ikke helt har nået det magiske signifikansniveau, blevet pumpet op. »Signifikansniveauet opfattes som en parameter for, hvor interessant et studie er. Hvis resultatet er signifikant, er der større chance for udgivelse, end hvis det er lige på grænsen til at være signifikant. Derfor har forskere et incitament til at forsøge at frembringe signifikante resultater. Og det viser sig, at en del forskere vælger at gøre det,« forklarer Yanos Zylberberg.

Ældre forskere ’masserer’ mindre Studiet har også registreret informationer om de forskere, der står bag de undersøgte studier, og her tegner sig et bestemt mønster: Ældre forskere og forskere med fastansættelser er mindre tilbøjelige til at ’massere’ data end yngre forskere. For denne gruppe forskere er dalen lige inden signifikansniveauet på fem procent mindre tydelig. Hvorfor forholder det sig sådan? »Det er spekulation, men en årsag kan være, at når en tidsskriftsreviewer skal gennemgå et studie af en garvet, respekteret forsker, så er der større sandsynlighed for, at han vil godkende studiet, selvom resultatet er insignifikant. Simpelthen fordi man har tiltro til, at forskeren er dygtig og i stand til at udføre god forskning. Det er derimod sværere for uprøvede, unge forskere at komme igennem nøglehullet, og derfor er det vigtigt for deres chancer, at resultaterne er signifikante,« siger Yanos Zylberberg, som understreger, at problemets rod skal findes i to lejre: »Tidsskrifterne har en udvælgelsesbias, idet de har mere fokus på signifikans end kvaliteten af metoden. Derudover er der på grund af den hårde konkurrence og vigtigheden af at blive publiceret en udpræget lukkethed blandt forskerne. Det betyder, at den rå data sjældent bliver delt med offentligheden, hvilket forværrer gennemsigtigheden.«

Ikke særlig overraskende Er der så tale om snyd, eller er forskerne i god tro, når de forsøger at opnå signifikante resultater ved løbende at ændre metoden? Videnskab.dk har spurgt professor Svend Hylleberg fra Institut for Økonomi på Aarhus Universitet: »Jeg er sikker på, at der er noget om det, forfatterne kommer frem til her, men det er ikke særlig overraskende, og der er efter min opfattelse ikke tale om snyd.« Ifølge Svend Hylleberg er der en mere uskyldig forklaring på de ’manglende’ resultater:

»Der er og vil altid ske en udvælgelse. Både fra tidsskriftets side og fra forfatterens. Der publiceres næsten kun resultater, som på en eller anden måde er nye. Det kan være en forkastelse af en tidligere teori eller det modsatte. Studier, hvis resultater ligger midt i mellem, sendes ikke til tidsskrifterne, og hvis de gør, bliver de som regel afvist,« siger Svend Hylleberg, som dog er enig med forskerne i en del af deres kritik: »Der er brug for mere åbenhed, herunder adgang til data. Det gælder for økonomi og for andre videnskaber.«

Forskere bør dele data Svend Hylleberg berører dermed spørgsmålet om, hvorvidt der skal ske ændringer i økonomernes videnskabelige praksis. Forfatterne bag studiet har flere anbefalinger: »For det første skal tidsskrifterne fokusere mindre på signifikans. De skal være mindre forhippede på kontroversielle resultater. Heldigvis går det også i den retning. For eksempel har flere af tidsskrifterne skilt sig af med deres stjernesystem,« siger Yanos Zylberberg. Stjernesystemet er en af tidsskrifternes metoder til at kategorisere signifikansen af studier. Typisk tildeles studier en til tre stjerner, alt efter hvor signifikant resultatet er. Systemet, som flere tidsskrifter altså nu har skilt sig af med, skabte ifølge Yanos Zylberberg en usund favorisering af signifikante resultater. »Dernæst bør forskere begynde at dele deres rå data med kollegaer og offentligheden, så alle har mulighed for at efterprøve resultaterne. Det er som sagt svært på grund af konkurrencen blandt forskere, og vi har ingen svar på, hvordan det skal kunne lade sig gøre, men det er den eneste måde at komme problemet til livs på,« mener Yanos Zylberberg.

URL: http://videnskab.dk/kultur-samfund/okonomer-masserer-data-opna-markante-resultater

© Ophavsretten tilhører Videnskab.dk

‫ﺭﺍﺳﺘﯽﺁﺯﻣﺎﻳﯽ ﺗﺤﻠﻴﻞﻫﺎی ﺍﻗﺘﺼﺎﺩی‬

‫ﺁﻳﺎ ﻣﺠﻼﺕ ﻣﻌﺘﺒﺮ ﺍﻗﺘﺼﺎﺩی ﺍﺯ ﺭﺍﺳﺘﯽﺁﺯﻣﺎﻳﯽ ﺳﺮﺑﻠﻨﺪ ﺑﻴﺮﻭﻥ ﻣﯽﺁﻳﻨﺪ؟‬ ‫ﺗﺤﻘﻴﻘﺎﺕ ﺍﻧﺠﺎﻡ ﺷﺪﻩ ﺭﻭی ﺻﺤﺖ ﺁﻣﺎﺭﻫﺎی ﺳﻪ ﻣﺠﻠﮥ ﻣﻌﺘﺒﺮ ﺍﻗﺘﺼﺎﺩی‪ ،‬ﺑﺎﻋﺚ ﻧﮕﺮﺍﻧﯽ ﺳﻴﺎﺳﺖﮔﺬﺍﺭﺍﻥ ﺷﺪﻩ ﺍﺳﺖ‪.‬‬ ‫‪ Economist , 22‬ﻓﺮﻭﺭﺩﻳﻦ ‪ 1395‬ﺳﺎﻋﺖ ‪14:17‬‬

‫ﻣﺘﺮﺟﻢ ‪ :‬ﻧﺠﻤﻪ ﺭﻣﻀﺎﻧﯽ‬ ‫ﺕ ﻣﻨﺘﺸﺮ ﺷﺪﻩ ﺩﺭ ﺳﻪ ﻣﺠﻠﮥ‬ ‫ﺑﻪﺗﺎﺯﮔﯽ ﺩﺭ ﺍﻣﺮﻳﮑﻦ ﺍﻳﮑﺎﻧﺎﻣﻴﮏ ژﻭﺭﻧﺎﻝ ﻣﻘﺎﻟﻪﺍی ﭼﺎپ ﺷﺪﻩ ﺍﺳﺖ ﮐﻪ ﻣﻴﺰﺍﻥ ﺻﺤﺖ ﻭ ﺳﻘﻢ ﻧﺘﺎﻳﺞ ﺗﺤﻘﻴﻘﺎ ِ‬ ‫ﻣﻌﺘﺒﺮ ﺍﻗﺘﺼﺎﺩی ﺭﺍ ﺑﺮﺭﺳﯽ ﻣﯽﮐﻨﺪ‪ .‬ﺍﻳﻦ ﻣﻘﺎﻟﻪ ﺑﺎ ﻋﻨﻮﺍﻥ »ﺟﻨﮓ ﺳﺘﺎﺭﮔﺎﻥ؛ ﺗﺠﺮﺑﻪﮔﺮﺍﻳﺎﻥ ﺍﻧﺘﻘﺎﻡ ﻣﯽﮔﻴﺮﻧﺪ« ﭘﻨﺠﺎﻩﻫﺰﺍﺭ ﺁﺯﻣﺎﻳﺶ ﺑﻴﻦ‬ ‫ﺳﺎﻝﻫﺎی ‪ ٢٠٠۵‬ﺗﺎ ‪ ٢٠١١‬ﺭﺍ ﺗﺤﻠﻴﻞ ﻣﯽﮐﻨﺪ ﻭ ﺩﺭ ﭘﺎﻳﺎﻥ ﺍﻧﻮﺍﻉ ﺭﻭﺵﻫﺎی ﺟﻠﻮﮔﻴﺮی ﺍﺯ ﺩﺳﺖﮐﺎﺭی ﺩﺭ ﻧﺘﺎﻳﺞ ﺗﺤﻘﻴﻘﺎﺕ ﻭ ﺭﺍﻫﮑﺎﺭﻫﺎی‬ ‫ﮐﺸﻒ ﺍﻳﻦ ﺗﻘﻠﺐﻫﺎ ﺭﺍ ﻣﻌﺮﻓﯽ ﻣﯽﮐﻨﺪ‪.‬‬ ‫ﺍﮐﻮﻧﻮﻣﻴﺴﺖ — ﺳﻮﮔﻴﺮی ﺍﻧﺘﺸﺎﺭ ﺩﺭ ﻣﺠﻼﺕ ﺍﻗﺘﺼﺎﺩی ﻣﺴﺌﻠﮥ ﺟﺪﻳﺪی ﻧﻴﺴﺖ‪ .‬ﺷﺎﻳﺪ ﻳﺎﻓﺘﻪﺍی ﻣﺒﻨﯽ ﺑﺮ ﺑﯽﺍﺭﺗﺒﺎﻁﺑﻮﺩﻥ ﺭﻭﻳﺪﺍﺩﻫﺎی‬ ‫ﻭﺭﺯﺷﯽ ﻭ ﺍﻋﻤﺎﻝ ﻣﺠﺮﻣﺎﻧﻪ ﻳﺎ ﺟﺮﺍﺋﻢ ﻣﺎﻟﯽ ﺍﺯ ﻧﻈﺮ ﺗﺤﻠﻴﻠﯽ ﺑﺴﻴﺎﺭ ﺍﺭﺯﺷﻤﻨﺪ ﺑﺎﺷﺪ؛ ﺍﻣﺎ ﺻﺎﺩﻗﺎﻧﻪ ﺑﮕﻮﻳﻢ‪ ،‬ﺍﮔﺮ ﺗﻮﺟﻬﯽ ﺑﻪ ﭼﻨﻴﻦ ﻳﺎﻓﺘﻪﺍی‬ ‫ﻧﮑﻨﻴﺪ‪ ،‬ﮐﺴﯽ ﺷﻤﺎ ﺭﺍ ﺳﺮﺯﻧﺶ ﻧﻤﯽﮐﻨﺪ‪ .‬ﺍﻳﻦ ﺩﺭ ﺣﺎﻟﯽ ﺍﺳﺖ ﮐﻪ ﺍﮔﺮ ﺳﺮﺩﺑﻴﺮﺍﻥ ﻣﺠﻼﺕ‪ ،‬ﻧﺘﺎﻳﺞ ﻫﻴﺠﺎﻥﺍﻧﮕﻴﺰ ﻭ ﺑﻬﺖﺁﻭﺭ ﺭﺍ ﺑﻴﺸﺘﺮ‬ ‫ﺑﭙﺴﻨﺪﻧﺪ‪ ،‬ﺑﺎ ﺍﻳﻦ ﺧﻄﺮ ﻣﻮﺍﺟﻪ ﻣﯽﺷﻮﻳﻢ ﮐﻪ ﺩﺭ ﭘﯽ ﮔﺰﻳﻨﺶ ﻧﻬﺎﻳﯽ ﻣﻘﺎﻻﺕ ﺑﺮﺍی ﭼﺎپ‪ ،‬ﺗﺼﻮﻳﺮی ﺗﺤﺮﻳﻒﺷﺪﻩ ﺍﺯ ﻭﺍﻗﻌﻴﺖ ﺍﺭﺍﺋﻪ ﺷﻮﺩ‪.‬‬ ‫]ﺍﻟﮕﻮی[ ﺗﻮﺯﻳﻊ ﺁﺛﺎﺭ ﭼﺎپﺷﺪﻩ ﺑﺎﻳﺪ ﺑﻪﺳﻤﺖ ﻳﺎﻓﺘﻪﻫﺎی »ﻣﻌﻨﺎﺩﺍﺭ«ﺗﺮ ﻣﺘﻤﺎﻳﻞ ﺷﻮﺩ‪ .‬ﺍﻣﺎ ﻣﻘﺎﻟﻪﺍی ﮐﻪ ﺑﻪﺗﺎﺯﮔﯽ ﺩﺭ ﺍﻣﺮﻳﮑﻦ ﺍﻳﮑﺎﻧﺎﻣﻴﮏ‬ ‫ژﻭﺭﻧﺎﻝ‪ ١‬ﭼﺎپ ﺷﺪﻩ‪ ،‬ﻧﺸﺎﻧﻪﻫﺎﻳﯽ ﺍﺯ ﻧﻮﻉ ﺩﻳﮕﺮی ﺍﺯ ﺳﻮﮔﻴﺮی ﺭﺍ ﻳﺎﻓﺘﻪ ﮐﻪ ﺑﻪ ﻣﻨﺒﻊ ﻣﺬﮐﻮﺭ ﻧﺰﺩﻳﮏﺗﺮ ﺍﺳﺖ‪ .‬ﺍﻳﻦ ﻣﻘﺎﻟﻪ ﺑﺎ ﻋﻨﻮﺍﻥ »ﺟﻨﮓ‬ ‫ﺳﺘﺎﺭﮔﺎﻥ؛ ﺗﺠﺮﺑﻪﮔﺮﺍﻳﺎﻥ ﺍﻧﺘﻘﺎﻡ ﻣﯽﮔﻴﺮﻧﺪ‪ «٢‬ﭘﻨﺠﺎﻩﻫﺰﺍﺭ ﺁﺯﻣﺎﻳﺶ ﺭﺍ ﺗﺤﻠﻴﻞ ﻣﯽﮐﻨﺪ‪ .‬ﻧﺘﺎﻳﺞ ﺍﻳﻦ ﺁﺯﻣﺎﻳﺶﻫﺎ ﺑﻴﻦ ﺳﺎﻝﻫﺎی ‪ ٢٠٠۵‬ﺗﺎ ‪٢٠١١‬‬ ‫ﺩﺭ ﺳﻪ ﻣﺠﻠﮥ ﺑﺮﺗﺮ ﺍﻣﺮﻳﮑﺎ ﭼﺎپ ﺷﺪﻩﺍﻧﺪ‪ .‬ﻧﺘﻴﺠﻪ ﺍﻳﻨﮑﻪ‪ ،‬ﺗﻮﺯﻳﻊ ﻧﺘﺎﻳﺞ )ﮐﻪ ﺑﺎ »ﻧﻤﺮﻩ ﻣﻌﻴﺎﺭ‪ «٣‬ﺍﻧﺪﺍﺯﻩﮔﻴﺮی ﺷﺪﻩ ﺍﺳﺖ( ﺷﮑ ِﻞ ﺩﻭﻗﻠﻪﺍی‬ ‫ﻋﺠﻴﺐﻭﻏﺮﻳﺒﯽ ﺩﺍﺭﺩ)ﻣﺮﺍﺟﻌﻪ ﺷﻮﺩ ﺑﻪ ﻧﻤﻮﺩﺍﺭ(‪ .‬ﻓﺮﻭﺭﻓﺘﮕﯽ ﻣﻴﺎﻥ ﺩﻭ ﻗﻠﻪ‪ ،‬ﻧﺘﺎﻳﺞ »ﻣﻔﻘﻮﺩﺷﺪﻩ« ﺭﺍ ﻧﺸﺎﻥ ﻣﯽﺩﻫﺪ ﮐﻪ ﻓﻘﻂ ﺩﺭ ﺩﺍﻣﻨﻪﺍی‬ ‫ﻣﻌﻨﺎﺩﺍﺭی ﺁﻣﺎﺭی‪ ۵‬ﺭﻭی ﻣﯽﺩﻫﺪ؛ ﺟﺎﻳﯽ ﮐﻪ ﻣﻌﻨﺎﺩﺍﺭی ﻣﻌﻤﻮﻻ ﺑﺎ ﺳﺘﺎﺭﻩﻫﺎ ﻣﺸﺨﺺ ﻣﯽ‬ ‫ﮐﺎﻣﻼ ﺧﺎﺭﺝ ﺍﺯ ﻧﻘﻄﻪ ﺑﺮﺵ‪ ۴‬ﺍﺳﺘﺎﻧﺪﺍﺭﺩ ﺑﺮﺍی‬ ‫ِ‬ ‫ﺷﻮﺩ‪ .‬ﺑﺮﺍﺳﺎﺱ ﻧﺘﺎﻳﺞ ﺁﻥﻫﺎ ﻣﺸﺨﺺ ﻣﯽﺷﻮﺩ ﮐﻪ ﺩﻩ ﺗﺎ ﺑﻴﺴﺖﺩﺭﺻﺪ ﺍﺯ ﻧﺘﺎﻳﺠﯽ ﮐﻪ ﮐﻢﻭﺑﻴﺶ ﻣﻌﻨﺎﺩﺍﺭ ﻫﺴﺘﻨﺪ‪ ،‬ﺩﺳﺖﮐﺎﺭی ﺷﺪﻩﺍﻧﺪ‪.‬‬

‫ﻳﮏ ﺗﻮﺿﻴﺢ ﺍﻳﻨﮑﻪ ﺍﮔﺮ ﻧﺘﻴﺠﻪﺍی ﺩﺭ ﺳﻄﺢ ﻣﻌﻨﺎﺩﺍﺭی ﭘﻨﺞﺩﺭﺻﺪ‪ ،‬ﻳﻌﻨﯽ ﺍﺳﺘﺎﻧﺪﺍﺭﺩ ﺻﻨﻌﺖ ﺑﺎﺷﺪ‪ ،‬ﻣﺤﻘﻘﺎﻥ ﻣﯽﺗﻮﺍﻧﻨﺪ ﺩﻭﺭ ﻫﻢ ﺧﻮﺵ‬ ‫ﺑﮕﺬﺭﺍﻧﻨﺪ ﻭ ﺑﻪ ﺳﺎﺧﺖ ﻟﻄﻴﻔﻪﻫﺎی ﺍﻗﺘﺼﺎﺩی ﺍﺩﺍﻣﻪ ﺩﻫﻨﺪ‪ .‬ﺍﻣﺎ ﺍﮔﺮ ﻧﺘﻴﺠﻪ ﺑﻪ ﺷﮑﻞ ﻭﺳﻮﺳﻪﺍﻧﮕﻴﺰی ﺑﻪ ﻧﺘﻴﺠﮥ ﻣﺜﺒﺖ ﻧﺰﺩﻳﮏ ﺑﺎﺷﺪ‪ ،‬ﺁﻥ ﻭﻗﺖ‬ ‫ﺷﺎﻳﺪ ﻣﺤﻘﻘﺎﻥ ﮐﻤﯽ ﺑﺎ ﺭﻭﺵ ﺧﻮﺩ َﻭﺭ ﺑﺮﻭﻧﺪ ﻭ ﻧﺘﻴﺠﮥ ﻧﺎﺷِﺮﭘﺴﻨﺪ ﺧﻮﺩ ﺭﺍ ﺑﺴﺘﺎﻳﻨﺪ‪ .‬ﻳﺎﻧﻮﺱ ﺯﻳﻠﺒﺮﺑﺮگ‪ ،‬ﻳﮑﯽ ﺍﺯ ﻧﻮﻳﺴﻨﺪﮔﺎﻥ ﻣﻘﺎﻟﻪ‪ ،‬ﺷﺮﺡ‬ ‫ﻣﯽﺩﻫﺪ ﮐﻪ ﺍﻧﺠﺎﻡ ﺁﺯﻣﺎﻳﺶﻫﺎی ﮐﻨﺘﺮﻝﺷﺪﻩ ﺩﺭ ﺍﻗﺘﺼﺎﺩ ﺩﺷﻮﺍﺭ ﺍﺳﺖ ﻭ ﺍﻳﻦ ﻣﺴﺌﻠﻪ ﺁﺯﺍﺩی ﺯﻳﺎﺩی ﺑﻪ ﻣﺤﻘﻘﺎﻥ ﻣﯽﺩﻫﺪ ﺗﺎ ﺩﺭ ﺭﻭﺵﻫﺎی ﺧﻮﺩ‬

‫ﺗﻐﻴﻴﺮﺍﺕ ﺟﺰﺋﯽ ﺍﻳﺠﺎﺩ ﮐﻨﻨﺪ‪ .‬ﮔﺎﻩ ﻣﺤﻘﻘﺎﻥ ﺑﺪﻳﻦﺧﺎﻁﺮ ﺗﻐﻴﻴﺮ ﺭﻭﺵ ﻣﯽﺩﻫﻨﺪ ﮐﻪ ﻣﯽﺧﻮﺍﻫﻨﺪ ﺑﻬﺘﺮﻳﻦ ﺭﺍﻩ ﺗﺄﺛﻴﺮﮔﺬﺍﺭی ﺭﺍ ﭘﻴﺪﺍ ﮐﻨﻨﺪ؛ ﺍﻣﺎ ﮔﺎﻩ‬ ‫ﺍﻳﻦ ﺗﻐﻴﻴﺮ ﺩﺭ ﺟﻬﺖ ﺭﺳﻴﺪﻥ ﺑﻪ ﺗﺄﺛﻴﺮی ﻣﻌﻨﺎﺩﺍﺭ ﺍﺳﺖ‪ .‬ﺷﺎﻳﺪ ﺍﻳﻦ ﺗﻘﺴﻴﻢﺑﻨﺪی ﻣﺒﻬﻢ ﺑﺎﺷﺪ؛ ﺣﺘﯽ ﺩﺭ ﺫﻫﻦ ﺧﻮﺩﺷﺎﻥ‪.‬‬ ‫ﺩﺭ ﺍﻳﻦ ﻣﻘﺎﻟﻪ ﺑﻪ ﻧﺘﺎﻳﺠﯽ ﭘﺮﺩﺍﺧﺘﻪ ﻣﯽﺷﻮﺩ ﮐﻪ ﺩﺭ ﺯﻳﺮﮔﺮﻭﻩﻫﺎﻳﯽ ﺗﻘﺴﻴﻢ ﺷﺪﻩﺍﻧﺪ‪ .‬ﻅﺎﻫﺮﺍ ً ﻋﻮﺍﻣﻠﯽ ﺩﺭ ﺁﻧﺠﺎ ﺩﺧﻴﻞ ﻫﺴﺘﻨﺪ ﮐﻪ ﺑﺎ ﺗﻮﺯﻳﻌﯽ‬ ‫ﮐﻢﻓﺮﺍﺯﻭﻧﺸﻴﺐﺗﺮ ﻣﺮﺗﺒﻂﺍﻧﺪ )ﮐﻪ ﻣﯽﺗﻮﺍﻧﺪ ﻧﺸﺎﻧﮕﺮ ﺩﺳﺘﮑﺎﺭی ﮐﻤﺘﺮ ﺑﺎﺷﺪ(‪ .‬ﺩﺭ ﻫﺮ ﺳﻪ ﻣﺠﻠﮥ ﺗﺮﺍﺯ ّﺍﻭﻟﯽ ﮐﻪ ﻣﻘﺎﻟﻪ ﺑﻪ ﺁﻥﻫﺎ ﻣﯽﭘﺮﺩﺍﺯﺩ‪ ،‬ﺍﻳﻦ‬ ‫ﺍﻟﮕﻮی ﮐﻠﯽ ﺑﺮﻗﺮﺍﺭ ﺍﺳﺖ؛ ﻳﻌﻨﯽ ﺍﻣﺮﻳﮑﻦ ﺍﻳﮑﺎﻧﺎﻣﻴﮏ ﺭﻳﻮﻳﻮ‪ ،۶‬ﻓﺼﻞﻧﺎﻣﮥ ﺍﻳﮑﺎﻧﺎﻣﻴﮑﺲ‪ ٧‬ﻭ ﭘﻮﻟﻴﺘﻴﮑﺎﻝ ﺍﻳﮑﺎﻧﺎﻣﯽ‪ .٨‬ﺍﻣﺎ ﺩﺭ ﻣﻘﺎﻻﺕ ﻣﺤﻘﻘﺎﻥ‬ ‫ﻣﺴﻦﺗﺮ ﻭ ﻣﻘﺎﻻﺗﯽ ﮐﻪ ﺑﻪ ﺗﻮﺻﻴﻒ ﺁﺯﻣﺎﻳﺶﻫﺎی ﺗﺼﺎﺩﻓﯽ ﮐﻨﺘﺮﻝﺷﺪﻩ ﻣﯽﭘﺮﺩﺍﺯﻧﺪ‪ ،‬ﺍﮔﺮﭼﻪ ﻫﻨﻮﺯ ﻗﻠﻪﻫﺎﻳﯽ ﻣﺸﺎﻫﺪﻩ ﻣﯽﺷﻮﻧﺪ‪ ،‬ﺍﻳﻦ ﻗﻠﻪﻫﺎ‬ ‫ﺑﺮﺟﺴﺘﮕﯽ ﮐﻤﺘﺮی ﺩﺍﺭﻧﺪ‪ .‬ﺍﻳﻦ ﻣﻮﺿﻮﻉ ﺑﺮﺍی ﮐﺴﺎﻧﯽ ﮐﻪ ﺗﻼﺵ ﻣﯽﮐﻨﻨﺪ ﺁﺧﺮﻳﻦ ﺗﺤﻘﻴﻘﺎﺕ ﺭﺍ ﺗﻔﺴﻴﺮ ﻭ ﺩﺭﺑﺎﺭۀ ﺁﻥﻫﺎ ﺍﻅﻬﺎﺭﻧﻈﺮ ﮐﻨﻨﺪ‪،‬‬ ‫ﺗﺎﺣﺪی ﻧﮕﺮﺍﻥﮐﻨﻨﺪﻩ ﺍﺳﺖ؛ ﺯﻳﺮﺍ ﻧﻤﯽﺗﻮﺍﻥ ﮔﻔﺖ ﺩﺭ ﮐﺪﺍﻡ ﻣﻄﺎﻟﻌﻪ ﺩﺳﺖﮐﺎﺭی ﺷﺪﻩ ﻭ ﺩﺭ ﮐﺪﺍﻡﻳﮏ ﻧﺸﺪﻩ ﺍﺳﺖ‪ .‬ﺍﻣﺎ ﻧﮕﺮﺍﻧﯽ ﺍﺳﺎﺳﯽ ﻣﺮﺑﻮﻁ‬ ‫ﺑﻪ ﺧﻮﺩ ﺍﻳﻦ ﺣﺮﻓﻪ ﻭ ﺳﻴﺎﺳﺖﮔﺬﺍﺭﺍﻧﯽ ﺍﺳﺖ ﮐﻪ ﺑﺮ ﺍﺳﺎﺱ ﺷﻮﺍﻫﺪ ﺍﻗﺘﺼﺎﺩی ﺗﺼﻤﻴﻢﮔﻴﺮی ﻣﯽﮐﻨﻨﺪ‪ .‬ﺩﺳﺖﮐﺎﺭی ﻭ ﺍﻧﺠﺎﻡ ﺁﺯﻣﺎﻳﺶﻫﺎی‬ ‫ﻣﺘﻌﺪﺩ ﻭ ﮐﻢﻭﺑﻴﺶ ﻣﺘﻔﺎﻭﺕ ﺭﻭی ﻫﻤﺎﻥ ﺩﺍﺩﻩﻫﺎ ﺑﻪﺳﺮﻋﺖ ﺣﺠﻢ ﻭ ﺩﻗﺖ ﮔﺰﺍﺭﺵﺷﺪۀ ﻧﺘﺎﻳﺞ ﻧﻬﺎﻳﯽ ﺭﺍ ﺑﯽﻣﻌﻨﺎ ﻣﯽﮐﻨﺪ‪.‬‬ ‫ﺭﺍﻩﺣﻞﻫﺎی ﻣﺘﻔﺎﻭﺗﯽ ﻋﺮﺿﻪ ﺷﺪﻩ ﺍﺳﺖ‪ .‬ﻳﮑﯽ ﺍﻧﺘﺸﺎﺭ »ﺑﺮﻧﺎﻣﻪﻫﺎی ﭘﻴﺶﺗﺤﻠﻴﻠﯽ« ﺍﺳﺖ ﮐﻪ ﺩﺭ ﺁﻥ ﻣﺤﻘﻘﺎﻥ ﭘﻴﺶ ﺍﺯ ﺗﺤﻠﻴﻞ ﺑﻴﺎﻥ ﻣﯽﮐﻨﻨﺪ ﮐﻪ‬ ‫ﻗﺼﺪ ﺩﺍﺭﻧﺪ ﺗﺤﻠﻴﻞ ﺧﻮﺩ ﺭﺍ ﭼﮕﻮﻧﻪ ﺍﻧﺠﺎﻡ ﺩﻫﻨﺪ‪ .‬ﺭﺍﻩﺣﻞ ﺩﻳﮕﺮ ﺗﺸﻮﻳﻖ ﺑﻪ ﺗﮑﺜﻴﺮ ﺑﻴﺸﺘﺮ ﺍﺳﺖ‪ .‬ﻣﺎﺭﺳﻞ ﻓﭽﻤﭙﺲ ﻭ ژﻭﻟﻴﺌﻦ ﻻﺑﻮﻥ ﺩﺭ ﻣﻘﺎﻟﮥ‬ ‫ﺟﺪﻳﺪ ﻭ ﻧﺎﺗﻤﺎﻡ ﺧﻮﺩ ﺩﺭ ﺩﻓﺘﺮ ﻣﻠﯽ ﺗﺤﻘﻴﻘﺎﺕ ﺍﻗﺘﺼﺎﺩی‪ ٩‬ﺭﻭﺵ ﻣﺮﺗﺒﻂ ﺩﻳﮕﺮی ﺭﺍ ﭘﻴﺸﻨﻬﺎﺩ ﻣﯽﺩﻫﻨﺪ‪ .‬ﻧﻈﺮﺷﺎﻥ ﺍﻳﻦ ﺍﺳﺖ ﮐﻪ ﻣﺤﻘﻘﺎﻥ‬ ‫ﺩﺍﺩﻩﻫﺎی ﺧﻮﺩ ﺭﺍ ﺑﺮﺍی ﻁﺮﻑ ﺳﻮﻣﯽ ﺑﻔﺮﺳﺘﻨﺪ ﻭ ﻁﺮﻑ ﺳﻮﻡ ﻧﻤﻮﻧﮥ ﺩﺍﺩﻩ ﺭﺍ ﺑﻪ ﺩﻭ ﻧﻴﻢ ﺗﻘﺴﻴﻢ ﮐﻨﺪ‪ .‬ﻣﺤﻘﻘﺎﻥ ﺗﺤﻠﻴﻞ ﺧﻮﺩ ﺭﺍ ﺑﺮﺍﺳﺎﺱ ﺍﻭﻟﻴﻦ‬ ‫ﻣﺠﻤﻮﻋﮥ ﺩﺍﺩﻩ ﺍﻧﺠﺎﻡ ﻣﯽﺩﻫﻨﺪ‪ ،‬ﺭﻭﺵ ﺧﻮﺩ ﺭﺍ ﻧﻬﺎﻳﯽ ﻣﯽﺳﺎﺯﻧﺪ ﻭ ﺑﺮﺍی ﭼﺎپ ﺗﺤﻮﻳﻞ ﻣﯽﺩﻫﻨﺪ‪ .‬ﺯﻣﺎﻧﯽ ﮐﻪ ﻣﻘﺎﻟﻪ ﭘﺬﻳﺮﻓﺘﻪ ﺷﻮﺩ‪ ،‬ﻫﻤﺎﻥ‬ ‫ﺗﺤﻠﻴﻞ ﺭﻭی ﻧﻤﻮﻧﮥ ﺩﻭﻡ ﺍﻧﺠﺎﻡ ﻣﯽﺷﻮﺩ ﻭ ﻧﺘﺎﻳﺞ ﮐﺎﻣﻞ ﭼﺎپ ﻣﯽﺷﻮﻧﺪ‪ .‬ﺍﮔﺮ ﻧﺘﻴﺠﮥ ﺍﻭﻟﻴﻪ ﺗﻨﻬﺎ ﺑﻪﻋﻠﺖ ﺩﺳﺖﮐﺎﺭی ﺑﻪ ﺩﺳﺖ ﺁﻣﺪﻩ ﺑﺎﺷﺪ‪،‬‬ ‫ﺍﺣﺘﻤﺎﻝ ﺭﺳﻴﺪﻥ ﺑﻪ ﻫﻤﺎﻥ ﻧﺘﺎﻳﺞ ﺩﺭ ﻧﻤﻮﻧﮥ ﺩﻭﻡ ﺗﻘﺮﻳﺒﺎ ً ﮐﻢ ﺍﺳﺖ‪ .‬ﻭﻗﺘﯽ ﻣﺴﺌﻠﮥ ﺗﻐﻴﻴﺮ ﻧﺘﺎﻳﺞ ﻣﻄﺮﺡ ﺑﺎﺷﺪ‪ ،‬ﺑﺮﺍی ﺍﺟﺘﻨﺎﺏ ﺍﺯ ﻧﺎﺑﺴﺎﻣﺎﻧﯽ ﻧﺎﺷﯽ‬ ‫ﺍﺯ ﻧﺮﺳﻴﺪﻥ ﺑﻪ ﻧﺘﻴﺠﻪ‪ ،‬ﻣﺤﻘﻘﺎﻥ ﺑﺎﻳﺪ ﺳﺨﺖﮔﻴﺮﺗﺮ ﺷﻮﻧﺪ‪ .‬ﻭﻗﺘﯽ ﺍﻧﺪﺍﺯۀ ﻧﻤﻮﻧﻪﻫﺎ ﮐﻮﭼﮏ ﺑﺎﺷﺪ‪ ،‬ﺩﺳﺖﮐﺎﺭی ﺩﺷﻮﺍﺭ ﺍﺳﺖ؛ ﺯﻳﺮﺍ ﺩﻭﻧﻴﻢﮐﺮﺩﻥ‬ ‫ﻧﻤﻮﻧﻪ ﺍﺯ ﻗﺪﺭﺕ ﺁﺯﻣﺎﻳﺶﻫﺎ ﮐﻢ ﻣﯽﮐﻨﺪ‪ .‬ﺍﻣﺎ ﺩﺭ ﺟﻬﺎﻥ ﺍﺑﺮﺩﺍﺩﻩﻫﺎ‪ ،‬ﺍﻳﻦ ﺭﺍﻫﮑﺎﺭ ﻋﻤﻠﯽ ﺍﺳﺖ‪ .‬ﺷﺎﻳﺪ ﺳﺪ ﺑﺰﺭگﺗﺮ ﺗﺸﻮﻳﻖ ﻣﺤﻘﻘﺎﻥ ﺑﺎ ﻭﺟﺪﺍﻥ‬ ‫ﮐﺎﺭی ﺑﻪ ﻣﺸﺎﺭﮐﺖ ﺑﺎﺷﺪ‪.‬‬ ‫ﭘﯽﻧﻮﺷﺖﻫﺎ‪:‬‬ ‫]‪American Economic Journal [١‬‬ ‫]‪Star Wars, the empirics strike back [٢‬‬ ‫ﺩﻭﺭﺑﻮﺩﻥ ﻳﮏ ﻧﺘﻴﺠﻪ ﺍﺯ ﻣﻴﺎﻧﮕﻴﻦ ﻣﻮﺭﺩ ﺍﻧﺘﻈﺎﺭ ﺭﺍ ﻧﺸﺎﻥ ﻣﯽﺩﻫﺪ‪.‬‬ ‫]‪ [٣‬ﻧﻤﺮﻩ ﻣﻌﻴﺎﺭ ﻳﺎ ﻧﻤﺮﻩ‪ z-‬ﻣﻴﺰﺍﻥ‬ ‫ِ‬ ‫]‪cut-off point [۴‬‬ ‫]‪statistical significance [۵‬‬ ‫]‪American Economic Review [۶‬‬ ‫]‪Quarterly Journal of Economics [٧‬‬ ‫]‪Journal of Political Economy [٨‬‬ ‫]‪NBER or the national bureau of economic research [٩‬‬ ‫ﮐﺪ ﻣﻄﻠﺐ‪7900 :‬‬ ‫ﺁﺩﺭﺱ ﻣﻄﻠﺐ‪http://tarjomaan.com/vdch.vnqt23nwvftd2.html :‬‬ ‫ﺗﺮﺟﻤﺎﻥ ‪http://tarjomaan.com‬‬

…

Jo, ekonomin spelar visst roll (stupid!) Om behovet av att fördela globaliseringens vinster Nytt läsvärt nummer av Ekonomisk Debatt När eliten missade tåget — utan att fatta varför Det verkligt stora problemet med ”geo-engineering” Hur påverkar pensionerna hushållens förmögenhetsfördelning? Hur har Clinton påverkats av e-postskandalen? Prediktionsmarknaderna går isär. IB: En avsmalnande (?) gräddfil Antagningen till IB Högre utbildning i Europa: vad kan vi vänta i en ny, konkurrensutsatt era?

Mina teets

Arbetsmarknad (185) Beteendeekonomi (132) Bostadspolitik (36) Företagande (43) Finans (153) Finansiell stabilitet (78) Internationellt (185) Kuriosa (257) Lästips (154) Makro (383) Metod (135) Miljö (51) Offentlig sektor (227) Penningpolitik (94) Politik (188) Skatter (22) Utbildning och forskning (276) Välfärd (304)

november 2016 oktober 2016 september 2016 augusti 2016 juni 2016 maj 2016 april 2016 mars 2016 februari 2016 januari 2016 december 2015 november 2015 oktober 2015 september 2015 augusti 2015 juli 2015 juni 2015

IZA Newsroom

Simply Labor

The Empirics Strike Back Again: Abel Brodeur wins Leamer-Rosenthal Prize for his dedication to open science Posted on December 16, 2016 by IZA Press

In 2013, IZA Research Affiliate Abel Brodeur (University of Ottawa) and his co-authors published an IZA discussion paper titled “Star Wars: The Empirics Strike Back” (see also our IZA Newsroom article). Analyzing 50,000 statistical tests published in top economics journals, the authors concluded that researchers might be tempted to inflate the value of almost-rejected tests by choosing a “significant” specification. The reason is that journals favor rejection of the null hypothesis, which means that positive findings increase the Abel Brodeur

chances of publication. The IZA paper was published in the American Economic Journal: Applied Economics in January 2016 and received coverage in various media outlets, including The Economist. Abel Brodeur has now been awarded one of ten 2016 Leamer-Rosenthal Prizes for Open Social Science from the Berkeley Initiative for Transparency in the Social Sciences (BITSS), a network of researchers and institutions committed to strengthening scientific integrity in economics and related disciplines by identifying and disseminating useful tools and strategies for improving transparency, including the use of study registries, pre-analysis plans, data sharing, and replications. Brodeur receives the prize, which includes $10,000, for his work on the above-mentioned paper and “for his clear dedication to and advocacy of open science and reproducibility,” according to BITSS. Promoting open science is also among IZA’s core objectives. We have repeatedly stressed the value of replications and launched initiatives such as the open-access IZA Journals and the IDSC data repository.

Related

Star Wars: The Empirics Strike Back

IZA awards three prizes at ASSA meeting in Boston

Now over 10,000 papers in the IZA DP series!

This entry was posted in IZA News, Research and tagged data, empirics, integrity, open science, replications, transparency. Bookmark the permalink.

IZA Newsroom Proudly powered by WordPress.

Media coverage of Star Wars : The Empirics Strike Back - Marc Sangnier

des documents recommandant