Computer lab First, download and install the ... - Yves Desdevises

Data can be selected by dragging the mouse. Dataset: Poplars.xls. Brainstorming IV - Hannover, 26-27 September 2008. Biostatistics - Yves Desdevises. 1/7 ...
101KB taille 9 téléchargements 296 vues
Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

Computer lab First, download and install the programs on your computer. For XLStat, you will need to download the trial version (or to pay), the other programs are free. Session I Statdisk Statdisk is the companion program of the book Biostatistics by Triola & Triola. It is primary a program intended for teaching, but you can still do many things with it. You can use your own data, or the preloaded datasets from the Datasets menu. Data are explored via the Data menu, and analyses are performed via the Analysis menu. Datasets: Poplar Tree Weights, Parent/Child Heights 1. Descriptive statistics In the Data menu, play with the various descriptive stats you can find. Ask yourself the kind of statistical test the data are suitable for. In the particular, check the distribution of the data. 2. Statistical tests: t-test for independent samples, and F-test Using the Poplar Tree Weights dataset, test if the weights of poplars non treated and treated with irrigation have a different mean. Do not forget to compare the variances before. These tests are available via Analysis -> Hypothesis Testing. 3. Statistical tests: t-test for paired samples (= matched pairs) Using the Parent/Child Heights dataset, assess if male children and fathers have the same mean height (you need to delete data corresponding to the F gender). What about male children and mothers? 4. Statistical tests: One-way analysis of variance With the Poplar dataset, check if the weights of poplars in the four treatment categories have all the same mean. Play with the other datasets. XLStat XLStat works as an extension of Microsoft Excel and adds some menus to Excel. So if you are familiar with Excel, XLStat is straightforward to use. Data can be selected by dragging the mouse. Dataset: Poplars.xls

1/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

1. Descriptive statistics, normality assessment Open XlStat, and say “Yes” to activate macros: it will open Excel with a new menu on the right and some fancy icons full of statistic functions. To describe data and assess normality, do the following (note that menu commands can also be accessed via the icons): Describing data -> Descriptive statistics Describing data -> Histograms Describing data -> Normality test Select the test you want. You can draw a quantile-quantile plot via selecting Normal Q-Q plots in the Charts tab. 2. Variance and mean comparisons These tests can be found in Parametric tests.

Session II XLStat 1. ANOVA and test for homogeneity of variances Dataset: Poplars.xls First, you need to assess if variances are equal. This is done via Levene and/or Bartlett tests found in Parametric tests -> k-sample comparison of variances. Samples can be in separate columns or in a single column along with a factor column. To perform an ANOVA, the factor has to be placed in a single column. Choose Modeling data and ANOVA. Test if mean poplar weight is affected by the treatment. 2. Non parametric tests for means: Kruskall-Wallis test To perform a Kruskall-Wallis test, the factor has to be placed in a single column. Choose Nonparametric tests and Comparison of k samples. Test if mean poplar weight is affected by the treatment. 3. Correlation and regression Dataset: Bears.xls 2/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

• Correlation can be accessed via Correlation/Association tests -> Correlation tests, then choose the coefficient you want to compute. You can choose Pearson for linear correlation or Spearman for rank correlation. Test if Mass and Height are related. • To perform a linear regression, choose Modeling data and Linear regression. You need to define (= to previously think about) dependent and independent variables. Test if Mass and Height are related and find the equation of their linear relationship. You can do various other tests using these data. 4. Fisher’s Exact test Dataset: Helmet.xls Fisher’s Exact test can be accessed via Correlation/Association tests, as an option in Tests on contingency tables. Test if not to wear an helmet while riding a bicycle increases the risk of facial injuries. R R works through a language, that means that you have to know some syntax and the way to tell R what you want. This requires some learning but is worth the trouble because R can do lots of things, including permutation tests you cannot do with the two other programs. R has plenty of instruction manuals, and we will only see a microscopic amount of its functions and possibilities. In particular, they are other equivalent commands and ways to do the computations shown below. In R, it is convenient to have your commands written in a separate text file, so you can easily use, reuse, and adapt them from copy and paste. That means that you can copy and paste the commands from this document. You have also access to all previously entered commands via a sliding window (called via an icon above the main window) on the side of the main window. Dataset: Robins.txt 1. Playing with data: importing data into R, descriptive statistics, assessing normality (tests and graphs) • First you need to define your working directory for R: choose a folder on your computer via the Misc. menu, and Change Working Directory. This allows you to directly refer to files without specifying the path to find them. • You need your data in text format. Check the Robins.txt file in a text editor. Note that various format can be imported: with or without header, etc. In this file, the column names do not start with a tabulation (and there are 2 names for 3 columns), which implies that the first column

3/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

contains row names and have no header, and that the instruction header=T (for True) is not needed in the command below. To import the Robins.txt file, type: robin=read.table(file.choose()) This commands opens a dialog box for you to choose your file, even if you have not defined a working directory. Here, data from the file are placed in an object called “robin” (or any name you like) in R memory. You can now start working with these data via this object (note that the original Robins.txt file is not modified nor used anymore). Type robin to see your imported file. • Now create two subsets of data (then two objects) that you will use in subsequent analyses. These objects are the wing lengths and the tail lengths, respectively in the first and second column of data in robin. Here is a way to do that: wing=robin[,1] tail=robin[,2] You can check the mean, variance, and standard deviation with the commands (e.g. for wing): mean(wing) var(wing) sd(wing) Have a look on the data distribution via an histogram: hist(wing) Look at the scatterplot depicting the relationship between the two variables: plot(wing,tail) You can assess the normality of these two variables, first via a normal quantile plot, to check if the plot looks like a straight line: qqnorm(wing) qqline(wing) #to draw a line, once you have made the plot You can also perform a normality test, like the Shapiro-Wilks test, with a null hypothesis of normality (what consequences does it has for small samples?): shapiro.test(wing) Compare the variances of wing and tail via a F-test: var.test(wing,tail)

4/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

To compare wing variances between areas via a Fligner-Killeen test: area=robin[,3] fligner.test(wing,area) 2. Permutational Student t-test To use a function not already included in R, like the ones you will use here, you have to “source” it via the File menu (Source file). For example, to source the function StudentsPerm used to test via permutation the difference between two means, select StudentsPerm.R file from the above menu. If you open function files in a text editor, you should find some indications on how to use it. For example here, to test if wing length is different from tail length (using the objects wing and tail you have created), type: StudentsPerm(wing,tail) 2. Parametric and permutational ANOVA Dataset: Bears.txt Import the data in Bears.txt in an object called bear: bear=read.table(file.choose(),header=T) header=T is required here because the automatic recognition of the header needs the first column not to be labelled, and it has a name here (Animal). • Parametric ANOVA ANOVA requires the factors to be declared as such via the command as.factor. For a one-way ANOVA, we will create and use the object age as a factor with 7 levels, as well as the object height as a variable. Test the claim that bear’s mean height is not the same for all ages. ANOVA is a linear model, and can be implemented through the function lm (or the function aov, see below) which can be computed this way (the results are stored in an object called anova_bear): age=as.factor(bear[,2]) height=bear[,4] anova_bear=lm(height~age) To obtain the ANOVA output, you can use the function anova: anova(anova_bear) this analysis could have been simply written: anova(lm(height~age)) 5/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

For a two-way ANOVA, to assess the effects of age, gender, and their interaction on bear’s height: gender=as.factor(bear[,6]) res_aov=aov(height~age*gender) summary(res_aov) The one-way ANOVA could have been performed the same way (but with only one factor), via the function aov. • Permutational ANOVA (caution: this script works for balanced designs only, i.e. same number of values/cell). Use the permutational ANOVA to investigate the same questions. One-way ANOVA: Source the anova.1way.R function. To perform the same analysis as before with a permutational test using 999 permutations, type: anova.1way(height~age, nperm=999) For a two way ANOVA: anova.2way(height~age*gender, nperm=999) 3. Non parametric tests in lieu of two-way ANOVA: Sheirer-Ray-Hare test Use the same data as ANOVA, and make sure to have objects for the variable and each factors. For example here: height=bear[,4] age=as.factor(bear[,2]) gender=as.factor(bear[,6]) Source the SRH.R function, then type: SRH(height,age,gender) 4. Linear regression To compute a linear regression (you have to define the independent and dependent variables (the dependent variable is written first in the syntax below)), for example to assess if mass depends from height in bears, type: mass=bear[,3] lm(mass~height) The same thing could have been done via:

6/7

Brainstorming IV - Hannover, 26-27 September 2008

Biostatistics - Yves Desdevises

lm(Mass.kg~Height.cm, data=bear) or lm(bear[,3]~bear[,4]) (this does not require to have created first the mass and height objects) To draw a scatterplot with a regression line, use the PlotReg function. Source the PlotReg function, and type: reg(mass,height) You may also put the result of the regression in an object, for example with an object we call resreg: resreg= lm(mass~height) • It is then easy to test (from a standard parametric test, remember the requirements) the coefficients of this regression, via the function summary: summary(resreg) (of course, you could also put summary into an object an so on...) • Permutational testing Source the function corPerm used to test via permutation a Pearson coefficient (for correlation or linear regression), by selecting the corPerm.R file from the File -> Source menu. For example here, to test the correlation between tail and wing using 999 permutations, type corPerm1(mass,height,999) Compare the results with what you can obtain with a classical parametric correlation test (which should of course lead to the same result as the regression test obtained above via summary): cor.test(mass,height) Use your new knowledge of R to perform some analyses on your own data.

7/7