… a statistician drowned crossing a river that was only three feet deep... on average.
— Nate Silver, The Signal and The Noise

Statistics: a branch of mathematics used by researchers to organize, summarize, and interpret data (Hockenbury, 17) Any of the characteristics of a sample, as opposed to one of the population from which it is drawn. (Oxford) Works consisting of presentations of numerical data on particular subjects. (MeSH) 

Concerned with the collection and interpretation of quantitative "data" and the use of "probability" theory to estimate population parameters. (NCIt) 'Data science' is something that people are starting to talk about now. But it's just really some combination of having some familiarity with mathematical concepts, and being comfortable with computer science and programming, and definitely being comfortable in learning statistics. Because what we really need to do is make sense of very complex data now. (Pessoa, BSP106)


Causation: the act of causing some effect. (Coon, 33) The relating of causes to the effects they produce. Causes are termed necessary when they must always precede an effect and sufficient when they initiate or produce an effect. Any of several factors may be "associated" with the potential disease causation or outcome, including 'predisposing' factors, 'enabling' factors, 'precipitating' factors, 'reinforcing' factors, and 'risk' factors. (MeSH) An important characteristic of scientific psychology, the belief that events do not just happen, they are caused by something else. Most psychologists believe in the principle of multiple causes for events, rather than just one single cause. (Cardwell, 43) An event A can be said to cause another event B if (i) the onset of A precedes the onset of B and (ii) preventing A eliminates B. This definition must be suitably extended if either A or C can cause B. Given the highly interwoven, redundant and adaptive networks in molecular biology, cell biology, and neurobiology, moving from "correlation" to causation is not easy. (Koch, 331)

Correlation: the relationship between two variables. (Hockenbury, A-9) A statistical term for the tendency of two data sets or variables to vary in a similar way in a certain set of circumstances. Often mistaken for causation. (Collin, 340) The existence of a consistent, systematic relationship between two events, measures, or variables. (Coon, 33) The degree to which two or more quantities or events are linearly associated, a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the others. (NCIt)

Coefficient of Correlation: a mathematical representation of the degree of relatedness of two sets of measurements. (Cardwell, 61) The measure of the closeness of a relationship. (Ferber, 303) A numerical indication of the magnitude and direction of the relationship between two variables. (Hockenbury, A-9) A statistical index ranging from -1.00 to +1.00 that indicates the direction and degree of correlation. (Coon, 33) When no relationship at all exists between the “dependent variable” and the “independent variables,” the correlations coefficient is zero. (In this case) the independent variables are useless for estimating the value of the dependent variable.  (Ferber, 303)

Correlational Study: a non-experimental study designed to measure the degree of relationship (if any) between two or more events, measures, or variables. (Coon, 33) A research strategy that allows the precise calculation of how strongly related two factors are to each other. (Hockenbury, 23) The science and art of collecting, summarizing, and analyzing data that are subject to random variation. The term is also applied to the data themselves and to the summarization of the data. (MeSH)

Negative Correlation: a statistical relationship in which increases in one measure are matched by decreases in the other. (Coon, 33) A finding that two factors vary systematically in opposite directions, one increasing in size as the other decreases. (Hockenbury, A-9)

Positive Correlation: a statistical relationship in which increases in one measure are matched by increases in the other. (Coon, 33) A finding that two factors vary systematically in the same direction, increasing or decreasing in size together. (Hockenbury, A-9)

Frequency Distribution: a summary of how often various scores occur in a sample of scores. (Hockenbury, A-2) A collection of ‘classes’ defined by ‘class limits.’ The classes cover the entire range of the data values and do not overlap - and all classes are the same size. (Barnes, 119) Also referred to as ‘frequency table.’

Logarithm: the power to which a fixed number or base must be raised in order to produce any given number. (Oxford) For a specified base ‘b,’ a function such that its argument results when ‘b’ is raised to the power given by this function's value. (NCIt)

Mean: the sum of all observations divided by the number of observation in a treatment group. (Norman Labs, 18) A statistics term. The average value in a set of measurements. The mean is the sum of a set of numbers divided by how many numbers are in the set. (NCIt) Weighted sum of all possible values (of a variable). (Barnes, 94) Usually the most representative measure of central tendency. (Hockenbury,  A-5)

Median: a statistics term. The middle value in a set of measurements. (NCIt) The score that divides a frequency distribution exactly in half, so that the same number of scores lie on each side of it. (Hockenbury, A-5) The middle value of a set of data. 50% of values lie above the median, 50% of values lie below the median. (Norman Labs, 18)

Mode: the most frequently occurring score in a distribution. (Hockenbury, A-5) The most frequently observed data value. There may be 'no' mode or 'multiple' modes. (Norman Labs, 18) The value which occurs most often in a set of values. If no value is repeated, there is no mode. If more than one value occurs with the same greatest frequency, each value is a mode. (NCIt) A (group of data points) may have one, two, or more modes. (Barnes, 98)

Probability: a measure of the expectation of the occurrence of a particular event. (NCIt) The likelihood of the chance occurrence of specific events. May include the mathematical study of probability theory. (GHR) The study of chance processes or the relative frequency characterizing a chance process. (MeSH) An event that is certain to occur has a probability equal to 1 and an event that cannot occur has a probability equal to 0. The probability of ‘heads’ occurring on a single toss of a fair coin is 0.5. Each face of a fair die has a probability 1/6 of occurring. (Barnes, 10)

Range: a measure of variability; the highest score in a distribution minus the lowest score. (Hockenbury, A-7) The difference between the lowest and highest numerical values; the limits or scale of variation. (NCIt) The sample range for a data set is the difference between the maximum and minimum values of the data set. (Barnes, 127) The range or frequency distribution of a measurement in a population (of organisms, organs or things) that has not been selected for the presence of disease or abnormality. (MeSH)

Standard Deviation: a measure of variability. (Hockenbury, A-5) The standard deviation is the square root of the “variance” (in a data set) (Barnes, 65) A mathematical term which gives a measure of how spread out a set of results are. In describing any set of results it is useful to know two things: where the middle is, and how spread out the individual measurements are. The spread of results can be given simply as the difference between the largest reading and the smallest one. Unfortunately this can be misleading. The standard deviation, however, gives us a measure in terms of the distances of all the results from the mean. (Indge, 255) In a normal distribution approximately 67% (2/3) of the data points fall within +/- one standard deviation from the mean, 95% fall within 2 standard deviations from the mean and 99% fall within 3 standard deviations from the mean. (Norman Labs, 21)

Standard Error Bars: an error bar is used to indicate one standard deviation from the mean. If the standard error bars overlap, you can be certain that the difference between the two means is not statistically significant. (Norton Labs, 21-22) The ‘standard error’ of a rate is a measure of the sampling variability of the rate. (NCI4)

Statistical Significance: a mathematical indication that research results are not very likely to have occurred by chance. (Hockenbury, 17) Describes a mathematical measure of difference between groups. The difference is said to be ‘statistically significant’ if it is greater than what might be expected to happen by chance alone. (NCIt) Statistical analysis can determine if the control and experimental data are significantly different. (Brooker, 16) Ronald Fisher, an English statistician and biologist, developed the terminology of the statistical significance test and much of the methodology behind it. He … sought to develop a set of statistical methods that he hoped would free us from any possible contamination from “bias.” (Silver, 251-252)

Variable(s): a factor that can vary, or change, in ways that can be observed, measured, and verified. (Hockenbury, G-15) Marked by diversity or difference; liable to or capable of change. (NCIt)

Dependent Variable: variable that responds to manipulation of the "independent variables." (Norman, 5/26/09) The variable being estimated. Theoretically it is supposed to be dependent on the values of the independent variables. (Ferber, 303)

Independent Variable: factor being manipulated. (Norman, 5/26/09) Some aspect of the experimental situation that is directly manipulated by the experimenter in order to see if it causes a change in some dependent variable. (Cardwell, 128)

Variance: a measure of variability or dispersion. The variance will never have a value less than zero. (Barnes, 98) A measure of the variability in a sample or population. It is calculated as the 'mean squared deviation' (MSD) of the individual values from their common mean. In calculating the MSD, the divisor ’n’ is commonly used for a population variance and the divisor ’n-1’ for a sample variance. (NCIt)

Analysis of Variance: a statistical technique that isolates and assesses the contributions of categorical independent variables to variation in the mean of a continuous dependent variable. (MeSH)

Regression Analysis: procedures for finding the mathematical function which best describes the relationship between a dependent variable and one or more independent variables. (MeSH)