Brief introduction to market research
statistics
To
enable readers to use the results of this survey and get the most out of the
information, the following chapter offers a short introduction to the
statistical methods used. This will help eliminate potential misunderstandings
and misinterpretations of the data and provide support for accurately
understanding the statistical tests.
Cross Tabulations and Significance Tests
Most
market research information is based on the comparison of one variable (the
dependent variable) with regard to other variables (the independent variables).
To obtain such a cross tabulation, at least two columns of data are taken and a
table created showing the frequency of occurrence of all pairs of values in the
these columns. The percentages of Variable A within the subgroups of Variable B
and percentages of Variable B within the subgroups of Variable A can then be
compared. However, in most cases only one of these two percentages is of
particular interest, namely the percentages of the dependent variable within
the subgroups of the independent variable.
Example: Q2 - Generally speaking, do you
consider the long-term preservation of digital documents to be an intrinsic
task of libraries? (Dependent Variable) * Type of Library (Independent
Variable)
The
percentage of particular interest is the one within the Type of Library
variable, which answers the question: How many of the respondents within the
each group (national libraries, university libraries and general research
libraries) regard the long-term preservation of digital documents as an
intrinsic task of libraries? According to the cross tabulation within the
group of national libraries (96.6%), this group regards the long-term
preservation of digital documents as an intrinsic task of libraries more
often than the other two groups. |
Significance
tests then check whether differing values measured within different groups are
accidental or not. If they are not accidental, the results are transferable to
the population. It is common to apply a significance level of 0.05 as a
standard in market research statistics, thus guaranteeing significance with a
probability of error of less than 5%. In practice, this means that a
measurement of significance less than or equal to 0.05 indicates that
differences measured can be regarded as being significant and can be
transferred to the population, i.e. all the national libraries, university
libraries and general research libraries of the EU and EFTA states.
Chi-Square
Test
The
Chi-Square Test is used to compare two or more groups of variables on a nominal
scale (e.g. type of library, country etc.). The 0-hypothesis tested by the
Chi-Square Test is that no differences exist between the groups with respect to
the relative frequency with which group members fall into the various
categories of the variable of interest. Pearson Chi-Square measures the
asymptotic significance, and is thus appropriate for all types of variables,
whereas Fisher’s exact test is applicable only to 2x2 tables. While the
Chi-Square measures may indicate that there is a relationship between
variables, they do not indicate the strength or direction of the relationship.
General
conditions for obtaining valid measurements with Chi-Square Tests include a
minimum expected count ≥ 1 and a maximum expected count of ≤ 5
(≤ 20%) of 20% of the cells.
Example: Q2 - Generally speaking, do you
consider the long-term preservation of digital documents to be an intrinsic
task of libraries? (Dependent Variable) * Type of Library (Independent
Variable)
Chi-Square Test
This
Chi-Square Test displays a significance value of less than 0.05, namely
0.008, so the differences in the cross tabulation could be regarded as
significant. However, it does not meet the condition of a maximum expected
count of less than 5 (≤ 20%) of 20% of the cells. Therefore, no
generalisations should be made based on the differences in the cross
tabulation. |
Mann-Whitney
U Test and Kruskal Wallis Test
The
Mann-Whitney U Test and the Kruskal Wallis Test are useful for comparing two or
more groups on a variable which is measured at ordinal level (e.g. the
importance assigned to a subject on the scale 1=very important to 4=not
important at all). Mann-Whitney U Tests are used if the independent variable
comprises two groups. Kruskal Wallis Tests are used if the independent variable
comprises three or more groups.
The 0-hypothesis
tested by these two tests is that there is no difference between the groups in
terms of location, focusing on the median as a measure of central tendency. The
median is the middle observation when the observed values are ordered from
lowest to highest (half the values in the sample are smaller and half are
larger). The test focuses on differences in central location and assumes that
any differences in the distribution of the populations are due only to
difference in location.
Example: Q1 - Generally speaking, how important
do you consider the long-term preservation of digital documents? * Type of
Library
Kruskal Wallis Test
The
significance value in this test is less than 0.05, namely 0.024. The
differences in the cross tabulation can therefore be regarded as significant
and generalised as applying to the whole population. |