Brief introduction to market research statistics

 

 

To enable readers to use the results of this survey and get the most out of the information, the following chapter offers a short introduction to the statistical methods used. This will help eliminate potential misunderstandings and misinterpretations of the data and provide support for accurately understanding the statistical tests.


Cross Tabulations and Significance Tests

 

Most market research information is based on the comparison of one variable (the dependent variable) with regard to other variables (the independent variables). To obtain such a cross tabulation, at least two columns of data are taken and a table created showing the frequency of occurrence of all pairs of values in the these columns. The percentages of Variable A within the subgroups of Variable B and percentages of Variable B within the subgroups of Variable A can then be compared. However, in most cases only one of these two percentages is of particular interest, namely the percentages of the dependent variable within the subgroups of the independent variable.

 

 

Example:

 

Q2 - Generally speaking, do you consider the long-term preservation of digital documents to be an intrinsic task of libraries? (Dependent Variable) * Type of Library (Independent Variable)

 

 

General Research Library

National Library

University Library

Total

Yes

79.2%

96.6%

92.4%

90.8%

 

38

28

220

286

No

20.8%

3.4%

7.6%

9.2%

 

10

1

18

29

Total

100.0%

100.0%

100.0%

100.0%

 

N=48

N=29

N=238

N=315

 

The percentage of particular interest is the one within the Type of Library variable, which answers the question: How many of the respondents within the each group (national libraries, university libraries and general research libraries) regard the long-term preservation of digital documents as an intrinsic task of libraries? According to the cross tabulation within the group of national libraries (96.6%), this group regards the long-term preservation of digital documents as an intrinsic task of libraries more often than the other two groups.

 

 

 

Significance tests then check whether differing values measured within different groups are accidental or not. If they are not accidental, the results are transferable to the population. It is common to apply a significance level of 0.05 as a standard in market research statistics, thus guaranteeing significance with a probability of error of less than 5%. In practice, this means that a measurement of significance less than or equal to 0.05 indicates that differences measured can be regarded as being significant and can be transferred to the population, i.e. all the national libraries, university libraries and general research libraries of the EU and EFTA states.

 


Chi-Square Test

 

The Chi-Square Test is used to compare two or more groups of variables on a nominal scale (e.g. type of library, country etc.). The 0-hypothesis tested by the Chi-Square Test is that no differences exist between the groups with respect to the relative frequency with which group members fall into the various categories of the variable of interest. Pearson Chi-Square measures the asymptotic significance, and is thus appropriate for all types of variables, whereas Fisher’s exact test is applicable only to 2x2 tables. While the Chi-Square measures may indicate that there is a relationship between variables, they do not indicate the strength or direction of the relationship.

 

General conditions for obtaining valid measurements with Chi-Square Tests include a minimum expected count ≥ 1 and a maximum expected count of ≤ 5 (≤ 20%) of 20% of the cells.

 

 

Example:

 

Q2 - Generally speaking, do you consider the long-term preservation of digital documents to be an intrinsic task of libraries? (Dependent Variable) * Type of Library (Independent Variable)

 

 

General Research Library

National Library

University Library

Total

Yes

79.2%

96.6%

92.4%

90.8%

 

38

28

220

286

No

20.8%

3.4%

7.6%

9.2%

 

10

1

18

29

Total

100.0%

100.0%

100.0%

100.0%

 

N=48

N=29

N=238

N=315

 

Chi-Square Test

 

 

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

9.682(a)

2

.008

N of Valid Cases

315

 

 

a)  2 cells (33.3%) have expected count less than 5. The minimum expected count is 2.67. 

 

This Chi-Square Test displays a significance value of less than 0.05, namely 0.008, so the differences in the cross tabulation could be regarded as significant. However, it does not meet the condition of a maximum expected count of less than 5 (≤ 20%) of 20% of the cells. Therefore, no generalisations should be made based on the differences in the cross tabulation.

 

 

 


Mann-Whitney U Test and Kruskal Wallis Test

 

The Mann-Whitney U Test and the Kruskal Wallis Test are useful for comparing two or more groups on a variable which is measured at ordinal level (e.g. the importance assigned to a subject on the scale 1=very important to 4=not important at all). Mann-Whitney U Tests are used if the independent variable comprises two groups. Kruskal Wallis Tests are used if the independent variable comprises three or more groups.

 

The 0-hypothesis tested by these two tests is that there is no difference between the groups in terms of location, focusing on the median as a measure of central tendency. The median is the middle observation when the observed values are ordered from lowest to highest (half the values in the sample are smaller and half are larger). The test focuses on differences in central location and assumes that any differences in the distribution of the populations are due only to difference in location.

 

 

 

Example:

 

Q1 - Generally speaking, how important do you consider the long-term preservation of digital documents? * Type of Library

 

 

General Research Library

National Library

University Library

Total

Very important (1)

67.3%

93.5%

75.0%

75.6%

 

33

29

186

248

Rather important (2)

28.6%

6.5%

23.8%

22.9%

 

14

2

59

75

Rather unimportant (3)

2.0%

.0%

.8%

.9%

 

1

0

2

3

Totally unimportant (4)

2.0%

.0%

.4%

.6%

 

1

0

1

2

Total (Mean)

100.0%

100.0%

100.0%

100.0%

 

N=49

N=31

N=248

N=328

 

(1.39)

(1.06)

(1.27)

(1.27)

 

Kruskal Wallis Test

 

 

 

Chi-Square

7.494

df

2

Asymp. Sig.

.024

 

The significance value in this test is less than 0.05, namely 0.024. The differences in the cross tabulation can therefore be regarded as significant and generalised as applying to the whole population.