Data analysis and Geostatistics

Data analysis and Geostatistics

2022

Exercises

Lab 1: Data distribution and descriptors

1_1 - Data distribution and histograms

1_2 - Cumulative frequency plots

1_3 - Normal vs. lognormal data distribution

normal probability graphing paper - linear scale pdf

normal probability graphing paper - log scale pdf

Lab 2: Error propagation and correlation between variables

2_1 - Error propagation

2_2 - Rounding and significant digits

2_3 - Count statistical error and repeat analyses

2_4 - Correlation coefficients

2_5 - The closure effect

Lab 3: Significance of the correlation coefficient and confidence limits

3_1 - Significance of correlation coefficients 1

3_2 - Significance of correlation coefficients 2 data

3_3 - Confidence limits on the mean

Lab 4: Student t-test and ANOVA

4_1 - Comparing two sets of data for equality

4_2 - ANOVA on reference materials

Overview of the dumps of a Ni-PGE mine in Botswana.

Copyright: Vincent van Hinsberg & Simon Vriend

Last updated: February 2022

Labs - applying statistical tools to geological data

The best approach to learning (geo)statistical tools, interpreting their results, and identifying how and where these tools can aid in understanding data, is in working with real-world geo-data. We will do in this in two ways in the lab component of this course. In order to become familiar with calculating statistical properties in spreadsheet programs and the PAST statistics software, and to highlight certain statistical methods, approaches and properties, there will be a set of exercises. In a second component, you will work towards progressively understanding a large dataset of litho-geochemical samples from BC using the various statistical tools and approaches that have been discussed in the lectures. The dataset contains geological information, element concentrations and field observations, which will have to be explored in combination.

The datasets are original, unmodified data as provided by a variety of laboratories and should therefore be thoroughly checked before starting with your analysis and interpretation. All datasets contain a wealth of statistically interesting features and it is impossible to discover all. That is not the point of the lab and I will not grade your report on whether or not you found everything and tried every technique. The purpose is to dissect and understand the dataset so that you are able to interpret the data in a geological and geochemical context, and your reports will be graded on the level of insight into these data. There are many ways to dissect a dataset and there are generally a variety of statistical techniques that will lead you to the same conclusion. So feel free to attack this dataset in whatever way you like, but the following statistical tools should at least be included;

• data description (e.g. mean, IQR, median, mode etc)

• scatter diagrams, box-and-whiskers plots, histograms

• tests of distribution, cumulative frequency diagrams

• correlation tests and correlation matrices

• t-tests, F-tests or their rank-equivalents

• analysis of variance

• cluster and/or discriminant function analysis

• principle component and/or factor analysis

• (multiple) regression analysis

• spatial analysis of the data, maps, semivariograms

You are strongly encouraged to work on the exercises and project together and are free to work on these during the labs or at any time convenient to you. The TA is available during lab times to discuss the exercises and to answer any questions regarding the project. A final report on the data analysis project is to be handed in at the end of the course and counts 40% towards the final grade This final report should be approximately 10 pages in length, excluding tables and figures (these are recommended to be put into appendices). One report is to be submitted per group of 3. The exercises are not marked, but you are strongly recommended to check your results with the TA.