Data analysis and Geostatistics
Data analysis and Geostatistics
2022
Exercises
Lab 1: Data distribution and descriptors
1_1 - Data distribution and histograms
1_2 - Cumulative frequency plots
1_3 - Normal vs. lognormal data distribution
normal probability graphing paper - linear scale pdf
normal probability graphing paper - log scale pdf
Lab 2: Error propagation and correlation between variables
2_1 - Error propagation
2_2 - Rounding and significant digits
2_3 - Count statistical error and repeat analyses
2_4 - Correlation coefficients
2_5 - The closure effect
Lab 3: Significance of the correlation coefficient and confidence limits
3_1 - Significance of correlation coefficients 1
3_2 - Significance of correlation coefficients 2 data
3_3 - Confidence limits on the mean
Lab 4: Student t-test and ANOVA
4_1 - Comparing two sets of data for equality
4_2 - ANOVA on reference materials
Overview of the dumps of a Ni-PGE mine in Botswana.
Copyright: Vincent van Hinsberg & Simon Vriend
Last updated: February 2022
Labs - applying statistical tools to geological data
The best approach to learning (geo)statistical tools, interpreting their results, and identifying how and where these tools can aid in understanding data, is in working with real-world geo-data. We will do in this in two ways in the lab component of this course. In order to become familiar with calculating statistical properties in spreadsheet programs and the PAST statistics software, and to highlight certain statistical methods, approaches and properties, there will be a set of exercises. In a second component, you will work towards progressively understanding a large dataset of litho-geochemical samples from BC using the various statistical tools and approaches that have been discussed in the lectures. The dataset contains geological information, element concentrations and field observations, which will have to be explored in combination.
The datasets are original, unmodified data as provided by a variety of laboratories and should therefore be thoroughly checked before starting with your analysis and interpretation. All datasets contain a wealth of statistically interesting features and it is impossible to discover all. That is not the point of the lab and I will not grade your report on whether or not you found everything and tried every technique. The purpose is to dissect and understand the dataset so that you are able to interpret the data in a geological and geochemical context, and your reports will be graded on the level of insight into these data. There are many ways to dissect a dataset and there are generally a variety of statistical techniques that will lead you to the same conclusion. So feel free to attack this dataset in whatever way you like, but the following statistical tools should at least be included;
• data description (e.g. mean, IQR, median, mode etc)
• scatter diagrams, box-and-whiskers plots, histograms
• tests of distribution, cumulative frequency diagrams
• correlation tests and correlation matrices
• t-tests, F-tests or their rank-equivalents
• analysis of variance
• cluster and/or discriminant function analysis
• principle component and/or factor analysis
• (multiple) regression analysis
• spatial analysis of the data, maps, semivariograms
You are strongly encouraged to work on the exercises and project together and are free to work on these during the labs or at any time convenient to you. The TA is available during lab times to discuss the exercises and to answer any questions regarding the project. A final report on the data analysis project is to be handed in at the end of the course and counts 40% towards the final grade This final report should be approximately 10 pages in length, excluding tables and figures (these are recommended to be put into appendices). One report is to be submitted per group of 3. The exercises are not marked, but you are strongly recommended to check your results with the TA.