Data analysis and Geostatistics

2022

 

Exercises

Lab 1:    Data distribution and descriptors


        1_1 - Data distribution and histograms

        1_2 - Cumulative frequency plots

        1_3 - Normal vs. lognormal data distribution

        1_4 - Multi-modality (data)


        normal probability graphing paper - linear scale     pdf

        normal probability graphing paper - log scale         pdf


Lab 2:    Error propagation and correlation between variables


        2_1 - Error propagation

        2_2 - Rounding and significant digits

        2_3 - Count statistical error and repeat analyses

        2_4 - Correlation coefficients

        2_5 - The closure effect


Lab 3:    Significance of the correlation coefficient and confidence limits


         3_1 - Significance of correlation coefficients 1

         3_2 - Significance of correlation coefficients 2       data

         3_3 - Confidence limits on the mean


Lab 4:    Student t-test and ANOVA


         4_1 - Comparing two sets of data for equality

         4_2 - ANOVA on reference materials

Overview of the dumps of a Ni-PGE mine in Botswana.


Copyright:     Vincent van Hinsberg & Simon Vriend


Last updated:     February 2022

Labs - applying statistical tools to geological data

The best approach to learning (geo)statistical tools, interpreting their results, and identifying how and where these tools can aid in understanding data, is in working with real-world geo-data. We will do in this in two ways in the lab component of this course. In order to become familiar with calculating statistical properties in spreadsheet programs and the PAST statistics software, and to highlight certain statistical methods, approaches and properties, there will be a set of exercises. In a second component, you will work towards progressively understanding a large dataset of litho-geochemical samples from BC using the various statistical tools and approaches that have been discussed in the lectures. The dataset contains geological information, element concentrations and field observations, which will have to be explored in combination.


The datasets are original, unmodified data as provided by a variety of laboratories and should therefore be thoroughly checked before starting with your analysis and interpretation. All datasets contain a wealth of statistically interesting features and it is impossible to discover all. That is not the point of the lab and I will not grade your report on whether or not you found everything and tried every technique. The purpose is to dissect and understand the dataset so that you are able to interpret the data in a geological and geochemical context, and your reports will be graded on the level of insight into these data. There are many ways to dissect a dataset and there are generally a variety of statistical techniques that will lead you to the same conclusion. So feel free to attack this dataset in whatever way you like, but the following statistical tools should at least be included;


    • data description (e.g. mean, IQR, median, mode etc)

    • scatter diagrams, box-and-whiskers plots, histograms

    • tests of distribution, cumulative frequency diagrams

    • correlation tests and correlation matrices

    • t-tests, F-tests or their rank-equivalents

    • analysis of variance

    • cluster and/or discriminant function analysis

    • principle component and/or factor analysis

    • (multiple) regression analysis

    • spatial analysis of the data, maps, semivariograms


You are strongly encouraged to work on the exercises and project together and are free to work on these during the labs or at any time convenient to you. The TA is available during lab times to discuss the exercises and to answer any questions regarding the project. A final report on the data analysis project is to be handed in at the end of the course and counts 40% towards the final grade This final report should be approximately 10 pages in length, excluding tables and figures (these are recommended to be put into appendices). One report is to be submitted per group of 3. The exercises are not marked, but you are strongly recommended to check your results with the TA.