Statistics in Archaeology

Archaeologists don’t just dig in the dirt – statistics are often used to find trends in the data and reveal trends that weren’t visible with just excavation. Join Chelsi Slotten as she discusses some basic statistical model often used by archaeologists and how to recognize poorly done statistics.

Statistics in Archaeology

Transcript:

Welcome to the Arch 365- a podcast a day every day for 2017.  My name is Chelsi Slotten and on this episode I’m going to be talking about statistics in archaeology.

You see a lot of statistics in archaeology papers, everything from basic statistics like mean, median, and mode, to super advanced stuff like Bayesian statistics.  This podcast is going to talk about the basics of statistics.  The first rule of statistics in archaeology should be if you want to do statistics on your work, involve a statistician from the beginning.  Statistician’s need certain types of data to do their work, if you don’t gather it, they can’t analyze it so unless you’re a statistics whiz talk to someone who is before you start your project, your future self will thank you.

Types of Data

To make the statistics conversation easier, there are some basic terms and ideas you should know.  First, there are three different types of data and you can do different analyses based on the type of data you have.  The first type of data is nominal data.  Nominal data is categorical in nature and does not contain any sort of numerical value or ranking.  Examples of nominal data include yes/no and multiple choice type data such as biological sex, religion, eye color, type of pottery etc.  Next up we have ordinal data, which is data that has some numerical order such as highest to lowest but no numerical values.  This type of data usually includes scales such as work satisfaction, satisfied is better than neutral but we don’t necessarily know by how much.  I don’t see a lot of this type of data in archaeology.  The third type of data is numerical data. Numerical data is really good for statistics.  In case the name didn’t give it away, they are numeric scales that have both an order and a specific difference between the values.  They include data such as age, income, height, temperature, length, width etc.  When in doubt, try to collect numerical level data as it is the data that you can do the most with in statistics.  Your statistician will thank you.

Data Analysis

Once you have your data, what can it tell you?  Well it depends on the type of analysis you do.  You can do univariate analysis, which is the analysis of a single variable, or multivariate analysis, where you analyze multiple variables in relation to one another.  A common place to start in statistics is with the mean, median, and mode.  The mean is the average of your data, the median is the midpoint of your data, and the mode is the most commonly occurring data point.  This information can help you figure out what type of distribution represents your univariate data- normal or skewed.  A normal distribution is when your graph is symmetrical and your mean equals your median.  If your graph is asymmetrical your data is skewed and your mean does not equal your median.

Your sample size here is extremely important.  If you have a small sample size a slight outlier in your data has the possibility to make your sample seem more skewed than it may be, while a large sample will be less effected by outliers.  This is a result of the law of large numbers- basically more data is always good and likely to be more representative.  If there are any statisticians listening, our data sets are almost never as large as you would like, we’ve got preservation issues but we do the best we can.

Moving on to multivariate analysis, in this case bivariate analysis.  Bivariate analysis is the relationship between two variables.  If you’ve ever heard the phrase correlation is not causality, that applies here.  This type of analysis can tell you if two variables are related but not whether one change in data caused the other.  The most common multivariate analyses that you see in archaeology are chi-squared and  t-tests.  Chi squared tests are done with nominal data and t-tests should only be done with interval data.  If someone is trying to do a t-test with nominal data you should ignore their conclusions because that’s not a valid test.

Some other things you may want to look out for when interpreting statistical analysis.  Beware when people use stats to prove they theory they want to prove.  Statistics can be manipulated to make something seem true, so if people are using an odd statistical test, greatly reducing the size of their sample for no logical reason or not presenting all their data be wary.  Another big no-no ins statistics is removing outliers from your data because the results are better without outliers.  Unless the author gives an EXTREMELY compelling reason, such as contaminated samples, it’s likely the statistics are being manipulated.  Statistics are very useful for eliminating working hypothesis and providing the reader with the hypothesis you have discounted is a great way to show that the researcher has done a thorough analysis and followed the scientific process.  It’s also important to note that absence of evidence is not evidence of absence.  Treat any paper that states something wasn’t found so it couldn’t exist with extreme caution.  This is especially true in archaeology where preservation bias can erase a lot of our data, that doesn’t mean it never existed.

So remember to be skeptical of what you’re reading and make sure the statistics are being accurately used before you accept a conclusion based on statistics.  If you’re interested in learning more about how to detect faulty statistics check out the book “Fine Art of Bolony Detection” by Carl Sagan.  Thanks for listening.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Up ↑

%d bloggers like this: