Skip to main content

Trends, noise, and errors OH MY!

How do we interpret initial findings in meaningful ways?

An Example

When using this image to count the number of tubes present, we could imagine various scenarios that illustrate different types of errors:

Let’s say my data are 7 blue tubes, 3 purple tubes, 2 green tubes, 1 yellow tube, and 3 clear tubes.

Is this accurate? Is this meaningful? Well here are some examples of ways that I might reflect on these data…

  • There are actually at least two shades of blue tubes, does that color variation matter? Should these tubes all be counted in one bin or more than one bin? Are the long periwinkle tubes more accurately binned with the purple tubes?
  • There are at least two shapes of tubes: those that are long and pointy just at the end and those that taper along the lower half of the tube. Is this a meaningful difference? Is this a more important category to study than color? After all, this parameter changes the volume each tube can hold while color might just be aesthetic.
  • There are more than 3 clear tubes but they are hard to see and thus hard to count. How hard should we work to count them? Are there other methods we could use? Could we change the exposure of the picture? Change the lighting? Can we touch them, thus using a different method/metric for detecting them?
  • I am confident there are two green tubes because this color is bold, significantly different from the others, and there are no hints of more green elsewhere in the pile. There is a lot less clarity for the purple, pink, and maybe even orange hues and further study could be needed if these colors or color differences are important.
  • Only one yellow tube? Is it actually yellow or did the photo just get edited in a weird way to make a clear tube look yellow?

While this example is very silly, I hope it highlights just how complicated data can be even when the dataset seems simple. All data is messy, especially preliminary and amateur data, so it’s very important that we help students to dissect the meaning in their data and use this information to make good use of their further study on the project.

Your browser is out of date!

Update your browser to view this website correctly. Update my browser now