Data science class investigating N.C. shark attacks

Thompson and graduate students review data
Wednesday, July 29, 2015

After weeks of research, graduate students in the UNC Charlotte course “Knowledge Discovery in Databases” have analyzed diverse data sets related to sharks and have discovered certain patterns emerge.

There appears to be a correlation between the frequency of shark attacks and the phases of the moon, with attacks more prevalent around the time of the full moon. Other factors like tourism, crab population and sea temperature also appear in decision trees and rules that were part of the research process.

Adjunct faculty member Pamela Thompson is the instructor for “Knowledge Discovery in Databases,” which is part of UNC Charlotte’s Data Science Initiative. She cautioned that the study was for a class project and not vetted, peer-reviewed research; she hopes that the project will lead to a formal and more thorough research study with collaboration among many.

“Student decisions on features to use, preprocessing and cleansing the data and even the choice of mining algorithms can affect the outcomes in either a positive and negative way,” stated Thompson.  She added swimmers should always be careful in the water, particularly with the increase in the number of attacks in North Carolina this year.

“Knowledge Discovery in Databases” deals with data mining, a core element of “Big Data.” While data mining employs statistical methods, it goes beyond the classical approaches of statistical research to uncover hidden patterns, relationships and knowledge that may not be readily apparent, said Thompson. Attempting to understand shark attacks of the North Carolina coast was a real-world issue that enabled students to apply the stages of the data mining process from beginning to end.

According to Thompson, the students undertook the challenge and sought out diverse data sets, many of which might at first glance seem unrelated, out of which they were able to glean what data scientists refer to as “actionable insights.”

They located the Global Shark Attack File, a worldwide database that provided information on thousands of attacks. This particular dataset gave information as to date, time, species and location for specific events. The students also located information from sources such as the N.C. Division of Marine Fisheries, which provided data on the yield of crab fishing for the years up to 2014. They also collected data that came from many different sources on weather conditions and water conditions such as visibility, color, temperature, wave height and salinity.

 The students also had to gain domain knowledge on the habits of sharks – particularly with respect as to how different species of sharks behave because, “they had to learn to look at things from the viewpoint of a shark,” said Thompson. The students performed a basic literature review and spoke with a shark researcher, and they interviewed a North Carolina crab fisherman who had more than 40 years of experience with life along the state’s coast.

In addition to their discovery of patterns related to sharks and the moon, the students’ research also noted that other animals, such as crab populations, displayed behavior that tended to fluctuate with lunar cycles. Additional study confirmed that blue crabs moved greater distances in conjunction with the moon’s phases. Anecdotal observations supported this research. The experienced crab fisherman also noted that the crabs tended to molt at certain times of the month. Research indicated that sharks like to eat blue crabs as well as other predators that consume crabs.

 “The study succeeded on several levels, it energized the students while familiarizing them with the science of data mining, and it identified correlations worthy of future investigation,” noted Thompson, a full-time faculty member in the Ketner School of Business at Catawba College. She completed a Ph.D. in computer science from UNC Charlotte.