Big Data Future Begins Today!

Big Data Future begins today with a keynote address from Joel Gurin, Founder of and former chair of the Obama White House Task Force on Smart Disclosure. The event begins at 7:00 PM in the Saxbe Auditorium, Drinko Hall, 55 West 12th Avenue, Columbus, Ohio.

Panels will begin tomorrow at 8:30. Because the event has become so popular, please plan to arrive early to ensure your seat.

There will be a first-come-first-served standby line for people who wish to come but were not able to register on time. You are also welcome to watch the conference via a webcast that can be found on our website Follow the “Live Streaming/Archive” tab to get to the webstream. The entire event will be webcasted and access to that webcast will be completely free.

The Big Data Team at Ohio State is looking forward to this exciting week!

Connecting the dots–a look into merging data sets at Ohio State

Connecting the dots–a look into merging data sets at Ohio State

By: Megan Weyrauch 

Ohio State is in the process of building a system to merge multiple student and staff data sets and systems to better understand the student experience through the analysis of the integrated data. 

Institutional Research and Planning Director Julie Carpenter-Hubin said that Ohio State presently lacks a unified system to tie its various data sets together.

“The HR data resides in HR and the student data resides in enrollment services,” she said. “You can get to these various data sets but there is not yet a single place where you can go and easily pull data from all of those data sets.” 

Executive Director of the Center for the Study of Student Life Dr. Lance Kennedy-Phillips said that there are committees right now that are looking at how the university can connect all of its data.  

“We’re a large institution with various data warehouses in various areas of campus,” Kennedy-Phillips said. “We have the infrastructure. We have the data. We just need to figure out how to connect all of the various pieces.”

James Brenza, chief data officer for the Office of the Chief Information Officer, said that while a small portion of the data at Ohio State is already integrated, he is working on building up the infrastructure necessary to integrate all of Ohio State’s data. 

In order to merge the data, Brenza said that he is working on building the necessary infrastructure, which includes both a hardware and a software side.

“The hardware side of it, the data warehouse itself became outdated and needs to be replaced, so we are in the process of replacing all of the hardware right now,” Brenza said. “The software aspect of it, then, is how do you pull in all these disparate data sources and get them integrated. So that’s the second piece and the longer ongoing piece.”

Brenza said enough hardware was purchased to cover his teams for the next five years.

“The software side will be an ongoing stream of projects to keep going after piece, after piece, after piece,” he said.

In order to get the different data sets integrated, Brenza said that each piece must be looked at individually and analyzed in a business case, where the return on investment of why you want to integrate that data must be proved.

“Every piece you want to integrate, you have to demonstrate the business case,” Brenza said. “What’s the value of each piece of data? What’s the value of integrating it with other pieces of data? If you were to analyze it, what difference could you make?”

Brenza said that he will have to work with data stewards around the university, including Kennedy-Phillips, to begin integrating the data sets.

Once it is completed, however, Carpenter-Hubin said that she sees a lot of possibilities for the future if this data is integrated.

“I think we see a lot of possibilities but until we have that infrastructure firmly in place it’s going to be hard to do much of anything,” she said. “It’s going to take a little time to connect all of the silos.”

Kennedy-Phillips said that he thinks connecting all of the data would result in a better understanding of the student experience, which could lead to the university providing services to help students succeed. 

“I think higher education has an opportunity to really leverage a lot of data that we have to provide better services for our students,” he said. “That is a priority of the institution from my perspective.”


Big Data and Transportation at Ohio State

By Matthew McGreevy

Smart Phones and big data assist GIS scholars in analyzing the flow of transportation and movement on campus

COLUMBUS, Ohio – Geographic Information Science scholars are using the pinging of smart phones off cellular towerHybridBus_July2011-1020W_1s to gather data related to identifying and charting traffic and pedestrian patterns in real-time across the Ohio State University campus.

Harvey Miller, the Bob and Mary Reusche chair in GIS in the OSU Department of Geography, said the data allows researchers to ask different questions about transportation. For example, data can show what type of people use the campus bus service at a given time or what time is most efficient for a delivery truck to do its business at the university.

“I can now think of cities as a collection of individuals moving, not just a big amorphous blob with waves of people moving through it,” said Miller. “I want to know how we can build transportation systems in cities such that we can create sustainable development and livable communities.”

Using GPS or smart phones to relay information about a population is not a unique idea, and this type of data gathering is becoming commonplace with the creation of Smart Cities.

In Smart Cities, sensors embedded in the city’s infrastructure send an array of information about a population’s activity to a large inter-connected network. This network stores the data for officials who can interpret the information and devise more environmentally-friendly and practical solutions to urban problems.

Such problems on the Ohio State campus include a high influx of delivery trucks jamming limited roadways, especially in this age of online commerce and rapid delivery, said Morton O’Kelly, director of Ohio State’s Center for Urban and Regional Analysis. “Using technology for helping to move packages or freight, I’d say is very important,” said O’Kelly. “It helps to get packages there in a very efficient way.”

O’Kelly acknowledged the Campus Area Bus Service’s recent implementation of GPS software as another facet of the smart data revolution. “We have become very used to things like that working without realizing that underneath is an efficient computation underpinning things that allow us to have a smooth operating transportation system,” said O’Kelly.

Miller said these are just preliminary steps in Ohio State’s path to becoming a Smart City, but that path is far from certain. “The technology is there; the data is there,” Miller said. “The hard part is getting it all together and getting a commitment from the university.”

“It’s all there; it just takes the right catalyst.”

Columbus School Scandal and Big Data

Public Records Prove Essential in a Columbus City Schools Scandal

By Aubrey Sinclair 

Data tampering in Columbus City Schools changed the face of the district when Columbus Dispatch education writers Jennifer Smith Richards and Bill Bush revealed the scandal in June 2012.

Smith Richards and Bush often dig through public records to add data to their stories, but this time was unlike any other. 

They discovered that 2.8 million absences had been deleted by principals. In some cases, up to 500 grades had been changed per day. Students were also withdrawn and attendance records were erased. 

Doug Caruso, editor and supervisor of Smith Richards and Bush, said the day after principals were invited to meetings at data centers, thousands of records were changed. 

It was pretty apparent that principals were told to do this, said Caruso. 

“They couldn’t have thought that was OK,” said Smith Richards.  “It’s like beyond what you could probably defend.” 

The state auditor and FBI are now involved in an ongoing investigation, but prior to requesting public records in 2012, the scandal had only been an idea.   

Smith Richards found out that the superintendent at the time, Gene Harris, was asking principals to stop altering some of the data. She became curious and started looking into how much data was being altered and why. 

“So my first thought was, we were on to something big and we needed to expand on it and that we needed to get more information about it,” said Caruso.

This specific set of data took months to get, but on average, public records take about six weeks. The requested data contained 200 spreadsheets and data files, said Smith Richards.

“And so we just have to know that whenever something becomes available, we have a limited amount of time if we want to break news out of it,” said Smith Richards.

Other news outlets typically request the same public records, making it even more important to get the story out first, said Smith Richards.

The first story published was about the millions of absences that had been deleted during the course of several years.

Smith Richards explained that without the data from the public records, the story written would have been “a story of opinions.”

“It adds a scientific answer, frankly, to your journalism.  It takes it from abstract to concrete,” said Smith Richards.  “I think that’s really important.”

This school year, significantly less absences have been deleted.  

“I mean think about it, if you get caught doing something you weren’t supposed to be doing, whether you had bad intent or not, and you know that somebody is watching you, I think you’d probably change your way,” said Smith Richards.

To this day, Columbus Dispatch reporters still use the same data sets they used to write the first story. So far, 130 stories on data tampering have been written using the original data sets.