Big Data Future Begins Today!

Big Data Future begins today with a keynote address from Joel Gurin, Founder of and former chair of the Obama White House Task Force on Smart Disclosure. The event begins at 7:00 PM in the Saxbe Auditorium, Drinko Hall, 55 West 12th Avenue, Columbus, Ohio.

Panels will begin tomorrow at 8:30. Because the event has become so popular, please plan to arrive early to ensure your seat.

There will be a first-come-first-served standby line for people who wish to come but were not able to register on time. You are also welcome to watch the conference via a webcast that can be found on our website Follow the “Live Streaming/Archive” tab to get to the webstream. The entire event will be webcasted and access to that webcast will be completely free.

The Big Data Team at Ohio State is looking forward to this exciting week!

Connecting the dots–a look into merging data sets at Ohio State

Connecting the dots–a look into merging data sets at Ohio State

By: Megan Weyrauch 

Ohio State is in the process of building a system to merge multiple student and staff data sets and systems to better understand the student experience through the analysis of the integrated data. 

Institutional Research and Planning Director Julie Carpenter-Hubin said that Ohio State presently lacks a unified system to tie its various data sets together.

“The HR data resides in HR and the student data resides in enrollment services,” she said. “You can get to these various data sets but there is not yet a single place where you can go and easily pull data from all of those data sets.” 

Executive Director of the Center for the Study of Student Life Dr. Lance Kennedy-Phillips said that there are committees right now that are looking at how the university can connect all of its data.  

“We’re a large institution with various data warehouses in various areas of campus,” Kennedy-Phillips said. “We have the infrastructure. We have the data. We just need to figure out how to connect all of the various pieces.”

James Brenza, chief data officer for the Office of the Chief Information Officer, said that while a small portion of the data at Ohio State is already integrated, he is working on building up the infrastructure necessary to integrate all of Ohio State’s data. 

In order to merge the data, Brenza said that he is working on building the necessary infrastructure, which includes both a hardware and a software side.

“The hardware side of it, the data warehouse itself became outdated and needs to be replaced, so we are in the process of replacing all of the hardware right now,” Brenza said. “The software aspect of it, then, is how do you pull in all these disparate data sources and get them integrated. So that’s the second piece and the longer ongoing piece.”

Brenza said enough hardware was purchased to cover his teams for the next five years.

“The software side will be an ongoing stream of projects to keep going after piece, after piece, after piece,” he said.

In order to get the different data sets integrated, Brenza said that each piece must be looked at individually and analyzed in a business case, where the return on investment of why you want to integrate that data must be proved.

“Every piece you want to integrate, you have to demonstrate the business case,” Brenza said. “What’s the value of each piece of data? What’s the value of integrating it with other pieces of data? If you were to analyze it, what difference could you make?”

Brenza said that he will have to work with data stewards around the university, including Kennedy-Phillips, to begin integrating the data sets.

Once it is completed, however, Carpenter-Hubin said that she sees a lot of possibilities for the future if this data is integrated.

“I think we see a lot of possibilities but until we have that infrastructure firmly in place it’s going to be hard to do much of anything,” she said. “It’s going to take a little time to connect all of the silos.”

Kennedy-Phillips said that he thinks connecting all of the data would result in a better understanding of the student experience, which could lead to the university providing services to help students succeed. 

“I think higher education has an opportunity to really leverage a lot of data that we have to provide better services for our students,” he said. “That is a priority of the institution from my perspective.”


Big Data and Transportation at Ohio State

By Matthew McGreevy

Smart Phones and big data assist GIS scholars in analyzing the flow of transportation and movement on campus

COLUMBUS, Ohio – Geographic Information Science scholars are using the pinging of smart phones off cellular towerHybridBus_July2011-1020W_1s to gather data related to identifying and charting traffic and pedestrian patterns in real-time across the Ohio State University campus.

Harvey Miller, the Bob and Mary Reusche chair in GIS in the OSU Department of Geography, said the data allows researchers to ask different questions about transportation. For example, data can show what type of people use the campus bus service at a given time or what time is most efficient for a delivery truck to do its business at the university.

“I can now think of cities as a collection of individuals moving, not just a big amorphous blob with waves of people moving through it,” said Miller. “I want to know how we can build transportation systems in cities such that we can create sustainable development and livable communities.”

Using GPS or smart phones to relay information about a population is not a unique idea, and this type of data gathering is becoming commonplace with the creation of Smart Cities.

In Smart Cities, sensors embedded in the city’s infrastructure send an array of information about a population’s activity to a large inter-connected network. This network stores the data for officials who can interpret the information and devise more environmentally-friendly and practical solutions to urban problems.

Such problems on the Ohio State campus include a high influx of delivery trucks jamming limited roadways, especially in this age of online commerce and rapid delivery, said Morton O’Kelly, director of Ohio State’s Center for Urban and Regional Analysis. “Using technology for helping to move packages or freight, I’d say is very important,” said O’Kelly. “It helps to get packages there in a very efficient way.”

O’Kelly acknowledged the Campus Area Bus Service’s recent implementation of GPS software as another facet of the smart data revolution. “We have become very used to things like that working without realizing that underneath is an efficient computation underpinning things that allow us to have a smooth operating transportation system,” said O’Kelly.

Miller said these are just preliminary steps in Ohio State’s path to becoming a Smart City, but that path is far from certain. “The technology is there; the data is there,” Miller said. “The hard part is getting it all together and getting a commitment from the university.”

“It’s all there; it just takes the right catalyst.”

Columbus School Scandal and Big Data

Public Records Prove Essential in a Columbus City Schools Scandal

By Aubrey Sinclair 

Data tampering in Columbus City Schools changed the face of the district when Columbus Dispatch education writers Jennifer Smith Richards and Bill Bush revealed the scandal in June 2012.

Smith Richards and Bush often dig through public records to add data to their stories, but this time was unlike any other. 

They discovered that 2.8 million absences had been deleted by principals. In some cases, up to 500 grades had been changed per day. Students were also withdrawn and attendance records were erased. 

Doug Caruso, editor and supervisor of Smith Richards and Bush, said the day after principals were invited to meetings at data centers, thousands of records were changed. 

It was pretty apparent that principals were told to do this, said Caruso. 

“They couldn’t have thought that was OK,” said Smith Richards.  “It’s like beyond what you could probably defend.” 

The state auditor and FBI are now involved in an ongoing investigation, but prior to requesting public records in 2012, the scandal had only been an idea.   

Smith Richards found out that the superintendent at the time, Gene Harris, was asking principals to stop altering some of the data. She became curious and started looking into how much data was being altered and why. 

“So my first thought was, we were on to something big and we needed to expand on it and that we needed to get more information about it,” said Caruso.

This specific set of data took months to get, but on average, public records take about six weeks. The requested data contained 200 spreadsheets and data files, said Smith Richards.

“And so we just have to know that whenever something becomes available, we have a limited amount of time if we want to break news out of it,” said Smith Richards.

Other news outlets typically request the same public records, making it even more important to get the story out first, said Smith Richards.

The first story published was about the millions of absences that had been deleted during the course of several years.

Smith Richards explained that without the data from the public records, the story written would have been “a story of opinions.”

“It adds a scientific answer, frankly, to your journalism.  It takes it from abstract to concrete,” said Smith Richards.  “I think that’s really important.”

This school year, significantly less absences have been deleted.  

“I mean think about it, if you get caught doing something you weren’t supposed to be doing, whether you had bad intent or not, and you know that somebody is watching you, I think you’d probably change your way,” said Smith Richards.

To this day, Columbus Dispatch reporters still use the same data sets they used to write the first story. So far, 130 stories on data tampering have been written using the original data sets.

Fisher College of Business, IBM and Smart Cities

By Joel ThomasImage

The use of big data, and the Fisher College of Business’s recent partnership with IBM, has Columbus on the way to becoming a smart city.

The recently built IBM Data Analytics Center in Columbus partnered with the Fisher College of Business to create a curriculum for undergraduates in analytics, according to Scott Cook, an IBM spokesperson.

“There is a significant knowledge gap. There are a lot more jobs in analytics than there are skilled professionals,” said Cook. “We are helping to create a curriculum that closes that gap.”

IBM selected Columbus for an analytical center because of its central location. The city has the second highest number of college graduates in a 200-mile radius, and IBM has had a presence in Columbus for 85 years, said Cook.

Businesses are moving toward using big data in order to keep up with the competition. The competitive edge will come from students who are educated enough in analytics to help them use data more efficiently and make better decisions.

The Fisher College of Business is also interested in using big data for themselves with the help of IBM, said Ralph Greco, the Director of Business Analytics at Fisher.

“If a department has 95 percent of its students in jobs six months after graduation, then the curriculum is fine,” said Greco. “If, on the other hand, they are placing 85 percent of their students in jobs and 12 percent of them work in a doctor’s office or somewhere else, then they have a problem. They have to look at why their students aren’t getting jobs in their fields and possibly reevaluate the curriculum.”

Colleges at Ohio State can use big data to look at students applying for admittance and determine which of them are most likely to find a job after graduation.

“By studying data on the students who graduate and immediately find a job, colleges can create a model of an ‘ideal student’ that they can use during their admittance process,” said Greco.

Students should not only be able to use big data to think of the right question and then find the answer, but also to know what to do once they have that answer. Education on analytics is the first step toward creating a smarter city, said Greco.

The implementation of big data plays a vital role in the process a city takes towards becoming a smart city.

“A smart city is one that uses data as an infrastructure to make other infrastructure more effective,” said Harvey Miller, the Reusche Chair in Geographical Information Science at Ohio State.

The use of smart phones throughout cities means people are constantly sending out data. GPS services on phones allow big data to be collected at an individual level in real time. Miller says this type of data collection is unprecedented.

“Throughout the last century or so we’ve only been able to work on more aggregate levels, such as counting the number of cars that roll down a street or doing a travel survey once every five to 10 years,” said Miller. “Cities can use this data to create better transportation systems that are better tailored for supporting people’s activities.”

Big data is collected in a smart city in areas such as transportation, energy and water use, crime rates and economic activity.

Miller does not think the idea of a smart city is on anyone’s radar screen at Ohio State, but he said he believes the potential exists.

“The elements are here,” said Miller. “The IBM data center is here, there’s a very progressive city government and the recent push to make Columbus big on biking will encourage people to move closer to downtown.”


OSU Wexner Medical Center and Big Data

By Karlie Frank


The Ohio State University Wexner Medical Center will soon launch a study aimed at improving patient adherence to its cardiac rehabilitation program by harnessing the power of big data and social networking. 

The study, conceived by Lise Worthen-Chaudhari, Research Assistant Professor of Physical Medicine, and Dr. Martha Gulati, Director of Preventive Cardiology and Women’s Cardiovascular Health, will use data from past cardiac patient histories to construct a new rehabilitation program. The program will incorporate text messaging to encourage patients to go to sessions.

Farsite, a data analysis company in Columbus, will help the medical center perform statistical analysis on the historical data to see which factors are most telling in adherence or non-adherence to rehab. 

Michael Gold is the founder of Farsite and a leader on the project.”You can look at those historical patients and their compliance or noncompliance with the cardiac rehab, and you can identify what factors are important,” he said. “We use that to help craft the study where we’re actually enrolling [new] patients.”

Though the project is still in the beginning stages of data collection, two factors that have popped up the most are health insurance and type of patient employment.

“Folks that don’t have insurance are less likely to go [to cardiac rehab], and people that are self-employed are less likely to go, and you can draw any number of conclusions,” Gold said. “Are those [self-employed] people busier? Are those people less likely to have health insurance?”

Worthen-Chaudhari has done much research on social intervention in the health care space and envisioned using it in cardiac rehab. She, Gulati and Gold deemed text messaging the most appropriate method of social intervention.

“We tried to think about how we can take the core principles that bring meaning and power to social media, and do it in such a way that’s more accessible to the patient population that’s most likely to suffer from a heart attack, which are older people,” said Gold.

Family and friends of the patient will write out text messages to the patient ahead of time, encouraging them to go to their sessions. Farsite will bank these texts and send them to the patient throughout the program. 

Worthen-Chaudhari is passionate about the role of family and friends in the rehabilitation process.

“The idea that we might be able to help not only cardiac heart failure patients but help their loved ones help the patient as well—I love that,” she said. 

Worthen-Chaudhari foresees big data continuing to transform the health care industry.

“It’s pretty cool, the power [of big data],” she said. “Anything we can do to improve patient health, we have to do.”

Big Data Future Conference 2014

By Marcus Andrews

The Ohio State University Moritz College of Law will host a free multidisciplinary conference from March 19-21, 2014, with the goal of raising awareness about big data’s potential impact on economic, social and political life.ohio-state-university

Ohio State law professor Peter Shane, the lead organizer of Big Data Future, explained that helping people understand big data and connect it to their lives is an important part of the conference.

“Any one of these topics could sustain its own full day symposium or its own full semester course,” Shane said. “But the idea of the conference is really to give people, to give non-experts, a kind of road map of this terrain—a sense of what the field encompasses.”

Big Data Future will consist of eight panels and a keynote speaker discussing a range of big data topics. Each contributor will also write a paper, and presentations will be video- recorded and archived online.

Joel Gurin, the founder of, will deliver the keynote address. Panel discussion topics will range from the governance of big data, to big data’s impact on health, education and welfare.

Panelists will come from a variety of fields in the university, government and private sectors, with some coming from high-profile companies such as IBM, Twitter and Microsoft.

Shane stressed the importance of the diversity of expertise brought by the speakers.

“My basic defense of interdisciplinary approaches is always that problems rarely show up in single discipline frames,” Shane said. “Whether it’s poverty, or climate change, or economic growth, all of these things require problem solvers from a variety of approaches.”

Caroline Wagner, a professor at Ohio State’s John Glenn School of Public Affairs, explained that big data can be applied to current issues.

“For example,” Wagner explained, “Obamacare, and healthcare and all of these things are going to eventually become simpler to use as we apply big data to these kinds of public problems.”

Wagner further described the advantage of the multidisciplinary approach. A conference of this kind brings together governments that hold data, companies with computing power and universities that want to study the data. These agencies and companies provide a service to the community with the resulting knowledge of how to use data.

“Central Ohio, under the leadership of The Ohio State University, is in a better position than almost any place in the nation to take the lead on big data,” Wagner said. “This conference will help us to kind of focus ourselves in on some of the key questions.”

Big Data Future grew from an annual, single-day symposium that covered topics held by a law student group, I/S: A Journal of Law and Policy for the Information Society, which Shane advises. Big data turned out to be such an important topic that it led to an expansion into a three-day format for 2014.