University of Minnesota researchers say they’re producing more data in a day than computers could hold less than a decade ago, and they’ll soon have extra support analyzing it.
In hopes of accelerating research, the University is establishing a new institute to help analyze the large amount of data coming from faculty labs and research centers. The University of Minnesota Informatics Institute will specialize in big data analysis, which officials said is needed more than ever as computers generate more data than researchers can analyze.
“The research landscape has changed quite a bit in the last several years,” said institute director Claudia Neuhauser. “Big data is becoming a major drive of our research, and informatics is essentially the tool that helps people to analyze big data.”
The 2-month-old institution is in the “discovery phase,” Neuhauser said, so she hasn’t nailed down its framework yet and is asking each college and department what they need. Under the Office of the Vice President for Research, the Informatics Institute will build on existing resources, offering extra analysis services and partnerships, she said.
The institution will bridge different disciplines, Neuhauser said, and offer different skill sets to help build a network of big data between the different areas of research on campus.
“It’s across all fields,” she said. “So whether it’s agriculture, environment, social sciences, humanities, health area, biology — it’s really across the University and across the entire system.”
Supply and demand
Some University colleges and departments have their own informatics programs, but Neuhauser said the demand for data analysis is growing and outstripping resources.
Facilities like the Genomics Center or the University Imaging Centers each have hundreds of researchers who use their equipment to produce data, she said.
The University provides data analysis services through programs such as the Research Informatics Support Systems and the Social Media and Business Analytics Collaborative. But they could use the help, said Jorge Viñals, director of the Minnesota Supercomputing Institute.
While the supercomputing institute provides researchers help with both analysis and storage, he said, it doesn’t help with training. Viñals said that’s a big difference between his institute and the Informatics Institute, which will provide training fellowships and other programs to help faculty and staff learn any necessary data analysis skills.
Genomics Center director Kenny Beckman said any analysts would need to be trained in special software and know how to read the specific type of data they’re handling.
Their center carries several DNA sequencing machines that he said produce more than a billion pairs of DNA sequences in a week, which he said isn’t very intuitive to read.
“It’s quite intimidating to a lot of users, and it takes quite a bit of sophistication,” he said. “It’s not the kind of data that people are used to kind of eyeballing. You could very easily make mistakes.”
Beckman said finding skilled analysts is the biggest struggle for most researchers today, and he hopes the Informatics Institute will help fill that need.
“The bottom line is there’s just not enough help available for the demand,” he said.
Still seeking storage solutions
While the Informatics Institute will help analyze big data, researchers will still need a place to store it.
University Imaging Centers director Mark Sanders said his two facilities are producing five times as much as they did two years ago.
Between serving three colleges and 70 departments, Sanders said their centers easily produce 100,000 images a month.
The Imaging Centers partner with the supercomputing institute to store their data, paying $261 per terabyte, per year, which would fill about 213 full DVDs.
For some researchers and facilities, that price may be too steep.
Brian Bagley manages the X-ray Computed Tomography lab and said they have had to pay for additional internal storage after running out of space in the first six months of operation.
His lab, which he said houses one of the most powerful X-ray machines in the country, started with 12 terabytes of data. But after six months, he said they had to upgrade to 50 terabytes.
He said he wishes MSI would charge for data storage space rather than an annual fee. When they run out of space again, he said, they’ll just buy more personal storage.
The Informatics Institute has no plans yet of assisting with storage, Neuhauser said, only analysis.
But Bagley said he hopes they can work out some alternative solutions because data storage is still a huge issue for his lab. He said he’s spoken with Neuhauser and said they’re working on finding long-term and viable data storage solutions for labs like his.
“For the moment, we’re just dealing with it on our own,” he said.