Austin-based Umbel spinoff Pilosa is pushing its big-data indexing at this week’s OSCON

Posted May 9th, 2017

ContributedHiginio (H.O.) Maycotte is CEO of Pilosa, an open-source startup that was incubated at Austin-based Umbel. Pilosa seeks to separate indexes from databases, vastly speeding up query speeds for big data.

The law of “Why We Can’t Have Nice Things” states that the bigger things get, the slower they typically are. Bloated summer movie sequels. Dinosaurs. A two-hour sermon. 

The same is true of so-called “Big Data,” enormous sets of information increasingly used for everything from curing diseases to making better Amazon shopping recommendations for you

Higinio (H.O.) Maycotte, the founder and former CEO of Austin-based data-management company Umbel, believes it’s a solvable dilemma. He’s now debuting an open-source startup called Pilosa to tackle the problem of slowdown that occurs the bigger big data sets get.

“Databases in the last five years have advanced tremendously,” Maycotte said. “The reality is that data is getting stored at a much faster pace than we’re able to access the info we’re storing. The bad news is the information is getting harder to retrieve.”

About three years ago, a team at Umbel led by Maycotte began working on solving a problem for its clients: the larger big data sets got, the longer it took to access information as it scaled.

The solution, it turned out, was to separate the indexing of a database from the database itself, creating a binary representation of the data.

“We sit on top of data and turn it into ones and zeroes,” Maycotte said, comparing it to a highly compressible, easy to distribute version of a library’s card catalog. “The card catalog really becomes its own building and its own destination.”

In order to build on the idea and develop it further, Pilosa’s 14-person team is betting on making the project open source, creating a large community around the concept. To that end, it’s debuting Pilosa at a series of events at this week’s O’Relly Open Source Convention in Austin. The startup’s lead engineer, Matt Jaffee, will speak on a panel at 11 a.m. Wednesday titled, “The Index as a First-Class Citizen” and Pilosa has a booth at the event.

ContributedThis slide from the open-source startup Pilosa explains the way the open-source project seeks to decouple indexes from data storage, optimizing speeds for accessing big data.

“We thought if we solved those problems for ourselves, it could solve problems for many others,” Maycotte said.

Troy Lanier, former vice president of innovation at Umbel and now on the engineering team for Pilosa, said it’s a good time and place to launch an open-source startup. 

“In Austin, there’s lots of open-source projects gaining traction,” Lanier said. “It’s an exiting time for engineers.”

Maycotte said he hopes that going open source is a way for a new enterprise IT company to scale quickly. “By going open source, we maximize its potential impact,” he said.

Pilosa released its code on the repository site Github and so far, it has remained on the site’s list of trending open-source projects.

Maycotte said he hopes speeding up data access will benefit medical industries, transportation, smart cities, energy and network security.

“The next wave of scientific breakthroughs will come from research projects that work with data sets of a terabyte or more,” he said. “We know how to store that data, but nobody has focused on accelerating access to that data. That changes today.”