|
|
Finding the Value in Scientific Pictures: Image Informatics and Scimagix, Inc.
| This article originally appeared in the July 2000 issue of The HMS Beagle, an online publication by Elsevier that was closed in July 2004. The text is reprinted here by permission. |
Abstract: While researchers have plenty of systems at their disposal for managing alphanumeric data, the visual descriptions of experimental outcomes—scientific image data—must still be handled manually. Scimagix is using technology borrowed from machine perception and multimedia research to make images searchable, mineable, and, ultimately, accessible.
Clichéd linguistic references aside, what is a picture really worth? An immunoblot or microscopy image might not garner much at eBay, but if that image helps to reveal the mechanism behind a particular pathology, its value could be tied to millions of U.S. dollars in worldwide new drug sales.
Conversely, the image will have no value at all if researchers never glean any information from it. The preponderance of image data produced by modern labs--as much as 70 percent of the experimental data generated--has revealed a dearth of tools for managing this data effectively. In many labs, image files are stuffed in a directory, inaccessible to anyone other than the researcher who owns that computer. And given the volumes of data generated by today’s high-throughput techniques, all those immunohistograms, in situ hybridization results, and electrophoresis patterns can be difficult for researchers to compare manually.
“They call it gel gazing,” explains Robert Dunkle, president and CEO of Scimagix, Inc., a start-up software firm tucked along the water in Redwood Shores, California. The “they” he’s referring to are proteomics researchers at Parke-Davis, one of the sites that has partnered with Scimagix to create software applications for managing scientific images. To identify patterns in protein expression, Parke-Davis researchers and others like them hunker down and tediously look at and compare gel after gel after gel. It’s a decidedly low-tech and laborious end-point in an otherwise high-tech process.
“The last decade has seen significant advances in laboratory automation, and, along the way, tools for databasing and mining alphanumeric data have matured,” says Dunkle, who has spent over 15 years managing and marketing informatics products for chemistry and biology. But compared to these areas, image handling and analysis has lagged. Hence Scimagix, which aims to make image content as searchable as text and numbers through a unique technology developed initially for broadcasting.
Multimedia Meets the Western Blot
The idea propelling Scimagix is not merely to extract alphanumeric data from images, but to turn images themselves into queries that can be used to find other, similar images. The concept may sound far-fetched, but, in fact, the ability to perform low-level image searching and analysis has wended its way from intelligence and military applications to consumer products for organizing things like photo albums and stamp collections. Today, extensions of the technology are also being used to manage broadcasting and industrial video archives.
Virage, Inc. (San Mateo, California) is one of the companies that pioneered image search technology for broadcasting. Its particular innovation is “visual information retrieval” (VIR), developed by Dr. Ramesh Jain, a computer scientist and engineer specializing in multimedia information systems, image databases, machine vision, and intelligent systems. Scimagix was born when Virage saw the opportunity to apply VIR to a new sector: pharmaceutical research and chemical discovery.
“The initial problems Ramesh and his group were trying to solve were along the lines of ‘Find the pictures that look like this picture,’” says Paul Lego, Virage’s CEO and a member of Scimagix’s board of directors. VIR accomplishes this by characterizing four “primitives” that combine to create an image: color, shape, size, and texture. The characterization produces a 70-dimension vector that mathematically describes images. To search for images that are “like” each other, users tune the primitives. An emphasis on color, for instance, would help you tell an apple from an orange; an emphasis on shape would enable you to distinguish between both of these objects and a Rubik’s cube.
The ability to select “regions of interest” within images makes VIR more efficient than other image analysis techniques developed to date--and is what has enabled Scimagix to apply this technology to pharmaceutical R&D successfully. Rather than characterizing and searching entire images using VIR, researchers can instead define a particular region of a 2D gel that contains interesting features, such as protein “constellations” known to confer drug toxicity. The defined region can then be used to query a database of gel patterns to find other instances of those features.
Dunkle acknowledges that VIR requires algorithmic tuning. “If the question is one of black versus white, native VIR does fine,” he says. “It’s the shades of gray, such as telling the difference between healthy tissue and first-stage pathology, that need tweaking.” But Dunkle is also adamant that the technology isn’t intended to replace visual searching by humans. Rather, it’s intended to support human searches so that those same humans can make better decisions.
“We know that patterns of expression exist across different drug classes,” Dunkle points out. “But even with our uncanny perceptual abilities, humans just can’t take in the hundreds of images necessary to resolve those patterns. VIR offers a way to pull together related, or even just possibly related, image data and view that data in tandem with other results, which gives researchers a better chance of being able to detect trends and decide what story the data is telling.”
The “Image Informatics” Space
Like HTS and combinatorial chemistry, which were initially criticized for being imprecise and risky, image informatics (Scimagix’s term for the space created by its products and services) needs to prove its value before it will become an accepted part of life science R&D. When a reporter asked the listserv of the Laboratory Robotics Interest Group to comment on the technology, most respondents assumed that ”image informatics” referred to data aggregator technology or to software for one-to-one image comparison. Searching the image content itself? When listserv members grasped the concept, most doubted that it was possible.
“We’ve really had to educate researchers, asking them straight out, ‘Do you realize that you actually can search on an image and find information relevant to your research?’” explains Suzanne Mattingly, Ph.D., a molecular biologist and vice president of marketing at Scimagix. Part of the problem is cultural, according to Mattingly. Because of their complexity, images have yet to become part of the informatics mix that today includes alphanumeric and chemical structure data. In fact, images are often used after the fact, rounded up and pasted into reports to back up decisions, rather than working in conjunction with other data to guide decisions.
“There was no image management strategy,” responds Michail Esterman, an information consultant at the scientific imaging center at Lilly Research Laboratories (Indianapolis, IN), to a question about how his company has managed image data in the past. “The problem has been that images are scattered over many directories, either local or remote. It often takes less time to redo the experiment than to find an image” from an experiment done more than a few weeks back.
Before Scimagix can market its VIR-based technology for searching databases of images (a 2D gel analysis and mining module is slated for release in the fall), it has had to develop capabilities for getting images into databases in the first place. In March, the company released the Scientific Image Information System (SIMS™), an Oracle®/Web-based package for organizing, storing, retrieving, and mining images (see Figure 2). Both Parke-Davis and Eli Lilly & Co. have publicly announced their licensing of SIMS.
It’s been a focused effort, according to Dunkle. In the nine months since his official appointment as CEO (announced in October 1999 simultaneously with the company’s completion of its first round of venture capital financing), Dunkle has seen Scimagix quadruple in size. The bulk of the new hires are in development, and the team has adopted a round-the-clock development cycle. “Some of our developers work until two or three in the morning, and then our head of engineering [Bryan Van Vliet, a founder of the company] walks in around five or six and gets the day going,” Dunkle said. “By the time the morning staff arrives, Bryan has integrated all of the prior day’s work into a new build.”
It’s a common sentiment among start-ups, but Mattingly—an executive who states strongly, “I don’t do ‘also rans’”—insists that the staff is driven to succeed because they are committed to defining the next step in scientific data management. Most of the employees, particularly in development, are scientists turned computer scientists who have worked with the data types currently handled by other scientific software vendors—vendors that are currently scrambling for position as object-oriented programming and the preeminence of Oracle obsolesces their current product strongholds.
“While chemical structures, for instance, are really only relevant to chemists, images can be relevant to everyone if they can only be made widely and readily available,” notes Mattingly, whose history includes a VP position at Oxford Molecular. She relates a story from one of the pharmaceutical companies that she visited recently. A member of one project team happened to see an image produced by another project team. One glance, and the team member discovered that the image, generated by a member of a completely different team, had relevance to his area. His team was soon pursuing it as a new drug candidate.
Dunkle concurs. “Discoveries like this one don’t have to be accidents,” he says. Making images part of the informatics mix not only helps those responsible for image management, but also opens up further opportunities for discovery as the data contained in images is shared and integrated with other experimental data. “Success in scientific research often rests on data exchanges,” say Dunkle. “It’s just a matter of getting the right images in front of people so that they can see things they couldn’t before. This is image informatics.”
Deborah J. Ausman, a science writer specializing in scientific software and drug discovery research, recently moved from St. Louis to Houston. She is planning her escape.
| |