The Olive vision is based on three central ideas that were developed at CMU and IBM Research.

  • The first idea is a technique for efficiently delivering the state of a VM image over the Internet. This makes it practical for a VM library to only provide archival services, and allow users of the library to provide the infrastructure (e.g. server, laptop, or cloud) on which the VM images are executed. it also allows users to interact with the executing VM at a higher fidelity and performance within their local environment despite using a central library for access to the VM image data.
  • The second idea is a technique for indexing and searching VM image contents, based on locality sensitive hashing and search-by-example algorithms that rely on signature matching rather than human-specified metadata to identify what software exists in what images. This makes it practical to efficiently build and update a searchable catalog of software components contained within VM images contributed by a diversity of sources.
  • The third idea is a technique for incrementally composing a VM image, so that one user of a VM image library can extend the image published by another user, instead of creating a full VM image from scratch. In addition to making it simpler for users to contribute new content into the library, this capability allows provenance tracking for legal and security purposes, and version-control for purposes of recovery and rollback.

While the vision of a library for executable content has been a long-standing one, it is the integration of these three capabilities that make this vision both practical and useful. In a workshop held at IBM in 2011, several key scientists, researchers, and faculty at different universities and institutions working on digital archiving gathered together to view a demonstration of these capabilities. The plan to build Olive - a public domain library for executable content in the form of VM images - grew out of that meeting.

IBM Research - CMU Collaboration

Around 2006, IBM Research recognized that as virtualizaiton technologies become ubiquitous and large scale virtualized datacenters and clouds emerge, VM images would become valuable objects. Vas Bala and his team at IBM Research began exploring technologies for offline introspection and maintenance of VM images.

Meanwhile, a CMU team led by Professor Mahadev Satyanarayanan was developing techniques for streaming VM images over the Internet using a system called Internet Suspend/Resume®.

The IBM Research team thought that combining CMU's image streaming technology with IBM's image introspection technology had the potential to change the way people create, share, deploy and maintain software environments, and transform the way virtualized/cloud data centers of the future are designed. The IBM Research and CMU teams started a formal collaboration in 2009 to jointly explore new ways to work with VM images.

Another CMU project in Satya's team, OpenDiamond®, was exploring techniques for scalable search of large picture collections. Vas and Satya soon realized that by treating VM images similar to JPEG pictures, signature-based matching algorithms can be used to efficiently search for arbitrary patterns (e.g. representing the installed state of a PacMan game, a specific OS configuration, or even the presence of malware) over a large corpus of VM images. The VMFind algorithm was the first outcome of the IBM Research - CMU collaboration. ISR and VMFind would eventually become key parts of the technology foundation necessary for Olive.

In 2010, Vas and Satya discussed the idea of a VM image library as a way to showcase the value of these technologies. Dr. Gloriana St. Clair joined these discussions at that point, and together with Erika Linke, helped refine the concept and value of the VM image library as a long-term digital archive for executable content.

The name "Olive" was coined during one of those discussions as an acronym for "Open Library of Images for Virtualized Execution". An early prototype of the system was built for demonstration in 2011 by engineers from CMU and IBM Research. A VM image repository hosted at CMU served as the Olive library, and ISR technology was used to stream an image from that library over the Internet onto end user laptops. In addition to streaming services, the library also provided basic image indexing and image composition capabilities.

In 2011, the Olive vision was presented to a group of invited attendees at a workshop hosted at IBM Research in New York. Satya demonstrated how old systems such as a Windows 3.1 computer and an early Mac could be run inside a VM, and how these VM images could be stored in a central library and streamed over the Internet. Other presentations detailed how a user could search for images that contain software, and how one user can extend an existing image in the library to create a new one. The demonstration and presentations created a lot of interest, and the Olive vision was for the first time embraced by a community of scientists outside of CMU or IBM Research.


* Internet Suspend/Resume is a registered trademark of Carnegie Mellon University.