Benjamin Gilbert —
Virtualization From Below —
June 11, 2014, 12:40 AM
After a recent
yum upgrade on our RHEL 6 Workstation systems, we noticed that the mouse pointer was no longer visible in certain Olive virtual machines. The pointer still worked (hovering, clicking, and dragging all functioned properly) but, with no direct visual feedback, it was difficult to use the mouse. This effect only occurred on VMs with paravirtualized display (QXL) support -- so, Scientific Linux, but not Windows 3.1.
After some tinkering, we discovered that the problem went away if we forced VMNetX to check for software updates at startup, or if we touched one of VMNetX's Python source files. We didn't have to modify the file to fix the problem -- just update its timestamp so Python would regenerate the bytecode.
So, what happened?
tl;dr: SpiceChannel overrides the GObject
connect(...) method for something completely different, so we used
channel.connect_object('open-fd', ..., channel). This triggered a long-standing refcounting bug in PyGObject, so when the Python GC happened to run during SPICE channel setup, our
open-fd handler ended up receiving a half-destroyed PyGObject.
How we use SPICE
VMNetX uses the SPICE thin-client protocol in both its thick- and thin-client modes. (Thick-client mode just means that the VM executes on the same computer as the user interface.) A SPICE session actually consists of several TCP connections: a "main channel", plus individual channels for display data, audio playback, keyboard/mouse input, cursor updates, and so on.
VMNetX uses the spice-gtk widget as its SPICE viewer. By default, spice-gtk will initiate TCP connections for the channels within a session; the programmer just supplies the remote hostname, port number, and password. Like the venerable VNC protocol, SPICE assumes that every remote thin-client session will listen on a different host/port pair, so there is no mechanism within the protocol for selecting a particular remote session.
Thus, if we used only SPICE's built-in infrastructure, the VMNetX thin-client server would have to open a separate listening port for every QEMU process running a VM; this causes issues with port number reuse and firewall configuration. To avoid these problems, we handle connection setup ourselves: VMNetX connects to a single well-known TCP port and sends a "token" which specifies a particular VM running on the server. If the server accepts the token, we launch the normal SPICE protocol over the TCP connection, with the thin-client server forwarding TCP traffic to the appropriate backend QEMU process.
This additional handshake means that we cannot use spice-gtk's automatic connection setup. Instead, we ask spice-gtk to signal us when it wants to establish a TCP connection. We initiate the connection ourselves and pass its file descriptor back to spice-gtk. While this additional step is only necessary for thin-client mode, it is also executed in thick-client mode.
spice-gtk is written in C and uses the GObject framework to expose an object-oriented API to application code. We use it from Python via the PyGObject binding library. In order to make a GObject accessible from Python, PyGObject creates a Python proxy object which exposes the object's methods and properties as Python bound methods and properties. Thus, in Python, a SpiceChannel is actually two objects which reference each other:
The Python object holds a strong reference (A) to the GObject, and the GObject holds a weak reference (B) to the Python object. The weak reference allows a holder of the GObject to recover the Python object, if it still exists, but does not prevent the Python object from being destroyed.
Object destruction in GObject and Python
Both GObject and Python use reference counting to determine when they can free objects. Since refcounting by itself cannot detect reference cycles, Python also has a special-purpose garbage collector that only handles reference cycles. "Container" objects that can participate in cycles, such as lists and dicts, must be specially registered with the garbage collector from C code. To find unreferenced objects, the GC traverses outward from each container, tracking the inbound references it finds to every other container. At the end of this process, any container whose refcount is larger than the computed number of inbound references must be referenced from outside and so is not garbage. These objects are removed from the set and the process repeats. Any remaining objects are garbage, so the GC forcibly clears their references. This should cause their refcounts to drop to zero and the objects to be freed.
As the PyGObject library is designed, PyGObjects will often have shorter lifetimes than their underlying GObjects. For example, consider building a GTK+ window. Each UI widget is created, configured, and bound to another widget. At that point, the Python object for the widget will often fall out of scope, and what remains is a tree of UI widgets (GObjects) linked together from C code. The PyGObject library handles this common case via ordinary refcounting, without involving the Python GC: when a PyGObject drops out of scope, reference A is dropped and weak reference B is cleared. If the PyGObject is later needed (for example, to pass to a callback function), a new PyGObject is created for the existing GObject.
SPICE session setup
Back to our problem. During SPICE session setup, the sequence of events looked like this:
signal: channel-new <spice_client_gtk.MainChannel object at 0x28cc230 (SpiceMainChannel at 0x29b4000)>
signal: open-fd <spice_client_gtk.MainChannel object at 0x28cc230 (SpiceMainChannel at 0x29b4000)>
callback: set_fd <spice_client_gtk.MainChannel object at 0x28cc230 (SpiceMainChannel at 0x29b4000)>
signal: channel-new <spice_client_gtk.PlaybackChannel object at 0x28cc190 (SpicePlaybackChannel at 0x29bf340)>
signal: channel-new <spice_client_gtk.RecordChannel object at 0x28cc280 (SpiceRecordChannel at 0x29c0a90)>
signal: channel-new <spice_client_gtk.CursorChannel object at 0x28cc2d0 (SpiceCursorChannel at 0x29c1ed0)>
signal: channel-new <spice_client_gtk.DisplayChannel object at 0x28cc320 (SpiceDisplayChannel at 0x29c3710)>
*** GC ***
signal: open-fd <spice_client_gtk.DisplayChannel object at 0x28cc320 (SpiceDisplayChannel at 0x29c3710)>
signal: open-fd <spice_client_gtk.CursorChannel object at 0x28cc2d0 (uninitialized at 0x0)>
signal: channel-new <spice_client_gtk.InputsChannel object at 0x28cc5a0 (SpiceInputsChannel at 0x2a5a010)>
signal: open-fd <spice_client_gtk.InputsChannel object at 0x28cc5a0 (SpiceInputsChannel at 0x2a5a010)>
callback: set_fd <spice_client_gtk.DisplayChannel object at 0x28cc320 (SpiceDisplayChannel at 0x29c3710)>
callback: set_fd <spice_client_gtk.InputsChannel object at 0x28cc5a0 (SpiceInputsChannel at 0x2a5a010)>
Whenever a channel object is created, VMNetX receives a callback via the SpiceSession
channel-new signal. The
channel-new handler registers callbacks against multiple GObject signals, including
open-fd. Except for the DisplayChannel, VMNetX doesn't retain a reference to the channel objects: we only need them in the signal handlers, and signal handlers always receive a reference to the object that emitted the signal.
Notice this line:
signal: open-fd <spice_client_gtk.CursorChannel object at 0x28cc2d0 (uninitialized at 0x0)>
Garbage collection is running while the cursor channel is being set up, and is somehow destroying reference A. That would explain why a
yum upgrade started the problem: the Python GC runs after a certain number of allocations, and if one of the depended-on libraries slightly changed its allocation pattern, GC could now be running at a different time. That would also explain why update checking or bytecode compilation fixed the problem: those are both large tasks that would have significantly changed GC timing.
Digging into signal registration
Normally registration of callbacks would be done with the GObject
However, when SpiceChannel subclasses GObject, it overrides this method to perform a completely different function: initiating a TCP connection. (In C, there is no conflict because methods are just functions, and their names include the class name. In Python, of course, the subclass method overrides the superclass one.) Luckily, there's another method,
connect_object(), which is not overridden.
connect_object() is simply
connect(), plus a mechanism for replacing the object passed to the callback function (normally
channel) with a different one. So we can equivalently register the callback this way:
channel.connect_object('open-fd', self._request_fd, channel)
The registration is stored in the GObject as a closure. The closure, of course, maintains a reference (C) to the Python object that will be passed to the callback function.
Aha! A reference cycle!
PyGObjects are registered as containers, so they are visible to the Python GC. The GC doesn't know about the underlying GObject -- it's not a Python container -- but it does know about the references to Python objects stored inside the GObject's closures, because the PyGObject tells the GC about them during traversal. (If it didn't, reference cycles involving closures would be uncollectable.) So from the GC's perspective, the landscape looks like this:
Thus, the GC decides the PyGObject is unreferenced and clears its outbound references:
If the GObject was a Python object, its outbound references would be cleared as well. But it isn't, and the GC doesn't know about it, so the PyGObject remains accessible and is not destroyed. Thus, when the
open-fd handler is called, it receives a half-destroyed channel PyGObject which is no longer linked to the GObject. When it tries to use this PyGObject, it receives a
TypeError, which it then swallows as a workaround for a race condition in spice-gtk:
def _request_fd(self, chan, _with_tls):
# Channel is invalid because the session was closed while the
# event was sitting in the queue.
The result? VMNetX silently continues running, but the cursor channel is never connected. The session mostly works, but the cursor shape and position data never get to spice-gtk.
What can we do about this?
One option would be for VMNetX to keep a list of channel PyGObjects for connected channels. Since these PyGObjects would be reachable from outside the GObject/PyGObject pair, they would not be GC'd. We could register to receive a signal when the channel was torn down, and then remove the channel from the list.
A second option, oddly, is to add a line like this to the
This creates (on demand) the
__dict__ dictionary that stores Python attributes on the object. (Equivalently, we could assign a value to some channel attribute, e.g.
channel.dummy = None.) This tells the PyGObject library that it is not safe to tear down and recreate the channel PyGObject, because it has important state (the Python attributes) that must be preserved. In this case, PyGObject will convert reference B to a GObject toggle ref, which is maintained as a strong reference whenever the GObject refcount is greater than 1, and a weak reference otherwise.
That works, but involves too much magic.
(Why doesn't PyGObject always use toggle refs? A design concern was to permit GObjects to be referenced simultaneously from Python and another high-level language. Toggle refs don't work if there are multiple language bindings with proxy objects, because the GObject refcount will never go to 1 and so the bindings' proxy objects will never be destroyed. The compromise was to use toggle refs only when it's clearly unsafe to tear down the proxy object.)
A third option (also known as the "right" option) is to stop using
channel.connect_object() and call the superclass method directly:
GObject.connect(channel, 'open-fd', self._request_fd)
In this case, unlike with
connect_object(), the closure does not include a reference to the PyGObject.
Instead, the PyGObject is accessed via weak reference B whenever it is needed by a callback function. If the PyGObject is dead, a new one is created. That's exactly what we want.
The VMNetX bug was to believe that
channel.connect_object(..., channel) were equivalent. Their functionality is equivalent, but their refcounting behavior is not. Of course there's a bug in the PyGObject library as well, since it shouldn't be possible for a PyGObject to get into a half-destroyed state.
It turns out that the PyGObject refcounting system has been through several iterations over the years, as various refcounting issues were discovered and fixed. This particular issue was first discussed and reported in 2008, but due to an error in the provided test case, the bug was erroneously believed to be fixed. Included in the bug report is a patch to fix the issue by allowing the GC to traverse the GObject's closures only when its refcount is 1 -- that is, when there are no references other than the one held by the PyGObject. We have updated the report to indicate that the bug still exists in the current version of PyGObject.
We Python programmers like to think the language releases us from having to think about low-level memory management -- not so much! The low-level issues are still there in the background, and can pop up to trap the unwary. In this case, we naturally assumed that the bug was either in our code or in spice-gtk, and only gradually realized that the problem was a refcounting bug deep within PyGObject. After all, if there had been such a bug, surely it would have been fixed by now! Meanwhile, the unusual
connect_object() argument list was so self-evidently correct, and so obviously unrelated to the problem, that it wasn't seriously considered as a cause.
In our experience, technical problems of this subtlety are not uncommon in archival virtualization. When virtualizing an older operating system or application, the first attempt will sometimes "just work" -- but if it doesn't, tracking down the problem can require careful investigation of the archived application, supporting libraries, guest and host operating systems, emulated peripherals, virtual machine monitor, and our own infrastructure!
The fix for this bug is included in VMNetX 0.4.4.
Appendix: Debugging the problem
We tracked down the problem by:
- Adding a bunch of
print statements to VMNetX code,
- Using gdb breakpoints to intercept the creation of important objects and obtain their addresses,
- Setting gdb watchpoints on the objects' refcounts and inspecting the stack when they changed, and
- Digging through the spice-gtk and PyGObject sources.
In hindsight, enabling GC debugging would have made this a lot easier:
gc.set_debug(gc.DEBUG_COLLECTABLE | gc.DEBUG_OBJECTS)
but we didn't do that until fairly late in the process.