By Thomas Leonard - 2014-12-11
When tracking down bugs or performance problems, or just trying to understand how something works, it is often useful to look at a trace of the execution of your unikernel.
As a motivating example, we'll track down a (real, but now fixed) bug in MirageOS's TCP stack.
Here's a small program that demonstrates the problem:
let target_ip = Ipaddr.V4.of_string_exn "10.0.0.1"
module Main (S: V1_LWT.STACKV4) = struct
let buffer = Io_page.get 1 |> Io_page.to_cstruct
let start s =
let t = S.tcpv4 s in
match_lwt S.TCPV4.create_connection t (target_ip, 7001) with
| `Error _err -> failwith "Connection to port 7001 failed"
| `Ok flow ->
let payload = Cstruct.sub buffer 0 1 in
Cstruct.set_char payload 0 '!';
match_lwt S.TCPV4.write flow payload with
| `Error _ | `Eof -> assert false
| `Ok () ->
S.TCPV4.close flow
end
This unikernel opens a TCP connection to 10.0.0.1, sends a single "!" character, and then closes the connection. Most of the time it works, generating output similar to the following:
ARP: sending gratuitous from 10.0.0.2
Manager: configuration done
ARP: transmitting probe -> 10.0.0.1
ARP: updating 10.0.0.1 -> fe:ff:ff:ff:ff:ff
main returned 0
Occasionally, however, it hangs after getting the ARP response and doesn't send the data:
ARP: sending gratuitous from 10.0.0.2
Manager: configuration done
ARP: transmitting probe -> 10.0.0.1
ARP: updating 10.0.0.1 -> fe:ff:ff:ff:ff:ff
To enable tracing, pass the optional tracing
argument to the register
function in config.ml
.
For example:
open Mirage
let main = main "Unikernel.Main" (stackv4 @-> job)
let stack console = direct_stackv4_with_default_ipv4 console tap0
let tracing = mprof_trace ~size:1000000 ()
let () =
register "example" ~tracing [
main $ stack default_console;
]
The size argument gives the size in bytes of the ring buffer to use.
When you run mirage configure
, you will probably be prompted to install a version of Lwt with tracing enabled; just run the opam pin
command provided.
This will automatically trigger a recompile of all the MirageOS libraries with tracing enabled.
To view the trace data, use mirage-trace-viewer. If you don't want to compile from source, there are pre-compiled binaries for Linux-x86_64 and Linux-armv7l (e.g. for the Cubietruck). You can get them with 0install:
sudo apt-get install zeroinstall-injector
0install add mirage-trace-viewer http://talex5.github.io/mirage-trace-viewer/mtv.xml
If you compiled your MirageOS program as a Unix process, the trace data will appear in a file called trace.ctf
(you can view the trace while the process is still running).
To view the trace using the GTK viewer, use:
mirage-trace-viewer trace.ctf
If you don't have GTK, you can use mirage-trace-viewer --html=htdocs ...
to create an htdocs
directory with a JavaScript viewer.
The files can also be read by other CTF readers, such as babeltrace, with the appropriate metadata file.
To get the trace data from a Xen unikernel, run mirage-trace-viewer -d NAME
as root.
I use a wrapper script for this (~/bin/sudo-mtv
):
#!/bin/sh
exec 0launch -w sudo mirage-trace-viewer "$@"
To dump the trace buffer from the Xen domain example
to the file trace.ctf
:
sudo-mtv -d example -w trace.ctf
To fetch trace data from a remote Xen host (e.g. a Cubietruck board) and view it on your laptop:
ssh mirage@cubietruck sudo-mtv -d example -w - | mirage-trace-viewer -
Tip: If your Xen guest crashes before you can read the data, put these lines in your .xl
file:
on_crash = 'preserve'
on_poweroff = 'preserve'
Here's the trace for a successful run (where the message was transmitted):
Use your mouse's scroll wheel (or the buttons at the bottom) to zoom.
Time runs left to right:
gntref
counter (the red line) increases rapidly during this time.
gntref
metric then goes up again as the network code shares its buffer with dom0.
tcp-to-ip
counter goes up here (to 1), showing where the single-byte packet is passed from the TCP buffer to the IP layer for transmission.
Horizontal black lines are Lwt threads. White regions indicate when the thread was running. Vertical black lines indicate threads creating new threads or merging with existing ones. Arrows show interactions between threads:
Lwt.wakeup
).
Libraries can annotate threads with labels, which makes reading the diagrams much easier. If a thread doesn't have a label, its unique thread ID is displayed instead.
For more information about reading the visualisation, see the blog post Visualising an Asynchronous Monad.
To find the problem, we can compare a good trace and a bad trace:
Above: a trace from a successful run. Below: a trace from a failed run.
Normally, when a thread doesn't do anything it is drawn only as a tiny stub in the display.
To make the problem easier to spot, I modified the test program to call MProf.Trace.should_resolve
on the program's main thread, which adds a hint that this thread is important.
The viewer sees that it didn't resolve, and so draws it in red and extending to the far right of the display.
Looking at the good trace, three important threads are created:
TCP connect
- it will highlight in yellow as you type.
Soon after creating these threads, the unikernel received an event on port-4, which is the network event channel.
The Netif
driver determined this was an ack that the ARP request had been sent, and resolved the ring.write
thread, which then triggered the page of memory containing the request to be unshared.
This then triggered the next step in the process, which was to start waiting for the "ARP response" condition to fire, creating an "ARP response" "task" thread.
Another event then arrived on port-4, which was the ARP response notification.
This resolved the "ARP response" thread, which allowed us to start the TCP connection (sending the SYN packet).
Once the remote end had ack'd the SYN, the TCP connection was ready (the TCP connect
thread ends) and we sent the data, increasing the tcp-to-ip
counter.
In the bad case:
MProf.Trace.should_resolve
) appears as a long red line with should-resolve thread never resolved
at the end.
try
thread waiting on Wait for ARP response
(tip: to avoid losing track of where you are, you can double-click a thread to highlight it).
Following the ARP response task thread backwards, we can see that the order of the events was different. In this case, the first event from the network is not the confirmation that the ARP request has been sent, but the notification of the ARP response. You can see that the network code now first notifies the "ARP response" condition (but nothing is waiting for it, so this is ignored). Then, confirmation of the request being sent arrives, ending the "ring.write" thread. This triggers us to start waiting for a notification from the "ARP response" condition, but this will never arrive because the event has already happened.
Indeed, the bug is in this code in arpv4.ml
:
let cond = MProf.Trace.named_condition "ARP response" in
(* printf "ARP query: %s -> [probe]\\n%!" (Ipaddr.V4.to_string ip); *)
Hashtbl.add t.cache ip (Incomplete cond);
(* First request, so send a query packet *)
output_probe t ip >>= fun () ->
Lwt_condition.wait cond
One solution here would be to call Lwt_condition.wait
before waiting for the result of output_probe
.
However, a better solution (the one actually adopted) replaces the use of a condition with a simple wait thread.
A wait thread can only be resolved once, so it doesn't matter whether you check it before or after it gets its value.
You might notice that many of the threads are unlabelled, as we haven't fully instrumented all the MirageOS libraries.
To add extra annotations, clone the appropriate library with Git, add some extra reporting, and use opam pin
to build against your version.
If you think your annotations would be generally useful, please send a pull request.
In general, any code that uses Lwt.wait
, Lwt.task
, Lwt_condition.create
, or Lwt_mvar.create
should be changed to use the corresponding labelled version in mirage-profile (e.g. MProf.Trace.named_wait
, etc).
When Lwt is compiled without tracing support, these labels will be optimised out and have no runtime cost.