and developer guides
Attendees: Anil Madhavapeddy (chair), Hannes Mehnert, David Kaloper, Thomas Leonard, Jeremy Yallop, Magnus Skjegstad, David Sheets, Justin Cormack, Mindy Preston, Richard Mortier, Thomas Gazagnaire.
There were various issues around duplicate acks and TCP retransmission which were exposed due to the TLS stack integration. Many of these were just regressions or lurking issues due to Lwt mvars, and are now fixed.
To stop them from coming back, we now have tests that run per PR within
Travis. This uses vnetif to create virtual interfaces that directly short
circuit the need for a real
tuntap device, and so happily work great inside a
container. Coverage is patchy at the moment but is steadily improving (see
for coverage instructions).
ThomasG/L have put in debug logging so we now have full trace viewer capability. When ThomasL looked at it, every connection ends with an exception being thrown, that noone had noticed before! (The stack resolved a thread in RST processing and then looped again). That issue is now fixed, but everyone is encouraged to use the browser profiler and find other lurking issues.
Hannes has a TCP/IP test harness and will generate traces based on Peter Sewells Netsem. This has not been used much since 2005, but is being modernised for testing against Mirage TCP/IP.
Magnus and Mindy have written some iperf tests using the virtual vnetif interface. Travis is timing out all the time due to very variable performance within their infrastructure, so its hard to figure out how much to test. Anil suggested that we run them from cron against a repo like mirage/is-mirage-broken.
The next feature that we are aiming for is to get IPv6 working with the stack. Nicolas Ojeda Bar has implemented everything needed, but the only thing blocking it is the configuration interface (which should be the easiest bit). Hannes, Justin and many others are keen on this...
#### Mirage 2.5 release
This was a very complex release due to the growing number of libraries that we have in the project. It all went well this time, but Anil suggested recapping what went right and wrong in the release this time. A poll around the team revealed:
The blog posts were very close to the actual libraries releases, and it was hard to predict when something would work without a beta cycle. Made schedulig the posts challenging.
We do not explicitly document API breakage for end users as we go along, and so it had to be pieced together from the changelog. This is getting more painful for users as we grow in size and have more production infrastruture.
Do not release on a Friday afternoon and then go to the pub (or in the case of ThomasG, a wedding)
The library changes are still happening in big chunks. This is partly due to the fact that some of the core Channel APIs were revved. We are getting better at testing reverse dependencies, but this still needs some infrastructural help from OPAM to do bulk builds after a large set of releases.
The use of a mirage-dev remote generally
worked successfully. Unfortunately, remotes in OPAM are global and not per-switch,
and Anil pointed out that it would be nice to have some switches that were pristine
upstream. This is possible when using just
opam pin, but not with a remote.
ThomasG: Mirage is a set of libraries that work together and a frontend tool that glues
them together. Its fine to release libraries as a batch since we have OPAM, but what we
didnt manage well is evolving the API of the Mirage DSL itself which glues it all together
(Anil: this is referring to the
Mort: the Mirage DSL eis an implicit collation of a bunch of library versions and it is
hard to track since its not captured in OPAM.
ThomasG: we can fix this by adding conflicts in the OPAM metadata.
ThomasL: a number of Mirage packages have gone upstream into the OPAM package repository and their unit tests fails. More testing on OPAM import is needed to prevent dependent packages from breaking their unit tests due to an import of a dependency. DavidS: we only test the package version we are importing and so we only test for one solution. Further changes will break upstream. We dont do reverse dependencies for OPAM dependency tests. Anil: The OPAM maintainers (several of whom are on this call) are aware of the issue and are working on improving testing reverse dependencies on new package import.
Mort/DavidS: the Mirage libraries should use the ocaml-travis-ci-skeleton so that they take advantage of the improvements in reverse depenedency testing. Mort: we need to figure out how to get around the Travis 50 minute limit.
Irmin/Xen is at the pull request stage, and is green! ThomasG will do some more testing, but is confident that we can merge it in now that the Xen/TLS changes are all in. It is still memory-only, so we will need to put together a block device store for persistence (perhaps using baardskeerder or LevelDB.
Log files: David Sheets and Jeremy Yallop have a design for a logging library that they are planning to write up soon. This needs to be coalesced with the recent logging work in the TCP/IP stack. One of the backends could be Irmin/Xen, but optional of course. Hannes notes that the Bitcoin pinata uses a different logging system with a dom0 proxy, which works fine if you control the Xen host (not the case with EC2).
Live stats: are now working on the mirage-www website thanks to Dave Scott and are very fancy! Anil encourages everyone to not put functionality directly into the mirage-www repo, but to create a small library with it instead.