Category Archives: NSP

QOS and IP Accounting with BGP under linux

At NSP we’ve go a fibre connection into the building, and a 10MBit feed from our ISP, and over that we’re allowed 10MBit of national and 3 Mbit PIR of international traffic. Note that this adds up to more than 10Mbit in total! This can cause annoying problems, like someone doing a lot of national or APE traffic at 10MBit, and closing out real international traffic. For a long time I’ve wanted to separate this out, but have not had the time to look into it

This week I finally organised a BGP from my ISP, and had a look at what my options were. I’d seen the Route-based QOS mini-HOWTO a while back, and it looked like it would work ok, but had a few problems. There’s no current way it to apply tc or iptables rules selectively based on a routing decision, or even on a route table. You can match on a route realm, however. The mini-HOWTO suggests copying your BGP routes into a separate table and into a realm at the same time, and then using tc and iptable’s realm matching code.

A quick aside: route realms are best described as a collection of routes. The decision as to which realm a route is placed is made by the local administrator, and each realm can contain routes from a mix of origins. Realms are used to allow administrators to perform bulk operations on large groups of routes in an easy manner. From the iproute command reference:

The main application of realms is the TC route classifier [7], where they are used to help assign packets to traffic classes, to account, police and schedule them according to this classification.

After a bit of digging, I found a link to a patch for quagga to provide route realms support. It’s even still maintained! After a bit of battling with autotools[1], and a bit of battling with linux capabilities[2], I had it up and running.

The route realms patch page covered off the BGP configuration I needed, and now I have a set of iptables counters for national, international and total traffic (for completeness). The only bit it doesn’t cover off is graphing, but we already have a set of perl scripts which pull information from interface totals or iptables FWMARK counters, so I modified that to pull from these counters as well, and set up RRD graphs. I was previously graphing interface totals out the external nic, and it’s interesting to note that the iptables “total” traffic, while adding up to the sum of national and international, does not correspond to the interface totals.

It’s worth pointing out that, as seen in iproute command reference, the rtacct tool will grab realm counts for you without needing iptables, so if you just want to something to graph things quickly, rtacct might do the job:

#kernel
Realm BytesTo PktsTo BytesFrom PktsFrom
BPSTo PPSTo BPSFrom PPSFrom
unknown 5949K 57188 15839K 61776
0 0 0 0
national 15839K 61776 5949K 57188
0 0 0 0

rtacct has a naive limit of 256 realms however, where as the actual implementation supports a 16 bit number, so if you have a large number of realms, or you autoclassify your inbound BGP into realms based on the AS number, you will have to use iptables only

I’m currently only accounting for traffic using this mechanism, but I can also do QOS on it – tc will match directly on realm tags, and any iptables based match systems you may have can be adapted to match on a realm as well.

[1] The realms patch touched configure.ac, which then required the autotools chain to rebuild everything, but it needed a very particular combination of autoconf and automake. Because it took me an hour or so to get this right, I’ll record it here:

patch -p1 < ../quagga-0.99.5-realms.diff
aclocal-1.7
autoheader
autoconf
autoconf2.50
libtoolize -c
automake-1.7 –gnu –add-missing –copy
./configure –enable-realms –enable-user=quagga –enable-group=quagga –enable-vty-group=quaggavty –enable-vtysh –localstatedir=/var/run/quagga –enable-configfile-mask=0640 –enable-logfile-mask=0640

autoheader and autoconf above are version 2.13. I have no idea why I had to run autoconf2.13 then autoconf2.50, but it seems that this actually worked.

[2] I initially tried building against quagga-0.98.6, because the quaggarealms patch site implied this was the “stable” verson, but it seems that quagga drops priviledges too soon. This works out fine if you have “capabilities” support in your kernel, which mine didn’t. They’ve changed this behaviour in 0.99.5, and incidentally this is the version in debian etch.

Exporting Tape Autoloaders via iSCSI

A while ago I posted about {{post id=”iscsi-for-scsi-device-passthrough-under-xen-enterprise” text=”exporting a tape drive via iSCSI”}} to enable windows VMs to backup to a SCSI tape drive under Citrix Xenserver. I spent a couple of hours googling for whether or not you could do the same thing with a tape autoloader, and didn’t find a lot of useful information.

So, I just dived in and tried it, and it turns out exactly the same process works fine for exporting a tape autoloader via iSCSI as well, as long as you are slightly careful about your configuration file.

First of all, find your HCIL numbers with lsscsi:

[4:0:0:0] tape HP Ultrium 4-SCSI U24W /dev/st0
[4:0:0:1] mediumx HP 1×8 G2 AUTOLDR 1.70 –

So, we’ve got an HP Ultrium 4 tape drive on 4:0:0:0, and a 1×8 G2 Autoloader on 4:0:0:1. Let’s configure IETd:

Target iqn.2007-04.com.example:changer0
Lun 0 H=4,C=0,I=0,L=0,Type=rawio
Type 1
InitialR2T No
ImmediateData Yes
xMaxRecvDataSegmentLength 262144

Lun 1 H=4,C=0,I=0,L=1,Type=rawio
Type 1

A couple of points to note:

  • I’ve named it changer0, you don’t have to
  • You do have to make sure both the tape drive device(s) (in this case, 4:0:0:0) and the changer device (4:0:0:1) are exported as different LUNs under the same target
  • The other options (InitialR2T, ImmediateData etc) may or may not work for you, consult the IETd documentation for what you actually need and want.

Once you’ve restarted the iscsi target, you can load up an initiator and connect to it, and you should see both devices being exported under the one target. If you accidentally use a different target for the changer and the tape drive, you’ll find that your backup software probably can see the changer device, but will tell you there are not available drives.

Looking up .local DNS names under OSX

My workplace uses a .local DNS suffix for all internal DNS, which of course causes problems when you’re running a system which uses any form of mdns – such as OSX or Ubuntu (or probably any modern Linux distro, I know SuSE had this problem about 6 years ago). The .local lookups fail, because mdns takes over. (Thanks John and Phil for reminding me of this). This shows up as resolution via host or dig working fine, as they make calls direct to your nameservers, but commands like ping failing, as it uses the NSS to do the lookup.

A quick bit of googling, and I found this gem on Apple’s website, and also this one on www.multicastdns.org. Apple’s suggested fix didn’t seem to work, but I suspect a reboot is required. I’ve applied the second one, and rebooted, and one of them is definitely working.

As an aside, this started with me wishing that it was possible to do per-domain resolver configuration. I initially gave up and set up dnsmasq which forward on requests to specific domains to specific servers, but then hit the mdns issue. This method looks very much like a per-domain resolver configuration however – it’s saying to use my local DNS server for .local lookups. I haven’t tested it, but it looks like it should support setting an arbitrary resolver for an arbitrary domain.

OSS Network Imaging / Install services

I’m very interested in the topic of network deployments of operating systems, specifically the various Microsoft OSs, as I can already install linux via PXEboot. There’s two main groups of software in this field – unattended or scripted installs, and imaged installs.

A while ago I found a tool called Unattended, which is a network based unattended installation tool for Windows. If it works, it looks very promising. It’s basically a DOS boot disk which mounts a network share and executes the windows installer. Simplicity. The basic install seems to require you to enter a number of responses to questions (such as administrator password, timezone and Microsoft product key), but the documentation explains how to customise the script to meet your business needs, including examples. Once the OS install is done, Unattended can be configured to install third party packages, as long as the packages (eg, MSI bundles) also support some level of unattended installation procedure.

Today I discovered Free Online Ghost, or FOG. FOG is network based computer imaging tool, designed to both read images from, and write images to hosts on your network. I’ve used tools like partimage in the past for exactly this purpose – creating a golden image of a lab machine and then reimaging the entire lab every couple of months to keep everything clean. FOG seems to be more polished than partimage does, as it claims to support things like creating AD accounts for the machine and so on.

The Unattended documentation includes a concise explanation of why the approach adopted by FOG, partimage, and commercial tools like Acronis and Ghost is bad, however I think this is really a case of using the right tool for the job. I can see a system like FOG being used with great success in a lab environment, or for periodic backup of individual host OSes to near-line storage, providing bare-metal restore functionality without requiring major investment in tape backup expansion. And Unattended makes a lot more sense for initial deployments, especially for my workplace, as we use such a wide range of hardware that an imaged install would be fairly problematic.

There are other commercial systems for doing these deployments of course – IBM Director, HP ICE, Citrix Provisioning Server are just a few of them, but these systems invariably make more sense for in-house deployment control.

HP AiO iSCSI and Citrix Xenserver

A couple of our clients have HP AiO1200 iSCSI systems. These are nice enough units, especially for entry level iSCSI SANs. They’re in a slightly modified HP DL320s chassis, and run Windows 2003 Storage Server, as well as some custom built HP management tools.

I’ve never had an easy run when dealing with their iSCSI target and the open-scsi stack used by Citrix Xenserver. The first problem I had was that the management tools don’t support multiple initiators connecting to the same target LUN. They don’t actually stop you, but it seems that you need to hold your tongue just right to allow multiple initiators connecting to the one target. If you don’t hold it just right, the admin tools will let you do it, but it won’t actually work.[1]

The second problem I had is that Xenserver just refuses to connect, saying “Your Target is probably misconfigured”. There’s not really a lot of configuration you can do with an iSCSI target, so I’m perplexed here. Digging deeper:

# iscsiadm -m discovery -t st -p 1.2.3.4
1.2.3.4:3260,1 iqn.1991-05.com.microsoft:storage-iqn.xen-osdata-target

It seems that iscsiadm can see everything fine!. I tried adding the target via the cli:

# xe sr-create host-uuid=f3b260ab-f8b9-4b52-980d-7b7e93ab8dcf content-type=user name-label=AIO_OSDATA shared=true type=lvmoiscsi device-config-targetIQN=iqn.1991-05.com.microsoft:storage-iqn.xen-osdata-target device-config-target=1.2.3.4

Error code: SR_BACKEND_FAILURE_107
Error parameters: , The SCSIid parameter is missing or incorrect, \
< ?xml version="1.0" ?>
iscsi-target
LUN
vendor
HP
/vendor
LUNid
0
/LUNid
size
42949672960
/size
SCSIid
360003ff646c289389ea2e31c1d419930
/SCSIid
/LUN
/iscsi-target

Unlike the error messages you get in the GUI, that one is quite helpful[2] It tells us we’re missing a SCSIid parameter, and then lists a SCSIid parameter to try:

xe sr-create host-uuid=f3b260ab-f8b9-4b52-980d-7b7e93ab8dcf content-type=user name-label=AIO_OSDATA shared=true type=lvmoiscsi device-config-targetIQN=iqn.1991-05.com.microsoft:storage-iqn.xen-osdata-target device-config-target=1.2.3.4 device-config-LUNid=1 device-config-SCSIid=360003ff646c289389ea2e31c1d419930

And our iSCSI target was happily added. I’m not entirely sure the LUNid parameter is required, this post suggests it isn’t. I found a couple of other posts on the forums which suggest that using the CLI for these tasks should be your first attempt.

[1] Where “work” means “actually let you connect more than one initiator at a time”
[2] Although, being full of less than and greater than signs, doesn’t want to display nicely in wordpress. So it’s a bit sanitised

Linux iSCSI stacks and multiple initiators per target LUN

I’ve used a few hardware based iSCSI stacks for Xenserver shared storage backends, but never had spare hardware to run up software based stacks. This is rather backwards from the usual way people would test things I guess, but it’s how it worked out for us.

However, we’re now getting some new hardware for internal use – a couple of frontend servers and a storage server apparently, and we’re going to use a software based iSCSI stack on it. We’ve had a look at some of the commercial offerings – SANmelody, Open-e etc, but I’d much rather not spend money where it’s not needed. This iSCSI backend is going to have one or two LUNs, shared to a static number of hosts, and that’s it.

I’d steered away from the various open-source iSCSI target stacks, because it wasn’t clear whether they supported multiple initiators to access a single LUN concurrently. This surprised me somewhat – it seemed like it should just work, however we kept getting caught by people asking about the “MaxConnections” parameter for IETd, which sounds like it means “Maximum number of initiators to this LUN”, and has a rather depressing note beside it stating that the only valid parameter is “1” at this stage.

This didn’t sit right with me though – surely there are lots of people using fully opensource iSCSI systems. All the talk about iSCSI being a cheap (free!) SAN alternative can’t just be referring to people consolidating disks but still allocating single-use LUNS. I’ve found lots of references to people even talking about using software iSCSI targets with Xen as a shared storage backend.

And, of course, it’s not right[1]. The IETd “MaxConnections” parameter refers to the number of connections a single initiator can make with respect to a single target, which boils down to whether multipath IO is supported via the iSCSI stack or not. And it’s not, as far as IETd is concerned. This post to iscsi.iscsi-target.devel clears things up quite nicely, but it took me a damned long time to find. So, hopefully, this will help someone else answer this question.

1) multiple ini access different targets on same iet box at same time.
no data concurrency issue. the performance totally depends on your HW.
of course, IET can be improved to support large # of ini better

2) multiple ini access same targets on same iet box at same time. has
data concurrency issue here, so need a clsuter file system or similar
system at client side to coordinate.

3) one ini access different targets on one iet box. it will create
multiple sessions and no data concurrency issue here. performance issue
depends on HW.

all these are MS&OC/S (Multiple Sessions& One Connection per Session)

4) one ini access same target on one iet box.

it might try to use multiple connection in one session (MC/S, Multiple
Connection per Session), but iet doesnot support it and in parameter
negotiation, iet stick to MaxConn=1.

it might try to create multiple sessions with same target (still one
connection per session), which is allowed. usually this is controlled by
client software, for example, linux multi-path.

I read “multiple ini[tiator] access same targets on same iet box at same time” to mean exactly the problem I’m looking at, and the only cavaet is the filesystem issue, which Xenserver deals with. And it clarifies the point about MaxConnections too.

[1] That said, I haven’t tested this properly yet. I ran up IETd on a test box and connected OSX and linux to it concurrently, but while I could format the disk via linux I couldn’t mount it for some reason. OSX saw it fine. I’m not sure if this wasn’t just some transient weirdness on my test boxes or not.

UPDATE: Matt Purvis emailed me to confirm that it does all work as expected. Thankfully. I hope other people find this post useful – if only because it means I’m not the only one that spent hours trying to find a definitive answer on this topic.

Week of Xenserver Bugs

Over the last week I’ve been required to fix four different bugs relating to Xenserver. Not all were major bugs, not all were even Xenserver’s fault.

DVD drive missing

The first bug, and actually one that first showed itself several months ago, is that the option to attach the server’s DVD drive to a VM was not present. This originally happened because the DVD drive in the HP C3000 Blade chassis died, and was replaced. Even after this was replaced, it wouldn’t show up in Xencenter however. There are forum notes around on recreated the VBD and so on, however in this case that wasn’t even required – after reattaching the DVD drive via the Bladecenter ILO to the individual blades and confirmed that the correct CD device appeared in dmesg output, I ran the command xe-toolstack-restart. This command, as you might guess, restarts the xenserver toolstack. The DVD drive now shows up in Xencenter. I’d actually logged a bug report with Citrix for this a while back, and so credit is due to the Citrix engineer that called me back on this issue and suggested trying xe-toolstack-restart before doing anything else.

Xencenter not connecting

The same day as fixing the above bug, I had another customer call me saying they couldn’t connect via XenCenter to their Xenserver Enterprise host. I’d had a similar issue several months ago when someone changed the networking configuration on the host, and the fix then was, as above, to run the xe-toolstack-restart command. All fixed! Well, in this case, the symptoms were fixed, we still don’t know what caused the underlying problem.

VMs not starting, ISO SR failing after upgrade

This one came through on the same day as well. One of our customers had run an upgrade from 4.0.1 to 4.1.0 on their own internal evaluation system of Xenserver Enterprise, which actually had a couple of production hosts on it. They’d run the upgrade and the ISO storage repository failed to reconnect, and a couple of VMs that had previously had ISO images mounted out of the SR failed to boot. Sadly, xe-toolstack-restart didn’t solve anything for me here.

There is a lot of functionality exposed via the CLI however, so I was able to force detach the ISO images from the VMS in question. They were in a suspended state however, so I had to manually force reset them. Once I had these fixed I looked at what caused the ISO SR to die.

One of the things a that a lot of people misunderstand about Xenserver is that it is effectively an appliance. It runs CentOS as the dom0 (priviledged domain), but that doesn’t mean you should consider it to be a useful CentOS server. The upgrade process for a Xenserver system is to duplicate the primary partition into a backup partition (copy /dev/sda1 into /dev/sda2, for example). Once this is done, it basically performs a full install of the new version of Xenserver into /dev/sda1, and migrates the settings it knows about – all the Xenserver state, your networking configuration (in theory anyway), and so on. Things it misses include any custom software you might have installed (iSCSI initiators for tape access, monitoring tools, any custom scripts) – these all get “deleted”. They’re still actually in the backup partition, just not in the active one.

The upshot of this is that when you connect your ISO SR to a CIFS share and use a hostname to refer to the server rather than an IP address, don’t “make it work” by adding an entry to /etc/hosts. If you want to use hostnames, make sure they work via DNS, and make sure your DNS is set up right on your Xenserver host.

I think there’s a lot Xenserver could have done to have prevented this bug from happening, so hopefully they’ll add some smarts to auto-detach VDIs from ISO SRs if the SR doesn’t connect properly. I’m not sure there’s a nice way to auto-migrate all the users settings (eg, do an inplace upgrade rather than an overwrite upgrade) – there’s too much scope for stuff to change.

Upgrade loses network settings on Xenserver

And now my final bugs, and the most annoying. We have a customer with a Xen Enterprise 3.2 host, with a Win2k3 terminal server and a Win2k3 SBS server on it, running their core business infrastructure. We’d scheduled an outage for the upgrade from 3.2 to 4.0.1 to 4.1.0, and it all looked good, except…

Xenserver network settings failed to migrate. Not sure why his happened, it definitely doesn’t seem to always happen. The xe pif-reconfigure-ip command is used in Xenserver 4.1.0 to reconfigure the IP stack on the host however, followed by a xe-toolstack-restart. My favourite command!

Xentools won’t install in 4.1.0 system upgraded from 3.2.0

This one took up basically my entire day yesterday. After the upgrade from 3.2.0 through 4.0.1 and into 4.1.0, the VMs booted, but were running the old version of Xentools. The technician doing the upgrade attempted to install the new Xentools, however on both servers it got as far as uninstalling the 3.2.0 Xentools, and then failed completely to install the 4.1.0 version. We spent a lot of time going back and forth uninstalling and attempting to reinstall the drivers, before eventually completely uninstalling them and leaving the systems running without xentools for the afternoon. I then spent most of my evening on the phone to Citrix support in Australia, both looking at the site in question over a very laggy Gotoassist connection. We finally went through another complete uninstall of xentools, including removing all the hidden device drivers (see here for details), and then installed an internal release of Xentools for 4.0.1, which at least resolved the issue.

The bug appears to be within the Xentools, but it could also be within windows itself, or that’s what I understood from the Citrix engineer I was talking to. We are apparently the second documented occurance of this bug, and Citrix is working on a final resolution. The Citrix engineer in question had managed to replicate the bug on one of his test systems, which is reassuring to me – they can prove they fix it, at least for some permutation of the problem.

Summary

It feels like I’m painting a bad picture of Xenserver here, and maybe I am. You can take what you like from what I’ve written, I guess :). I’m not sure that any company could push through as many major changes as quickly as Xensource/Citrix have and not end up with some showstopper bugs, but I think some of the smaller ones should have been avoidable. Others, like the xentools bug I mentioned last, only seem to effect older systems being upgraded, and even then it doesn’t always happen to them, and I don’t really think you can test for that sort of edge case very easily, especially if you don’t know it happens. I’ll post an update when Citrix resolve this last bug, so if anyone is reading this and is put off upgrading their XE 3.2 system, check back for an update!

Thoughts on OSX

I recently got a new laptop for work – the base model Macbook Pro. Slightly unfortunately, I got it about three weeks before the refresh, but I don’t really care about the fairly minor changes. The slight CPU speed bump isn’t really worth worrying about, although the new penryn based chip might have been worth it, and the disk and VRAM bumps aren’t anything I care about. The multitouch trackpad sounds cool, but I’m not sure how much use it would have been anyway.

I’ve spent the last few weeks getting used to OSX and it’s quirks, and figured I should write up my thoughts on it. I’ve been using linux for about 11 years now, and it’s been my primary OS on the desktop/laptop for at least the last seven. So, I’m pretty used to how you’d do things under linux, and while people keep making claims like “OSX is just FreeBSD under the hood anyway”, that’s not really much help to me. FreeBSD and linux are different under the hood; and OSX is different above the hood – Aqua is not X.

The little things

OSX may be FreeBSD-like under the hood, but that doesn’t help a long-time linux user very much. There’s so many little differences, none of which are massive, but which take a little while to get used to. For example: you can’t mix option and non-option command line arguments: ‘chmod g+w foo -R’ is not the same as ‘chmod -R g+w foo’; /sbin/route doesn’t exist at all – you can use netstat instead of course. None of these are majors, they’re just little things to get used to.

Installing Applications

OSX is still different to FreeBSD – there’s no ports system there. So, OSX doesn’t ship with wget, just curl, but I’m used to using wget. I can install a ports system, and use that to install wget, which is actually fine… but then I try to do the same for subversion, and spend half a day compiling libraries, before giving up on that and doing a quick search for ‘subversion dmg’ on google. I like being able to use apt-get (or even yum, although I like yum much less) to install arbitrary software quickly and easily. I’m sure that using ports is much less tedious on a system which is built using ports and already has a much wider range of libraries and build-related packages installed, but it just feels clunky on OSX. My slow DSL at home isn’t helping either.

That’s only one aspect of installing applications however. Using .app bundles is in many ways a better way of managing applications than the standard approach of installing them into a common path. Want to install an app? Drag it to your Applications folder. Want to remove it? Drag it to the trash. Or use the cli if you really care. The best approximation under a traditional linux/unix system would be to install the entire application into it’s own tree under /usr/local or /opt, and there are systems like Zero Install under linux which aim to do something similar. This framework isn’t new to OSX of course, it’s been round for years.

It’s a slightly more user-centric way of doing things however, and I’m not sure how well it’ll work out in a shared environment. At worst it’ll probably mean that everyone ends up with their own versions of apps stored under their home directory, which tends to happen anyway in shared environments.

The menu bar

I’m not even sure what this is really called. Under OSX – and most previous versions of MacOS I think, the application menu bar has been detached from the application window itself. The app menu resides at the top of the screen, always, no matter where your application window happens to be at the time. I kind of like this idea, but it seems like it’ll fall down in multi-head systems, as the menu bar is tied to one display only, whereas you may want your application on the other display. I’m not running multihead at the moment, so this doesn’t bother me too much

Exposé

This is a fantastic innovation, and is a much quicker way of navigating through a pile of open windows. Exposé basically shrinks all open windows so that they all fit on the screen at once – you then select one, and they all resize, with the selected one at the front.  If you haven’t seen it before, check this video.  There is  work on doing something similar with Compiz or Beryl under linux / X, but the last I looked it was nowhere near as polished as this.

Spaces

Spaces is a new feature in Leopard that brings virtual desktops to OSX.  My laptop came with Leopard, so the first thing I did was set up spaces and assign keyboard shortcuts. I really can’t work without virtual desktops, so much so that when I installed Tiger onto a separate boot disk for some development work that required it, I immediately looked for a third-party addon that provided virtual desktops to Tiger – there’s a few of them round, I ended up using virtue. Spaces and Exposé integration is also very cool, and is a feature I find myself using a lot. (if you don’t know what virtual desktops are, google it ;). On it’s own, Spaces just levels up the playing field between X and Aqua in my terms – but Spaces and Exposé together take it to a whole new level.

The Dock

The Dock definitely isn’t a new concept – it’s something that’s been round on various OSes in various forms for a long time. The OSX Dock is definitely easy to use, but I’m not sure if it’s better or worse than anything else. It’s just different, perhaps. I quite like the Documents and Downloads stacks that are new in Leopard – if you haven’t seen them, they’re blow-up windows of the contents of the respective folders, which makes it easy to access them. Of course, if you have a bajillion files in your Documents folder, I’m not sure it’ll be much use to you

Finder

As far as file managers go, Finder is pretty good. I tend to not use file managers very much, but if I was forced to only use a file manager for filesystem interaction, I think you could do a lot worse than using Finder.

Spotlight

Spotlight is the OSX “search” tool. Other than an irritating tendency for spotlight to bog down your system when it insists on scanning a newly inserted external harddisk, I quite like it. It’s easy to find binaries (eg, what’s the OSX graphical tool for partitioning disks? Start typing disk into spotlight…), as well as searching through your documents and emails. There’s an OSS equivalent called Beagle, which is part of the GNOME project I believe, however I never cared enough to make it work properly, although I did care enough to get rid of it on at least one occasion, where the Beagle cache files consumed about 6 GB of my 10 GB /home partition.

 Terminal

I spend a lot of time using terminals – most of my work is done on remote servers via ssh. I went through nearly every terminal program available on linux, and nearly always ended up sticking with xterm or rxvt. More recently I adopted KDE as my linux desktop environment, and just stuck with Konsole. I found Terminal.app to be pretty good however, and with the addition of Visor, which drops a system-wide Terminal window much like the in-game Quake console menu, I’d have to say I’m happy. Other than the point below:

Copy and Paste, and Selection buffers

After moving to OSX I discovered how much I make use of the selection buffer in X Windows. If you don’t know what I mean by this – under X Windows, if you select a block of text with your mouse, it’s immediately available to paste (typically) via a middle-mouse click. No need to hit ctrl-c, or to find the edit window. Just select then paste. Under OSX, this only works inside Terminal.app, and even then it only works within the same Terminal window. I’m just going to have to put up with not being able to do this, because there isn’t really any way around it. From an efficiency point of view, it probably doesn’t save that much time compared to right-click copy or tapping option-c on the keyboard, but I still notice it, even after several weeks of getting used to it.

Different shortcut keys

OSX has the apple or option key, as well as ctrl and alt. Some of the shortcuts you’re used to still use ctrl (eg, in Terminal,  ctrl-c to cancel a program, ctrl-d to send a EOF), and most of them now use the Apple key (Apple-w to close a tab, not ctrl-w, etc). This has been fairly easy to get used to, however I normally use a Microsoft Ergonomic 4000 keyboard when at my desk, which obviously has a different key layout to the macbook pro keyboard. To make matters worse, when I’m at the office in Auckland, I use a different keyboard again. Not OSX’s fault, of course..

Remote Desktop

I’m forced to use Remote Desktop to a windows terminal server for work – we have an Exchange server, and I need to use outlook. We also use a CRM/ERP webapp which requires Internet Explorer. IE for Mac doesn’t cut it at all, and I haven’t got a copy of Office 2008 for Mac yet (and I’ve heard bad things about Entourage too). There’s no Apple RDP client, but MS have a version that is fairly buggy and annoying to use – nowhere near as nice as mstsc under windows or rdesktop under linux. I found a GPL app called CoRD which I quite like. It doesn’t support as many features as the MS client yet, but it’s doing pretty well.

Safari

I’m not sure what it is, but I just can’t bring myself to like Safari. Thankfully the new Firefox 3 Betas are showing serious performance and integration improvements under OSX, so I can just use those instead. Maybe I’ll give Safari another shot later.

Bootcamp

Bootcamp is the OSX bundled tool for dualbooting into windows. Once you’ve setup a bootcamp windows install, you can also use Parallels or VMWare Fusion to boot windows as a VM. I set up windows a couple of weeks ago, but haven’t really used it yet, and the only reason I installed it was because I wanted to play Neverwinter Nights 2, which has no OSX version. I still haven’t played it. The tool itself is quite nice – it resizes your partition on the fly, no reboot needed.

Booting from external disks

OSX has always (I think) let you boot from external disks, which has made it easy to do system upgrades. Pre-intel, you had to boot from a firewire disk enclosure, now you can boot from either firewire or USB2. This isn’t something that really occured to me to do, until I discovered I needed Tiger to do some development work, but didn’t want to install it over Leopard. You can’t virtualise desktop versions of OSX yet, and for some reason I couldn’t install Tiger into a bootcamp partition, so I cleared off an external drive and used that. Works fine, although it feels slower than off the local disk.

Virtualisation

As I mentioned earlier, you can’t virtualise the desktop versions of OSX, even under OSX. You will be able to virtualise the server version of OSX, assuming you’re properly licensed. I also mentioned Parallels and VMWare Fusion, the two leading desktop virtualisation suites for OSX. While both of these are fine for what they do, one of the things I was hoping to do with a new laptop was to have a serious look at KVM and other non-Xen virtualisation options now available in linux. And, well, I need linux to do that. Not OSX’s fault at all

Finally…

Overall, I think OSX, and specifically Leopard (v 10.5), is a great platform. Aside from the slight inconsistencies between common features (eg, command line argument placement), it was very easy for me to adopt my entire workflow to OSX. I’ve been forced use Windows in the past for various reasons, and found it much harder to adapt – it lacked all the good things I was used to (virtual desktops, copy/paste selection, etc).

A coworker mentioned a quote he’d heard, which I haven’t managed to track down. It went something like “OSX is for unix admins that don’t have time to care.” I can mostly agree with that sentiment – if you care about a pretty workstation. From that point of view, OSX definitely just works, and I’ve got very few problems with it.  I’ve hit issues elsewhere, some of them outlined above. Most of them I’ll just have to overcome with time, and as long as I stay with OSX as my primary OS, I’m sure that’ll be fine. It’s always possible they’ll be added in the future (Spaces was added in Leopard), but for things like UI-wide selection/paste, I’m not sure it’ll ever happen.

It’s definitely a very polished operating system, and I’d much rather run it than Windows. I’m still at the point where the KDE environment I had been using for the last 18 months or so is so familiar to me that it still feels easier to use in a lot of ways, but I’m putting that aside for now.

Xenserver Enterprise + Fibrechannel storage = live migration

I finally got round to booking time at the IBM demo center to have a fiddle with Xen Enterprise on shared storage. They’ve got a good range of entry-level kit at the demo center, but the important bits (for me anyway) were a Bladecenter H chassis with some HS21 blades fitted with Fibrechannel HBAs, and a DS3400 Fibrechannel storage array.

The IBM DS3000 series of arrays looks really promising. There’s three current variants – SAS, iSCSI and Fibrechannel attached, along with an EXP3000 expansion shelf. All four systems will scale to 48 SAS (14.4 TB) drives over 4 shelves, and an IBM firmware announcement I read online the other day strongly suggests they will support SATA drives in the very near future (36 TB using 750 GB disks). And the DS3000 series are cheap. Performance is pretty good, but you can only stack the controller with 1GB of cache – which really highlights the “entry level” bit of these SANs. The SAS attached option is compelling as well – you get SAN functionality, very close to FC levels of performance (3 Gbps peak HBA throughput for SAS compared to 4Gbps for FC), at a fraction of the cost. The DS3200 allows 6 single-connect or 3 dual-connect hosts, and as soon as IBM get round to releasing a SAS switch, that limit will disappear.

I’d used the DS3400 management software in the past, so setting up a LUN within the existing array took about 5 minutes, including creating the host to WWN mappings; and I had two Xenserver Enterprise installs already set up on a matched pair of blades from a previous attempt at this with the DS3300.

Xenserver Enterprise supports shared storage, but as of version 4.0.1, it only supports NFS or iSCSI shared storage officially. I’d had problems getting the openiscsi software iSCSI stack that Xenserver ships with to communicate successfully with the DS3300 however, and ran out of time. On the other hand, FC shared storage is just not supported at all yet. There’s a forum article explaining Xensource’s position on the support, which also links to a knowledge base article describing how shared storage works in Xenserver. The article was pulled with the release of 4.0.1:

“We took it down because, with the level of testing and integration we could do by initial release of 4.0, we couldn’t be any further along than a partial beta. There are business reasons we couldn’t ship a product as released while describing some of the features as “beta”, and it is hypocritical for us to officially describe a way of using the product yet describe that as “unsupported.” For that reason, until we are ready to release supported shared Fibre Channel SRs, we’re not going to put the KB article back up.”

The article describes the overall setup you have to have in place to have FC shared storage working – namely array, LUN and zoning management on the SAN and HBA, and locating the device node that maps to the corresponding LUN. And then there are two commands to run, on one node of your Xenserver pool, and your shared storage system is up and running.

At this point it took about another 5 minutes to create a Debian Etch domU, and verify that live migration between physical hosts was indeed working.

I set up a slightly more complicated scenario, which I’m planning on using with some other software to demonstrate a HA / DR environment, but which also enabled me to do one of those classic cheesy live migration “but it still works” tests – I connected a Wyse thinclient through to a terminal server domU, migrated the TS between physical hosts, and demonstrated that the terminal session stayed up. A ping left running during this time experienced a brief bump of about 100 ms during the final cutover, but that’s possible attributable forwarding-path updates in the bridge devices on the Xenserver hosts. Either way it works well. Moving a win2k3 terminal server took about 15 seconds on the hardware I was working with.
Overall, I’m quietly encouraged by this functionality and it’s ease of set up. I’m perhaps a bit underwhelmed by it too – it ended up being such a nonevent to get it working, that I’m annoyed I didn’t get round to setting up the demonstration months ago.

Benchmarks of an Intel SRSC16 RAID controller

One of our clients gave us an Intel server with an Intel SRSC16 SATA RAID controller and a 500 GB +hotspare RAID1 set up on it, to install XenServer Express 4.0.1 system. While building the system up for him, I noticed abysmal write perfomance. It was taking around 29 minutes to install a guest from a template, a process which basically involves creating and formatting a filesystem and unpacking a .tar.bz2 file into it. Inspection of the system revealed that the controller lacked a battery backup unit (BBU), and thus the writeback cache was disabled. Also, the firmware on the controller disabled the on-disk cache as well, and the controller listed disk access speed at 1.5Gbps, which I’m presuming means it was operating in SATA-1 mode, so no NCQ either. The controller has 64MB of cache.

I persuaded the customer to buy the BBU for the system, and then ran some quick bonnie++ benchmarks, which I know aren’t the best benchmark in the world, but show a good indication of relative performance gains. Results are as follows:

Note: I didn’t do the tests right either – not specifying a number of blocks of files to stat results in those tests completing too soon for bonnie to come up with an answer. So, the output below only really shows throughput tests, as the sequential create/random create tests all completed too soon. Changing the disk cache settings requires a reboot into the BIOS mode configuration tool, so I’ve avoided doing this too many times. Changing the controller cache settings can be done on the fly.

RAID controller writeback cache disabled, disk cache disabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.loca 512M  1947   4  2242   0  1113   0 10952  18 36654   0 169.7   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ ++

RAID controller writeback cache enabled, disk cache disabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.locald 512M  7938  19  9195   1  4401   0 28823  50 41961   0 227.0   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

RAID controller writeback cache disabled, disk cache enabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.loca 512M 19861  47 17094   1  9870   0 28484  47 41167   0 243.8   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.loca  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

RAID controller writeback cache enabled, disk cache enabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.locald 512M 38633  95 40436   4 15547   0 32045  54 42946   0 261.4   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Enabling the only the controller write-back cache (64MB in this case) roughly quadrupled the write throughput in all cases. Enabling only the disk cache provided nearly 8 times the performance on it’s own. And enabling both together increased write throughput by about a factor of 20.   I suspect the tests weren’t large enough to actually tax the cache systems on the disk or controller however, as I was running them in a Xen domU with only 256 MB of ram, and actually just wanted some quick results.

I know they aren’t really representative of anything, but here’s a test that is semi-representative: Installing another copy of the Xen domU via a template took 2minutes 55 seconds with disk cache enabled, and 2 minutes 30 seconds with disk cache and controller cache enabled (I didn’t test this with just controller cache enabled as that would have required a reboot and manual intervention, and I wasn’t onsite at that point).  Prior to enabling the disk cache and controller cache, this was taking nearly 30 minutes.

While the above shows that a combination of the controller write-back cache and the disk cache shows the best improvement, merely enabling the disk cache on it’s own had the biggest single effect. Of course, the disk cache isn’t backed up by a battery, so there’s the risk of losing the data that is in the disk cache at the time. The Intel documentation for the controller implied that this is limited to the sector that is being written at the point of powerfailure.

When I get some free time and a SCSI or SAS server, I’ll do some similar benchmarks up for that.