Categories
General NSP WLUG

Benchmarks of an Intel SRSC16 RAID controller

One of our clients gave us an Intel server with an Intel SRSC16 SATA RAID controller and a 500 GB +hotspare RAID1 set up on it, to install XenServer Express 4.0.1 system. While building the system up for him, I noticed abysmal write perfomance. It was taking around 29 minutes to install a guest from a template, a process which basically involves creating and formatting a filesystem and unpacking a .tar.bz2 file into it. Inspection of the system revealed that the controller lacked a battery backup unit (BBU), and thus the writeback cache was disabled. Also, the firmware on the controller disabled the on-disk cache as well, and the controller listed disk access speed at 1.5Gbps, which I’m presuming means it was operating in SATA-1 mode, so no NCQ either. The controller has 64MB of cache.

I persuaded the customer to buy the BBU for the system, and then ran some quick bonnie++ benchmarks, which I know aren’t the best benchmark in the world, but show a good indication of relative performance gains. Results are as follows:

Note: I didn’t do the tests right either – not specifying a number of blocks of files to stat results in those tests completing too soon for bonnie to come up with an answer. So, the output below only really shows throughput tests, as the sequential create/random create tests all completed too soon. Changing the disk cache settings requires a reboot into the BIOS mode configuration tool, so I’ve avoided doing this too many times. Changing the controller cache settings can be done on the fly.

[code]

RAID controller writeback cache disabled, disk cache disabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.loca 512M  1947   4  2242   0  1113   0 10952  18 36654   0 169.7   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ ++

RAID controller writeback cache enabled, disk cache disabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.locald 512M  7938  19  9195   1  4401   0 28823  50 41961   0 227.0   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

RAID controller writeback cache disabled, disk cache enabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.loca 512M 19861  47 17094   1  9870   0 28484  47 41167   0 243.8   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.loca  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

RAID controller writeback cache enabled, disk cache enabled:

Version  1.03      ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.locald 512M 38633  95 40436   4 15547   0 32045  54 42946   0 261.4   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.locald  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
[/code]

Enabling the only the controller write-back cache (64MB in this case) roughly quadrupled the write throughput in all cases. Enabling only the disk cache provided nearly 8 times the performance on it’s own. And enabling both together increased write throughput by about a factor of 20.   I suspect the tests weren’t large enough to actually tax the cache systems on the disk or controller however, as I was running them in a Xen domU with only 256 MB of ram, and actually just wanted some quick results.

I know they aren’t really representative of anything, but here’s a test that is semi-representative: Installing another copy of the Xen domU via a template took 2minutes 55 seconds with disk cache enabled, and 2 minutes 30 seconds with disk cache and controller cache enabled (I didn’t test this with just controller cache enabled as that would have required a reboot and manual intervention, and I wasn’t onsite at that point).  Prior to enabling the disk cache and controller cache, this was taking nearly 30 minutes.

While the above shows that a combination of the controller write-back cache and the disk cache shows the best improvement, merely enabling the disk cache on it’s own had the biggest single effect. Of course, the disk cache isn’t backed up by a battery, so there’s the risk of losing the data that is in the disk cache at the time. The Intel documentation for the controller implied that this is limited to the sector that is being written at the point of powerfailure.

When I get some free time and a SCSI or SAS server, I’ll do some similar benchmarks up for that.

Categories
jabber MetaNET WLUG

Upgrading jabber.meta.net.nz

I’ve finally gotten sick of the jabber.meta.net.nz services having a negative impact on the rest of the server due to the transports chewing up lots of ram, and so have put together another server and will start migrating things over. The new server is a dual 1GHz Coppermine P3, with 2 GB of ram and 36 GB of SCSI RAID-1 disks.

This seems like a great opportunity to tidy things up again. The server stack has been a bit messy since we upgraded to jabberd2, partly because of the server codebase, and at least partly because of gentoo. So, the new plan is to move jabber.meta.net.nz to ejabberd, sometime soon.  Along with the move will come a new webbased chat system ( jwchat ), and newer versions of the transports if they are available.

Ejabberd seems to be a pretty robust system. Jabber.org moved their system to it several years ago, and jabber.org.au made the switch more recently. I’ve seen some pretty impressive benchmarks regarding message latency / cpu loading / ram usage under high protocol load and it beats jabberd1.x and jabberd2.x hands down. Not that jabber.meta.net.nz really notices, or compares with jabber.org or jabber.org.au in terms of volume. We’ve currently got 112 clients connected, and someone noticed we had 203 clients connected a couple of weeks ago, but I have no idea what the high water mark for client connects is, much less any idea about message throughput statistics.

I aim to have the following done in the next few weeks:

  • Install and test base system – ejabberd, transports
  • Install and test jwchat
  • Script the migration of registration information and rosters from the jabberd2 pgsql database to something ejabberd can read.
  • Update the website with better information
  • Setup transports to prevent OOM issues (ulimit/regular restarts)
  • Test migration of registry/rosters
  • Pick a day and cutover.

The trickiest bit will be the migration of roster info from pgsql to ejabberd’s format, but I’ve already done at least three quarters of that already, as I considered doing this migration back in March.

Categories
linux NSP WLUG Xen

iSCSI for SCSI device passthrough under Xen Enterprise

I recently had to add a SCSI tape drive to a Xen Enterprise server, and needed to use BackupExec under one of the Windows domU’s as the backup software. Luckily, Greig did this a few months ago already using the iSCSI Enterprise Target, and put his notes up on the WLUG Wiki here.

I hit one problem however – when using NTBackup to test the system, it would write about 20 MB to tape, then fail. Greig pointed out he’d only ever used BackupExec, which was the software that was going to be used finally anyway, so I installed that and then it worked fine.

Also, going one step further, it’s possible to use the same technique to push USB mass storage devices over iSCSI to domUs. As Xen Enterprise doesn’t have a nice way of passing USB mass storage devices through to domUs yet, this is a very good solution in the interim

[code]

Target iqn.2007-04.com.example:tape0
Lun 0 H=4,C=0,I=0,L=0,Type=rawio
Type 1
InitialR2T No
ImmediateData Yes
xMaxRecvDataSegmentLength 262144

Target iqn.2007-04.com.example:usb0
Lun 1 Path=/dev/sdb,Type=fileio
[/code]

As it refers to the scsi device, which will change if you unplug and re-insert a USB block device, it makes a lot of sense to use udev to map your USB mass storage device to a specific /dev entry.

I don’t have a feeling for how robust this is yet.

Categories
linux NSP Tool of the Week WLUG

NUT: Network UPS Tools

I was tweaking the UPS rules at a client’s site, when I noticed that the base NUT configuration that we use didn’t really do a hell of a lot. The example config files had some hints as to what were needed, but unless I missed something fundamental, didn’t have the full picture.

After a bit of searching, my laptop battery ran out so I couldn’t carry on working onsite. I did get far enough to make some notes, but I have since lost the site I referred to, so can’t put proper attribution. It looked something like this one though, and was also dedicated to setting up NUT on a Mac, so I figure that will do.

I’ve since returned to this issue, and after fighting with serial and USB cables, have finally completed and tested it all. My configuration is on the WLUG wiki at the NutNotes page.

Categories
Tool of the Week WLUG

“Useful” command line tools

A coworker was doing some work on a server we’re building up, and wanted to kill a bunch of processes. The killall binary wasn’t installed for some reason (default etch install, probably just missing the package), but he found a killall5 binary instead.

For those of you who don’t know, killall5 is the SysV version of killall. It’s quite a bit more literal about it’s functionality than the killall most of us are used to – it will, without taking any command line arguments, prompting if you are sure, or any indication of what is about to happen, send a kill signal to all processes.

I was in the middle of saying “Don’t run that” when my coworker did. Oops. Good thing we were still building the server and it was on the build desk next to him.

This got me thinking though. Are there any other command line tools that are similarly dangerous as killall5? That is, they will do something terminal to your system, without prompting for help or confirmation?

Categories
linux Tool of the Week WLUG

Miro – Internet TV

Miro, formerly known as Democracy TV, made its first public release a few days ago.  It’s available at http://www.getmiro.com/. Miro is like a blog aggregator for video sources such as YouTube and Google Video, as well as provider content such as various news  and science tv channels, The Onion.

Installing it was trivial under Ubuntu, although it conflicts with the blackdown JRE. You can install the sun jre instead to get around this.

Categories
linux WLUG

make-kpkg and recent 2.6 kernels

I keep hitting a bug in make-kpkg when building recent kernels, and I always forget the fix. So, blogging it here for ease of reference

[code] make-kpkg uses version.h to get UTS_RELEASE. UTS_RELEASE has
moved to utsrelease.h.

Right after you get the error, modify
debian/ruleset/misc/version_vars.mk

-UTS_RELEASE_VERSION=$(shell if [ -f include/linux/version.h ]; then \
– grep ‘define UTS_RELEASE’ include/linux/version.h | \
+UTS_RELEASE_VERSION=$(shell if [ -f include/linux/utsrelease.h ]; then \
+ grep ‘define UTS_RELEASE’ include/linux/utsrelease.h | \

And rerun your make-kpkg. The above is not a valid patch, you’ll have
to hand change it.

Joel

[/code]

Original post was found at http://lkml.org/lkml/2006/7/16/109

Categories
NSP WLUG

Feisty Fawn and Software RAID

It turns out there’s a race condition in Feisty Fawn, which can cause software RAID sets to not be set up on boot. This is problematic if you have your root partition on software RAID

Bug #75681 discusses this in some detail, although there are several suggestions on how to fix it.

I first hit this bug on a local machine, then had to the same upgrade on a machine in a different country. Needless to say I wanted to get it right. I’m archiving my notes here as I’m sure I’ll need them eventually. This race condition has probably already been fixed in Feisty, but it’s not worth risking on a remote machine.

First of all, there is some new management involved in setting up software RAID under feisty, so you need to make sure you read the documentation for the mdadm package. Every time an initramfs is generated it will generate a warning:

[code]
update-initramfs: Generating /boot/initrd.img-2.6.20-14-generic
cp: cannot stat `/etc/udev/rules.d/85-brltty.rules’: No such file or directory
W: mdadm: unchecked configuration file: /etc/mdadm/mdadm.conf
W: mdadm: please read /usr/share/doc/mdadm/README.upgrading-2.5.3.gz .
W: mdadm: no arrays defined in configuration file.
W: mdadm: falling back to emergency procedure in initramfs.
[/code]

Following those instructions, you are told to check the configuration in /etc/mdadm/mdadm.conf and compare with the output of /usr/share/mdadm/mkconf. Once you’ve done that, you can remove /var/lib/mdadm/CONF-UNCHECKED and re-run update-initramfs -u -k all to regenerate your initramfs images.

The particular race condition that I mentioned above occurs because udev hasn’t had time to stabilise before mdadm tries to create the array, which means mdadm can’t find the devices and fails. The fix suggested in the bug report is to insert udevsettle into the initramfs at an appropriate point, and recreate the initramfs images:

[code]
# echo “/sbin/udevsettle –timeout=10” >> /usr/share/initramfs-tools/scripts/init-premount/udev
# update-initramfs -u -k all
[/code]

This works, at least as of today. I don’t know if the bug is actually still a problem or not – I didn’t want to risk it./

Categories
General NSP WLUG

Debian Etch and apt-proxy issues

Debian Etch (4.0) was released on Monday, and I have to say I wasn’t at all prepared. I’ve got about 70 machines that will probably need to be upgraded to Etch at some point in the near future. I could leave some of them running sarge, but I’ll definitely have to upgrade most of these servers.

We use an apt-proxy internally, to improve apt performance. It works well, aside from a couple of bugs that cause it to lock up every now and then. While running some upgrades on out of the way servers today, I discovered that the version of apt in sarge really doesn’t play very nicely with an etch repository served by apt-proxy running on an etch server. It seems that Ubuntu is fine, and trying to update a sarge server via an apt-proxy running on an etch server is ok too.

Once the etch client has been upgraded, the etch apt-proxy works fine. So, looks like a key issue. The version of apt in sarge doesn’t have the archive security stuff in it, and has no way of checking whether the keys are intact – BUT, it still seems to care, and will timeout and eventually fail.

It turns out that installing a copy of apt from the sarge backports solves this. You’ll also need the gnupg package, but the one from sarge is OK
[code]
wget http://backports.org/debian/pool/main/a/apt/apt_0.6.46.4-0.1~bpo.1_i386.deb
wget http://backports.org/debian/pool/main/d/debian-archive-keyring/debian-archive-keyring_2006.11.22~bpo.2_all.deb
dpkg -i *.deb
apt-get update
[/code]

Categories
advocacy General NSP WLUG

Puppet – a system configuration tool

I saw a couple of blog posts about puppet recently. I’ve been meaning to investigate cfengine for a while now, and puppet was a new angle on the same problem. From the intro:
Puppet is a system configuration tool. It has a library for managing the system, a language for specifying the configuration you want, and a set of clients and servers for communicating the configuration and other information.

The library is entirely responsible for all action, and the language is entirely responsible for expressing configuration choices. Everything is developed so that the language operations can take place centrally on a single server (or bank of servers), and all library operations will take place on each individual client. Thus, there is a clear demarcation between language operations and library operations, as this document will mention.

It’s very new still, and is under active development. It seems to have been designed with fixing some of the hassles of cfengine in mind. It is written in ruby and has a reasonably powerful config language, and you can use embedded ruby templates for dynamically building up content to deploy. I have no particular preference for ruby – in fact, this is the first time I’ve used the language. Configuration is stored in a manifest on the puppetmaster server, and is based on the notions of classes and nodes. A node can inherit from multiple classes, or can merely include a specific class if certain criteria are met. Subclasses can override specific details of a parent class.
It makes use of a library called facter (also written by reductive labs), to pull information ‘facts’ from the client hosts, and these can be used in the manifests to control configuration. For example, it will work out the linux distribution you are running and store this in a variable, and you can use this to determine which classes to run.  It is fairly easy to extend facter to support additional facts – so I added support for working out the Debian and Ubuntu release number and codename – eg, 3.1 and sarge, or 6.10 and edgy.
There is a dependancy system in place, so that you can specify a rule to ensure that a service is running, which depends on the package being installed. If you use puppet to manage the config file for the service, you can set a subscription on the file for the service, so that if a change to that file is pushed out via puppet, it will restart the server for you as well.

Installing packages is handled well, with the option for seeding debconf if appropriate. Puppet understands several package management formats, including apt, rpm and yum.
I’m by no means an expert with cfengine, but this feels a lot nicer to use. After my initial testing, I see no reason so far to not deploy this at work. I’ll test try a test deployment on some systems, and if that works out I’ll push it the whole way out.