<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DotBlag.Com &#187; lsi</title>
	<atom:link href="http://www.dotblag.com/tag/lsi/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dotblag.com</link>
	<description>Technical Trials And Errors</description>
	<lastBuildDate>Fri, 14 Oct 2011 23:07:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>ZFS In Action</title>
		<link>http://www.dotblag.com/2008/10/02/zfs-in-action/</link>
		<comments>http://www.dotblag.com/2008/10/02/zfs-in-action/#comments</comments>
		<pubDate>Thu, 02 Oct 2008 23:05:07 +0000</pubDate>
		<dc:creator>SysOp</dc:creator>
				<category><![CDATA[.Fail]]></category>
		<category><![CDATA[ext2]]></category>
		<category><![CDATA[ext3]]></category>
		<category><![CDATA[lsi]]></category>
		<category><![CDATA[reiserfs]]></category>
		<category><![CDATA[solaris]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://www.dotblag.com/?p=63</guid>
		<description><![CDATA[Well we&#8217;ve now had a few months of full production ZFS usage. We&#8217;ve had our first drive failure which exposed the oddities of drive failures under ZFS. It does work REALLY hard to cover them up, so much so that it never really quite gave up on the dead drive until I ran zpool offline [...]]]></description>
			<content:encoded><![CDATA[<p>Well we&#8217;ve now had a few months of full production <a href="http://www.sun.com/software/solaris/zfs.jsp">ZFS</a> usage.  We&#8217;ve had our first drive failure which exposed the oddities of drive failures under <a href="http://www.sun.com/software/solaris/zfs.jsp">ZFS</a>.  It does work REALLY hard to cover them up, so much so that it never really quite gave up on the dead drive until I ran zpool offline on the drive.  That said there was NO effect to users at all as far as I can tell, despite the drive producing errors and just generally not responding the only commands suffering were zpool related commands that actually went to access the affected drive directly, overall performance and function of ZFS didn&#8217;t degrade.  Once I told ZFS to offline the offending drive the <a href="http://docs.sun.com/app/docs/doc/819-2240/zpool-1m?a=view">zpool commands</a> didn&#8217;t hang anymore.  The never hung indefinitely, just for a timeout period.  Later we actually replaced the physical drive, part of which was figuring out how to get the new drive visible since we&#8217;re using a <a href="http://lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_84016e/index.html">RAID card</a>, but aren&#8217;t doing any RAID with it, and finally running zpool replace.  Side rant &#8211; <a href="http://lsilogic.com">LSI</a>&#8216;s MegaCli is&#8230;Sometimes hard to understand.  Forget about the [] means optional convention.  When it&#8217;s talking about [E0:S0] it means literally that. except E0 and S0 are just numeric so enclosure 2525 slot 1 in say the -pdInfo command is expressed as <code>MegaCli -pdInfo -PhysDrv[252:1] -a0</code> so in <a href="http://www.gnu.org/software/bash/">BASH</a> and BASH-like shells you need to actually type <code>MegaCli -pdInfo -PhysDrv\[252:1\] -a0</code> in order to make sure you don&#8217;t run afoul of the fact that [ <a href="http://www.gnu.org/software/bash/manual/html_node/Filename-Expansion.html#Filename-Expansion">introduces a filename expansion in bash</a>  &#8211; why can&#8217;t they just follow the <a href="http://developer.apple.com/documentation/Darwin/Reference/ManPages/man5/manpages.5.html">normal</a> <a href="http://docs.sun.com/app/docs/doc/819-2252/man-5?a=view">man</a> <a href="http://www.kernel.org/doc/man-pages/online/pages/man7/man-pages.7.html">page</a> <a href="http://www.phpman.info/index.php/man/man/7">conventions</a> on SYNOPSIS sections and plain old good practice with regards to command line arguments?</p>
<p>The quirkiest bit is that a simple <a href="http://docs.sun.com/app/docs/doc/819-2240/zpool-1m?a=view">zpool status</a>, would indicate everything was A/OK and healthy.  You really do have to have &#8220;something&#8221; monitor the READ/WRITE/CSUM columns because ZFS is only going to go into DEGRADED states if the device becomes completely absent, and even then due to it&#8217;s fault tolerance that might not stop the filesystem.  Depending on the filesystem settings, it&#8217;ll either try to keep working from cache, block, return i/o errors, or panic the system.  This is actually a Good Thing.  Because say you&#8217;re moving a fibre channel loop or an iSCSI cable?  Well ZFS will handle that gracefully even if ALL the drives disappear.  Folks, don&#8217;t try that with <a href="http://e2fsprogs.sourceforge.net/ext2.html">EXT2/3</a> or ReiserFS.  Trust me, just Don&#8217;t Ask How I Know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dotblag.com/2008/10/02/zfs-in-action/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Goodbye Areca, Hello LSI</title>
		<link>http://www.dotblag.com/2008/07/25/goodbye-areca-hello-lsi/</link>
		<comments>http://www.dotblag.com/2008/07/25/goodbye-areca-hello-lsi/#comments</comments>
		<pubDate>Fri, 25 Jul 2008 17:26:31 +0000</pubDate>
		<dc:creator>SysOp</dc:creator>
				<category><![CDATA[.Fail]]></category>
		<category><![CDATA[.Hardware]]></category>
		<category><![CDATA[areca]]></category>
		<category><![CDATA[lsi]]></category>
		<category><![CDATA[solaris]]></category>
		<category><![CDATA[zfs]]></category>

		<guid isPermaLink="false">http://www.dotblag.com/?p=49</guid>
		<description><![CDATA[While it&#8217;s not clear exactly who-is-causing what, what is clear is the areca driver tries to de-reference a NULL pointer, this is either because the adapter screws up, or the driver screws up somewhere. The result is a Solaris kernel fault, pointing at the arcmsr driver, and apparently an adapter lockup.  It&#8217;s not 100% clear [...]]]></description>
			<content:encoded><![CDATA[<p>While it&#8217;s not clear exactly who-is-causing what, what is clear is the areca driver tries to de-reference a NULL pointer, this is either because the adapter screws up, or the driver screws up somewhere. The result is a Solaris kernel fault, pointing at the arcmsr driver, and apparently an adapter lockup.  It&#8217;s not 100% clear what causes this condition.  It could be the driver not handling some buffer appropriately, it could be the card sending an error that the driver doesn&#8217;t handle.  It&#8217;s pretty likely though the issue is completely inside of the arcmsr driver and <a href="http://areca.com.tw/">Areca</a> hardware.  One thing we did discover that means we HAVE to replace the Areca hardware is that in JBOD mode (which is how we use it, since we&#8217;re using <a href="http://solaris.com/zfs">Solaris&#8217; ZFS superset</a> of RAID functionality), any disk failure seizes the whole card up until the failure clears, or maybe until some apparently long timer clears.  SATA and SAS have ethernet-like link failure detection.  You know within milliseconds when the cable is pulled. The Areca&#8217;s in JBOD mode seem unable to handle hot-swap of any type, or even failures of any type.  When we tried to get them to address it all we received was vague &#8220;you must have a failing drive&#8221; answers, which for a RAID card is a bad answer.  Even in JBOD mode the controller should signal/propogate an error.  Solaris&#8217; would handle this condition.</p>
<p>Then there&#8217;s the boot selection.  All logical drives appear in the boot selection.  Either the list fills up or the Areca&#8217;s only show drives on the first controller.  That&#8217;s a problem if you want to be able to boot an alternate drive on a second controller.</p>
<p>So, Saturday, I get to backup the entire user data.  Blow the whole damn thing away.  And start over.  *sigh*</p>
<p>I know I posted about this before, but we were avoiding the whole rebuild thing, turns out I&#8217;m going to have to do that anyway.  Argh.  The biggest reason is the <a href="http://lsi.com">LSI</a> cards use an on-disk metadata format (apparently).  They&#8217;re kinda quiet about it all so I&#8217;m not sure how big it is or where on the disk.  I&#8217;m betting it&#8217;s the last N megs of the drive.  8, 16, 20 something.  Most people won&#8217;t notice, but if you&#8217;re migrating, it becomes noticeable.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dotblag.com/2008/07/25/goodbye-areca-hello-lsi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.580 seconds -->

