2009-09-04

Soekris net4526 JP3 (GPIO) pinout

This thing gets indexed by Google, so I'll post this here for the next person who wonders. (I've also updated some random wiki page I wound when googling, and should probably mail the soekris-tech list while I'm at it, but probably won't do that tonight.)

Anyway. The Soekris net4526 single-board computer has on it an 8-bin header connector labelled JP3, which is known to carry five GPIO lines. However, Soekris Engineering hasn't released a manual for the net4526, only for the related net4501, and the only related traffic I found on their mailing list was someone who had figured out some of the pins and was asking for help with the others. (Which is odd, because I thought I'd seen a complete pinout posted there at one point.)

So, since I have a project that might need this feature, I went in and tested myself, and (arranged like they are here, with the JP3 label upright):

1: +3.3V     2: +5V
3: GPIO 7    4: GPIO 8
5: GPIO 11   6: GPIO 21
7: GPIO 22   8: GND


I saw some talk about GPIO numbers in octal or something, so I will add: those are in decimal and zero-indexed, as accepted and reported by NetBSD's gpioctl(8) command. ALso, I have independently confirmed those of the pin assignments noted in the mailing list message I mentioned.

Know further that the pins' input mode has an internal pullup.

The connector I ginned up for the header is also… interesting, and may deserve a post of its own.

2009-07-14

The truth is also a three-edged sword.

But these are a different three from the canonical ones for understanding. Of course I'm talking about the RAIDframe stuff, because that's all I seem to get around to posting here. Anyway, there's what we believed at boot, what we know now, and what we want to be believed on the next boot. For the code I've written, they're fields of the same struct. For the existing raid(4) code, the information can be a bit more… scattered. (Making things slightly more fun: all the metadata for a RAID set is replicated on each component, so there's the question of what to do if there are non-fatal differences.) My SoC mentor has noted that things could use some reorganizing there, and part of me would like that too, but a much larger part of me says It's Working Code, Leave It Alone.

This notion applies to fault-tolerant persistent data systems in general, really; there, as with RAIDframe, the first item is relevant only until some kind of roll-forward is done to clean up after a failure. In RAIDframe's case, this is raidctl -P, and it's a little more prominent because it runs in parallel with, to borrow a term, the mutator; contrast with wapbl(4)'s roll-forward, which is done automatically on mount, appears to be quite quick, and I assume blocks use of the filesystem until it's done.

(In the other half of my life I'm looking at the literature on persistence, and it's almost odd how these two things are converging here.)

2009-06-12

It Is Written

What I'll call a first draft of the RAIDframe parity map stuff is written, and compiles and links, and if run will actually do something. That something will, in practice, probably involve bugs.

Now to get my QEMU setup into something more resembling a useful state, because the time I spend on that will almost certainly be paid back by not waiting for my test box to reboot, and I've been meaning to deal with that anyway. Once this gets to the point of serious benchmarking I'll need to use actual hardware for the most part, of course.

The RAIDframe codebase, incidentally, is… not unelaborate.

2009-06-11

The RAID Project: Things Not To Do

  1. Let writes to the RAID hit the disk before the corresponding parity map bit is set on disk.

  2. Let writes to the RAID hit the disk after the corresponding parity map bit is cleared on disk. (That is, updates which just mark regions clean again still need a barrier.)

  3. Have one write see that its region needs to be marked unclean, then do that, and then before that actually gets committed to the disk another write to the same region sees that it's allegedly already marked and just does the write, which happens to hit the disk before the parity map update and then the power goes out at that exact moment.

    This may not even be possible — I think I'll only ever be starting writes from one particular thread, given the RAIDframe architecture, though I'm not sure of that yet — and even if it is it sounds stunningly unlikely. Which is to say that if I get this wrong I may never find out; so don't do that.

    Point also being that it's important to keep invariants in mind when dealing with shared-state concurrency, including those invariants that involve the state of secondary storage and the potential behavior of loosely specified hardware as well as the program's data structures proper.

Completely unrelatedly, I've just learned that the posting interface here rejects ill-formed HTML.

2009-06-01

Parity

I didn't go into too much detail about my Google Summer of Code project last time. It is: improving RAIDframe parity handling. And now I'm going to be excessively verbose about it.

Specifically: the thing about RAID levels that provide redundancy (i.e., not RAID 0) is that there's some kind of invariant over what's on the disk: both halves of a mirror are the same, or each parity block is the XOR of its corresponding data blocks, &c. And the thing about software RAID is that, if the power goes out (or the system crashes) while you're in the middle of writing stuff to each of the disks, some of those writes might happen while others don't. Then, when the lights come back on, the invariant may no longer hold for any stripe that was being written.

This is of particular concern for RAID 5, because if the parity is still wrong when (not if) a disk fails and one of the data blocks needs to be reconstructed by XORing the parity with the remaining data, you will get complete garbage instead of the data you lost. This is bad.

One solution, and the one currently used in NetBSD, is to set a flag on each disk making up the RAID when it's configured, and clear it when it's unconfigured. If that flag is already set when the set is brought up, then there might have been an unclean shutdown requiring the parity to be recomputed.

That is, requiring the entire array to be read from beginning to end. Which, as magnetic disk drives pack more and more tracks onto their platters, inevitably takes longer and longer. As it is, each unclean shutdown requires many hours of parity rewriting, during which the disk I/O load interferes with whatever the system's actual job is. This is also kind of bad.

It is said that the Solaris Volume Manager (which I briefly administered an instance of, but didn't have to care how it worked in this much detail) divides the RAID into some number of regions and records for each one whether its parity might be out of sync. This seems like a simple enough idea.

Except it's kind of not. Ideally, you'd like as many of these regions to be marked clean as possible, to cut down on the parity rewriting time. On the other hand, because you'll have to do disk seeks (and probably disk cache flushes, too, and hope the firmware isn't too broken) to set or clear a region's dirty bit, and it's absolutely essential that that bit-setting hit the disk before any writes to the region are done, you also want to hold off on marking clean those regions that you think might be getting written to sometime soon.

So, if you're getting truly random I/O, then you're kind of stuck. But, if what's on top of the RAID is some halfway reasonable filesystem that's been painstakingly designed to exhibit reasonable locality of reference, then recent write activity should (at the region level) be a decent predictor of the future. I hope.

And then there's the part of the project where I get all this integrated into the kernel, which is beyond the scope of this post.

2009-05-21

The blog is alive!

This summer, I'm participating in Google's Summer of Code, attached to the NetBSD project and working on fixing certain misfeatures of the software RAID driver — more details on which later.

Various other SoCers have declared their intention of keeping a weblog on their work; and, since I'm not entirely unfamiliar with the concept, I thought I might do that as well. And this blog, being created for me to do Serious Technical Blogging (and then left to gather dust when I stopped being bothered with writing for it), and hosted by Google no less, seems like the best place.

Well. I briefly considered “blogging” by hand-writing an RSS file and giving out that URL — why, some web browsers even format RSS nicely when you browse to it — but then came to my senses. I also considered making a separate blog (as BlogSpot has a nice interface/ontology for that), but didn't really see the point. And, hey, tags; they're what Web 2.0 is all about, except when it's not, or something.