Please note - this is part of my 'How To' build a multi-node Windows 98se based diskless DC farm ("SETI Wall") for running 'SETI Classic'. Below relates to moving from a Windows 98 based 'node' system to a Windows XP based one. It is incomplete since I never managed to get Windows XP to successfully boot across a network into a RAM disk. Since my initial efforts (prior to about 2010) detailed below, MS has done a complete "U turn" on 'net booting' and now actively 'supports' it as part of their Server 'virtualisation' efforts. One day I may return to my 'SETI Wall', however if you want to 'net boot' Windows 7, don't start here :-)
What changes are needed to support BOINC ?
The major hurdle that has to be overcome is switching the Nodes from NetBUEI operation to TCP/IP. This was first achieved using the (single CPU supporting) Windows 98se.
Since Win98se is unable to support HT, let alone dual (or quad etc) core CPU's, to use modern CPUs, we have to move away from Win98. Note, however, that it is (was) possible to obtain win98 drivers for some of the CUDA capable Graphics cards. Since SETI running on a CUDA card is unable to also run on the CPU, in theory, single CPU compute nodes using CUDA processing can continue to use Win98.
Why must we use TCP/IP ?
TCP/IP is forced upon us because of the way SETI BOINC 'tracks' Work Units.
In SETI Classic, wu's were handed out to "anyone" and results could be returned by "anyone". This permitted one PC (in my case, the 'server') to fetch multiple wu's and then 'farm them out' to the individual Compute Nodes, which then did the actual processing. For SETI Classic, Berkley didn't really care which actual machine processed an individual wu, just so long as they got back the results. Since wu's were sent out multiple times (at least 3 times) they actually expected the same result to be reported back from multiple different users (i.e. different PC's). All they really cared about was that the results 'agreed' with one-another. However each individual that reported a completed result got a 'credit' for processing the wu. Unfortunately this allowed some over-competitive glory seeking groups to take advantage of the 'scoring' system when they realised that it was possible to 'part complete' a wu on one machine and then copy the partial result to multiple other machines. Although an individual user (with multiple PC's) could only claim a credit for completing a specific wu once, it was possible for users to act together as a group and 'defraud' the scoring system by claiming credit for completing their friends wu's. Each individual in the group would run a single unit to '99% complete'. They would then exchange copies of their 99% completed unit with each other. Each member of the group would then finish processing the 99% complete units and report the results. In this way, each and every one of them would get a '100%' credit for wu's that they had only processed for the final 1%. Since Groups competed with each other to "process the most wu's", cheating groups (or sub-groups) adopting this 'fraud' were able to out-perform other groups. Whilst Berkley admins made some effort to remove credits (when a wu that had been sent out only 3 times had dozens of results returned) from the worst offenders, they were very slow to act. Since the 'group rankings' were being eagerly followed every day, a 'monthly clean up' did little to 'punish' the offenders. This bought the entire 'credit' scoring system into disrepute and caused many otherwise enthusiastic supporters of SETI to curtail (or even drop completely) their efforts since there was no longer any point in 'competing' with all the cheats.
To prevent 'cheating', BOINC now identifies each individual computer that fetches a wu - and computers are now only permitted to return results from their 'own' wu's. Whilst this prevents cheating (by sharing wu's) it also prevents a central 'server' fetching wu's on behalf of other computers (such as my DC farm Compute Nodes).
How does each Node obtain it's own wu ?
To allow identification, the 'internal' Seti-wall network protocol has to support TCP/IP .. this is necessary so that a Node can connect to the Internet 'direct' and thus fetch it's own wu (or, more exactly, so that the BOINC software running on a compute Node can connect to Berkley using HTTP protocols and 'identify itself'). Whilst this connection can be routed via a Proxy Server, that routing also requires TCP/IP. Most Windows operating systems offer a basic Proxy service known as ICS or 'Internet Connection Sharing'.
Of course TCP/IP support on the Node is only necessary after Windows launches. So the DOS floppy boot approach can still be used and all DOS mode file transfers can be done using NetBEUI, as at present. Thus, in effect, ALL that is required to support SETI BOINC with your existing Node's is to modify your compressed Windows 'images' (the ones loaded by DOS into the RAM drive) to support TCP/IP - and set up your 'server' to act as an Internet Proxy. During actual processing, the part completed wu's will still be kept on the Server (mapped network drive, using Net BIOS), since the Nodes have no actual hard disk and thus no other way of preserving interim results across a power outage. This is not a problem for BOINC, so long as each Node processes only it's own wu.
What does this mean for Node security ?
Once Nodes are allowed to reach the Internet, plainly the Internet can reach the Nodes .. and thus Node security, which was previously irrelevant (there being no way for an Internet based threat to reach a Note connected internally only via NetBEUI), becomes a concern.
Fortunately, we can place a Proxy Server between the Nodes and the Internet and this can run a Firewall and anti-virus software to protect this Node. This avoids the need for each node to run it's own Firewall etc.
To prevent any possibility of bypassing the Proxy and obtaining direct access to the Nodes (from the Internet via the Router) all we need to do is place the Nodes on a totally different sub-net using a separate physical NIC's at the Proxy Server. This means using one NIC to link the Server to the Internet (via the Router) and a second NIC on the Server to link to the Nodes on a LAN subnet. The Router itself will (of course) have it's own built in Firewall, but even if the Router is compromised, since there is no physical connection to the Nodes, any 'hacker' would then have to get through the Server firewall and 'take over' the proxy Server before they could reach the Nodes. Since the only function of a Node will be to contact BOINC/SETI site using web communications, the Proxy NIC can be set up to pass only that traffic and block all other.
What about Server Licence issues ?
If you are using Windows XP Pro as your 'proxy' server, your Microsoft XP Pro licence limits the number of 'simultaneous' Proxy ICS Clients to 10 (I believe that 'simultaneous' actually means 'during the last 15 minutes' .. i.e. not more than 10 different computers may 'connect' during the same 15 minute 'time slot').
However, HTTP is a 'connectionless' protocol, so it is hard to see how any such limit can be imposed in practice
However the 10 user 'share map' limit IS imposed. Since each individual Node only needs to 'map' whilst saving intermediate results (and can immediately 'un-map' again), and only needs to be 'connected' to ICS once a day or so (whilst reporting it's final results & fetching it's next wu), the chances of running into the '10 simultaneous connection limit' will be exceedingly low, even with 30 or 50 Nodes
No doubt the reason why MS imposes such restrictions is to force small businesses users (who don't use a Domain, so don't need 'Active Directory' and thus have no need for an actual MS 'Server' level Operating System) into paying through the nose for a Server OS thus allowing the imposition of 'Client Licences'.
What ICS Proxy do you recommend ?
To avoid running into MS 'simultaneous user' ICS licensing limits, I highly recommend that you avoid using MS ICS software at all.
I use the ICS Proxy alternative s/w instead = which has no 'client' limitations. Among other things, this has a lot more built in security options and will allow you to control 'permitted' v's 'non-permitted' traffic a lot better than the Microsoft offering.
How has changing CPU technology impacted the Compute Nodes ?
The rise of both Hyper-Threading and multi-core CPU's means that Windows 98se, which supports only a single CPU (and no Hyper-Threading, no multi-core), is no longer a 'good' choice for a Node OS.
At the time of writing (mid 2010), the 'best' choice is Windows XP Professional. This o/s supports 2 physical CPU chips with whatever combination of cores and Hyper-Threading they support ... in effect, this means just about any motherboard (available at a reasonable price) and any CPU is supported. A Windows XP Pro licence can be had for as little as £10, if purchased on eBay as part of a broken Hardware package 'for spares' (note, however, that an 'OEM' License must be run on the motherboard that it comes with, or one from the same manufacturer).
Windows 2000 also supports '2 CPU's', however because W2000 us 'unaware' of HyperThreading, when it 'sees' a HT capable CPU it 'counts' this as '2'. Needless to say, W2000 also counts a dual core CPU as '2'.
What impact does Windows XP Licence restrictions have on Node 'build' ?
You still need a separate Licence for each Node, of course, however 'Full Retail' XP has to be 'installed' on the Hardware it will run on. This makes it impossible to install 'once' and 'copy' to the other nodes.
Fortunately, 'OEM' Licences will run on 'any' Hardware from that manufacturer. SO, if you stick to (eg) DELL kit, you can save a lot of time by 'ghosting' your first install onto all the other Nodes (yes, there will be 'SID' issues which will need to be addressed, see later).
The 'best' choice for Nodes is the 'Small Form Factor' (SFF) motherboard. For non-multi-core CPU's and standard DIMM's, the Dell Optiplex GX260 (mainly non-HT) & GX270 (HT) are recommended. For DDR2 DIMM's, the GX280 is suggested, especially as the GX280 has a PCIe-16 socket, so will support modern CUDA capable Graphics cards (of which more later)
Is it worth using multi-core CPU's for SETI Nodes ?
Generally, no. Multi-core capable motherboards are more expensive (although the later Dell 'Dimension' 5150 or E520 (US name) = again DDR2 and PCIe-16, are just about affordable. However, at the time of writing (2011) DDR3, Quad-core CPU's and motherboards are significantly more expensive than CUDA capable graphics cards which will 'crunch' wu's up to 10x faster than CPU's. So I would suggest adding a NVidia 250 card to a GX280 motherboard instead of replacing the GX280 with a multi-core capable motherboard.
Can you run XP from a RAM disk ???
I never managed it, however, in theory, 'Yes'. Microsoft has reversed course from it's previous attempts to prevent Windows running from a RAM disk and now actively supports this (presumably as part of it's 'Virtualisation' efforts). However, as usual with all things MS, there is still a steep 'learning curve' that has to be followed before a Windows XP o/s 'image' can be loaded from a 'server' network share into a Node RAM Disk and actually be made to run :-(
Fortunately, although the RAM requirements of Windows XP (1Gb min) is massively higher than Win98se (64k min), the cost of that RAM has fallen to the point where it can still be afforded. Even so, swapping out the existing Windows 98se Nodes for more modern multi-core Hyper-Threaded alternatives on my 'Wall' and switching to XP has proved to be a slow and costly process. Getting WinXP to boot across the network has not yet been achieved (and may never be).
In order to get a Hyper-Threading (or dual CPU/dual-core) Node 'up and running' with XP Pro, you need to start with a small hard drive partition of less than 4Gb. You must also install with minimal RAM (512Mb) fitted, because whilst XP itself will occupy only about 2Gb of hard drive space, during install it will create a Virtual Memory file equal to your RAM size ! So the more RAM you install with, the larger the 'starting' disk image. Since most 'Ghost' type applications will not 'restore' to a smaller 'disk' than the original 'source' partition this will limit what device you can 'clone' to
**All SDHC and USB memory devices will be less than their stated capacity. So '4Gb' will likley be 3.9Gb (or even less). If you are going to 'clone' to such devices, your hard disk partition needs to be no more than 3.8Gb
I used the 3rd party application "XPLite" to reduce my XP 'install' down to the smallest possible 'footprint' (this prevents many unwanted DLL's etc. installing in the first place, it being almost impossible to determin which DLL's can be safely removed after install). I then reduced the 'image' size further by turning off 'virtual memory' (and deleting the swap file) and removing further 'option' files (such as those for foreign language support) etc. before squeezing the disk partition size down. This has allowed me to sucessfully run XP from a USB stick / CFcard / 'DOM' of 256Mb (of which a lot more later).
Summary - what's required to support Multi-core / Hyper-Threading CPU's
Moving to multi-core / hyper-threading is a much bigger step than moving to TCP/IP. The Windows 98se 'images' have to be replaced by Windows XP Pro 'images', and, as there is 'no way' to cross boot from DOS to XP this means the whole DOS boot-up sequence has to be replaced.
Fortunately, all modern motherboards / NIC's support a 'remote booting' process known as BOOTP .. and since this is actually intended for 'diskless workstations' it is ideal for our needs.
The main drawback is that supporting a BOOTP process is regarded by Microsoft as a real "Server Level" function = rather than the NetBIOS 'drive mapping' (or 'peer to peer' file sharing) process we have been using previously.
So, if we wish to stay legal, we are now forced into buying a 'real' Server Operating System Licence. The 'cheapest' available (via eBay) comes as part of a broken PC package "for spares or repair". At the time of writing (mid 2008) a scrap** server package with a Windows NT 4 Server Licence was approx £15-20 (and one with a Windows 2000 Server approx £25-50 (eBay, June 2008).
**Note that most scrap machines will come with an 'OEM' Licence. This must be run on a motherboard from the original manufacturer (the 'OEM' nature of the licence is usually enforced by the motherboard BIOS, in that Windows will refuse to install or boot unless the 'correct' BIOS 'signature' is found). This means, for example, that there is no point buying a scrap Dell Server (with Dell NT4 OEM licence) unless you have a Dell motherboard to run it on. You should, of course, avoid Compaq kit as a matter of course (their Hardware has never been very 'standard' and the cost of a replacement motherboard is prohibitive) - same applies to HP & especially the 'Proliant' range with the totally non-standard hard disk interfaces.
The next problem that has to be addressed is that the number of Node machines simultaneously connected to the Sever is limited by the number of 'Client Licences'. In each case (NT 4, w2000), the basic Server licence comes with only 5 !
To avoid breaching the 5 client limit, once a Node has BOOTP'd itself, it must 'disconnect' from the Server. It only has to reconnect to the Server when it needs to report results and fetch a new wu. The BOINC software allows us to set the times when SETI is allowed to communicate with the Internet. By setting 'non-overlapping' rimes, you can ensure there will never be more than 5 (simultaneous) clients connecting to the Server.
Various 'flavours' of Open Source software also support BOOTP (on LINUX etc). It is even possible to get a LINUX Server to 'host' Windows XP Professional diskless node 'images'. However the steep learning curve involved in understanding LINUX was something that I have (so far) decided against ..
The next page (to be done) is a description of how to set up a BOOTP Node using a Windows NT/2000 Server, boot up the Node and then have it 'disconnect' so that another Node can make use of the 'Client Licence'.
(ends == BOOTP is still being worked intermittently, with no further (tested) results to report, as of Aug 2011)
2010. Some work has been done using 'low capacity' (less than 1Gb) CF cards in 'IDE hard disk emulation' mode. With modern motherboards, it is also possible to boot from USB sticks.