From bill at math.ucdavis.edu Tue May 1 01:21:36 2001 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:01:13 2009 Subject: NFS over Myrinet for 32 node beowulf Message-ID: <20010501012136.A21566@sphere.math.ucdavis.edu> I'm looking at building a 32 node beowulf, and currently favoring the Myrinet solution for MPI traffic. Our codes don't scale well over a 100 mbit switch but do scale well on other machines with faster interconnects (T3e's and Origin 2000's to name 2). Originally we planned on the 1.2 Gbit flavor of Myrinet but in the mean time the 2.0 Gbit myrinet flavor has become available (at similar prices). Anyone with practical experience with running NFS over Myrinet instead of running a gigabit to fileserver (or 2) and 100 mbit switched ports to the clients? Any idea on price/performance? In comparison to Myrinet 100 mbit per client is almost free, then again we plan to get Myrinet anyways. As a practical data point it's easier to buy pre-fab fileservers for Gigabit then it is for Myrinet, not that I can't build it myself from parts. Of course any Dolphinics data points are appreciated as well. Please follow up to the list by default, although I'll of course read and respond to any personal emails. -- Bill Broadley Mathematics/Institute of Theoretical Dyanmics University of California, Davis From bill at math.ucdavis.edu Tue May 1 01:32:19 2001 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:01:14 2009 Subject: Power/Airconditioning for clusters Message-ID: <20010501013219.B21566@sphere.math.ucdavis.edu> Anyone have any total power usage numbers for a beowulf? I can offer one data point: Dual P3-866 Serverworks LE motherboard 3 7200 RPM scsi disks on the onboard U160 channel in a software raid5. 512 MB ram (2 256 MB ecc dimms) 100 mbit (no fancy interconnect ) Dual/redundant 300 watt power supply 2u case 1.2 amps at idle 1.56 amps with 2 cpu loads 1.61 amps with 2 cpu loads + large tar (lots of file activity). As a related note on a Best Fortress 1425 at idle it lasted 1 hour and 24 minutes. Unfortunately the resolution of these numbers is suspicious since they were taken with a $50 radio shack DVM and ignores various consideration (i.e. power factor) and I forgot to measure the voltage (which was in the 108-128 ish range from what I can tell). I'm looking for actual/typical loads (useful for certain kinds of planning), not peak loads or anything related to whats printed on the powersupply (which I can look up myself). Anyone have numbers for the newer athlon nodes with/without DDR? How about a myrinet switch (of whatever number of slots/interfaces)? -- Bill Broadley Mathematics/Institute of Theoretical Dyanmics University of California, Davis From bill at math.ucdavis.edu Tue May 1 04:51:05 2001 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:01:14 2009 Subject: fwd: Re: Stream variation Message-ID: <20010501045105.D21566@sphere.math.ucdavis.edu> I have a pthreads benchmark very similar to stream, it currently only implements one of the 4 benchmarks, but does so on a variety of arrays so you can see the effects of the various levels of cache. I use pthreads to insure that there is a very close sync between the multiple-threads, I see minimal variations from run to run. Just email me for source and I'll send it out, I don't want post it openly yet till I clean it up a bit, and add in the other 3 standard stream benchmarks (copy, sum, scale, and saxpy or similar). ----- Forwarded message from Greg Lindahl ----- On Thu, Apr 26, 2001 at 09:37:15AM -0700, Patrick Geoffray wrote: > Running 2 instances of Stream at the same time on several Netfinity x330 > Dual PIII boxes, I see a variation of performance from one node to another > one, from 185 MB/s to 260 MB/s per run for TRIAD for example. Running 2 instances of stream? Well, that depends on the details. You see, stream takes the best time and uses that to give the answer. So From JParker at coinstar.com Tue May 1 08:24:57 2001 From: JParker at coinstar.com (JParker@coinstar.com) Date: Wed Nov 25 01:01:14 2009 Subject: PVM Problems Message-ID: G'Day ! > > > The symptoms are that when I run the pvm shell, it starts correctly on the > server, as verified by running administrative commands, but when I try to > add a node, it returns a "terminated" and puts me back at my bash shell > prompt. I can then rsh or ssh over to the node and verify the pvmd is > running on the node via the ps command. When I try to use the pvm shell > on that node, it displays a notice that the pvmd is already running and > hangs till I kill the process. Any clues or suggestions ? Well problem solved ... writing this so it is in the list archives. It turns out the the machine name was listed twice in /etc/hosts, first as the loopack and second under it's eth0 address. Someone smarter than me well need to explain this, but rsh and ssh picked the eth0 address, so I was able to connect to the remote machine, but the pvmd was using the loopback address. Thank you for all your suggestions. cheers, Jim Parker Sailboat racing is not a matter of life and death .... It is far more important than that !!! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010501/9fb7c64c/attachment.html From patchkov at ucalgary.ca Tue May 1 08:30:22 2001 From: patchkov at ucalgary.ca (Serguei Patchkovskii) Date: Wed Nov 25 01:01:14 2009 Subject: Power/Airconditioning for clusters In-Reply-To: <20010501013219.B21566@sphere.math.ucdavis.edu> Message-ID: On Tue, 1 May 2001, Bill Broadley wrote: > Anyone have any total power usage numbers for a beowulf? [...] > I'm looking for actual/typical loads (useful for certain kinds > of planning), not peak loads or anything related to whats printed > on the powersupply (which I can look up myself). For whatever it's worth, 106 alphas (EV56-500 in DPW/CPW 500au) with about 256Mbytes and a 4Gb hard drive (7200RPM) draw about 12 kilowatts at our typical load (100% CPU, little to none disk activity). /Serge.P --- Home page: http://www.cobalt.chem.ucalgary.ca/ps/ From rgb at phy.duke.edu Tue May 1 08:50:59 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:14 2009 Subject: PVM Problems In-Reply-To: Message-ID: On Tue, 1 May 2001 JParker@coinstar.com wrote: > G'Day ! > > > > > > The symptoms are that when I run the pvm shell, it starts correctly on > the > > server, as verified by running administrative commands, but when I try > to > > add a node, it returns a "terminated" and puts me back at my bash shell > > prompt. I can then rsh or ssh over to the node and verify the pvmd is > > running on the node via the ps command. When I try to use the pvm shell > > > on that node, it displays a notice that the pvmd is already running and > > hangs till I kill the process. Any clues or suggestions ? > > Well problem solved ... writing this so it is in the list archives. > > It turns out the the machine name was listed twice in /etc/hosts, first as > the loopack and second under it's eth0 address. Someone smarter than me > well need to explain this, but rsh and ssh picked the eth0 address, so I > was able to connect to the remote machine, but the pvmd was using the > loopback address. I cannot explain it, but would that be: *** If you see "Master Host IP Address is Loopback!" or get the PvmIPLoopback error when adding hosts, this means that the networking on your Master PVM host (the one you initially started PVM on) is not set up to support multiple, remote hosts in your virtual machine. By default, especially on many Linux systems, the host name alias is appended to the 127.0.0.1 loopback / localhost IP address in the /etc/hosts file. This is very useful for running in stand-alone mode without networking, however this alias must be removed for interaction with remote hosts in PVM. See the Linux Networking HOWTO for information on automatically handling this scenario (via ifup-local and ifdown-local scripts). from /usr/share/pvm3/Readme under Troubleshooting? ;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From math at velocet.ca Tue May 1 17:31:42 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs Message-ID: <20010501203141.D60454@velocet.ca> I've run a few select G98 jobs (as provided by my brother, who's the researcher) through a number of different configurations of CPUs and mainboards, ranging from a duron 750 on a regular old board with PC133 ram, to the same duron on a DDR board w/256Mb DDR RAM (talk about a waste! :) right up to a 1.3Ghz Tbird I able to borrow and put on the DDR board. I havent had a chance to run any of these on a P4 with SSE3 (is that the term?) optimized and recompiled gaussian and ATLAS. (If anyone is interested in contributing such stats I'm very interested.) My stats here were generated with ATLAS compiled for each change in config. The freeBSD stats were with ATLAS and G98 compiled on Linux and run on FreeBSD in Linux emulation mode (I had alot of trouble getting ATLAS *AND* G98 to compile in FreeBSD, so I just gave up). There are some caveats and other environmental factors discussed on the page as well. I am not trying to start a jihad here against high speed Thunderbird and Dual Thunderbird proponents; this is just what i've found for *MY* jobs (which themselves are a very small subset of what most people would use G98 for) in *my* particular environment. They probably have very little to do with most people's cluster designs. http://trooper.velocet.net/~math/giocomms/benchmarks/ /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From rauch at inf.ethz.ch Wed May 2 01:32:04 2001 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:01:14 2009 Subject: fwd: Re: Stream variation In-Reply-To: <20010501045105.D21566@sphere.math.ucdavis.edu> Message-ID: On Tue, 1 May 2001, Bill Broadley wrote: > I have a pthreads benchmark very similar to stream, it currently > only implements one of the 4 benchmarks, but does so on a variety of > arrays so you can see the effects of the various levels of cache. If you are interested in memory bandwidth with different access patterns and working set sizes, you should check out my colleagues work: ECT memperf - Extended Copy Transfer Characterization http://www.cs.inf.ethz.ch/CoPs/ECT/ On the web-page you will find explanations, papers, source code and example benchmarks. Regards, Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From goenzoy at gmx.net Wed May 2 02:22:22 2001 From: goenzoy at gmx.net (Gottfried Zojer) Date: Wed Nov 25 01:01:14 2009 Subject: Astrobiology/DNA-Computing Message-ID: <18888.988795342@www19.gmx.net> Maybe a little bit OT but did something know any reference about the use of linux-clusters for DNA-Computing particularly for Astrobiology Thanks in advance Gottfried -- GMX - Die Kommunikationsplattform im Internet. http://www.gmx.net From bogdan.costescu at iwr.uni-heidelberg.de Wed May 2 03:55:22 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:01:14 2009 Subject: network block device as replacement for NFS In-Reply-To: <20010430171208.H11296@getafix.EraGen.com> Message-ID: On Mon, 30 Apr 2001, Chris Black wrote: > We are trying to find the best way to share a small directory tree of > files over the network from our head node to the compute nodes. There should > be under a meg of data in this tree and most of it will be scripts and > makefiles and such. If these are things that don't change very often (like source code while you're coding), NFS should be good enough. And for < 1Mb, they probably get cached, so you don't stress the network too much. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From bogdan.costescu at iwr.uni-heidelberg.de Wed May 2 03:59:07 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:01:14 2009 Subject: NFS over Myrinet for 32 node beowulf In-Reply-To: <20010501012136.A21566@sphere.math.ucdavis.edu> Message-ID: On Tue, 1 May 2001, Bill Broadley wrote: > In comparison to Myrinet 100 mbit per client is almost free, then again > we plan to get Myrinet anyways. But that means sharing of Myrinet between MPI/PVM and NFS. If one of them is of low volume, things might work out well, otherwise it might be better getting an extra FE NIC. > As a practical data point it's easier to buy pre-fab fileservers for > Gigabit then it is for Myrinet, not that I can't build it myself from > parts. AFAIK, the new Myrinet switches allow connecting some other types of media to them (like Gigabit Ethernet), given proper modules inserted. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From jlkaiser at fnal.gov Wed May 2 05:01:25 2001 From: jlkaiser at fnal.gov (Joe Kaiser - hegel) Date: Wed Nov 25 01:01:14 2009 Subject: Article submission request Message-ID: <3AEFF715.9B3E1A91@fnal.gov> Hi, My name is Joe Kaiser and I am the July special issue editor for ;login: The Magazine of Usenix and SAGE. ;login: has a circulation of about 15,000 people, mostly UNIX systems administrators. The theme for the July issu is "clusters." I was wondering if there is anyone on this list who is running a home "cluster," one that is either a true beowulf cluster or even just a cluster of workstations, and who would like to write a brief, one to three page article about: The hardware configuration. How you set it up and any problems you encountered. What you are using it for. Why you did it in the first place. What future uses you will have for it. My goal is to get 5-7 of these and string them together to show how small clusters can be and still do some useful work. The deadline is May 22 to me and it will go in the July issue of ;login. If you would like to contribute or would like to discuss this further, please send me an email to jkaiser@speakeasy.org Thanks, Joe Kaiser - Systems Administrator Fermi Lab CD/OSS-SCS 630-840-6444 jlkaiser@fnal.gov From hahn at coffee.psychology.mcmaster.ca Wed May 2 09:53:17 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: <20010501203141.D60454@velocet.ca> Message-ID: > the same duron on a DDR board w/256Mb DDR RAM (talk about a waste! :) well, a duron has the same dram performance as a tbird at the same FSB. so in that sense, it's actually a better match for a lot of computational codes that sneer at cache. otoh, tbirds are dirt cheap, not much more than durons. > I havent had a chance to run any of these on a P4 with SSE3 (is that the sse2. gcc 3.0 snapshots apparently can generate sse2 code... > in contributing such stats I'm very interested.) My stats here were generated > with ATLAS compiled for each change in config. does ATLAS include prefetching? it's fairly astonishing how big a difference prefetching (and movntq) can make on duron/athlon code. for an extreme case (Arjan van de Ven's optimized page-copy and -zero): 600.044 MHz clear_page 'normal_clear_page' took 8429 cycles (278.0 MB/s) clear_page 'slow_zero_page' took 8451 cycles (277.3 MB/s) clear_page 'fast_clear_page' took 7341 cycles (319.3 MB/s) clear_page 'faster_clear_page' took 2576 cycles (909.7 MB/s) clear_page 'even_faster_clear' took 2573 cycles (910.8 MB/s) copy_page 'normal_copy_page' took 8237 cycles (284.5 MB/s) copy_page 'slow_copy_page' took 8238 cycles (284.5 MB/s) copy_page 'fast_copy_page' took 5798 cycles (404.2 MB/s) copy_page 'faster_copy' took 3046 cycles (769.3 MB/s) copy_page 'even_faster' took 3077 cycles (761.6 MB/s) that's on a duron/600 (100 ddr fsb), kt133, cas2 PC133. I haven't seen results from a ddr dram system yet. > There are some caveats and other environmental factors discussed on > the page as well. error bars would be nice. > I am not trying to start a jihad here against high speed Thunderbird > and Dual Thunderbird proponents; this is just what i've found for *MY* maybe I'm being dense, but how would these results be interpreted jihadically? in general, they show two things: high-end machines bear a price-premium that decreases their speed/cost merit, and that freebsd's page coloring sometimes has a measurable benefit. I'm dubious about further interpretation, though. for instance, you seem to show a significant benefit to tbird's larger cache (384 vs 192K), but surely you chose this workload to be bandwidth intensive, didn't you? if not, then the DDR comparison is rather specious... thanks for posting the numbers! regards, mark hahn. From math at velocet.ca Wed May 2 12:35:34 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: ; from hahn@coffee.psychology.mcmaster.ca on Wed, May 02, 2001 at 12:53:17PM -0400 References: <20010501203141.D60454@velocet.ca> Message-ID: <20010502153534.B27034@velocet.ca> On Wed, May 02, 2001 at 12:53:17PM -0400, Mark Hahn's all... > > the same duron on a DDR board w/256Mb DDR RAM (talk about a waste! :) > > well, a duron has the same dram performance as a tbird at the same FSB. > so in that sense, it's actually a better match for a lot of computational > codes that sneer at cache. > > otoh, tbirds are dirt cheap, not much more than durons. The thing is for the cost levels Im working with, the price diff between a Duron and a Thunderbird does matter. As does the cost of a videocard being needed to get a system booted or not. For gaussian98, for my jobs (specifically) the cache seems to really matter quite alot. Thats the only thing that I can guess that accounts for the very high speed of my K7-700, which performs, for most jobs, very close to a Tbird 900. I really wish I could find some old K7s that people are tossing out for cheap. Problem is the board ends up being huge - wont fit into a 1U case for sure obviously. > > I havent had a chance to run any of these on a P4 with SSE3 (is that the > > sse2. gcc 3.0 snapshots apparently can generate sse2 code... Wonder if g77 can generate sse2 code. > does ATLAS include prefetching? it's fairly astonishing how big a > difference prefetching (and movntq) can make on duron/athlon code. > for an extreme case (Arjan van de Ven's optimized page-copy and -zero): Im not too up on the internals of atlas. Others on the list probably are. > > There are some caveats and other environmental factors discussed on > > the page as well. > > error bars would be nice. Heh, my stats are so unprofessionally done, you flatter me by even asking. I dont know what the factors of error are at all - does /usr/bin/time have problems? Is my clock accurate? Like I said, I just ran the jobs 5 times each minimum (in one case 90 times) and took the median performance. > > I am not trying to start a jihad here against high speed Thunderbird > > and Dual Thunderbird proponents; this is just what i've found for *MY* > > maybe I'm being dense, but how would these results be interpreted > jihadically? in general, they show two things: high-end machines Because I've mentioned that "perhaps buying an assload of Durons instead of dual Athlon DDR boards would give more bang for the buck overall" caused some consternation. Most were regarding increased switch costs for having so many more nodes, or possibly the cost of havving a boot hardrive for each box (which would increase the costs quite heavily). My advantages that probably apply to few others: - I dont need any high speed paralellism for this cluster. all jobs run singly by themselves on one node. - quality isnt even a massive factor - if a board crashes, the job is rescheduled. Obviously there's an acceptable threshhold, and we're way beyond that - two boards running for 12 days didnt crash on either OS. I am not putting together a mini prototype to start before the shipment of the rest of the parts comes and I finalize the cabinet design: Im gonna run 10 boards within a few inches of eachother in a stack (with proper cooling) and see how they fare - Im mainly curious about RF interference and cooling problems. I am pretty sure they'll be fine (a friend ran 4 of these boards even closer than I plan for 4 weeks with no prbolems). - we are running diskless nodes over NFS and we dont read or write to it often. (256kbps/node average during calculations) So I dont need a big switch, I dont need super reliable BrandName equipment with a service contract and I dont even need high performance network cards. Im actually somewhat interested in the power usage stats for different Athlon systems. Having fewer but faster nodes may well not save all that much power. Someone have wattage stats handy? > bear a price-premium that decreases their speed/cost merit, and > that freebsd's page coloring sometimes has a measurable benefit. Wonder how many people are using FreeBSD on their clusters instead of Linux... > I'm dubious about further interpretation, though. for instance, > you seem to show a significant benefit to tbird's larger cache > (384 vs 192K), but surely you chose this workload to be bandwidth > intensive, didn't you? if not, then the DDR comparison is rather > specious... I didnt. I chose it to be related to what we need the cluster for. Im trying to justify my design because it may come into question. Which is partly why I really need to check out a P4s stats, but Im pretty sure the price/performance is going to be lower than we can afford. The only question Im really trying to head off is why we didnt use the fastest Tbirds available and DDR ram. I am not going to read the G98 code, its horrid spaghetti ;) and my fortran isnt that great. And there's A LOT of code. Its not worth my time. Its much faster to just run the jobs on different boards and see what the results are, than to predict them by reading the code. Actually I have about 3x as many numbers for non-Atlas jobs, but they're kind of useless. However, they do indicate the speedup provided by the Thunderbirds as I managed to get both a Tbird and Duron 750, 800 and 850. I can dig up those stats if people care, but then again these are stats for *my* particular jobs running on non-optimal (non ATLAS) gaussian. > thanks for posting the numbers! No problem, sorry it wasnt more professionally done ;) I also apologize for not running standard G98 tests (Im not aware what would constitute such, or if there's a preset package of benchmarks available). /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From cbhargava at asacomputers.com Wed May 2 13:11:11 2001 From: cbhargava at asacomputers.com (Chetan Bhargava) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core Message-ID: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> Hi, I have successfully installed Scyld Beowulf on a master and 12 nodes. I have added all the nodes in the cluster and they all show 'UP' status in beosetup. Whenever I run beostatus it dumps core! Also when a 'KDE with master' setup is installed from the CDROM it can't run beosetup from KDE panel. GNOME-Master installs ok and there is no problem executing beosetup from the panel. Any pointers will be appreciated. Thanks. From agrajag at linuxpower.org Wed May 2 13:09:46 2001 From: agrajag at linuxpower.org (Jag) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core In-Reply-To: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com>; from cbhargava@asacomputers.com on Wed, May 02, 2001 at 01:11:11PM -0700 References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> Message-ID: <20010502130946.D20583@kotako.analogself.com> On Wed, 02 May 2001, Chetan Bhargava wrote: > Hi, > > I have successfully installed Scyld Beowulf on a master and 12 nodes. I have > added all the nodes in the cluster and they all show 'UP' status in > beosetup. Whenever I run beostatus it dumps core! Please run: gdb beostatus /path/to/core where /path/to/core is the path to the core file dumped by beostatus. Once you get to the prompt that looks like "(gdb)", type 'bt' and hit enter. You can then type 'quit' to exit gdb. The results from the command will help us determine where the segfault is occuring. Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010502/2876e92c/attachment.bin From fmuldoo at alpha2.eng.lsu.edu Thu May 3 13:53:45 2001 From: fmuldoo at alpha2.eng.lsu.edu (Frank Muldoon) Date: Wed Nov 25 01:01:14 2009 Subject: F90 Message-ID: <3AF1C559.2861F80C@me.lsu.edu> I have a Fortran 90 and MPI code. I need to have my program crash if I ever use a real, integer or an array of same without first giving a value to it. My code uses many dynamically allocated arrays. Does anyone know of a way to set the value of allocatable arrays to say a NaN and then have some way of trapping a NaN if it is ever used. Explicitly setting the array to NaN is not a good solution from an ease of code readibility standpoint. Besides one can forget to do this which defeats the purpose. I have found a way to do the above with statically allocated arrays on a SP2. Thanks, Frank -- Frank Muldoon Computational Fluid Dynamics Research Group Louisiana State University Baton Rouge, LA 70803 225-344-7676 (h) 225-578-5217 (w) From Dean.Carpenter at pharma.com Wed May 2 14:04:34 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48BA@a1mbx01.pharma.com> Hi All - Just got some eval equipment in today to play with, with the Gigabyte GA-6vxdr7 motherboards in them. The NICs show up as EtherExpressPro 10/100 nics, pretty normal. These are dual P3 boards with dual 933 cpus and 512meg memory. The stage 1 boot goes fine, it gets an IP and grabs the stage 2 kernel fine. It's during the boot and init of the dual cpus that it barfs ... It leaves this on screen : : : CPU map: 3 Booting processor 1 eip 2000 Setting warm reset code and vector 1. 2. 3. Asserting INIT. Deasserting INIT. Sending STARTUP #1. After apic_write. Before start apic_write. Startup point 1. And there it sits. There's some more above the CPU map: 3 there, I can provide that as well. I have to run right now, but tomorrow I'll try the non-SMP kernel, see if it will actually boot. Otherwise, any ideas ? -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) From lindahl at conservativecomputer.com Wed May 2 14:25:48 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:14 2009 Subject: F90 In-Reply-To: <3AF1C559.2861F80C@me.lsu.edu>; from fmuldoo@alpha2.eng.lsu.edu on Thu, May 03, 2001 at 03:53:45PM -0500 References: <3AF1C559.2861F80C@me.lsu.edu> Message-ID: <20010502172548.A1976@wumpus> On Thu, May 03, 2001 at 03:53:45PM -0500, Frank Muldoon wrote: > I have a Fortran 90 and MPI code. I need to have my program crash if I > ever use a real, integer or an array of same without first giving a > value to it. My code uses many dynamically allocated arrays. Does You didn't say what compiler or architecture. I have code for the Compaq Alpha F90 compiler that intercepts ALLOCATE calls and can do things like initialize the data to NaNs. Similar things could probably be done to, say, the PGI F90 compiler for x86, but you'll have to figure out the interface of the library routine. You would then need to know how to set the compiler up to crash when it touches a NaN, but that's often not that hard. -- g From rgb at phy.duke.edu Wed May 2 15:16:51 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: <20010502153534.B27034@velocet.ca> Message-ID: On Wed, 2 May 2001, Velocet wrote: > > does ATLAS include prefetching? it's fairly astonishing how big a > > difference prefetching (and movntq) can make on duron/athlon code. > > for an extreme case (Arjan van de Ven's optimized page-copy and -zero): > > Im not too up on the internals of atlas. Others on the list probably > are. IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into at least parts of ATLAS for use with athlons back when they were first released. It apparently made a quite significant difference in performance. I'm sure Google would turn up the discussion and the patches (if he or whoever isn't listening). > Because I've mentioned that "perhaps buying an assload of Durons instead > of dual Athlon DDR boards would give more bang for the buck overall" > caused some consternation. Most were regarding increased switch costs > for having so many more nodes, or possibly the cost of havving a boot > hardrive for each box (which would increase the costs quite heavily). No need for a jihad on this -- I don't thing anybody would really expect a dual to truly deliver the same performance (per CPU-memory channel) as a single, even with nominally doubled memory speed. So you're probably right. There are, I'm sure, plenty of people in the marginal area where it makes sense to go dual, just as I'm sure there are plenty of people in the marginal area where it makes sense to go single. It may not be very easy to tell which >>one<< is truly cost optimal, though, without benchmarking your particular code and doing a careful cost comparison including hidden costs (e.g. the fact that electricity costs and space costs may be 60% higher for lots of singles, each single requires a case, its own memory and copy of the OS and network card, and so forth). In many cases a dual is only 0.7-0.8 the cost of two singles, although the high cost of DDR makes that unlikely in this case. Still, it is under $1/MB, which isn't all THAT bad -- PC133 cost that much only months ago. Months from now DDR may cost little more than SDRAM in equivalent amounts. The limited benchmarking I've done of DDR-based systems suggests that DDR can very easily be worth it for folks doing lots of streaming vector operations that are memory bandwidth bound. No surprise there. For folks that run code with either a lousy rotten random irregular stride or memory access pattern, OR for folks that run e.g. ATLAS-style optimized code (that rearrange the problems so that the algorithm runs out of cache when possible and then fills the cache in single bursts -- there are some nice white papers on the ATLAS site at netlib.org if you want to see how it works) there are smaller advantages. If cache works, it hides memory access speeds. I actually don't know whether a random access pattern is better or worse on DDR (yet) -- it has higher bandwidth but I would guess the latency is no better or even a bit worse. I will be getting a DDR-equipped 1.33 GHz Tbird in about ten days, at which time I'll crunch through a bunch of benchmarks and post the results. I got it because I have an application that does SOME stuff that is likely to be CPU/memory bound, while other tasks are easily parallelized and less memory bandwidth sensitive (so my nodes are "regular" PC133). I think this will be a decent architecture for this particular task -- showing that even mixed memory architectures can be cost optimal. > My advantages that probably apply to few others: > > - I dont need any high speed paralellism for this cluster. all jobs run > singly by themselves on one node. > > - quality isnt even a massive factor - if a board crashes, the job is > rescheduled. Obviously there's an acceptable threshhold, and we're > way beyond that - two boards running for 12 days didnt crash on either > OS. I am not putting together a mini prototype to start before the shipment > of the rest of the parts comes and I finalize the cabinet design: Im gonna > run 10 boards within a few inches of eachother in a stack (with proper > cooling) and see how they fare - Im mainly curious about RF interference > and cooling problems. I am pretty sure they'll be fine (a friend ran > 4 of these boards even closer than I plan for 4 weeks with no prbolems). > > - we are running diskless nodes over NFS and we dont read or write to it > often. (256kbps/node average during calculations) > > So I dont need a big switch, I dont need super reliable BrandName equipment > with a service contract and I dont even need high performance network > cards. Sounds like you are dead right on all of this. Embarrassingly parallel jobs, few to no communications, purely CPU bound -- Durons (or whatever currently delivers the most raw flops for the least money) are likely to be perfect for you. And for many others, actually. For a long time I like Celerons (or even dual Celerons) for the same reasons, although at this point I've converted to AMD-based systems as their cost-benefit has overwhelmed Intel's whole product line for my code. > Im actually somewhat interested in the power usage stats for different > Athlon systems. Having fewer but faster nodes may well not save all > that much power. Someone have wattage stats handy? Not yet, but maybe soon. The fast Tbirds do require a big "certified" power supply, but I'm guessing they draw a lot less than they "require" except maybe in bursts. I'm betting they draw around 100-150W running, a number that recently got some support on the list. > > > bear a price-premium that decreases their speed/cost merit, and > > that freebsd's page coloring sometimes has a measurable benefit. > > Wonder how many people are using FreeBSD on their clusters instead of > Linux... Dunno. Based on what I've seen, heard, and talked to people about it is a small fraction of the total, but not a small number. The core effort has been linux-centric from the beginning, although of course a lot of cluster stuff runs fine under BSD (or any *nix). > > I'm dubious about further interpretation, though. for instance, > > you seem to show a significant benefit to tbird's larger cache > > (384 vs 192K), but surely you chose this workload to be bandwidth > > intensive, didn't you? if not, then the DDR comparison is rather > > specious... > > I didnt. I chose it to be related to what we need the cluster for. Im > trying to justify my design because it may come into question. Which is > partly why I really need to check out a P4s stats, but Im pretty sure > the price/performance is going to be lower than we can afford. The only > question Im really trying to head off is why we didnt use the fastest > Tbirds available and DDR ram. > > I am not going to read the G98 code, its horrid spaghetti ;) and my > fortran isnt that great. And there's A LOT of code. Its not worth my > time. Its much faster to just run the jobs on different boards and > see what the results are, than to predict them by reading the code. > > Actually I have about 3x as many numbers for non-Atlas jobs, but they're > kind of useless. However, they do indicate the speedup provided by > the Thunderbirds as I managed to get both a Tbird and Duron 750, 800 and > 850. I can dig up those stats if people care, but then again these are > stats for *my* particular jobs running on non-optimal (non ATLAS) > gaussian. > > > thanks for posting the numbers! > > No problem, sorry it wasnt more professionally done ;) I also apologize > for not running standard G98 tests (Im not aware what would constitute > such, or if there's a preset package of benchmarks available). Not a preset package, but I'm trying to start a collection of sorts: http://www.phy.duke.edu/brahma/dual_athlon/tests.html My primary recommendation would be to use lmbench. It gives you a very nice set of all sorts of microbenchmarks to profile overall system performance. Its packaging could be improved. Stream-like benchmarks (including cpu-rate) will give you some idea of float performance relative to memory access speed. Stream-2 should let you make a profile that relates float speed to the size of the memory segment being worked through (although I haven't tried it yet); cpu-rate definitely does. A mixed float/int benchmark that is CPU bound and has no particularly nice stride or memory access pattern can help you assess overall performance when the code isn't so nice -- this is what I use my MC code for, although it isn't really packaged for production. Hope this all helps or is interesting. I'm very interested in Athlon performance profiles as they seem to be the current most-CPU-for-the-least-money winners, and when one buys in bulk (as beowulf humans tend to do) this sort of optimization really matters. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From cbhargava at asacomputers.com Wed May 2 15:20:43 2001 From: cbhargava at asacomputers.com (Chetan Bhargava) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> Message-ID: <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> here is the output you needed: [root@beowulf /root]# gdb beostatus /root/core GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Core was generated by `beostatus'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libncurses.so.4...done. Reading symbols from /usr/lib/libgnomeui.so.32...done. Reading symbols from /usr/lib/libart_lgpl.so.2...done. Reading symbols from /usr/lib/libgdk_imlib.so.1...done. Reading symbols from /usr/X11R6/lib/libSM.so.6...done. Reading symbols from /usr/X11R6/lib/libICE.so.6...done. Reading symbols from /usr/lib/libgtk-1.2.so.0...done. Reading symbols from /usr/lib/libgdk-1.2.so.0...done. Reading symbols from /usr/lib/libgmodule-1.2.so.0...done. Reading symbols from /usr/X11R6/lib/libXext.so.6...done. Reading symbols from /usr/X11R6/lib/libX11.so.6...done. Reading symbols from /usr/lib/libgnome.so.32...done. ---Type to continue, or q to quit--- Reading symbols from /usr/lib/libgnomesupport.so.0...done. Reading symbols from /usr/lib/libesd.so.0...done. Reading symbols from /usr/lib/libaudiofile.so.0...done. Reading symbols from /lib/libm.so.6...done. Reading symbols from /lib/libdb.so.2...done. Reading symbols from /usr/lib/libglib-1.2.so.0...done. Reading symbols from /lib/libdl.so.2...done. Reading symbols from /usr/lib/libbproc.so.1...done. Reading symbols from /lib/libc.so.6...done. Reading symbols from /usr/lib/libz.so.1...done. Reading symbols from /usr/X11R6/lib/libXi.so.6...done. Reading symbols from /lib/ld-linux.so.2...done. Reading symbols from /lib/libnss_files.so.2...done. #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 3049 malloc.c: No such file or directory. (gdb) bt #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 #1 0x40466f9a in __libc_free (mem=0x8052ab8) at malloc.c:3023 #2 0x4039ae05 in poptFreeContext () from /usr/lib/libgnomesupport.so.0 #3 0x804b3d6 in initialize_output_gtk () #4 0x804b1c3 in main () #5 0x404259cb in __libc_start_main (main=0x804b1a8
, argc=1, argv=0xbffffb94, init=0x8049b80 <_init>, fini=0x804c54c <_fini>, rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbffffb8c) at ../sysdeps/generic/libc-start.c:92 (gdb) quit [root@beowulf /root]# Thanks for your help. ----- Original Message ----- From: "Jag" To: "Chetan Bhargava" Cc: Sent: Wednesday, May 02, 2001 1:09 PM Subject: Re: Scyld: Beostatus dumps core From agrajag at linuxpower.org Wed May 2 17:08:45 2001 From: agrajag at linuxpower.org (Jag) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core In-Reply-To: <011001c0d356$20ef3650$1f00a8c0@asacomputers.com>; from cbhargava@asacomputers.com on Wed, May 02, 2001 at 03:20:43PM -0700 References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> Message-ID: <20010502170845.E20583@kotako.analogself.com> On Wed, 02 May 2001, Chetan Bhargava wrote: > #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 > 3049 malloc.c: No such file or directory. > (gdb) bt > #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 > #1 0x40466f9a in __libc_free (mem=0x8052ab8) at malloc.c:3023 > #2 0x4039ae05 in poptFreeContext () from /usr/lib/libgnomesupport.so.0 > #3 0x804b3d6 in initialize_output_gtk () > #4 0x804b1c3 in main () > #5 0x404259cb in __libc_start_main (main=0x804b1a8
, argc=1, > argv=0xbffffb94, init=0x8049b80 <_init>, fini=0x804c54c <_fini>, > rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbffffb8c) > at ../sysdeps/generic/libc-start.c:92 Looks like something weird is going on with the popt stuff in GNOME. What version of GNOME do you have installed? This command will tell you: rpm -qf /usr/lib/libgnomesupport.so.0 Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010502/e070bb36/attachment.bin From dvos12 at calvin.edu Wed May 2 19:05:08 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core In-Reply-To: <20010502170845.E20583@kotako.analogself.com> Message-ID: If I remember correctly, he was running under KDE when this happened. (I deleted the email, and the the archives are not very recent). Is the program a gnome-only program? David On Wed, 2 May 2001, Jag wrote: > On Wed, 02 May 2001, Chetan Bhargava wrote: > > > #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 > > 3049 malloc.c: No such file or directory. > > (gdb) bt > > #0 chunk_free (ar_ptr=0x404fbe40, p=0x8052ab0) at malloc.c:3049 > > #1 0x40466f9a in __libc_free (mem=0x8052ab8) at malloc.c:3023 > > #2 0x4039ae05 in poptFreeContext () from /usr/lib/libgnomesupport.so.0 > > #3 0x804b3d6 in initialize_output_gtk () > > #4 0x804b1c3 in main () > > #5 0x404259cb in __libc_start_main (main=0x804b1a8
, argc=1, > > argv=0xbffffb94, init=0x8049b80 <_init>, fini=0x804c54c <_fini>, > > rtld_fini=0x4000ae60 <_dl_fini>, stack_end=0xbffffb8c) > > at ../sysdeps/generic/libc-start.c:92 > > Looks like something weird is going on with the popt stuff in GNOME. > What version of GNOME do you have installed? This command will tell you: > > rpm -qf /usr/lib/libgnomesupport.so.0 > > > Jag > From math at velocet.ca Wed May 2 20:36:47 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: ; from rgb@phy.duke.edu on Wed, May 02, 2001 at 06:16:51PM -0400 References: <20010502153534.B27034@velocet.ca> Message-ID: <20010502233647.O27034@velocet.ca> On Wed, May 02, 2001 at 06:16:51PM -0400, Robert G. Brown's all... > IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into > at least parts of ATLAS for use with athlons back when they were first > released. It apparently made a quite significant difference in > performance. Oh, btw, I did include Athlon3DNow!2 stuff in my ATLAS just for a laugh. I then compared my results to the original non ATLAS g98 and everything matched up for all 4 jobs I chose. I dont think its actually used for G98 since its single precision stuff only IIRC. Also, note, we may start using MPQC in a bit - Graydon Hoare (ex Berlin project, now at Redhat) is hacking on it to speed it up a bunch if possible. I havent had a chance to benchmark that (it may demand TB1333's on DDR boards and Ill be sunk! :) In fact my stats indicated that a Duron 750 was slightly better for the money in most cases, but the 900 and 700 are so close, I went with the faster chip in the hopes that other situations that I cant benchmark now but will encounter later will favour the faster CPU (also the supply of D700s is waning). > in the marginal area where it makes sense to go single. It may not be > very easy to tell which >>one<< is truly cost optimal, though, without > benchmarking your particular code and doing a careful cost comparison > including hidden costs (e.g. the fact that electricity costs and space > costs may be 60% higher for lots of singles, each single requires a > case, its own memory and copy of the OS and network card, and so forth). > In many cases a dual is only 0.7-0.8 the cost of two singles, although > the high cost of DDR makes that unlikely in this case. Still, it is > under $1/MB, which isn't all THAT bad -- PC133 cost that much only > months ago. Months from now DDR may cost little more than SDRAM in > equivalent amounts. I started forming my anti SMP bias when I ran a bunch of old G94 jobs on a dual Celeron board. It *SUCKED* :) 100 Mhz ram was probably the bottleneck, and the resource locking in Linux 2.2.(early) was not as nice as it is now in 2.4 (so I hear). I have stats for it actually: two jobs """""""" (Mhz ratio = 1.00 for 550Mhz, times for 2 jobs to be run) # C P speed MHZ efficiency U CPU MHz/bus RAM total ratio ratio (mhz/speed ratios) ------------------------------------------------------------------------------ 2 C366A 550/100 128M 4293.6 1.00 2.00 0.50 1 P2 400 400/100 128M 6105.3 0.70 0.73 0.96 1 P3 450 450/100 256M 5675.0 0.76 0.83 0.92 1 C300A 450/100 64M 7253.5 0.59 0.83 0.71 For g94 I saw almost the exact same performance out of a C450 as a P3-450 (I guess the CPU's SSE extensions were not used by it). So seeing that my heavily overclocked 550MHz Celeron's were only some 25% faster overall for 2 CPUs vs 1 at 450, thats pretty bad. :) I think that this is the software fighting with the locking and the shared bus (Abit BP6 IIRC was the board, which probably didnt have the fastest architecture either. IIRC, dual celeron was a bad hack). (As well, the stats above are not that fair as I ran a single job instead of the 4 different types Im using now as my base set, not to mention its G94 (outdated) and non-ATLAS.) So its probably not optimal to even consider stats for SMP from a dual Celerons considering they're so not designed for it. :) Nonetheless, my benchmarking of various jobs showed me that memory bandwidth is really important for most gaussian jobs, and you'd really need a speedy memory bus to keep up with this. DDR will probably really help SMP for this kind of thing. > Sounds like you are dead right on all of this. Embarrassingly parallel > jobs, few to no communications, purely CPU bound -- Durons (or whatever > currently delivers the most raw flops for the least money) are likely to > be perfect for you. And for many others, actually. For a long time I > like Celerons (or even dual Celerons) for the same reasons, although at > this point I've converted to AMD-based systems as their cost-benefit has > overwhelmed Intel's whole product line for my code. Its interesting however, to note that the improvements per cycle especially for DDR boards. I actually removed this stat from my charts, but I did compare all the Athlons to a Celeron 450 G89-atlas setup. These values are MUCH better than the non atlas values - ie atlas really improves the efficiency of the jobs, especially for newer CPUs - I assume by making better use of large L1/L2 caches and filling the longer pipelines more optimally. efficiency/cycle, atlas g98: (each column is normalized to 1.00 for C450 w.r.t that column's data) job 1 job 2 job 3 job 4 | | | non atlas | non atlas | non atlas | non atlas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~ C450 1.00 1.00 | 1.00 1.00 | 1.00 1.00 | 1.00 1.00 A700 1.05 | 0.88 1.05 | 1.26 1.31 | 1.34 1.39 D750 0.86 1.30 | 0.70 0.77 | 1.05 1.10 | 1.25 1.24 T900 0.89 1.31 | 0.81 0.86 | 1.05 1.08 | 1.23 1.20 T1200DDR 1.08 | 1.04 | 1.30 | 1.42 I think we see these patterns because without optimization for caches and pipeline, the disparate speeds between the CPU and RAM for the non DDR machines is very large (up to 4 or 5 CPU cycles/ram cycle). The cache helps a fair bit (the Tbird and Athlon fare much better than the duron), but when we get ATLAS involved, the improvements are quite noticeable over the baseline C450. > Not yet, but maybe soon. The fast Tbirds do require a big "certified" > power supply, but I'm guessing they draw a lot less than they "require" > except maybe in bursts. I'm betting they draw around 100-150W running, > a number that recently got some support on the list. Bursts of CPU usage? Arent all our clusters all hammering our CPUs as much as possible? And if ATLAS is really doing its job, arent we hammering all parts of the CPU as much as possible? :) > Not a preset package, but I'm trying to start a collection of sorts: > > http://www.phy.duke.edu/brahma/dual_athlon/tests.html Will check it out. > Hope this all helps or is interesting. I'm very interested in Athlon > performance profiles as they seem to be the current > most-CPU-for-the-least-money winners, and when one buys in bulk (as > beowulf humans tend to do) this sort of optimization really matters. What kind of deals can you get in bulk? From AMD themselves? do you need to be a big university and have a big press release event to get these deals from them? How many do you need to get a batch deal? How deep is the discount? /kc > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From g.roest at linvision.com Thu May 3 02:45:55 2001 From: g.roest at linvision.com (Gerben Roest) Date: Wed Nov 25 01:01:14 2009 Subject: Intel Fortran compiler Message-ID: Hi all, I recently tested the new Intel Fortran compiler 5.01 for Linux (free beta until sept 1). The results compared to the GNU g77 are quite impressive. I did a test job of a client of ours, which is some CFD code. It ran for about 2 minutes and used around 100 MB of memory. I did the test on a Celeron 466, with RH 7.1, kernel 2.4.3. g77 (gnu compiler) time: 2:45 minutes ifc (intel compiler) time: 1:37 minutes g77 options: -O2 -mpentiumpro -funroll-all-loops ifc options: -w90 -w95 (to suppress messages about use of non-standard Fortran). It default optimises, I tested some extra options for MMX or PII, but that did not result in faster code, as far as I could tell. Greetings, Gerben Roest. --- Linvision BV tel: 015-7502310 Elektronicaweg 16 d fax: 015-7502319 2628 XG Delft g.roest@linvision.com The Netherlands www.linvision.com From jcownie at etnus.com Thu May 3 04:17:04 2001 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:01:14 2009 Subject: [Commercial] TurboGenomics releases TurboBlast Message-ID: <14vH6W-3QS-00@etnus.com> I know nothing about this (I write debuggers not genomes), but folks seemed to be asking about running Blast in parallel, so this may be worth getting into the list archive. This from LWN (www.lwn.net/2001/0503/commerce.php3) TurboGenomics releases TurboBLAST. TurboGenomics has announced the release of TurboBLAST, a genetic sequencing tool. This product may be one of the first commercial packages for Beowulf clusters: "Initial benchmarks of TurboBLAST on a network of 11 commodity PCs running Linux reduced a month-long BLAST run to just two days. Greater speed-up of BLAST is achieved simply by adding more machines to the TurboBLAST system." The press release is at http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/04-23-2001/0001474961&EDATE= -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From renambot at cs.vu.nl Thu May 3 05:18:00 2001 From: renambot at cs.vu.nl (Renambot Luc) Date: Wed Nov 25 01:01:14 2009 Subject: (no subject) Message-ID: Hi, Is there anyone using the Thunder HEsl (S2567) Motherboard with an AGP graphic board (GeForce for example) ? I'm looking for some feedback on the AGP performance, as I read that there was some performance issues on that aspect. The global idea is to find a board with high PCI throughput for the network, and with good AGP performance (for a graphic cluster). thanks for any feedback or suggestion, cheers, Luc. -- Luc Renambot Mail: renambot@cs.vu.nl - Web : http://www.cs.vu.nl/~renambot/vr There's a crack in everything, that's how the light gets in. (L.C.) From rgb at phy.duke.edu Thu May 3 05:23:56 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: <20010502233647.O27034@velocet.ca> Message-ID: On Wed, 2 May 2001, Velocet wrote: > > Not yet, but maybe soon. The fast Tbirds do require a big "certified" > > power supply, but I'm guessing they draw a lot less than they "require" > > except maybe in bursts. I'm betting they draw around 100-150W running, > > a number that recently got some support on the list. > > Bursts of CPU usage? Arent all our clusters all hammering our CPUs as much as > possible? And if ATLAS is really doing its job, arent we hammering all parts > of the CPU as much as possible? :) Not variations in the CPU/memory load, which as you note is nearly constant (and not really all THAT different between idle and loaded -- a lot of juice is expended just keeping it and the memory running in idle mode), variations in the peripheral load -- using the disk(s), the network and so forth. In your configuration (very stripped, if I recall) you don't think you'll see much variation. I'm going to see if I can get Duke to spring for some tools to measure power draw properly. There are all sorts of peak vs rms issues that were discussed on the list a year or so ago, and this matters for a project we have underway (we're getting a "beowulf room" built from scratch in the department and are dealing with the feeding and cooling of some as yet indetermined number of nodes, where I've been using 100 Watts/node average power consumption as a rough guestimator for node consumption). > What kind of deals can you get in bulk? From AMD themselves? do you > need to be a big university and have a big press release event to > get these deals from them? How many do you need to get a batch deal? > How deep is the discount? Ah, I don't shop that way. I love to save money and all, but: a) It's other people's money (mostly -- I do run my home 'wulf out of pocket, sort of). b) I have considerable personal experience of the penny wise, pound foolish variety. I'll cheerfully spend some of my OPM budgets on convenience, vendor relationships, and extranea that cost me a few nodes overall but may get more work done with less hassle. c) I factor in the cost of my own time at a pretty hefty rate for setting everything up and maintaining it. I therefore get things set up so they will be very low maintenance, as my time ends up being worth (trading off) quite a lot of hardware. To me, anyway. In my case the best solution to optimize these parameters seems to be do business with a reliable local vendor, missing the absolute best deals available anywhere by an easy 10% or so but getting the warm fuzzies of a place where I'm on a first name basis with the staff (who of course adore all the lovely money I spend there and are willing to earn it) that will fix things for me without invoking the daemon-gods of depot-repair hell (or playing the mail game -- they mail you a piece, the next day or so you try it, if it works you mail back the bad part, otherwise you mail back the good part and they mail you a piece, ad nauseam). I just carry my vendor the node or the part if I've definitely isolated it (ten minute drive) and the next day or next hour they hand back a working node, with a full shop and lots of immediately available parts to swap to apply to the repair. Only same-day on-site service contracts from e.g. Dell can beat it, but Dells cost another 10% or more per node and its harder to shop and microconfigure. After all, nodes will be with us always, and new budgets for buying them are arranged every few years. Whatever I get will be obsoleted in three months so anything I get will be a "mistake" from one point of view (if only I'd held out for the 1.9 GHz SuperCruncher with relativistic predictive dram -- delivers the memory before you ask for it -- THEN I'd have been Happy). Better to take the long view, buy in mid-sized chunks (giant purchases of 256-node 'wulfs can require real expertise to get right and cost you a LOT of money if you get them wrong and are often best arranged via a turnkey provider with some consultative experience). That way, if one mid-size chunk is less than perfectly optimal, so be it -- you learn from it and arrange the next mid-sized chunk to be better spent. You also get to buy that 1.9 GHz SuperCruncher with the next round of money spent. This last point is worth examining. The way Moore's Law works it is amusing but true that if you take a fixed three year budget of 3A and spend it all at once, you get (3A)*(3 years) = 9 work units done over three years. If you spend it A per year, you get (A)*(3 years) + 2*(A)*(2 years) + 4*(A)*(1 year) = 11 work units done over the same time (the numbers reflecting the approximate annual doubling in speed from ML). That is, you break even in work done between years 2 and 3 and thereafter accumulate work units at A+2A+4A = 7A per year vs 3A. Also note that you break even in the RATE at which work is done at the BEGINNING of the second year -- by spending your money incrementally (likely matching the ramp-up in work load, unless your users are "ready" to jump in and simply crank up to full speed immediately) you get almost as much work done in the third year alone as one would in three spending everything all at once. Ain't exponential growth wonderful? So my one piece of parting advice is to worry less about getting "the" absolute best (most cost effective) hardware as it exists right now -- your cost-benefit optimization calculation may not survive literally from week to week anyway. Last week the bleeding edge Tbird dropped by almost 25% of its price (so my concern about NIC prices in the cluster I'm getting turned out to be specious -- I'm getting upgraded systems at constant cost with the originally quoted 3c905's in place). P4 prices are plummeting. Clock speeds keep edging up within CPU families, and then there are the 64 bit children of those families waiting to be delivered to the world. Nothing you get now will appear to be a wise purchase six months from now, but if one DOESN'T buy in at some point one never gets started and Moore's law doesn't quit. [...a clever person perhaps have noting that if one did nothing but bask in the rays of the Caribbean sun for two years and mosey back north to buy 3A*4-speed systems at the start of the third year, one would get 12 work units done in the third year alone and thus beat out even my A per year purchase schedule and get a nice tan besides. Or worse, waiting one MORE year gets 3A*8 = 24 work units done in the fourth year alone. In the limit, NOBODY should EVER buy computers to do numerical calculations now, as the longer they wait the less time it will take to complete them once they start and if we all just waited long enough a single desktop unit would get more work done than all the beowulf units currently in existance put together... hmmm, something wrong with this logic, head hurts, must seek solution -- oh hell, might as well go get tickets to Jamaica...:-) Forgive my morning ramblings... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From josip at icase.edu Thu May 3 07:49:57 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs References: Message-ID: <3AF17015.BD539EF@icase.edu> "Robert G. Brown" wrote: > > IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into > at least parts of ATLAS for use with athlons back when they were first > released. It apparently made a quite significant difference in > performance. It was not me (we have Pentiums). However, prefetching and SSE instructions should make a significant difference. For example, Portland Group suggests compiling LAPACK and BLAS with the following switches (using PGI compilers release 3.2-4 and a SSE-enabled Linux kernel, i.e. version 2.2.10 or later with the appropriate patches): Pentium III: -fast -pc 64 -Mvect=sse -Mcache_align -Kieee Athlon: -fast -pc 64 -Mvect=prefetch -Kieee The only exceptions are slmach.f and dlmach.f which must be compiled using '-O0'. Also, the main program should be compiled using the '-pc 64' (64-bit double precision format). PGI says thatin some cases a 23% performance benefit can be obtained when prefetch instructions are used. This helps with both single- and double-precision codes. For single-precision codes only, the Pentium III SSE instructions can deliver about 33% benefit. Since SSE instructions operate only on single-precision data that is aligned on cache-line boundaries, enforcing this alignment with '-Mcache_align' produces an even better 61% gain over the original non-SSE code (says PGI). Finally, the PGI release 3.2-4 also supports Pentium 4 SSE2 instructions (-tp piv -Mvect=sse ...). Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From hahn at coffee.psychology.mcmaster.ca Thu May 3 09:38:04 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:14 2009 Subject: Intel Fortran compiler In-Reply-To: Message-ID: > Celeron 466, with RH 7.1, kernel 2.4.3. so g77 was one of RH's funky "2.96+" snapshots? > g77 (gnu compiler) time: 2:45 minutes > ifc (intel compiler) time: 1:37 minutes > > g77 options: -O2 -mpentiumpro -funroll-all-loops I find that -funrill-all-loops usually gives *worse* performance. in fact, -Os is the most commonly-recommended flag on the gcc dev list. -fomit-frame-pointer is an obvious other one. From cbhargava at asacomputers.com Thu May 3 11:06:53 2001 From: cbhargava at asacomputers.com (Chetan Bhargava) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> <20010502170845.E20583@kotako.analogself.com> Message-ID: <003701c0d3fb$d6609600$1f00a8c0@asacomputers.com> Hi, I'm using KDe right now but GNOME didn't work either... here is some system information... Here are the installed packages. gnome-libs-1.0.55-12 kdelibs-1.1.2-15 (rpm -qa |grep beo) ldconfig-1.9.5-16.beo.1 beoboot-1.0.6-1 beoboot-devel-1.0.6-1 beompi-1.0.7-1 beompi-devel-1.0.7-1 beosetup-1.21-1 beostatus-1.7-1 beowulf-doc-0.12-1 kernel-headers-2.2.16-21.beo kernel-2.2.16-21.beo kernel-pcmcia-cs-2.2.16-21.beo kernel-smp-2.2.16-21.beo kernel-source-2.2.16-21.beo kernel-utils-2.2.16-21.beo rdate-1.0-1.beo.1 (rpm -qa|grep kde) kdeadmin-1.1.2-6 kdebase-1.1.2-33 kdebase-lowcolor-icons-1.1.2-33 kdegraphics-1.1.2-3 kdelibs-1.1.2-15 kdelibs-devel-1.1.2-15 kdemultimedia-1.1.2-7 kdenetwork-1.1.2-13 kdesupport-1.1.2-12 kdesupport-devel-1.1.2-12 kdeutils-1.1.2-4 switchdesk-kde-2.1-1 I have installed beowulf from the Scyld Beowuld CDROM distributed by cheap bytes. Is there an newer version available? Thanks. :-) Chetan ----- Original Message ----- From: "Jag" To: "Chetan Bhargava" Cc: Sent: Wednesday, May 02, 2001 5:08 PM Subject: Re: Scyld: Beostatus dumps core From math at velocet.ca Thu May 3 11:15:50 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: ; from rgb@phy.duke.edu on Thu, May 03, 2001 at 08:23:56AM -0400 References: <20010502233647.O27034@velocet.ca> Message-ID: <20010503141550.F13946@velocet.ca> On Thu, May 03, 2001 at 08:23:56AM -0400, Robert G. Brown's all... > > Bursts of CPU usage? Arent all our clusters all hammering our CPUs as much as > > possible? And if ATLAS is really doing its job, arent we hammering all parts > > of the CPU as much as possible? :) > > Not variations in the CPU/memory load, which as you note is nearly > constant (and not really all THAT different between idle and loaded -- a > lot of juice is expended just keeping it and the memory running in idle > mode), variations in the peripheral load -- using the disk(s), the > network and so forth. In your configuration (very stripped, if I > recall) you don't think you'll see much variation. nope ;) > I'm going to see if I can get Duke to spring for some tools to measure > power draw properly. There are all sorts of peak vs rms issues that We have power measuring tools for our colocation customers. Im going to plug in a bunch of these boards and see what I get. > In my case the best solution to optimize these parameters seems to be do > business with a reliable local vendor, missing the absolute best deals > available anywhere by an easy 10% or so but getting the warm fuzzies of > a place where I'm on a first name basis with the staff (who of course Oh I agree. Our provider will come in and sit on the floor and fix things for 5-20 minutes if he can, or take it with him to his lab. If he's going to take more than a day he usually just gets us a drop in replacement part and RMAs the thing for himself to get a new board for some future customer. Works out great. Its not the cheapest, but 10% premium on a design which is 80% cheaper than most designs is ok by me if it saves a lot of time. In fact, I find that having all the nodes booting off a central NFS server makes management easier as well. With each one booting off the same mount point and set of directories, upgrading stuff is hyper simple, and only the unique portions of disk that I need (any scratch disk) needs to be seperate per machine. In fact, the raid server is actually being shared with another project already in operation, so we get to save effort there too. Yay, I dont have any work to do! :) > This last point is worth examining. The way Moore's Law works it is > amusing but true that if you take a fixed three year budget of 3A and > spend it all at once, you get (3A)*(3 years) = 9 work units done over > three years. If you spend it A per year, you get (A)*(3 years) + > 2*(A)*(2 years) + 4*(A)*(1 year) = 11 work units done over the same time > (the numbers reflecting the approximate annual doubling in speed from > ML). That is, you break even in work done between years 2 and 3 and > thereafter accumulate work units at A+2A+4A = 7A per year vs 3A. Also > note that you break even in the RATE at which work is done at the > BEGINNING of the second year -- by spending your money incrementally > (likely matching the ramp-up in work load, unless your users are "ready" > to jump in and simply crank up to full speed immediately) you get almost > as much work done in the third year alone as one would in three spending > everything all at once. IIRC, Moore's law was at 18months now. From everything2.com (because I knew I'd find it there, not because its authoritative): The observation that the logic density of silicon integrated circuits has closely followed the curve (bits per square inch) = 2^(t - 1962) where t is time in years; that is, the amount of information storable on a given amount of silicon has roughly doubled every year since the technology was invented. This relation, first uttered in 1964 by semiconductor engineer Gordon Moore (who co-founded Intel four years later) held until the late 1970s, at which point the doubling period slowed to 18 months. The doubling period remained at that value through time of writing (late 1999). This doesnt talk about the speed of the chips. Assuming it applies, however, as you have: At 1.587/year, over 3 years, the schedule would be: all at once 3A*3 = 9 A in stages once a year 3*A + 2*1.587*A + 1.587^2*A = 8.7 A all in the last year 3*1.587^2*A = 7.55 A So its close, but slightly losing. There are a few caveats here of course. - The people controling the money may want to see a fair number of results early on, instead of waiting the full 3 years. - Hopefully the techniques themselves as well as the software, will be improving becasue of the previous research/results. The more early results you get, the better. Even at a 'compounded advancement' rate of 10% 'better technology' per year, this favours the early implimentation earlier on. Im not just talking about you wiating around for others to improve their software - you yourself will use your own results to improve/guide your research. Humans thinking for years is alot of valuable input and will change and hopefully improve the research. (a possible argument to this is that investing the money on the market at 10% would counteract this 10% favour to 'early implimentation' ;) - The other caveat is that Moore's law is a smooth curve that approximates the increase in speed/performance, but the actual advancements are done when Intel and AMD release stuff according to marketing schedules, etc. So if you catch the wave at the beginning or end of a cycle, you could ride the value of the cheaper components dropping in price suddenly and drastically in one shot, and jump ahead of the curve for a few months. You can also get screwed by the same effect, and its hard to tell where you are in the cycle too. > So my one piece of parting advice is to worry less about getting "the" > absolute best (most cost effective) hardware as it exists right now -- > your cost-benefit optimization calculation may not survive literally > from week to week anyway. Last week the bleeding edge Tbird dropped by > almost 25% of its price (so my concern about NIC prices in the cluster Yes I agree to some extent. The thing is a number of results have to be produced within a couple months. The speed and cost/performance of the cluster over the next three years is a declining concern, compared with getting a number of things done over the next 6 months. Also, if we always buy stuff that costs twice as much than the best price performance, then we never win by buying in stages/not worrying about it. I buy the best price performance now, and I buy it again in 6 months, which is some totally different architecture by that time. And 6 months after that, again. Eventually I end up with Tb 1.3Ghz in my cluster, but 6 months after everyone else. In the meantime, I've always had the best price performance. Also, for me the money is structured differnetly. This is a grant to the group and must be ALL allocated NOW. There's no options to waiting at all. So I might as well spend it all now, on the best possible performance for the money. > [...a clever person perhaps have noting that if one did nothing but bask > in the rays of the Caribbean sun for two years and mosey back north to > buy 3A*4-speed systems at the start of the third year, one would get 12 > work units done in the third year alone and thus beat out even my A per > year purchase schedule and get a nice tan besides. Or worse, waiting > one MORE year gets 3A*8 = 24 work units done in the fourth year alone. If moore's was 2x per year ya, but its 18 months ;) > In the limit, NOBODY should EVER buy computers to do numerical > calculations now, as the longer they wait the less time it will take to > complete them once they start and if we all just waited long enough a > single desktop unit would get more work done than all the beowulf units > currently in existance put together... hmmm, something wrong with this > logic, head hurts, must seek solution -- oh hell, might as well go get > tickets to Jamaica...:-) > > Forgive my morning ramblings... Heh, they're fun, keep us sane, remind us why we're doing our work. Luckily moore's law isnt so quick, now we know why we research things NOW instead of waiting 200 years for someone to figure it all out ;) /kc > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From agrajag at linuxpower.org Thu May 3 11:19:04 2001 From: agrajag at linuxpower.org (Jag) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core In-Reply-To: <003701c0d3fb$d6609600$1f00a8c0@asacomputers.com>; from cbhargava@asacomputers.com on Thu, May 03, 2001 at 11:06:53AM -0700 References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> <20010502170845.E20583@kotako.analogself.com> <003701c0d3fb$d6609600$1f00a8c0@asacomputers.com> Message-ID: <20010503111904.F20583@kotako.analogself.com> On Thu, 03 May 2001, Chetan Bhargava wrote: > Hi, > > I'm using KDe right now but GNOME didn't work either... here is some system > information... > Here are the installed packages. > > (rpm -qa |grep beo) > ldconfig-1.9.5-16.beo.1 > beoboot-1.0.6-1 > beoboot-devel-1.0.6-1 > beompi-1.0.7-1 > beompi-devel-1.0.7-1 > beosetup-1.21-1 > beostatus-1.7-1 > beowulf-doc-0.12-1 > kernel-headers-2.2.16-21.beo > kernel-2.2.16-21.beo > kernel-pcmcia-cs-2.2.16-21.beo > kernel-smp-2.2.16-21.beo > kernel-source-2.2.16-21.beo > kernel-utils-2.2.16-21.beo > rdate-1.0-1.beo.1 > I have installed beowulf from the Scyld Beowuld CDROM distributed by cheap > bytes. Is there an newer version available? Judging from the above package versions, it looks like you are using the Scyld PREVIEW CD (27bz-6). Scyld has also made an actual release (27bz-7) that includes some newer software, including a 2.2.17 kernel and a newer version of beostatus, among other things. I recommend you try the newer release. The packages are available on ftp.scyld.com and I beleive (although I'm not certain) that cheapbytes now carries 27bz-7 instead of 27bz-6. > > Thanks. :-) No problem Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010503/71447eef/attachment.bin From cbhargava at asacomputers.com Thu May 3 11:21:17 2001 From: cbhargava at asacomputers.com (Chetan Bhargava) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> <20010502170845.E20583@kotako.analogself.com> <003701c0d3fb$d6609600$1f00a8c0@asacomputers.com> <20010503111904.F20583@kotako.analogself.com> Message-ID: <007601c0d3fd$d8de0730$1f00a8c0@asacomputers.com> Is an ISO image availabe on the web for 27bz-7 ? Thanks. ----- Original Message ----- From: "Jag" To: "Chetan Bhargava" Cc: Sent: Thursday, May 03, 2001 11:19 AM Subject: Re: Scyld: Beostatus dumps core From rgb at phy.duke.edu Thu May 3 11:30:00 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:14 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: <3AF17015.BD539EF@icase.edu> Message-ID: On Thu, 3 May 2001, Josip Loncaric wrote: > "Robert G. Brown" wrote: > > > > IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into > > at least parts of ATLAS for use with athlons back when they were first > > released. It apparently made a quite significant difference in > > performance. Sorry. I did a google search and pulled up e.g. http://www.beowulf.org/listarchives/beowulf/1999/11/0008.html It was Emil Briggs from NCSU who wrote BLAS 1 routines hand optimized in assembler for the Athlon. There is a web-drop for his efforts in this article: http://nemo.physics.ncsu.edu/~briggs/blas_src_v0.11_tar.gz which still seems to exist. I don't know if there are more recent versions or if this ever went anywhere further, but Emil would likely know at: Emil Briggs as of 1999. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From lowther at att.net Thu May 3 12:01:35 2001 From: lowther at att.net (lowther@att.net) Date: Wed Nov 25 01:01:14 2009 Subject: Scyld: Beostatus dumps core References: <00f501c0d344$0873af50$1f00a8c0@asacomputers.com> <20010502130946.D20583@kotako.analogself.com> <011001c0d356$20ef3650$1f00a8c0@asacomputers.com> <20010502170845.E20583@kotako.analogself.com> <003701c0d3fb$d6609600$1f00a8c0@asacomputers.com> <20010503111904.F20583@kotako.analogself.com> Message-ID: <3AF1AB0F.5469FFBA@att.net> Jag wrote: > > I beleive (although I'm not certain) that cheapbytes now carries 27bz-7 > instead of 27bz-6. > Linux Central, the source listed on the Scyld site, does indeed carry the latest release. -- Ken Lowther Youngstown, Ohio http://www.atmsite.org From bra369 at pp.molsci.csiro.au Thu May 3 21:48:09 2001 From: bra369 at pp.molsci.csiro.au (Kim Branson) Date: Wed Nov 25 01:01:14 2009 Subject: intel etherpro 100 problems In-Reply-To: <20010501203141.D60454@velocet.ca> Message-ID: Hi List, I've been having some problems with intel etherpro 100 cards on my cluster. the nodes are 1ghz athlons 64meg pc133 on a asus a7pro mb. ibm 10.2 gig drives after a reboot of the nodes, and sometimes during a run the card reports no rx buffers or resources. from the logs eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html May 3 19:47:50 node04 kernel: eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 Modified by Andrey V. Savochkin and others May 3 19:47:50 node04 kernel: eth0: Intel PCI EtherExpress Pro100 82557, 00:02:B3:0B:A7:4A, I/O at 0xa400, IRQ 11. May 3 19:48:12 node04 kernel: eth0: card reports no RX buffers. May 3 19:48:12 node04 kernel: eth0: card reports no resources. Has anyone else seen this problem or can explain to me what might be going on. after rebooting this node by hand it came up ok. This problem occoured on 7/64 nodes. kim ______________________________________________________________________ Mr Kim Branson Phd Student Structural Biology Walter and Eliza Hall Institute Royal Parade, Parkville, Melbourne, Victoria Ph 61 03 9662 7136 Email kbranson@wehi.edu.au Email kim.branson@hsn.csiro.au ______________________________________________________________________ From zolia at lydys.sc-uni.ktu.lt Thu May 3 23:13:14 2001 From: zolia at lydys.sc-uni.ktu.lt (zolia) Date: Wed Nov 25 01:01:14 2009 Subject: intel etherpro 100 problems In-Reply-To: Message-ID: Hi, Sometimes we have the same problems. Systems doesn't start reporting no tx rx buffers. This i spotted about the year before, but didn't find any solution. First i thought that eepro100 had some kind of conflicts with mainboards (we used via chipset), but later the same things happened with intel MB integraded eepro. Does any one knows what is the problem? ==================================================================== Antanas Masevicius Kaunas University of Technology Studentu 48a-101 Computer Center LT-3028 Kaunas LITNET NOC UNIX Systems Administrator Lithuania E-mail: zolia@sc.ktu.lt On Fri, 4 May 2001, Kim Branson wrote: > > Hi List, > > I've been having some problems with intel etherpro 100 cards on my > cluster. > > the nodes are 1ghz athlons 64meg pc133 on a asus a7pro mb. ibm 10.2 gig > drives > > after a reboot of the nodes, and sometimes during a run the card reports > no rx buffers or resources. > > from the logs > > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > May 3 19:47:50 node04 kernel: eepro100.c: $Revision: 1.20.2.10 $ > 2000/05/31 Modified by Andrey V. Savochkin and others > May 3 19:47:50 node04 kernel: eth0: Intel PCI EtherExpress Pro100 82557, > 00:02:B3:0B:A7:4A, I/O at 0xa400, IRQ 11. > > May 3 19:48:12 node04 kernel: eth0: card reports no RX buffers. > May 3 19:48:12 node04 kernel: eth0: card reports no resources. > > Has anyone else seen this problem or can explain to me what might be going > on. after rebooting this node by hand it came up ok. This problem occoured > on 7/64 nodes. > > kim > ______________________________________________________________________ > > Mr Kim Branson > Phd Student > Structural Biology > Walter and Eliza Hall Institute > Royal Parade, Parkville, Melbourne, Victoria > Ph 61 03 9662 7136 > Email kbranson@wehi.edu.au > Email kim.branson@hsn.csiro.au > ______________________________________________________________________ > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Thu May 3 23:26:49 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:14 2009 Subject: intel etherpro 100 problems In-Reply-To: Message-ID: hi i suspect that if you are using the onboard NICs on the various motherboards w/ intel i810 or i815 motherboards ,,, than you need to upgrade your eepro100.c driver for linux-2.2.16 thru *.19 linux-2.4.x seems to be little better... other simple test is to make your eepro100 be a module and see if the "no resources" problems go away using "modules" fixed all of our obvious eepro100 problems have fun alvin http://wwww.Linux-1U.net ... 500Gb 1u Raid5 ... On Fri, 4 May 2001, zolia wrote: > Hi, > > Sometimes we have the same problems. Systems doesn't start reporting no tx > rx buffers. This i spotted about the year before, but didn't find any > solution. First i thought that eepro100 had some kind of conflicts with > mainboards (we used via chipset), but later the same things happened with > intel MB integraded eepro. > > Does any one knows what is the problem? > > ==================================================================== > Antanas Masevicius Kaunas University of Technology > Studentu 48a-101 Computer Center > LT-3028 Kaunas LITNET NOC UNIX Systems Administrator > Lithuania E-mail: zolia@sc.ktu.lt > > On Fri, 4 May 2001, Kim Branson wrote: > > > > > Hi List, > > > > I've been having some problems with intel etherpro 100 cards on my > > cluster. > > > > the nodes are 1ghz athlons 64meg pc133 on a asus a7pro mb. ibm 10.2 gig > > drives > > > > after a reboot of the nodes, and sometimes during a run the card reports > > no rx buffers or resources. > > > > from the logs > > > > eepro100.c:v1.09j-t 9/29/99 Donald Becker > > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > > May 3 19:47:50 node04 kernel: eepro100.c: $Revision: 1.20.2.10 $ > > 2000/05/31 Modified by Andrey V. Savochkin and others > > May 3 19:47:50 node04 kernel: eth0: Intel PCI EtherExpress Pro100 82557, > > 00:02:B3:0B:A7:4A, I/O at 0xa400, IRQ 11. > > > > May 3 19:48:12 node04 kernel: eth0: card reports no RX buffers. > > May 3 19:48:12 node04 kernel: eth0: card reports no resources. > > > > Has anyone else seen this problem or can explain to me what might be going > > on. after rebooting this node by hand it came up ok. This problem occoured > > on 7/64 nodes. > > > > kim > > ______________________________________________________________________ > > > > Mr Kim Branson > > Phd Student > > Structural Biology > > Walter and Eliza Hall Institute > > Royal Parade, Parkville, Melbourne, Victoria > > Ph 61 03 9662 7136 > > Email kbranson@wehi.edu.au > > Email kim.branson@hsn.csiro.au > > ______________________________________________________________________ > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From Eugene.Leitl at lrz.uni-muenchen.de Fri May 4 02:09:50 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:14 2009 Subject: [Fwd: CCL:[Fwd: Athlon SDR/DDR stats for *specific* gaussian98 jobs]] Message-ID: <3AF271DE.FAD8D83C@lrz.uni-muenchen.de> -------- Original Message -------- From: "M. Nicklaus" Subject: CCL:[Fwd: Athlon SDR/DDR stats for *specific* gaussian98 jobs] To: chemistry@ccl.net CC: mn1@helix.nih.gov For what it's worth: Here's a recent small series of benchmarks we've run on various Linux systems, plus a Cray and SGI Origin added for comparison. Program: Gaussian 98 Rev. A.7. All executables exactly identical for the Linux systems (copied between machines), compiled w/ PGI v. 3.2 (on Linux), G.98 standard compilation (no tuning/hacking). Jobs run: G.98 test jobs # 1, 28, 94, 155, 194, 296, 302, aggregate time, as reported in output, all runs single-CPU. CPU Speed Chipset Kernel Distro glibc Memory HD/Contr. time min. 256MB MHz, RPM (sec) (all: U-ATA/) P 4 1.5GHz Int. 850 2.4.2-2 RH 7.1 2.2.2-10 RDRAM-800 100, 7200 278 Athlon 1.33 VIA 686B 2.4.0 SuSE 7.1 2.2-7 DDR 266 66, 7200 299 P III 1 GHz Int. 815 2.4.3 (RH 6.1) 2.1.2 SDRAM 133 100, 7200 497 P III 1 GHz Int. 815 2.4.3 (RH 6.1) 2.1.2 SDRAM 133 100, 5400 497 P III 1 GHz Int. 815 2.2.12 RH 6.1 2.1.2 SDRAM 133 100, 7200 646 P III 1 GHz Int. 815 2.2.12 RH 6.1 2.1.2 SDRAM 133 100, 5400 646 P III 866MHz S/Works 2.4.3 (RH 6.2) 2.1.3-15 SDRAM 133 33, 7200 453 P III 866MHz S/Works 2.2.14 RH 6.2 2.1.3-15 SDRAM 133 33, 7200 526 SGI Origin 2000 921 Cray SV-1 2400 "(...)" in the Distro column denotes newer kernel than out-of-the-box. Going from a 2.2 kernel to a 2.4 kernel seems to speed things up considerably, at least on these systems. Hard drive and IDE controller speeds appear to be near irrelevant for these jobs. Marc ------------------------------------------------------------------------ Marc C. Nicklaus, Ph.D. NIH/NCI at Frederick E-mail: mn1@helix.nih.gov Bldg 376, Rm 207 Phone: (301) 846-5903 376 Boyles Street Fax: (301) 846-6033 FREDERICK, MD 21702 USA Head, Computer-Aided Drug Design MiniCore Facility Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute at Frederick, National Institutes of Health http://rex.nci.nih.gov/RESEARCH/basic/medchem/mcnbio.htm ------------------------------------------------------------------------ -= This is automatically added to each message by mailing script =- CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From tru at pasteur.fr Fri May 4 03:04:08 2001 From: tru at pasteur.fr (Tru Huynh) Date: Wed Nov 25 01:01:15 2009 Subject: Intel Fortran compiler References: Message-ID: <3AF27E98.D4D18783@pasteur.fr> Hello, Just my 2 cents about the Intel beta compiler on a P4 RH7.1 I have just downloaded it and give it a try. But when the compiler exits with *Compiler Internal Error* : Please report to INTEL and that you report it, this is the message you have in return: "Intel Fortran compiler for Linux is currently supported on Red Hat 6.2 only." And from their FAQ Q. What Linux* distributions are supported? A. On IA-32 architecture, Red Hat* Linux 6.2 is supported. Turbolinux*, currently in Beta release, is supported on the Itanium(TM) processor architecture. We welcome feedback from users who are trying the compiler on other distributions. Q. Red Hat 6.2 doesn't install on my Pentium(R) 4 system. Is there any way I can work around this problem and use the compiler on a Pentium 4 system? A. There is a work around but it isn't very elegant. Install the Red Hat Linux 6.2 operating system on a Pentium III system and then install the hard disk to a Pentium 4 system. We are investigating support for additional distributions for future releases of our products that will avoid this issue. YMMV Tru -- Dr Tru Huynh | Bioinformatique Structurale mailto:tru@pasteur.fr | tel +33 1 45 68 87 37 Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France From rakhesh at cse.iitd.ernet.in Fri May 4 04:17:53 2001 From: rakhesh at cse.iitd.ernet.in (Rakhesh Sasidharan) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? Message-ID: Hi all, I was going through the beowulf.org site trying to find general information on beowulf's (and I did find lots), but there's something that I still am not clear about: Suppose I have 6-7 old machines lying around with me, is there any way I could use something like beowulf techniques to make them into a "super" computer ? I'm just a non-academic user, and the only use I think I can put this to could be to play games faster, or surfing, or plays mp3s etc ... Individually, the old machines (486s and above) wouldn't be efficient, so is it possible to combine them together and make things work ? I went through a couple of articles etc, and that gave me the impression contrary to what I am asking; but still I ask to make sure. :-) Or maybe it is possible to take the source code of normal apps, and beowulf-ify them ? Regards, __ Rakhesh From arnoldg at ncsa.uiuc.edu Fri May 4 05:47:59 2001 From: arnoldg at ncsa.uiuc.edu (Galen Arnold) Date: Wed Nov 25 01:01:15 2009 Subject: intel etherpro 100 problems In-Reply-To: Message-ID: Intel's latest e100 driver should fix it for you. Search for file "e100" on their download site or try this link: http://appsr.intel.com/scripts-df/File_Filter.asp?FileName=e100 -Galen On Fri, 4 May 2001, Kim Branson wrote: > > Hi List, > > I've been having some problems with intel etherpro 100 cards on my > cluster. > > the nodes are 1ghz athlons 64meg pc133 on a asus a7pro mb. ibm 10.2 gig > drives > > after a reboot of the nodes, and sometimes during a run the card reports > no rx buffers or resources. > > from the logs > > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > May 3 19:47:50 node04 kernel: eepro100.c: $Revision: 1.20.2.10 $ > 2000/05/31 Modified by Andrey V. Savochkin and others > May 3 19:47:50 node04 kernel: eth0: Intel PCI EtherExpress Pro100 82557, > 00:02:B3:0B:A7:4A, I/O at 0xa400, IRQ 11. > > May 3 19:48:12 node04 kernel: eth0: card reports no RX buffers. > May 3 19:48:12 node04 kernel: eth0: card reports no resources. > > Has anyone else seen this problem or can explain to me what might be going > on. after rebooting this node by hand it came up ok. This problem occoured > on 7/64 nodes. > > kim > ______________________________________________________________________ > > Mr Kim Branson > Phd Student > Structural Biology > Walter and Eliza Hall Institute > Royal Parade, Parkville, Melbourne, Victoria > Ph 61 03 9662 7136 > Email kbranson@wehi.edu.au > Email kim.branson@hsn.csiro.au > ______________________________________________________________________ > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- + Galen Arnold, system engineer--systems group arnoldg@ncsa.uiuc.edu National Center for Supercomputing Applications (217) 244-3473 152 Computer Applications Bldg., 605 E. Spfld. Ave., Champaign, IL 61820 From Dean.Carpenter at pharma.com Fri May 4 07:24:14 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:15 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48D1@a1mbx01.pharma.com> Wow - no comments at all on this. Anywhere - I've asked in a couple of places. Does anyone know of any SMP issues with 2.2.17 at all ? -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Wednesday, May 02, 2001 5:05 PM To: beowulf@beowulf.org Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Hi All - Just got some eval equipment in today to play with, with the Gigabyte GA-6vxdr7 motherboards in them. The NICs show up as EtherExpressPro 10/100 nics, pretty normal. These are dual P3 boards with dual 933 cpus and 512meg memory. The stage 1 boot goes fine, it gets an IP and grabs the stage 2 kernel fine. It's during the boot and init of the dual cpus that it barfs ... It leaves this on screen : : : CPU map: 3 Booting processor 1 eip 2000 Setting warm reset code and vector 1. 2. 3. Asserting INIT. Deasserting INIT. Sending STARTUP #1. After apic_write. Before start apic_write. Startup point 1. And there it sits. There's some more above the CPU map: 3 there, I can provide that as well. I have to run right now, but tomorrow I'll try the non-SMP kernel, see if it will actually boot. Otherwise, any ideas ? From tru at pasteur.fr Fri May 4 08:00:40 2001 From: tru at pasteur.fr (Tru Huynh) Date: Wed Nov 25 01:01:15 2009 Subject: Intel Fortran compiler (url) References: <3AF27E98.D4D18783@pasteur.fr> Message-ID: <3AF2C418.8D619233@pasteur.fr> For anyone interested, here is the url for registering to the beta testing. http://www.releasesoftware.com/_intelbetacenteronlinux/cgi-bin/pd.cgi?page=product_info Tru -- Dr Tru Huynh | Bioinformatique Structurale mailto:tru@pasteur.fr | tel +33 1 45 68 87 37 Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France From rgb at phy.duke.edu Fri May 4 08:13:54 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:15 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs In-Reply-To: <20010503141550.F13946@velocet.ca> Message-ID: On Thu, 3 May 2001, Velocet wrote: > IIRC, Moore's law was at 18months now. From everything2.com (because I knew > I'd find it there, not because its authoritative): > > The observation that the logic density of silicon integrated circuits has > closely followed the curve (bits per square inch) = 2^(t - 1962) where t is > time in years; that is, the amount of information storable on a given amount > of silicon has roughly doubled every year since the technology was invented. > This relation, first uttered in 1964 by semiconductor engineer Gordon Moore > (who co-founded Intel four years later) held until the late 1970s, at which > point the doubling period slowed to 18 months. The doubling period remained > at that value through time of writing (late 1999). > > This doesnt talk about the speed of the chips. Assuming it applies, > however, as you have: It does and it doesn't. Chip design over this period introduced the notions of cache, on-chip parallelism, RISC (allowing less logic on the CPU for greater effect), and much more. Nothing like direct anecdotes. My own personal measurements on the Intel architecture (with a very early precursor of cpu-rate:-) are: ~end of 1982, Original IBM PC, 8088 @ 4.77 MHz (8 bit), 10^4 flops (peak double precision), basica (I didn't have access to a real numerical compiler -- IBM's Fortran -- for a year or so and it still yielded order of 10^4 flops, which went up to 10^5 or so with an 8087). 2001, P3 @ 933 MHz ~2x10^8 flops (cpu-rate peak double precision). If we allow for 1.4 GHz in the P4 (which I haven't yet benched, but maybe this weekend or next week) and multiply by a bit for architectural improvements, we might reasonably call this a factor of 30,000 to 40,000 over around 18-19 years. Log base 2 of this is around 15, so Intel has been just off a doubling a year. However, the pattern has been very irregular; if the Itanium is released before year end at a decent clock and doubles rates at constant clock (yielding perhaps 1 GFLOP?) then we get log base 2 of 10^5 or more like 17. If we include Athlons as "Intel-like" CPUs we are already at about 16, although there are better/faster AMD's waiting in the wings as well likely to arrive before year's end. Of course other people with other benchmarks may get other numbers as well. So a speed doubling time of a year is perhaps optimistic, but only by weeks and even the weeks can depend on what year (sometimes what month of what year) you measure in. > So its close, but slightly losing. There are a few caveats here of course. > > - The people controling the money may want to see a fair number of results > early on, instead of waiting the full 3 years. All fair enough. Still, all things being equal production will be optimized finding a suitable purchase schedule that properly tracks the ML curve, whatever it might be. This is a substantial advantage of the beowulf architecture. It is one of the FEW supercomputing architectures around with a smooth, consistent upgrade path at remarkably predictable cost. One of my big early mistakes in this game involved buying a "refrigerator" SGI 220S with two processors, thinking that in a few years we could upgrade to 8 processors at a reasonable cost. Never happened. One could buy single CPU systems that were as fast as all six upgrade CPUs put together would have been for what was STILL the very high cost of the upgrade when we saw the COTS light and just quit. When we finally sold the $75000 box (only five years old) for $3000, we could get a system that was faster on a single CPU basis than both processors put together and then some for just about the cost of its software maintenance agreement. Not to dis SGI -- they were filling a niche and COTS clusters were still an idea in the process of happening (inspired in part by the ubiquity of this general experience). However, Moore's Law is particularly cruel to big-iron style all at once purchases. If we'd spent that $75K at the rate of $15K/year over five years, we would have gotten MUCH more work done, as by the end of that period we were just getting to where clusters with $5K/nodes were really a decent proposition, with Sun workstations (usually) being the commodity nodes or COW components. BTW, a related and not irrelevant question. You have said that G98 is your dominant application -- are you doing e.g. Quantum Chemistry? There is a faculty person here (Weitao Yang) who is very interested in building a cluster to do quantum chemistry codes that use Gaussian 98 and FFT's and that sort of thing, and he's getting mediocre results with straight P3's on 100BT. I'm not familiar enough with the problem to know if his results are poor because they are IPC bound (and he should get a better network) or memory bound (get alphas) or whatever. But I'd like to. Any general list-wisdom for quantum chemistry applications? Is this an application likely to need a high end architecture (e.g. Myrinet and e.g. Alpha or P4) or would a tuned combination of something cheaper do as well? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From MAWorsham at intermedia.com Fri May 4 10:46:44 2001 From: MAWorsham at intermedia.com (Worsham, Michael A.) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? Message-ID: >Date: Fri, 4 May 2001 16:47:53 +0530 (IST) >From: Rakhesh Sasidharan >To: beowulf >Subject: Possbile uses of Beowulf ... ? >Hi all, >I was going through the beowulf.org site trying to find general >information on beowulf's (and I did find lots), but there's something that >I still am not clear about: > > Suppose I have 6-7 old machines lying around with me, is there any way I > could use something like beowulf techniques to make them into a "super" > computer ? I'm just a non-academic user, and the only use I think I can > put this to could be to play games faster, or surfing, or plays mp3s > etc ... Individually, the old machines (486s and above) wouldn't be > efficient, so is it possible to combine them together and make things > work ? > >I went through a couple of articles etc, and that gave me the impression >contrary to what I am asking; but still I ask to make sure. :-) Or maybe >it is possible to take the source code of normal apps, and beowulf-ify >them ? > >Regards, >__ >Rakhesh I am doing something like this already. I am currrently running Scyld Beowulf (www.scyld.com) on a small cluster I made at home for load balancing through an online game system and web server I am developing. I recommend taking a look at the Scyld site and ordering a copy of the Scyld Beowulf personal edition and installing it. The slave/node PC's need at least 64 mb of ram, a decent IDE or SCSI drive, and a PCI (not ISA) ethernet card for minimal installation and configuration to take place. Other than that, its a breeze to work with. The Scyld documentation is found online at the Scyld site, and the cost of the personal CD is about $3.00 USD which is rather cheap through Linux Central. Scyld Site: http://www.scyld.com Linux Central (Ordering): http://www.qksrv.net/click-734457-487846 then search for 'Scyld'. -- Michael From SSauerburger at condor.nrl.navy.mil Fri May 4 12:09:57 2001 From: SSauerburger at condor.nrl.navy.mil (Stephan Robert Sauerburger) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? Message-ID: <20010504150957.A29778@ha-web1.nrl.navy.mil> > The Scyld documentation is found online at the Scyld site, and the cost of > the personal CD is about $3.00 USD which is rather cheap through Linux > Central. Yeah, when I first saw the price at scyld's site, I was thinking "why bother?".. It seems like it's little more than enough to compensate for the price of the medium, and maybe a little for the S&H troubles.. Why not just offer the iso image of the disc at an FTP site of theirs to download ourselves and burn, and maybe save the trouble? ~Stephan~ From cbhargava at asacomputers.com Fri May 4 12:43:14 2001 From: cbhargava at asacomputers.com (Chetan Bhargava) Date: Wed Nov 25 01:01:15 2009 Subject: Scyld: BIGMEM: More than 1GB support Message-ID: <001f01c0d4d2$75d61940$1f00a8c0@asacomputers.com> Hi, My nodes have a gig of ram and master has 2 gig. Right now they are using 960M only. If I compile the kernel for BIGMEM support, do I need to take care of anything else or just recompile the kernel with BIGMEM support and install it. How would I regenerate node boot disks? Thanks. Chetan Bhargava From math at velocet.ca Fri May 4 13:40:30 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:15 2009 Subject: G98 standard benchmarks Message-ID: <20010504164030.J6317@velocet.ca> Someone mentioned they ran a few of the G98 test jobs as a standard benchmark. Is this an 'official' standard, or just something made up on the fly? Is there such an 'official standard benchmark', or should we make up one for comparing our results? :) (Should we just run all tests? I think there are over 500 of them though.) /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From siegert at sfu.ca Fri May 4 14:22:27 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:15 2009 Subject: Athlon SDR/DDR stats for *specific* gaussian98 jobs Message-ID: <20010504142227.B4058@stikine.ucs.sfu.ca> On Fri, May 04, 2001 at 11:13:54AM -0400, Robert G. Brown wrote: > BTW, a related and not irrelevant question. > > You have said that G98 is your dominant application -- are you doing > e.g. Quantum Chemistry? There is a faculty person here (Weitao Yang) > who is very interested in building a cluster to do quantum chemistry > codes that use Gaussian 98 and FFT's and that sort of thing, and he's > getting mediocre results with straight P3's on 100BT. I'm not familiar > enough with the problem to know if his results are poor because they are > IPC bound (and he should get a better network) or memory bound (get > alphas) or whatever. But I'd like to. Any general list-wisdom for > quantum chemistry applications? Is this an application likely to need > a high end architecture (e.g. Myrinet and e.g. Alpha or P4) or would a > tuned combination of something cheaper do as well? I cannot tell you anything about Quantum Chemistry (which theoretical physicist does? sounds like density functional theory - arrgh), but I do know quite a bit about parallel FFT's. Parallel FFT's don't work very well with 100baseT. Hence upgrading your processor speed (or even going to Alphas) will not help very much. Getting a better network is the way to go (channel bonding or Myrinet). Even switching from tulip cards to 905B's will help. Also, you can optimize the parallel FFT: the best algorithm actually depends on system size, network speed, MPI distribution, etc. I summed up my experience here: http://www.sfu.ca/acs/cluster/fft-performance.html Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From kragen at pobox.com Sat May 5 00:17:40 2001 From: kragen at pobox.com (kragen@pobox.com) Date: Wed Nov 25 01:01:15 2009 Subject: How can I compute the range of signed and unsigned types Message-ID: <200105050717.DAA17059@kirk.dnaco.net> James Cownie writes: > Jag wrote : - > > Those sizes are defined for the C language. In order words, no > > matter if you're on a 32-bit machine or a 64-bit machine, an int is > > always going to be 32-bit and thus have the same numeric range > > No, the C standard says nothing of the sort. > > All the C standard says is that > > 1) sizeof (char) == 1 > 2) sizeof (short) >= sizeof (char) > 3) sizeof (int) >= sizeof (short) > 4) sizeof (long) >= sizeof (int) > 5) sizeof (long long) >= sizeof (long). > > It also does not specify that the representation of an int is two's > complement, so even on machines with the same sizeof(int) the legal > ranges could differ. It also says sizeof(short) >= 2 and sizeof(long) >= 4, IIRC, and the old ANSI C standard didn't say anything about long long. I haven't read C9X. From becker at scyld.com Sat May 5 12:42:42 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:15 2009 Subject: intel etherpro 100 problems In-Reply-To: Message-ID: On Fri, 4 May 2001, Kim Branson wrote: > I've been having some problems with intel etherpro 100 cards on my > cluster. > > the nodes are 1ghz athlons 64meg pc133 on a asus a7pro mb. ibm 10.2 gig > drives > > after a reboot of the nodes, and sometimes during a run the card reports > no rx buffers or resources. This is caused by timing bugs in the 82559 chip. We didn't see the same problems with 82557 chips. > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > May 3 19:47:50 node04 kernel: eepro100.c: $Revision: 1.20.2.10 $ > 2000/05/31 Modified by Andrey V. Savochkin and others You should try a different driver. This is a modification of a very old driver version, as should be obvious by the long-obsolete URL. The current URL is http://www.scyld.com/network/eepro100.html Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From becker at scyld.com Sat May 5 12:56:28 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? In-Reply-To: <20010504150957.A29778@ha-web1.nrl.navy.mil> Message-ID: On Fri, 4 May 2001, Stephan Robert Sauerburger wrote: > > The Scyld documentation is found online at the Scyld site, and the cost of > > the personal CD is about $3.00 USD which is rather cheap through Linux > > Central. > Yeah, when I first saw the price at scyld's site, I was thinking "why > bother?".. It seems like it's little more than enough to compensate > for the price of the medium, and maybe a little for the S&H > troubles.. Why not just offer the iso image of the disc at an FTP site > of theirs to download ourselves and burn, and maybe save the trouble? It's less expensive to subsidize the cost of the CD than to provide the bandwidth to download the ISO image. While we like to think otherwise, bulk data is still transferred most effectively through the mail. The wholesale cost of bandwith is about $8-14/GB, with the typical retail cost of $20. That means a 660MB ISO image costs about $10-14, potentially at each side of the transfer depending on how your ISP handles "peerage". While we do provide FTP transfer for potential or current customers, almost no one that anonymously downloads the whole CD is likely to be a paying customer or an active developer. And the zero apparent cost of public FTP means that we end up with mirror sites that download an updated ISO without anyone ever having looking at the previous version. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From dvos12 at calvin.edu Sat May 5 18:48:47 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? In-Reply-To: Message-ID: On Sat, 5 May 2001, Donald Becker wrote: > While we do provide FTP transfer for potential or current customers, > almost no one that anonymously downloads the whole CD is likely to be a > paying customer or an active developer. And the zero apparent cost of > public FTP means that we end up with mirror sites that download an > updated ISO without anyone ever having looking at the previous version. I'm just curious what you mean by that last sentence. Do you mean "having looked at the changed from the previous version"? Not that it really matters. I'm just curious. David From dvos12 at calvin.edu Sat May 5 18:58:05 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:15 2009 Subject: Scyld: BIGMEM: More than 1GB support In-Reply-To: <001f01c0d4d2$75d61940$1f00a8c0@asacomputers.com> Message-ID: Recompile the kernel that comes with Scyld. Download the tarbal for bproc, then go ahead and recompile the kernel. Install bproc from the tarbal (There are bproc rpms, but I haven't gotten them to work). Then, rebuild the Phase 2 image. There is a menu option for that in beosetup, but I'm at home right now and can't look up the exact name. I don't think you have to rebuild the boot disks. All the nodes on our cluster have a Gig of RAM, and this method worked for me. David On Fri, 4 May 2001, Chetan Bhargava wrote: > Hi, > > My nodes have a gig of ram and master has 2 gig. Right now they are using > 960M only. If I compile the kernel for BIGMEM support, do I need to take > care of anything else or just recompile the kernel with BIGMEM support and > install it. > > How would I regenerate node boot disks? > > Thanks. > > Chetan Bhargava > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From rajkumar at csse.monash.edu.au Sun May 6 04:02:59 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:15 2009 Subject: CCGrid 2001: Final call for participation Message-ID: <3AF52F63.7CFC3E9E@csse.monash.edu.au> Dear Friends, Please find enclosed advance program and call for participation for the: CCGrid 2001: First ACM/IEEE International Symposium on Cluster Computing & the Grid http://www.ccgrid.org to be held in Brisbane, Australia (15-18 May 2001). The program consists of - 6 Keynote Speakers - 2 Invited Talks - 1 Panel session - 4 Industry/State-of-the-art talks - 82 technical papers - 7 workshops - 3 tutorials (open to all and FREE i.e., no extra fee) The conference also hosts poster and research exhibition sessions and the submissions for such poster papers is still open. We are expecting a large attendance. Please plan to participate and register early to take advantage of low registration fee. We are looking forward to welcome and see you in Brisbane! Thank you very much. Sincerely Yours, CCGrid 2001 Team http://www.ccgrid.org ------------------------------------------------------------------------------------- ######################################################################## # # # ### ### #### ##### ### #### #### ### ### ## # # # # # # # # # # # # # # # # # # # # # ## #### # # # # # # # # # # # # # # # # # # # # # # # # # # # # ### ### #### # # ### #### ##### ### ### ### # # # ######################################################################## First ACM/IEEE International Symposium on Cluster Computing & the Grid (CCGrid 2001) http://www.ccgrid.org | www.ccgrid2001.qut.edu.au 15-18 May 2001, Rydges Hotel, South Bank, Brisbane, Australia CALL FOR PARTICIPATION ---------------------- *** Early bird registration 31 March *** Keynotes ******** * The Anatomy of the Grid: Enabling Scalable Virtual Organizations Ian Foster, Argonne National Laboratory and the University of Chicago, USA * Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej Goscinski, Deakin University, Australia * Programming High Performance Applications in Grid Environments Domenico Laforenza, CNUCE-Institute of the Italian National Research Council, Italy * Global Internet Content Delivery Bruce Maggs, Carnegie Mellon University and Akamai Technologies, Inc., USA. * Grid RPC meets Data Grid: Network Enabled Services for Data Farming on the Grid Satoshi Matsuoka, Tokyo Institute of Technology, Japan * The Promise of InfiniBand for Cluster Computing Greg Pfister, IBM Server Technology & Architecture, Austin, USA Invited Plenary Talks ********************* * The World Wide Computer: Prospects for Parallel and Distributed Computing on the Web Gul A. Agha, University of Illinois, Urbana-Champaign (UIUC), USA * Terraforming Cyberspace Jeffrey M. Bradshaw, University of West Florida, USA Industry Plenary Talks ********************** * High Performance Computing at Intel: The OSCAR software solution stack for cluster computing Tim Mattson, Intel Corporation, USA * MPI/FT: Architecture and Taxonomies for Fault-Tolerant, Massage-Passing Middleware for Performance-Portable Parallel Computing Tony Skjellum, MPI Software Technology, Inc., USA * Effective Internet Grid Computing for Industrial Users Ming Xu, Platform Corporation, Canada * Sun Grid Engine: Towards Creating a Compute Power Grid Wolfgang Gentzsch, Sun Microsystems, USA FREE Tutorials ************** * The Globus Toolkit for Grid Computing Ian Foster, Argonne National Laboratory, USA * An Introduction to OpenMP Tim Mattson, Intel Corporation, USA * Three Tools to Help with Cluster and Grid Computing: ATLAS, PAPI, and NetSolve Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory, USA Panel ***** * The Grid: Moving it to Prime Time Moderator: David Abramson, Monash University, Australia. Symposium Mainstream Sessions ***************************** (Features 45 papers selected out of 126 submissions by peer review) * Component and Agent Approaches * Distributed Shared Memory * Grid Computing * Input/Output and Databases * Message Passing and Communication * Performance Evaluation * Scheduling and Load balancing * Tools for Management, Monitoring and Debugging Workshops ********* (Features 37 peer-reviewed papers selected by workshop organisers) * Agent based Cluster and Grid Computing * Cluster Computing Education * Distributed Shared Memory on Clusters * Global Computing on Personal Devices * Internet QoS for Global Computing * Object & Component Technologies for Cluster Computing * Scheduling and Load Balancing on Clusters Important Dates *************** * Early bird registration 31 March (register online, check out web site) * Tutorials & workshops 15 May * Symposium main stream & workshops 16-18 May Call for Poster/Research Exhibits: ********************************** Those interested in exhibiting poster papers, please contact Poster Chair Hai Jin (hjin@hust.edu.cn) or browse conference website for details. Sponsors ******** * IEEE Computer Society (www.computer.org) * IEEE Task Force on Cluster Computing (www.ieeetfcc.org) * Association for Computing Machinery (ACM) and SIGARCH (www.acm.org) * IEEE Technical Committee on Parallel Processing (TCPP) * Queensland Uni. of Technology (QUT), Australia (www.qut.edu.au) * Platform Computing, Canada (www.platform.com) * Australian Partnership for Advanced Computing (APAC) (www.apac.edu.au) * Society for Industrial and Applied Mathematics (SIAM, USA) (www.siam.org) * MPI Software Technology Inc., USA (www.mpi-softtech.com) * International Business Machines (IBM) (www.ibm.com) * Akamai Technologies, Inc., USA (www.akamai.com) * Sun Microsystems, USA (www.sun.com) * Intel Corporation, USA (www.intel.com) Further Information ******************* Please browse the symposium web site: http://www.ccgrid.org | www.ccgrid2001.qut.edu.au For specific clarifications, please contact one of the following: Conference Chairs: R. Buyya (rajkumar@buyya.com) or G. Mohay (mohay@fit.qut.edu.au) PC Chair: Paul Roe (ccgrid2001@qut.edu.au) ------------------------------------------------------------------------------------ From tkimball at tampabay.rr.com Sun May 6 10:08:24 2001 From: tkimball at tampabay.rr.com (Tim K.) Date: Wed Nov 25 01:01:15 2009 Subject: Mental Ray rendering Message-ID: <000f01c0d64f$29895380$f2e4a118@tampabay.rr.com> Where can I find information about setting up a cluster to run Mental Ray? From joey at infinitevoid.com Mon May 7 11:59:59 2001 From: joey at infinitevoid.com (Joey Echeverria) Date: Wed Nov 25 01:01:15 2009 Subject: lam on netbsd/alpha References: <000f01c0d64f$29895380$f2e4a118@tampabay.rr.com> Message-ID: <3AF6F0AF.8030306@infinitevoid.com> Has anyone had experience running lam-mpi on netbsd/alpha? If so some links to documentation would help. We seem to be having weird problems. Also, does anyone know if it would be a problem if the master node had netbsd 1.4 and the rest had 1.3? -- Joseph G. Echeverria Carnegie Mellon University Department of Electrical and Computer Engineering From Dean.Carpenter at pharma.com Mon May 7 12:48:38 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:15 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48E4@a1mbx01.pharma.com> Hmmm. Just a point of note. The base install appears to also install the non-SMP 2.2.17-33 kernel. I built a stage 2 boot image using it like this beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m /lib/modules/2.2.17-33.beo which seemed to work fine. When the node boots though, there are all kinds of module loading errors because it's still looking for modules in /lib/modules/2.2.17-33.beosmp. The master node is still running the SMP version. I know the docs say you should run the same kernel on the slaves as on the master - would that be an issue here ? These motherboards are based on the Via Apollo Pro chipset VT82C694X, VT82C686A. http://www.areasys.com/products/Motherboards/6vxdr7.htm David Vos - These are disked slaves, but they haven't gotten to the point of being able to partition the disks yet. Since some of the modules fail to load, a command like bpsh 0 df results in an error like df: BProc move failed. I'll do some tests once I get the 2.2.19 kernel compiled with the bproc patches. Scyld/Daniel - you don't have a pre-done rpm for a 2.2.19-xxSMP kernel package do you ? :) -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Wednesday, May 02, 2001 5:05 PM To: beowulf@beowulf.org Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Hi All - Just got some eval equipment in today to play with, with the Gigabyte GA-6vxdr7 motherboards in them. The NICs show up as EtherExpressPro 10/100 nics, pretty normal. These are dual P3 boards with dual 933 cpus and 512meg memory. The stage 1 boot goes fine, it gets an IP and grabs the stage 2 kernel fine. It's during the boot and init of the dual cpus that it barfs ... It leaves this on screen : : : CPU map: 3 Booting processor 1 eip 2000 Setting warm reset code and vector 1. 2. 3. Asserting INIT. Deasserting INIT. Sending STARTUP #1. After apic_write. Before start apic_write. Startup point 1. And there it sits. There's some more above the CPU map: 3 there, I can provide that as well. I have to run right now, but tomorrow I'll try the non-SMP kernel, see if it will actually boot. Otherwise, any ideas ? From rakhesh at cse.iitd.ernet.in Fri May 4 04:12:30 2001 From: rakhesh at cse.iitd.ernet.in (Rakhesh Sasidharan) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? Message-ID: Hi all, I was going through the beowulf.org site trying to find general information on beowulf's (and I did find lots), but there's something that I still am not clear about: Suppose I have 6-7 old machines lying around with me, is there any way I could use something like beowulf techniques to make them into a "super" computer ? I'm just a non-academic user, and the only use I think I can put this to could be to play games faster, or surfing, or plays mp3s etc ... Individually, the old machines (486s and above) wouldn't be efficient, so is it possible to combine them together and make things work ? I went through a couple of articles etc, and that gave me the impression contrary to what I am asking; but still I ask to make sure. :-) Or maybe it is possible to take the source code of normal apps, and beowulf-ify them ? Regards, __ Rakhesh From enigma at custard.org Fri May 4 05:34:47 2001 From: enigma at custard.org (Enigma) Date: Wed Nov 25 01:01:15 2009 Subject: Possbile uses of Beowulf ... ? In-Reply-To: Message-ID: the short answer is no, unless the app is designed to use a parallel architecture it won't work, if it is code/modified to use PVM/MPI or some other system then it *could* work, my original plan was for a fast quake 3 machine, then i did the research and found it wasn't possible, but i built my cluster anyway, I have used my old 486s and in a mad evening i worked out i would need roughly 100 of these to even come close to the raw processing power of my t-bird (486dx 33s v.s. t-bird 1ghz), and then there is the network lbottle neck to add on, but i built my cluster anyways a s research for myself, but i only useit to teach myself parallel programming techniques, it does nothing usefull, 486's just can't crunch that hard :) if you still want to go ahead it is a good learning experience (well i think so, but then i'm sad geek anyways :P) if you want more info or help with getting it running there are some really helpfull people on the list and some good info around online (redhat comes with PVM and MPI and scyld is a complete cluster software, but won't install on small amounts of memory so i can't even try it) hope this helps Jez On Fri, 4 May 2001, Rakhesh Sasidharan wrote: > Date: Fri, 4 May 2001 16:47:53 +0530 (IST) > From: Rakhesh Sasidharan > To: beowulf > Subject: Possbile uses of Beowulf ... ? > > > Hi all, > > I was going through the beowulf.org site trying to find general > information on beowulf's (and I did find lots), but there's something that > I still am not clear about: > > Suppose I have 6-7 old machines lying around with me, is there any way I > could use something like beowulf techniques to make them into a "super" > computer ? I'm just a non-academic user, and the only use I think I can > put this to could be to play games faster, or surfing, or plays mp3s > etc ... Individually, the old machines (486s and above) wouldn't be > efficient, so is it possible to combine them together and make things > work ? > > I went through a couple of articles etc, and that gave me the impression > contrary to what I am asking; but still I ask to make sure. :-) Or maybe > it is possible to take the source code of normal apps, and beowulf-ify > them ? > > Regards, > __ > Rakhesh > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > ________________________________________________________ PGP key is here -> http://www.computerbooth.com/pgp.html * If debugging is the process of removing bugs, then programming must be the process of putting them in. From jmdavis at hsc.vcu.edu Fri May 4 13:57:30 2001 From: jmdavis at hsc.vcu.edu (Mike Davis) Date: Wed Nov 25 01:01:15 2009 Subject: G98 standard benchmarks References: <20010504164030.J6317@velocet.ca> Message-ID: <3AF317BA.E1B87256@hsc.vcu.edu> I chose to benchmark my cluster with GAMESS and G98 as a means of determining realworld performance. I used the molecules used in the paper Commodity Cluster Computing for Computational Chemistry by the University of Adelaide. http://dhpc.adelaide.edu.au/reports/073/html/dhpc-073.html The results of these tests helped to sell the use of the cluster to users in our Chemistry, Physics, and Pharmocology departments. Mike Velocet wrote: > Someone mentioned they ran a few of the G98 test jobs as a standard > benchmark. Is this an 'official' standard, or just something made up > on the fly? Is there such an 'official standard benchmark', or should > we make up one for comparing our results? :) (Should we just run > all tests? I think there are over 500 of them though.) > > /kc > -- > Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Mike Davis University Computing Services-MCV Unix Systems Administrator Virginia Commonwealth University jmdavis@hsc.vcu.edu 804-828-9843 x142 (fax: 804-828-9807) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010504/795195de/attachment.html From renambot at cs.vu.nl Thu May 3 05:14:04 2001 From: renambot at cs.vu.nl (Luc Renambot) Date: Wed Nov 25 01:01:15 2009 Subject: Thunder HEsl Message-ID: <3AF14B8C.67EA0F87@cs.vu.nl> Hi, Is there anyone using the Thunder HEsl (S2567) Motherboard with an AGP graphic board (GeForce for example) ? I'm looking for some feedback on the AGP performance, as I read that there was some performance issues on that aspect. The global idea is to find a board with high PCI throughput for the network, and with good AGP performance (for a graphic cluster). thanks for any feedback or suggestion, cheers, Luc. -- Luc Renambot Mail: renambot@cs.vu.nl - Web : http://www.cs.vu.nl/~renambot/vr There's a crack in everything, that's how the light gets in. (L.C.) From renambot at cs.vu.nl Thu May 3 05:14:04 2001 From: renambot at cs.vu.nl (Luc Renambot) Date: Wed Nov 25 01:01:15 2009 Subject: Thunder HEsl Message-ID: <3AF14B8C.67EA0F87@cs.vu.nl> Hi, Is there anyone using the Thunder HEsl (S2567) Motherboard with an AGP graphic board (GeForce for example) ? I'm looking for some feedback on the AGP performance, as I read that there was some performance issues on that aspect. The global idea is to find a board with high PCI throughput for the network, and with good AGP performance (for a graphic cluster). thanks for any feedback or suggestion, cheers, Luc. -- Luc Renambot Mail: renambot@cs.vu.nl - Web : http://www.cs.vu.nl/~renambot/vr There's a crack in everything, that's how the light gets in. (L.C.) From i_rkhan at hotmial.com Tue May 1 20:08:11 2001 From: i_rkhan at hotmial.com (Irfan R Khan) Date: Wed Nov 25 01:01:15 2009 Subject: Fw: Help Needed to Install Beowulf Message-ID: ----- Original Message ----- From: Irfan at Hotmail To: Beowulf@beowulf.org Sent: Monday, April 30, 2001 8:53 AM Subject: Help Needed to Install Beowulf Hi Guys I am starter in cluster business and need help I have an Alpha up2k and Lx164 plus some intel M/c's basically this is a trial if it succeeds than I would be Installing it in a larger way.I have tried Installing and doing all the stuff given below in RH 6.2 /7.0 SuSe 7.0 Installed Os Patched a file with the Source Code . i.e cd /usr/src/linux and then patch -p1 > /path of beo patch Activated the Beowulf Part in Compiling Kernel i.e make menuconfig general settings marked beowulf option (3 lines ) make dep clean boot. HERE the problem starts its not able to make an image it says there is no beowulf directory ..........or something like that ???? Can Anyone tell me how can I make a beowulf Cluster What are the Softwares Reqd. and Which Site is that , I will be grateful if I get this information. Thanks Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010502/3d148d84/attachment.html From jsmith at structbio.vanderbilt.edu Wed May 2 19:05:08 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:15 2009 Subject: Scyld: Channel bonding? Message-ID: <3AF0BCD4.BBE98EAB@structbio.vanderbilt.edu> I'm trying to get a handle on supported hardware configs in order to make some purchasing decisions... Is channel bonding well-supported with Intel controllers in the Scyld package? In what instances can I expect channel bonding to improve real-world performance (and by approx how much)? NFS? Bootstrapping the slaves? TIA -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From jsmith at structbio.vanderbilt.edu Wed May 2 19:21:51 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Myrinet config? Message-ID: <3AF0C0BF.FAF7213@structbio.vanderbilt.edu> Please critique my system design before I waste money. :) I know I can make this work if I put it together with a full install of e.g. RH on each node, but I'm very interested in the Scyld software, too... I want 1 management node with three 10/100 NICs, and 16 slave (compute) nodes, each with two NICs and a Myrinet card. Two NICs from each of the 17 nodes will ideally be channel bonded and handle all the TCP/IP traffic. The third NIC in the mgmt node will connect the cluster to the outside world. The Myrinet on the slaves will be dedicated to MPI. Ideally, the mgmt node will not be a compute node, and will not have a Myrinet card in it. It will not run computations. Is it the Scyld beowulf software's "philosophy" to make use of the management node as a compute node, or can you set it up to be a dedicated login/gateway/compiler/management server? Does this all sound reasonable or have I missed an important point in the design? Thanks. -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From i_rkhan at hotmial.com Tue May 1 20:08:11 2001 From: i_rkhan at hotmial.com (Irfan R Khan) Date: Wed Nov 25 01:01:16 2009 Subject: Fw: Help Needed to Install Beowulf Message-ID: ----- Original Message ----- From: Irfan at Hotmail To: Beowulf@beowulf.org Sent: Monday, April 30, 2001 8:53 AM Subject: Help Needed to Install Beowulf Hi Guys I am starter in cluster business and need help I have an Alpha up2k and Lx164 plus some intel M/c's basically this is a trial if it succeeds than I would be Installing it in a larger way.I have tried Installing and doing all the stuff given below in RH 6.2 /7.0 SuSe 7.0 Installed Os Patched a file with the Source Code . i.e cd /usr/src/linux and then patch -p1 > /path of beo patch Activated the Beowulf Part in Compiling Kernel i.e make menuconfig general settings marked beowulf option (3 lines ) make dep clean boot. HERE the problem starts its not able to make an image it says there is no beowulf directory ..........or something like that ???? Can Anyone tell me how can I make a beowulf Cluster What are the Softwares Reqd. and Which Site is that , I will be grateful if I get this information. Thanks Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010502/3d148d84/attachment-0001.html From i_rkhan at hotmial.com Tue May 1 20:08:11 2001 From: i_rkhan at hotmial.com (Irfan R Khan) Date: Wed Nov 25 01:01:16 2009 Subject: Fw: Help Needed to Install Beowulf Message-ID: ----- Original Message ----- From: Irfan at Hotmail To: Beowulf@beowulf.org Sent: Monday, April 30, 2001 8:53 AM Subject: Help Needed to Install Beowulf Hi Guys I am starter in cluster business and need help I have an Alpha up2k and Lx164 plus some intel M/c's basically this is a trial if it succeeds than I would be Installing it in a larger way.I have tried Installing and doing all the stuff given below in RH 6.2 /7.0 SuSe 7.0 Installed Os Patched a file with the Source Code . i.e cd /usr/src/linux and then patch -p1 > /path of beo patch Activated the Beowulf Part in Compiling Kernel i.e make menuconfig general settings marked beowulf option (3 lines ) make dep clean boot. HERE the problem starts its not able to make an image it says there is no beowulf directory ..........or something like that ???? Can Anyone tell me how can I make a beowulf Cluster What are the Softwares Reqd. and Which Site is that , I will be grateful if I get this information. Thanks Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010502/3d148d84/attachment-0002.html From jsmith at structbio.vanderbilt.edu Wed May 2 19:05:08 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Channel bonding? Message-ID: <3AF0BCD4.BBE98EAB@structbio.vanderbilt.edu> I'm trying to get a handle on supported hardware configs in order to make some purchasing decisions... Is channel bonding well-supported with Intel controllers in the Scyld package? In what instances can I expect channel bonding to improve real-world performance (and by approx how much)? NFS? Bootstrapping the slaves? TIA -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From Pascual.Asensi at uv.es Thu May 3 02:44:52 2001 From: Pascual.Asensi at uv.es (Pascual Asensi) Date: Wed Nov 25 01:01:16 2009 Subject: Problem with slave nodes Message-ID: <005401c0d3b5$b4442500$da229c93@barbol.uv.es> Hello. I?m instaling Beowulf cluster (scyld distribution). The server work fine, but when I start slave nodes...... boot: Connecting to 192.168.0.100:1555 neighbour table overflow. In the second Monte fase /proc/sys/kernel/read_root_dev No such file or directory VFS:Cannot open root device 03:01 Kernel panic: VFS:Unable to mount root fs on 03:01 Can you help me? From jsmith at structbio.vanderbilt.edu Wed May 2 19:05:08 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Channel bonding? Message-ID: <3AF0BCD4.BBE98EAB@structbio.vanderbilt.edu> I'm trying to get a handle on supported hardware configs in order to make some purchasing decisions... Is channel bonding well-supported with Intel controllers in the Scyld package? In what instances can I expect channel bonding to improve real-world performance (and by approx how much)? NFS? Bootstrapping the slaves? TIA -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From jsmith at structbio.vanderbilt.edu Wed May 2 19:21:51 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Myrinet config? Message-ID: <3AF0C0BF.FAF7213@structbio.vanderbilt.edu> Please critique my system design before I waste money. :) I know I can make this work if I put it together with a full install of e.g. RH on each node, but I'm very interested in the Scyld software, too... I want 1 management node with three 10/100 NICs, and 16 slave (compute) nodes, each with two NICs and a Myrinet card. Two NICs from each of the 17 nodes will ideally be channel bonded and handle all the TCP/IP traffic. The third NIC in the mgmt node will connect the cluster to the outside world. The Myrinet on the slaves will be dedicated to MPI. Ideally, the mgmt node will not be a compute node, and will not have a Myrinet card in it. It will not run computations. Is it the Scyld beowulf software's "philosophy" to make use of the management node as a compute node, or can you set it up to be a dedicated login/gateway/compiler/management server? Does this all sound reasonable or have I missed an important point in the design? Thanks. -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From jsmith at structbio.vanderbilt.edu Wed May 2 19:21:51 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod A. Smith) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Myrinet config? Message-ID: <3AF0C0BF.FAF7213@structbio.vanderbilt.edu> Please critique my system design before I waste money. :) I know I can make this work if I put it together with a full install of e.g. RH on each node, but I'm very interested in the Scyld software, too... I want 1 management node with three 10/100 NICs, and 16 slave (compute) nodes, each with two NICs and a Myrinet card. Two NICs from each of the 17 nodes will ideally be channel bonded and handle all the TCP/IP traffic. The third NIC in the mgmt node will connect the cluster to the outside world. The Myrinet on the slaves will be dedicated to MPI. Ideally, the mgmt node will not be a compute node, and will not have a Myrinet card in it. It will not run computations. Is it the Scyld beowulf software's "philosophy" to make use of the management node as a compute node, or can you set it up to be a dedicated login/gateway/compiler/management server? Does this all sound reasonable or have I missed an important point in the design? Thanks. -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From Pascual.Asensi at uv.es Thu May 3 02:44:52 2001 From: Pascual.Asensi at uv.es (Pascual Asensi) Date: Wed Nov 25 01:01:16 2009 Subject: Problem with slave nodes Message-ID: <005401c0d3b5$b4442500$da229c93@barbol.uv.es> Hello. I?m instaling Beowulf cluster (scyld distribution). The server work fine, but when I start slave nodes...... boot: Connecting to 192.168.0.100:1555 neighbour table overflow. In the second Monte fase /proc/sys/kernel/read_root_dev No such file or directory VFS:Cannot open root device 03:01 Kernel panic: VFS:Unable to mount root fs on 03:01 Can you help me? From Pascual.Asensi at uv.es Thu May 3 02:44:52 2001 From: Pascual.Asensi at uv.es (Pascual Asensi) Date: Wed Nov 25 01:01:16 2009 Subject: Problem with slave nodes Message-ID: <005401c0d3b5$b4442500$da229c93@barbol.uv.es> Hello. I?m instaling Beowulf cluster (scyld distribution). The server work fine, but when I start slave nodes...... boot: Connecting to 192.168.0.100:1555 neighbour table overflow. In the second Monte fase /proc/sys/kernel/read_root_dev No such file or directory VFS:Cannot open root device 03:01 Kernel panic: VFS:Unable to mount root fs on 03:01 Can you help me? From rgb at phy.duke.edu Mon May 7 14:20:53 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:16 2009 Subject: Possbile uses of Beowulf ... ? In-Reply-To: Message-ID: On Fri, 4 May 2001, Rakhesh Sasidharan wrote: > > Hi all, > > I was going through the beowulf.org site trying to find general > information on beowulf's (and I did find lots), but there's something that > I still am not clear about: > > Suppose I have 6-7 old machines lying around with me, is there any way I > could use something like beowulf techniques to make them into a "super" > computer ? I'm just a non-academic user, and the only use I think I can > put this to could be to play games faster, or surfing, or plays mp3s > etc ... Individually, the old machines (486s and above) wouldn't be > efficient, so is it possible to combine them together and make things > work ? > > I went through a couple of articles etc, and that gave me the impression > contrary to what I am asking; but still I ask to make sure. :-) Or maybe > it is possible to take the source code of normal apps, and beowulf-ify > them ? This is pretty much a FAQ. The answer is yes, although you may not really want to once you understand the cost-benefit of the setup. The problem with using 8 486's in a home or school beowulf for anything other than education (e.g. learning to parallel program) or play is that you'll pay almost as much for power and air conditioning them for a year as it would cost to buy a modern system several times faster than all of them put together. 8 486's are really pretty useless numerically, even in aggregate, by this point. 8 Pentiums wouldn't be much better, and even 8 Pentium Pros (at a presumed maximum of 200 MHz) would only barely beat breaking even on a single 1200 MHz Athlon (without the need to parallel program or face Amdahl's Law with the latter). Moore's Law is brutal to old hardware. It is perfectly reasonable to combine old systems like this for fun or for an educational project, and quite possible that you can find something useful or entertaining to do with your beowulf once you've done so. Just don't pretend to be doing it so that you can get any particular "real" work done as fast as possible, even with a yearly budget of only ~$500-700 to play with... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From brian at posthuman.com Mon May 7 15:36:50 2001 From: brian at posthuman.com (Brian Atkins) Date: Wed Nov 25 01:01:16 2009 Subject: 336 servers in one rack Message-ID: <3AF72382.6159FB70@posthuman.com> http://www.linuxgram.com/newsitem.phtml?sid=108&aid=12209 Using Transmeta CPUs: "Up against your standard 1U server or server appliance, the Texans have reportedly clocked the thing at eight times the density, five-10 times the power savings - California, pay attention, it's supposedly 15 watts versus 75 watts under load, seven watts versus 75 watts idle - six times lower operating costs and four times the number of web pages served per square foot" "The 324 reportedly consumes 80% less power and generates 80% less heat at peak performance than traditional web servers." the big downside is low CPU performance for the $$$$ -- Brian Atkins Director, Singularity Institute for Artificial Intelligence http://www.singinst.org/ From newt at scyld.com Mon May 7 16:29:08 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:16 2009 Subject: scyld scsi support In-Reply-To: <3AED905B.FF2A091A@home.com> Message-ID: Mitch, > just about to get started with scyld cluster (once i receive the cdrom > from linuxcentral). > recommended hardware lists IDE, but no SCSI. > > is SCSI supported for the boot disk? SCSI usually works just fine with the Scyld Beowulf software -- although the level of 'support' provided with the $2.00 CD is 'label side up'. I'm not sure what you mean by the 'boot disk'. If you mean the CD as used to install the master, then yes. If you mean the CD (or node floppy) as used to boot the nodes, then sort-of. The Scyld node boot process has a number of different phases that come into play here. The first kernel we boot doesn't know anything about SCSI -- cheer up -- it doesn't know anything about IDE either. All it knows how to do is grab a kernel over the network and jump to it (via 2-kernel monte). The second-phase kernel can (and by default does) support SCSI. You can also use tools like 'insmod' and 'modprobe' to plug new modules into node kernels after your nodes are up. Under scyld, 'insmod' and 'modprobe' take the additional argument '--node ' and use this as a target kernel to insert modules into. Regards, Dan Ridge Scyld Computing Corporation From newt at scyld.com Mon May 7 16:40:21 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Myrinet config? In-Reply-To: <3AF0C0BF.FAF7213@structbio.vanderbilt.edu> Message-ID: Jarrod, On Wed, 2 May 2001, Jarrod A. Smith wrote: > Ideally, the mgmt node will not be a compute node, and will not have a > Myrinet card in it. It will not run computations. Is it the Scyld > beowulf software's "philosophy" to make use of the management node as a > compute node, or can you set it up to be a dedicated > login/gateway/compiler/management server? This can be done. When running Scyld, The Chiba City cluster at Argonne National Lab operates exactly this way. On that machine, you can even use the magic Scyld inline-mpirun system to invoke jobs from the master as transparently as if a copy of the program were running on the master. Having said that -- I really like to be able to place a rank from an MPI job on the master. This allows me to be able to access filesystems that might not be available on the nodes, contact the outside world if I choose, open the X display with less hassle, etc. You can certianly develop codes that involve a copy of the job on the master -- but configure that rank not to participate in the expensive parts of your code. Regards, Dan Ridge Scyld Computing Corporation From dvos12 at calvin.edu Mon May 7 17:45:42 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:16 2009 Subject: Problem with slave nodes In-Reply-To: <005401c0d3b5$b4442500$da229c93@barbol.uv.es> Message-ID: On Thu, 3 May 2001, Pascual Asensi wrote: > > > Hello. > > I´m instaling Beowulf cluster (scyld distribution). The server work > fine, but when I start slave nodes...... > > > boot: Connecting to 192.168.0.100:1555 neighbour table overflow. I have this message, and I don't know if it is an error. I don't know of any problems caused by it. There was some discussion on "neighbor table overflow" on this list awhile back, but I didn't figure out what it really means. > > In the second Monte fase > > /proc/sys/kernel/read_root_dev > No such file or directory > VFS:Cannot open root device 03:01 > Kernel panic: VFS:Unable to mount root fs on 03:01 This looks very similar to an error I had before. I had to recompile the kernel to include RAM Disk support. This fixed it for me. I didn't write the error down, but this looks like what I had. > > Can you help me? > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From mas at ucla.edu Mon May 7 17:59:35 2001 From: mas at ucla.edu (Michael Stein) Date: Wed Nov 25 01:01:16 2009 Subject: Problem with slave nodes In-Reply-To: ; from David Vos on Mon, May 07, 2001 at 08:45:42PM -0400 References: <005401c0d3b5$b4442500$da229c93@barbol.uv.es> Message-ID: <20010507175935.A17751@mas1.oac.ucla.edu> > > boot: Connecting to 192.168.0.100:1555 neighbour table overflow. > > I have this message, and I don't know if it is an error. I don't know of > any problems caused by it. There was some discussion on "neighbor table > overflow" on this list awhile back, but I didn't figure out what it really > means. loopback device isn't active! (ifconfig lo) Just ran into it on a plain RH 7.0 system. (has something to do with arp (neighbour) table) From newt at scyld.com Mon May 7 19:40:38 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:16 2009 Subject: Possbile uses of Beowulf ... ? In-Reply-To: Message-ID: On Fri, 4 May 2001, Enigma wrote: > if you want more info or help with getting it running there are some > really helpfull people on the list and some good info around online > (redhat comes with PVM and MPI and scyld is a complete cluster software, > but won't install on small amounts of memory so i can't even try it) How small are we talking here? I've run VMware nodes with as little as 12MB with the Scyld Beowulf software. Regards, Dan Ridge Scyld Computing Corporation From orlandorocha at digi.com.br Tue May 8 04:48:53 2001 From: orlandorocha at digi.com.br (Orlando Donato Rocha Filho) Date: Wed Nov 25 01:01:16 2009 Subject: Bond-I have problems! Message-ID: What's sequence to install the bond0 in my cluster? How do you do the MAC of bond0, MAC of eth1 and MAC eth2 have the same MAC? Orlando Rocha Prof. Sistemas Digitais - CEFET-MA/BR Administração de Redes com Windows NT Server e Linux. --------------------------------------------- Webmail Diginet - Internet é Diginet. http://www.digi.com.br/ From Eugene.Leitl at lrz.uni-muenchen.de Tue May 8 08:11:36 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:16 2009 Subject: Beowulf clusters in the industry (Europe/Germany) Message-ID: Are any of you aware of a Beowulf system deployed in the European/German industry? If yes, I'm interested in hearing your experiences in relation to operational costs (air conditioning, maintenance, administration, etc.) TIA, -- Eugene From ksfacinelli at yahoo.com Tue May 8 09:54:21 2001 From: ksfacinelli at yahoo.com (Kevin Facinelli) Date: Wed Nov 25 01:01:16 2009 Subject: Cooling experts out there, some help please Message-ID: <20010508165421.999.qmail@web13503.mail.yahoo.com> I need some help from the cooling experts out there. What I am interested in a chart that define maximum internal die temp / Die surface temp / Heat sink temp for intel processors. I realize there are whole bunch of variables but I would like to get a rough idea. The chart could also specify different heatsink material types CU,Al... At the least I would like to see some of the off the shelf heatsinks that have been through testing that compare these variables for Intel Processors. Thank you ahead of time for you input, Kevin ===== Kevin Facinelli www.colosource.com webmaster@colosource.com __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From Dean.Carpenter at pharma.com Tue May 8 10:23:13 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:16 2009 Subject: Cooling experts out there, some help please Message-ID: <759FC8B57540D311B14E00902727A0C002EC48EA@a1mbx01.pharma.com> Couple of sites that are good to check with ... http://www.heatsink-guide.com http://www.overclockers.com You can get more from there. Lots of information out there. Don't know about max temps though - you mean max before the cpu fails ? That varies from cpu to cpu, even within the same lot. -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Kevin Facinelli [mailto:ksfacinelli@yahoo.com] Sent: Tuesday, May 08, 2001 12:54 PM To: beowulf@beowulf.org Subject: Cooling experts out there, some help please I need some help from the cooling experts out there. What I am interested in a chart that define maximum internal die temp / Die surface temp / Heat sink temp for intel processors. I realize there are whole bunch of variables but I would like to get a rough idea. The chart could also specify different heatsink material types CU,Al... At the least I would like to see some of the off the shelf heatsinks that have been through testing that compare these variables for Intel Processors. Thank you ahead of time for you input, From alvin at Mail.Linux-Consulting.com Tue May 8 10:45:18 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:16 2009 Subject: Cooling experts out there, some help please In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC48EA@a1mbx01.pharma.com> Message-ID: hi Kevin I know of some heatsink and fan vendors.... ( dont have any finite element apps to show cooling the intel i810 based chipsets does support i2c to be able to read the cpu temperature, voltages and fan speed ( motherboard health monitoring ) - we did quite a few experiments with it - - turned out that the smaller the heatsink, the - better the cooling was .. a small fan can only - cool so much metal of the heatsink the Electronics Cooling magazine is a good starting point of heatsink/fan info ... if you're not restricted to1U systems... the generic heatsink that comes with the intel CPU should work fine if you're trying to fit Intels P3-866Mhz or p4-1.4GHz machines into the 1U cases.... we have problems with cooling the P4 and all 1U vendors have to customize the chassis for the P4 and AMD motherboards thanx alvin http://www.Linux-1U.net/Parts .. heatsinks, fans, On Tue, 8 May 2001, Carpenter, Dean wrote: > Couple of sites that are good to check with ... > > http://www.heatsink-guide.com > http://www.overclockers.com > > You can get more from there. Lots of information out there. Don't know > about max temps though - you mean max before the cpu fails ? That varies > from cpu to cpu, even within the same lot. > > -- > Dean Carpenter > Principal Architect > Purdue Pharma > dean.carpenter@pharma.com > deano@areyes.com > 94TT :) > > > -----Original Message----- > From: Kevin Facinelli [mailto:ksfacinelli@yahoo.com] > Sent: Tuesday, May 08, 2001 12:54 PM > To: beowulf@beowulf.org > Subject: Cooling experts out there, some help please > > > I need some help from the cooling experts out there. > What I am interested in a chart that define maximum > internal die temp / Die surface temp / Heat sink temp > for intel processors. I realize there are whole bunch > of variables but I would like to get a rough idea. > The chart could also specify different heatsink > material types CU,Al... > > At the least I would like to see some of the off the > shelf heatsinks that have been through testing that > compare these variables for Intel Processors. > > Thank you ahead of time for you input, > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From joeyraheb at usa.net Tue May 8 10:54:57 2001 From: joeyraheb at usa.net (Joey Raheb) Date: Wed Nov 25 01:01:16 2009 Subject: linpack Message-ID: <20010508175457.6338.qmail@aw161.netaddress.usa.net> Hello everyone, I was wondering if anyone has had any expierence with the LINPack Benchmark. I am having a great deal of difficulty getting this benchmark to go on a single CPU Alpha Workstation. Can anyone tell me what I have to do to get it going. The package tells me that I need mpi, however, if I am testing only a one CPU system would this be necessary? How do I go about doing the 'tuning' that they speak of in their documentation. I downloaded their FAQ which helps a little, but I am new to this whole benchmark routine (at least this type of benchmarking). I would appreciate any information anyone can give me. Thanks, Joey ____________________________________________________________________ Get free email and a permanent address at http://www.amexmail.com/?A=1 From germ at home.com Tue May 8 10:59:36 2001 From: germ at home.com (germ@home.com) Date: Wed Nov 25 01:01:16 2009 Subject: Possbile uses of Beowulf ... ? References: Message-ID: <3AF83408.69602AB4@home.com> can you tell me more about VMware on the cluster. i hadn't thought of that use until i saw your posting. thx... --mitch Daniel Ridge wrote: > On Fri, 4 May 2001, Enigma wrote: > > > if you want more info or help with getting it running there are some > > really helpfull people on the list and some good info around online > > (redhat comes with PVM and MPI and scyld is a complete cluster software, > > but won't install on small amounts of memory so i can't even try it) > > How small are we talking here? I've run VMware nodes with as little as > 12MB with the Scyld Beowulf software. > > Regards, > Dan Ridge > Scyld Computing Corporation From tibbs at math.uh.edu Tue May 8 11:04:04 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld: Myrinet config? In-Reply-To: Daniel Ridge's message of "Mon, 7 May 2001 19:40:21 -0400 (EDT)" References: Message-ID: >>>>> "DR" == Daniel Ridge writes: [Myrinet cluster with front end not doing computations and not connected to Myrinet] DR> This can be done. When running Scyld, The Chiba City cluster at Argonne DR> National Lab operates exactly this way. Ah, really? When I asked about this a while ago, I was informed that it would require nontrivial hacking. Unfortunately my cluster already exists that way and I have no further funding on that project to buy a Scyld support contract so I let it drop. If the situation has improved then I'll have to have another look at Scyld. I guess the important issue was dealing with the mapper. It has to go on one of the compute nodes, since the front end isn't on the Myrinet. - J< From Dean.Carpenter at pharma.com Tue May 8 11:11:45 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48EB@a1mbx01.pharma.com> OK. Progress, but not in the right direction :) Here's what I did, and I'll be detailed so hopefully someone will notice what I missed/typoed/screwedup ... Got 2.2.19 from kernel.org, grabbed the bproc-2.2.tar.bz2 from Scyld. Patched the kernel source - took a little tweaking, some things had changed. But it appears to have gone in OK. make menuconfig Turn all sorts of things, most unnecessary, but there to more or less match up what the 2.2.17 menuconfig said. make dep make -j 4 bzImage make -j 4 modules make modules_install mv arch/i386/boot/bzImage /boot/vmlinuz-2.2.19 Copied the /boot/initrd-2.2.17-33.beosmp.img to /tmp/initrd-2.2.19.img.gz , gunzipped it, mounted it on /mnt. Replaced the aic7xxx.o with the 2.2.19 version. That was the only module being loaded for the master node. mount -o loop initrd-2.2.19.img /mnt cp /lib/modules/2.2.19/scsi/aic7xxx.o /mnt/lib umount /mnt gzip -9 /tmp/initrd-2.2.19.img mv /tmp/initrd-2.2.19.img.gz /boot/initrd-2.2.19.img Added the 2.2.19 kernel and initrd to /etc/lilo.conf, and rebooted. bproc failures - not installed yet, but that was expected. Now running 2.2.19 on the master node. Built bproc stuff. That seemed to go OK as well. The INSTALL file didn't quite seem to match the actual though. make make install Modules loaded cleanly. Nice. Copied the modules to the right place. cp vmadump/vmadump.o /lib/modules/2.2.19/misc cp ksyscall/ksyscall.o /lib/modules/2.2.19/misc cp bproc/bproc.o /lib/modules/2.2.19/misc Rebooted to see that they load during the boot. Works fine. Nice. So now the master node is running 2.2.19 patched with bproc, and appears to be fine. Time to build a netboot stage 2 image. beoboot -d -2 -n -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 Check the debug output. Looks good, it grabbed 2.2.19 kernel and the right modules. OK, boot one of the new eval nodes - everything seems to go OK, but only seems to. As the stage 2 kernel boots, the screen goes black for about 10 seconds, then it coldboots. Dang it. Redid the netboot image with noapic just in case ... beoboot -d -2 -n -c noapic -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 No go. Same thing. Dang it :( My next step is to build a 2.2.19 kernel with only what's needed for the master and compute nodes. Although not completely homogenous, it will be pretty close. Another option is to try the latest Alan Cox 2.2.19 ... Hmmm. I think I'll grab that first - more chance of Via chipset fixes in there. These eval nodes came with Redhat 7.1 base install with 2.4.x kernel. That comes up fine in SMP mode, so that's another (albeit more painful) option. How hard is it to patch bproc etc into 2.4.x ? -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Monday, May 07, 2001 3:49 PM To: beowulf@beowulf.org Cc: 'David Vos' Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Hmmm. Just a point of note. The base install appears to also install the non-SMP 2.2.17-33 kernel. I built a stage 2 boot image using it like this beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m /lib/modules/2.2.17-33.beo which seemed to work fine. When the node boots though, there are all kinds of module loading errors because it's still looking for modules in /lib/modules/2.2.17-33.beosmp. The master node is still running the SMP version. I know the docs say you should run the same kernel on the slaves as on the master - would that be an issue here ? These motherboards are based on the Via Apollo Pro chipset VT82C694X, VT82C686A. http://www.areasys.com/products/Motherboards/6vxdr7.htm David Vos - These are disked slaves, but they haven't gotten to the point of being able to partition the disks yet. Since some of the modules fail to load, a command like bpsh 0 df results in an error like df: BProc move failed. I'll do some tests once I get the 2.2.19 kernel compiled with the bproc patches. Scyld/Daniel - you don't have a pre-done rpm for a 2.2.19-xxSMP kernel package do you ? :) -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) From Dean.Carpenter at pharma.com Tue May 8 11:36:01 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48EC@a1mbx01.pharma.com> Huh - interesting. I just rebuilt a netboot image using the UP 2.2.17 from Scyld ... beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m /lib/modules/2.2.17-33.beo/ Rebooted a compute node. It comes up in UP as expected, but no NFS. Checking the /var/log/beowulf/node.0 file, it was trying to load modules (sunrpc specifically) from /lib/modules/2.2.19/misc. Now the master node is running 2.2.19. But why would the compute node try to load 2.2.19 modules ? I thought the beoboot script build a boot.img file that contains the kernel and modules ... Have to scan through beoboot ... -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Tuesday, May 08, 2001 2:12 PM To: beowulf@beowulf.org Cc: 'David Vos' Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard OK. Progress, but not in the right direction :) Here's what I did, and I'll be detailed so hopefully someone will notice what I missed/typoed/screwedup ... Got 2.2.19 from kernel.org, grabbed the bproc-2.2.tar.bz2 from Scyld. Patched the kernel source - took a little tweaking, some things had changed. But it appears to have gone in OK. make menuconfig Turn all sorts of things, most unnecessary, but there to more or less match up what the 2.2.17 menuconfig said. make dep make -j 4 bzImage make -j 4 modules make modules_install mv arch/i386/boot/bzImage /boot/vmlinuz-2.2.19 Copied the /boot/initrd-2.2.17-33.beosmp.img to /tmp/initrd-2.2.19.img.gz , gunzipped it, mounted it on /mnt. Replaced the aic7xxx.o with the 2.2.19 version. That was the only module being loaded for the master node. mount -o loop initrd-2.2.19.img /mnt cp /lib/modules/2.2.19/scsi/aic7xxx.o /mnt/lib umount /mnt gzip -9 /tmp/initrd-2.2.19.img mv /tmp/initrd-2.2.19.img.gz /boot/initrd-2.2.19.img Added the 2.2.19 kernel and initrd to /etc/lilo.conf, and rebooted. bproc failures - not installed yet, but that was expected. Now running 2.2.19 on the master node. Built bproc stuff. That seemed to go OK as well. The INSTALL file didn't quite seem to match the actual though. make make install Modules loaded cleanly. Nice. Copied the modules to the right place. cp vmadump/vmadump.o /lib/modules/2.2.19/misc cp ksyscall/ksyscall.o /lib/modules/2.2.19/misc cp bproc/bproc.o /lib/modules/2.2.19/misc Rebooted to see that they load during the boot. Works fine. Nice. So now the master node is running 2.2.19 patched with bproc, and appears to be fine. Time to build a netboot stage 2 image. beoboot -d -2 -n -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 Check the debug output. Looks good, it grabbed 2.2.19 kernel and the right modules. OK, boot one of the new eval nodes - everything seems to go OK, but only seems to. As the stage 2 kernel boots, the screen goes black for about 10 seconds, then it coldboots. Dang it. Redid the netboot image with noapic just in case ... beoboot -d -2 -n -c noapic -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 No go. Same thing. Dang it :( My next step is to build a 2.2.19 kernel with only what's needed for the master and compute nodes. Although not completely homogenous, it will be pretty close. Another option is to try the latest Alan Cox 2.2.19 ... Hmmm. I think I'll grab that first - more chance of Via chipset fixes in there. These eval nodes came with Redhat 7.1 base install with 2.4.x kernel. That comes up fine in SMP mode, so that's another (albeit more painful) option. How hard is it to patch bproc etc into 2.4.x ? From Dean.Carpenter at pharma.com Tue May 8 13:30:38 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:16 2009 Subject: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Message-ID: <759FC8B57540D311B14E00902727A0C002EC48EE@a1mbx01.pharma.com> Heh. I'm baaaack. Got more weirdness. The cluster is working. That's the good news. How I got there is odd though ... Note in the bottom of this msg that booting from a normal node boot diskette would pull the 2.2.19 kernel from the master fine, but after the 2-kernel monte, it would black screen and cold boot. I created a stage 2 boot floppy with beoboot, using the *same* 2.2.19 kernel. beoboot -2 -f -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19/ THAT sucker boots those eval nodes fine. So, floppy boot of stage 2 works like a champ, while the 2-kernel monte boot cold boots it. Riddle me that one Batman. Oh, there is a slight problem, but it doesn't appear to be affecting anything (NFS works fine). The last lines in the node boot are ... portmap: RPC call returned error 5 portmap: RPC call returned error 5 lockd_up: makesock failed, error = -5 portmap: RPC call returned error 5 2nd weirdness. I also have a few Dell PowerEdge 2450 boxes here that have been in the test cluster since day one. They have all worked fine with the 2.2.17-33.beosmp kernel. They boot off the normal node floppy, monte works fine, and all is copasetic. Well, ever since moving the master to 2.2.19, those floppies won't boot *any* node. Not the new evals (cold boot) nor the 2450's (also keep rebooting). Now why would a stage 2 kernel change affect that I wonder ? Tomorrow I'll recreate the stage1 boot floppies, just in case. Also will build a tighter kernel, just including stuff we need for the various node types. Then I'm out for a week ... -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Tuesday, May 08, 2001 2:36 PM To: beowulf@beowulf.org Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Huh - interesting. I just rebuilt a netboot image using the UP 2.2.17 from Scyld ... beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m /lib/modules/2.2.17-33.beo/ Rebooted a compute node. It comes up in UP as expected, but no NFS. Checking the /var/log/beowulf/node.0 file, it was trying to load modules (sunrpc specifically) from /lib/modules/2.2.19/misc. Now the master node is running 2.2.19. But why would the compute node try to load 2.2.19 modules ? I thought the beoboot script build a boot.img file that contains the kernel and modules ... Have to scan through beoboot ... -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter@pharma.com] Sent: Tuesday, May 08, 2001 2:12 PM To: beowulf@beowulf.org Cc: 'David Vos' Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard OK. Progress, but not in the right direction :) Here's what I did, and I'll be detailed so hopefully someone will notice what I missed/typoed/screwedup ... Got 2.2.19 from kernel.org, grabbed the bproc-2.2.tar.bz2 from Scyld. Patched the kernel source - took a little tweaking, some things had changed. But it appears to have gone in OK. make menuconfig Turn all sorts of things, most unnecessary, but there to more or less match up what the 2.2.17 menuconfig said. make dep make -j 4 bzImage make -j 4 modules make modules_install mv arch/i386/boot/bzImage /boot/vmlinuz-2.2.19 Copied the /boot/initrd-2.2.17-33.beosmp.img to /tmp/initrd-2.2.19.img.gz , gunzipped it, mounted it on /mnt. Replaced the aic7xxx.o with the 2.2.19 version. That was the only module being loaded for the master node. mount -o loop initrd-2.2.19.img /mnt cp /lib/modules/2.2.19/scsi/aic7xxx.o /mnt/lib umount /mnt gzip -9 /tmp/initrd-2.2.19.img mv /tmp/initrd-2.2.19.img.gz /boot/initrd-2.2.19.img Added the 2.2.19 kernel and initrd to /etc/lilo.conf, and rebooted. bproc failures - not installed yet, but that was expected. Now running 2.2.19 on the master node. Built bproc stuff. That seemed to go OK as well. The INSTALL file didn't quite seem to match the actual though. make make install Modules loaded cleanly. Nice. Copied the modules to the right place. cp vmadump/vmadump.o /lib/modules/2.2.19/misc cp ksyscall/ksyscall.o /lib/modules/2.2.19/misc cp bproc/bproc.o /lib/modules/2.2.19/misc Rebooted to see that they load during the boot. Works fine. Nice. So now the master node is running 2.2.19 patched with bproc, and appears to be fine. Time to build a netboot stage 2 image. beoboot -d -2 -n -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 Check the debug output. Looks good, it grabbed 2.2.19 kernel and the right modules. OK, boot one of the new eval nodes - everything seems to go OK, but only seems to. As the stage 2 kernel boots, the screen goes black for about 10 seconds, then it coldboots. Dang it. Redid the netboot image with noapic just in case ... beoboot -d -2 -n -c noapic -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 No go. Same thing. Dang it :( My next step is to build a 2.2.19 kernel with only what's needed for the master and compute nodes. Although not completely homogenous, it will be pretty close. Another option is to try the latest Alan Cox 2.2.19 ... Hmmm. I think I'll grab that first - more chance of Via chipset fixes in there. These eval nodes came with Redhat 7.1 base install with 2.4.x kernel. That comes up fine in SMP mode, so that's another (albeit more painful) option. How hard is it to patch bproc etc into 2.4.x ? _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at coffee.psychology.mcmaster.ca Tue May 8 16:58:21 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:16 2009 Subject: Cooling experts out there, some help please In-Reply-To: <20010508165421.999.qmail@web13503.mail.yahoo.com> Message-ID: > What I am interested in a chart that define maximum > internal die temp / Die surface temp / Heat sink temp > for intel processors. well, if you look at a PIII datasheet, it indicates the max Tjunction is around 80C (lower for higher clocks, interestingly), fan intake are <= 45C. I'd guess that the Tjunction-offset (diff between Tj and the internal diode reading) is a conservative approximation of the deltaT between inner and outer surface. it's given as 4C. so at the working surface of the heatsink, you need to be at, say, 76C. HS's seem to be characterized by vendors in thermal resistance, deltaT per watt. so a random Thermaltake unit lists .64C/W, which given, say, 24W for a PIII/733, means -16C cooler ambient, or a max of 60C in the case. as another example, P4/1700 specs say 64W thermal design power max Tcase 76C, Tambient 30C or so. oddly, it also says 50.2W of 1.7V power (85W, not 76!) > The chart could also specify different heatsink > material types CU,Al... design makes at least as much difference as material. From irwanhadi at phxby.com Tue May 8 19:05:28 2001 From: irwanhadi at phxby.com (Irwan Hadi) Date: Wed Nov 25 01:01:16 2009 Subject: Running FDTD (Finite Difference Time Domain) with beowulf Message-ID: <20010508200528.A30574@phxby.com> I have a question, does anyone has ever run FDTD (Finite Difference Time Domain) program under Beowulf Cluster ? FDTD program itself is for electromagnetic computational calculation, and takes CPU power very intensively. I wonder if we need special license to run the FDTD under beowulf cluster if it is possible, and how much the difference of speed if we run it under like 5 P III 933 Mhz dual Processor server (256 to 512 M Ram), and like 5 AMD Athlon 1 Ghz single processor server (256 to 512 M Ram), compare to run it with a SUNW Ultra-1 Sparc Station 167 Mhz with 128 M bytes Ram ? Thanks From dvos12 at calvin.edu Tue May 8 20:02:35 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:16 2009 Subject: linpack In-Reply-To: <20010508175457.6338.qmail@aw161.netaddress.usa.net> Message-ID: On 8 May 2001, Joey Raheb wrote: > I was wondering if anyone has had any expierence with the LINPack Benchmark. > I am having a great deal of difficulty getting this benchmark to go on a > single CPU Alpha Workstation. Can anyone tell me what I have to do to get it > going. > > The package tells me that I need mpi, however, if I am testing only a one CPU > system would this be necessary? The program is written using the mpi libraries, so you can't even compile the program without them, much less run it. (as far as I know). > How do I go about doing the 'tuning' that they speak of in their > documentation. I downloaded their FAQ which helps a little, but I am new to > this whole benchmark routine (at least this type of benchmarking). I've been doing the guess-and-check method to try and figure out what top performance might be. > I would appreciate any information anyone can give me. > > Thanks, > Joey > > ____________________________________________________________________ > Get free email and a permanent address at http://www.amexmail.com/?A=1 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Tue May 8 20:53:05 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:16 2009 Subject: Cooling experts out there, some help please In-Reply-To: Message-ID: hi mark heatsink compound makes a big difference too and airflow or lack there of makes the biggest difference in cpu temp.... I'd add i2c to one of the systems and start a cpu temperature tests...over several days/weeks... - an idle cpu will give you "ambient cpu temperature" it cpu temp is running hotter than spec... you'd better add more fans to cool the systems... every 10 degrees of heat in the cpu degrades the life of the cpu by around 1/2 or something like that... have fun alvin http://www.Linux-1U.net ... 500Gb Raid5 in 1U .. On Tue, 8 May 2001, Mark Hahn wrote: > > What I am interested in a chart that define maximum > > internal die temp / Die surface temp / Heat sink temp > > for intel processors. > > well, if you look at a PIII datasheet, it indicates the max > Tjunction is around 80C (lower for higher clocks, interestingly), > fan intake are <= 45C. I'd guess that the Tjunction-offset > (diff between Tj and the internal diode reading) is a > conservative approximation of the deltaT between inner and > outer surface. it's given as 4C. so at the working surface > of the heatsink, you need to be at, say, 76C. > > HS's seem to be characterized by vendors in thermal resistance, > deltaT per watt. so a random Thermaltake unit lists .64C/W, > which given, say, 24W for a PIII/733, means -16C cooler ambient, > or a max of 60C in the case. > > as another example, P4/1700 specs say 64W thermal design power > max Tcase 76C, Tambient 30C or so. oddly, it also says > 50.2W of 1.7V power (85W, not 76!) > > > The chart could also specify different heatsink > > material types CU,Al... > > design makes at least as much difference as material. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From patrick at myri.com Tue May 8 23:55:57 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:16 2009 Subject: linpack References: <20010508175457.6338.qmail@aw161.netaddress.usa.net> Message-ID: <3AF8E9FD.C34CFB3@myri.com> Joey Raheb wrote: > I was wondering if anyone has had any expierence with the LINPack Benchmark. > I am having a great deal of difficulty getting this benchmark to go on a > single CPU Alpha Workstation. Can anyone tell me what I have to do to get it > going. The sequential Linpack is one of the easiest benchmark to run. Look at http://www.netlib.org/benchmark/1000d > The package tells me that I need mpi, however, if I am testing only a one CPU > system would this be necessary? You need MPI for the parallel Linpack benchmark, not for the sequential one. If you want to play the "male game" and get big numbers, you need to use the parallel Linpack. A very good implementation is HPL (High Performance Linpack) also available on netlib. > How do I go about doing the 'tuning' that they speak of in their > documentation. I downloaded their FAQ which helps a little, but I am new to > this whole benchmark routine (at least this type of benchmarking). No tuning for serial Linpack > I would appreciate any information anyone can give me. Hope it helps -- Patrick Geoffray --------------------------------------------------------------- | Myricom Inc | University of Tennessee - CS Dept | | 325 N Santa Anita Ave. | Suite 203, 1122 Volunteer Blvd. | | Arcadia, CA 91006 | Knoxville, TN 37996-3450 | | (626) 821-5555 | Tel/Fax : (865) 974-1950 | --------------------------------------------------------------- From sjohnsto at eso.org Wed May 9 01:23:47 2001 From: sjohnsto at eso.org (Stephen Johnston) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid on an ABit KT7A-Raid Mobo Message-ID: <3AF8FE93.1A0DDD07@eso.org> Hi All I am installing a node, please see below the query I sent to redhat-managers (actually we are using Suse 7.1 but that doesnt matter). Simply the machine wont boot from the ATA100 Highpoint Raid after install, even though its set in the BIOS of the mobo. The installation proc seems to go ok, but fails on the first boot. If I restart the install, abort it and boot manually from the device it works. Then if I look into /etc there is no lilo.conf! The behaviour is the same regardless of using the highpoint to mirror the 2 system disks or not to mirror but have flat. I would be very grateful if anyone could guide me here. TIA Stephen. -- Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 European Southern Observatory Fax : +49 89 32006380 Karl-Schwarzschild-Strasse 2 D-85748 Garching bei Muenchen http://www.eso.org -- -------------- next part -------------- An embedded message was scrubbed... From: Stephen Johnston Subject: Booting ATA100 Raid on an ABit KT7A-Raid Mobo Date: Mon, 07 May 2001 16:43:52 +0200 Size: 1470 Url: http://www.scyld.com/pipermail/beowulf/attachments/20010509/7013253e/attachment.mht From MAWorsham at intermedia.com Wed May 9 09:46:48 2001 From: MAWorsham at intermedia.com (Worsham, Michael A.) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid Message-ID: Perhaps you should look at having a primary drive (ie. hdc) as a boot device then running the RAID that way off of it. I mean you really don't want to ever boot up in RAID already since its more for your data storage and application, rather than running your kernel on it directly. Are you using a module for loading the RAID software so that the kernel can see it? -- Michael From lindahl at conservativecomputer.com Wed May 9 10:43:21 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid In-Reply-To: ; from MAWorsham@intermedia.com on Wed, May 09, 2001 at 12:46:48PM -0400 References: Message-ID: <20010509134321.A1855@wumpus> On Wed, May 09, 2001 at 12:46:48PM -0400, Worsham, Michael A. wrote: > Perhaps you should look at having a primary drive (ie. hdc) as a boot device > then running the RAID that way off of it. > I mean you really don't want to ever boot up in RAID already since its more > for your data storage and application, rather than running your kernel on it > directly. One use for RAID is for your system disk, which is a single point of failure for the entire cluster on a Beowulf2 system. It's not hard to get a small 2-disk RAID card (3ware or promise) for this purpose. -- g From lindahl at conservativecomputer.com Wed May 9 13:07:45 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: page coloring enthusiast needed Message-ID: <20010509160745.A2727@wumpus> I've been working on a page coloring patch for a while, and Jason Papadopoulos has finally beaten it into good enough shape that it's ready for the wide world to hack on it. I have a sourceforge project, and I'd like to find an enthusiast who'd like to take care of all the web/cvs/whatever details. I can supply a readme and a patch... -- g From alvin at Mail.Linux-Consulting.com Wed May 9 14:35:36 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid In-Reply-To: <20010509134321.A1855@wumpus> Message-ID: hi all my opinions... i think raid is good to protect against disk failures or bad partitions ... large data storage should be on raid5... system disk should NOT be on the same disk as the "large irreplaceable data disks"... system disk is already backed up on the cdrom from which you installed ... or can copy from other similar servers most motherboards come with two ide controllers so you can have dual system disks ( mirrored ) to boot off either disk if booting or system disk crash is an issue.... use root raid on two mirror'd drives - - i just built a raid1 ( mirror ) root-raid box... - works out of the box ( rh-7.1 ) - - unplugged /dev/hda and booted fine off of /dev/hdb - ( redhat-7.1 restriction for where to put the mirror ) - - but it does boot in degraded mode so you should recover - what you need and shutdown immediately and rebuild or - resync your new system disk back into the raid mirror - - i would NOT put the "system disk" on the same disks as its data... if one disk goes bad...you lose both system and data... - you wanna be able to recover your data.. - replace the system disk and you're back and running - getting IDE drives on the same bacle to perform like it was scsi-3 disks is alot harder to do... guess thats why scsi drives are more expensive ?? have fun raiding alvin http://www.Linux-1U.net - 500Gb 1U Raid5 ... http://www.Linux-Sec.net - firewalls - monitoring - etc On Wed, 9 May 2001, Greg Lindahl wrote: > On Wed, May 09, 2001 at 12:46:48PM -0400, Worsham, Michael A. wrote: > > > Perhaps you should look at having a primary drive (ie. hdc) as a boot device > > then running the RAID that way off of it. > > I mean you really don't want to ever boot up in RAID already since its more > > for your data storage and application, rather than running your kernel on it > > directly. > > One use for RAID is for your system disk, which is a single point of > failure for the entire cluster on a Beowulf2 system. It's not hard to > get a small 2-disk RAID card (3ware or promise) for this purpose. > > -- g > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From farul_g at yahoo.com Wed May 9 16:44:34 2001 From: farul_g at yahoo.com (farul ghazali) Date: Wed Nov 25 01:01:17 2009 Subject: Scyld and queueing/load balancing system Message-ID: <20010509234434.64327.qmail@web12304.mail.yahoo.com> I'm trying out Scyld and it seemed to install quickly and quite easily. Is it possible to install some sort of load balancing system for this setup eg. DQS or GNU queue? Or is there something already built in? The cluster I'm setting up will be running a few MPI aware apps but more little apps that would need to be balanced properly over the multiple nodes. TIA. __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From bob at drzyzgula.org Wed May 9 19:49:14 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... Message-ID: <20010509224913.C25773@drzyzgula.org> All, At my office, we use a lot of Suns, mostly AXmps that we integrate in-house. With the release of Sun's Ocelot (AX2200) board, it was somewhat interesting to take a look at the SPEC ratings of several current CPUs. SPEC isn't the be-all and end-all of benchmarks, but we find that our real-world applications results track it, especially the SPECfp, pretty closely. I did this for our internal use but I thought that y'all might find it interesting as well. I'd welcome corrections or additions. Caveat Emptor: This is intended as sort of a sanity check to help think about how we are spending money. Clearly SPEC2000 and price are only two of the myriad things that one needs to take into consideration when purchasing hardware; this should not be taken as a buyer's guide. For entertainment purposes only, don't try this at home, YMMV, etc. The "est core $" is a guess of the dollar cost for a CPU, motherboard and 1GB of memory. Sorted in declining order of SPECfp2000: Processor MHz L2 KB SPi2K SPfp2K est core $ ------------------------- ---- ------ ----- ------ ---------- Alpha (21264) 833 8192 533 644 9,000 (UP2000+, est) PA-8700 750 N/A 603 581 14,000 (HP J6700, 2304KB L1) Pentium 4 1500 256 536 558 2,100 (D850GB, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 (GA7DX, DDR SDRAM) UltraSPARC III 750 8192 395 421 8,480 (Ocelot) AMD Athlon (Thunderbird) 1300 256 491 374 520 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 (VC820, RDRAM) UltraSPARC II 480 8192 234 291 10,000 (AXdp) Sorted in declining order of SPECint2000: Processor MHz L2 KB SPi2K SPfp2K est core $ ------------------------- ---- ------ ----- ------ ---------- PA-8700 750 N/A 603 581 14,000 (HP J6700, 2304KB L1) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 (GA7DX, DDR SDRAM) Pentium 4 1500 256 536 558 2,100 (D850GB, RDRAM) Alpha (21264) 833 8192 533 644 9,000 (UP2000+, est) AMD Athlon (Thunderbird) 1300 256 491 374 520 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 (VC820, RDRAM) UltraSPARC III 750 8192 395 421 8,480 (Ocelot) UltraSPARC II 480 8192 234 291 10,000 (AXdp) Sorted in order of increasing cost: Processor MHz L2 KB SPi2K SPfp2K est core $ ------------------------- ---- ------ ----- ------ ---------- AMD Athlon (Thunderbird) 1300 256 491 374 520 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 (VC820, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 (GA7DX, DDR SDRAM) Pentium 4 1500 256 536 558 2,100 (D850GB, RDRAM) UltraSPARC III 750 8192 395 421 8,480 (Ocelot) Alpha (21264) 833 8192 533 644 9,000 (UP2000+, est) UltraSPARC II 480 8192 234 291 10,000 (AXdp) PA-8700 750 N/A 603 581 14,000 (HP J6700, 2304KB L1) Note that the Pentium and DDR Athlon prices are quite high, due to memory costs. The Pentium 4 is limited to RDRAM, and the Pentium III configurations for which SPEC ratings are avialble are also limited to RDRAM. Clearly you can do an PIII+SDRAM system, but there are no reported SPEC numbers for these. The DDR Athlon is probably artificially high in price. DDR motherboards don't support more than two or three DIMM slots, so to get the 1GB of memory one has to use two 512MB modules. Unfortunately, these are currenty way overpriced. RDRAM also suffers to some extent from this problem. The following little table tells the story: PC133 PC1600 DDR PC2100 DDR PC800 RDRAM ------ ---------- ---------- ----------- 128MB $ 25 $ 60 $ 50 $ 90 256MB $ 50 $108 $100 $210 512MB $110 N/A $800 $850 Clearly, there is something anomolous about 512MB DDR memory, and to a lesser extent 512MB RDRAM memory. Most of the PC parts prices are coming from Pricewatch, after scrolling down to get past the clear loss-leaders and picking a nice, round approximate number. I wouldn't suggest paying attention to more than about two significant digits... The prices for the Alpha, the AXdp and the PA-8700 are pretty much wild-ass guesses. We have a $6K price quote for the 480MB USII, and I'm guessing that the AXdp motherboard and the 1GB of memory will together cost around $4K (as I mentioned, we use AXmps, I used the AXdp here only because it was more comparable to the other boards in my analysis). Microway charges around $17K for a dual-processor, 833MHz 21264, 256MB system on the UP2000+ motherboard. From this I'm guessing that the motherboard, 1GB of memory and a single 833MHz processor should cost around $9K. I have no real good info on the PA-8700. HP doesn't have prices for them on their website. The J6000 workstation with a 552MHz PA-8600 processor costs around $13K. This new processor represents a big improvement at 750MHz. Thus, I'm guessing that it will cost an arm and a leg. Maybe two legs. I'm pretty sure that one cannot buy these parts except as part of a system, so this further reduces the value of it's inclusion in these tables. Most of the SPEC ratings come from spec.org, a few come from press releases. FWIW, --Bob From sjohnsto at eso.org Thu May 10 00:40:05 2001 From: sjohnsto at eso.org (Stephen Johnston) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid References: Message-ID: <3AFA45D5.E7A52CA3@eso.org> Hi Thanks to all of you who responded. Turns out the HighPoint 370 controller on the ABit KT7A-RAID board is badly supported under linux. I upgraded the bios of the board, which also upgraded the HighPoint bios to version 1.03b. Now when I install onto 2 disks which are h/w mirrored the machine actually tries to boot (gets further) giving a LIL- prompt and then hanging. There is some documentatin on the net saying this controller is only supported in single drive mode (whatever that means, I presume it means no RAID config). It is still not creating a /etc/lilo.conf however, and if I create it manually using yast (I am running Suse 7.1) the behaviour is the same. My next step is to install with the Highpoint setting with no RAID, simply as 2 ATA100 channels and do the mirroring of the system disks in s/w. I will let you know. Finally, fyi, the machine does have a 3ware in it as well, but we are purposefully not using raid on it as we only need 8ports and not raid as disks will be removed periodically, raid would break this. Regards, Stephen. PS If someone could tell me what the 'LIL-' error means normally that would help, I will also try to look it up. -- Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 European Southern Observatory Fax : +49 89 32006380 Karl-Schwarzschild-Strasse 2 D-85748 Garching bei Muenchen http://www.eso.org -- From alvin at Mail.Linux-Consulting.com Thu May 10 01:37:59 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid In-Reply-To: <3AFA45D5.E7A52CA3@eso.org> Message-ID: hi stephen... this exact same thread or 95% similar thread is going on right now in the debian mailing list regarding hw raid, abit kt7a, htp-370 ... there is no point to using the expensive hpt-370 controller if its NOT being used as hardware raid... ( creates problems than its worth.. a promise ata100 is better/cheaper... ( less headaches ) have fun alvin http://www.Linux-1U.net ... 500Gb 1U raid5 ... if you got to LIL- - you got to the second(?) part of booting the loader but couldnt do the last part... see /usr/doc/lilo* for more info On Thu, 10 May 2001, Stephen Johnston wrote: > Hi > > Thanks to all of you who responded. > > Turns out the HighPoint 370 controller on the ABit KT7A-RAID board is badly > supported under linux. I upgraded the bios of the board, which also upgraded > the HighPoint bios to version 1.03b. Now when I install onto 2 disks which are > h/w mirrored the machine actually tries to boot (gets further) giving a > > LIL- > > prompt and then hanging. There is some documentatin on the net saying this > controller is only supported in single drive mode (whatever that means, I > presume it means no RAID config). > > It is still not creating a /etc/lilo.conf however, and if I create it manually > using yast (I am running Suse 7.1) the behaviour is the same. > > My next step is to install with the Highpoint setting with no RAID, simply as 2 > ATA100 channels and do the mirroring of the system disks in s/w. > > I will let you know. > > Finally, fyi, the machine does have a 3ware in it as well, but we are > purposefully not using raid on it as we only need 8ports and not raid as disks > will be removed periodically, raid would break this. > > Regards, > > Stephen. > > PS If someone could tell me what the 'LIL-' error means normally that would > help, I will also try to look it up. > > -- > Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 > European Southern Observatory Fax : +49 89 32006380 > Karl-Schwarzschild-Strasse 2 > D-85748 Garching bei Muenchen http://www.eso.org > -- > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From bob at drzyzgula.org Thu May 10 07:04:01 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <20010509224913.C25773@drzyzgula.org> References: <20010509224913.C25773@drzyzgula.org> Message-ID: <20010510100401.A27267@drzyzgula.org> Wes Bauske was kind enough to point out that I could do the Pentium 4 system a bit more cheaply. I'd missed the fact that Pentium 4 boards are available with 4 RIMM slots, so one can use 256MB RIMMs, which are available at about half the price per MB over the 512MB modules. Thus, it should be possible to do a 1.5GHz Pentium 4 system for a core (CPU+MB+Memory) cost of around $1300 or so. A 1.7GHz system would cost around $100 more than that. Updated tables below. Also, I added SPECfp2000/K$. Fascinating how the Pentium 4 comes out second in each one of these tables... Thanks, Wes. --Bob Sorted in declining order of SPECfp2000: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) Sorted in declining order of SPECint2000: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) Sorted in order of increasing cost: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) Sorted in declining order of SPECfp2000/K$ Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) From brunobg at lsi.usp.br Thu May 10 07:14:17 2001 From: brunobg at lsi.usp.br (Bruno Barberi Gnecco) Date: Wed Nov 25 01:01:17 2009 Subject: Scyld and root directory Message-ID: How do I write in the / directory of a Scyld client? I need at least some symbolic links (such as usr->rootfs/usr). Something else: could anybody explain in detail how to run X in the clients? Thanks a lot, -- Bruno Barberi Gnecco http://www.geocities.com/RodeoDrive/1980/ Quoth the Raven, "Nevermore". - Poe From agrajag at linuxpower.org Thu May 10 06:58:47 2001 From: agrajag at linuxpower.org (Jag) Date: Wed Nov 25 01:01:17 2009 Subject: Scyld and root directory In-Reply-To: ; from brunobg@lsi.usp.br on Thu, May 10, 2001 at 11:14:17AM -0300 References: Message-ID: <20010510065847.C15130@kotako.analogself.com> On Thu, 10 May 2001, Bruno Barberi Gnecco wrote: > How do I write in the / directory of a Scyld client? I need at > least some symbolic links (such as usr->rootfs/usr). Why are you wanting this symlink? Once the bproc daemon on the slave chroot's to /rootfs, there really isn't any way to access the real / as all the jobs that get propegated over there use /rootfs as their /. If the node came up all the way, there's no way to access the real / for reading or writing, so I don't see what good this would do you. If you're trying to start up something before the chroot happens, I suggest you do it after. It'll save you the headache of trying to make both / and /rootfs a sane root to run your programs in. Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010510/4481227c/attachment.bin From dvos12 at calvin.edu Thu May 10 07:53:01 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:17 2009 Subject: Scyld and root directory In-Reply-To: Message-ID: Your operations are with bpsh and bpcp. You could do something like bpsh -a ln -s /rootfs/usr /usr You will need to be logged in as root on the master. David On Thu, 10 May 2001, Bruno Barberi Gnecco wrote: > How do I write in the / directory of a Scyld client? I need at > least some symbolic links (such as usr->rootfs/usr). From hahn at coffee.psychology.mcmaster.ca Thu May 10 08:26:30 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:17 2009 Subject: Booting from ATA100 Raid In-Reply-To: Message-ID: > there is no point to using the expensive hpt-370 > controller if its NOT being used as hardware raid... > ( creates problems than its worth.. as far as I know, it doesn't have HW raid in any meaningful sense (that would be: does raid5 parity in hardware.) it's just another dual-channel udma controller that happens to have some bios support for doing trivial raid (0,1). there's no advantage over normal soft raid for these raid levels, since they don't eat CPU. I don't believe there's any hpt 370 support problem with current Linux (that means 2.4, of course; 2.2 hasn't seen IDE updates for some time now.) regards, mark hahn. From JParker at coinstar.com Thu May 10 09:02:15 2001 From: JParker at coinstar.com (JParker@coinstar.com) Date: Wed Nov 25 01:01:17 2009 Subject: VA - System Imager Message-ID: G'Day ! I am having problems with VA-SystemImager (ver 1.4.0). It seems that I can not get my remote machine to retrieve the kernel and reboot as per step 6 and 7 of the HOWTO. I do not have a floppy or cdrom attached to the machine, so I am using the rsync/updateclient -autoinstall method. The problem is that when I try to run "updateclient -autoinstall -server bhead -c eth0" it crashes because it can not find the modules Getopt::Long, etc. A quick search of the hard drive on the local node confirms it is not a part of the standard Perl-5.005 debian package, but it is located on my head server bhead. I believe the cause of my confusion may be the documentation. As I read step 6, where you prepare the boot media, the example on how to prepare a remote machine's local hd is exactly the same as the instructions for step 7, where the actually transfer of the image takes place. Do you need to "compile" the perl script on the headnode prior to transfering to the remote machine ? If so what is the command to do this ? BTW, all my remote nodes have a base Debian install with basic networking installed. Another question. During the step 5, I did not have systemImager write to the /etc/hosts file. All my nodes already have the correct network settings. Is this a problem ? cheers, Jim Parker Sailboat racing is not a matter of life and death .... It is far more important than that !!! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010510/f893a1ca/attachment.html From josip at icase.edu Thu May 10 10:11:39 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... References: <20010509224913.C25773@drzyzgula.org> Message-ID: <3AFACBCB.96C4A3FB@icase.edu> We are also looking at the SPECfp2000 numbers like these: Bob Drzyzgula wrote: > > Processor MHz L2 KB SPi2K SPfp2K est core $ > ------------------------- ---- ------ ----- ------ ---------- > Pentium 4 1500 256 536 558 2,100 (D850GB, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 (GA7DX, DDR SDRAM) Looking at the details of the SPEC tests, it seems that the compiler plays a huge role. Both machines ran Windows 2000 SP1. The P4 result used Intel's Fortran 5.0 (ifl), while the Athlon result used ifl for some F77 code and Compaq Visual Fortran 6.5 for other F77 and F90 code. Athlon vs. Pentium 4 benchmarks range all over the place. SPECfp2000 has some CFD-like tests (swim, mgrid, applu, galgel). The 1.7 GHz Pentium 4 beats 1.33 GHz Athlon on F77 code, but loses on F90 code. Per GHz, the F77 P4 "win" factors are 1.25-1.59, while the F90 P4 "loss" factor is 1.38. P4 should shine on codes where data is accessed sequentially with unit stride. Jumping around can lead to very inefficient use of the memory bandwidth. Also, optimization switches can make a huge difference. We'd like to test our own code compiled with our own compilers. The overall 2:1 performance uncertainty is too large to ignore. For memory bandwidth limited code optimized for P4-friendly memory access patterns, P4 should beat Athlon by about 50%. This performance gain matches the fact that PC2100 (DDR SDRAM) memory is currently about 40% cheaper than PC3200 (RDRAM) memory. Unfortunately, we just do not know how many codes fit into that P4-friendly category yet... Sincerely, Josip P.S. This link has results of a CFD benchmark on a variety of machines. They report virtually identical P4 and Athlon performance on the "per GHz" basis: http://www.caselab.okstate.edu/research/benchmark.html -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From lindahl at conservativecomputer.com Thu May 10 10:49:49 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: Intel Fortran compiler (url) In-Reply-To: <3AF2C418.8D619233@pasteur.fr>; from tru@pasteur.fr on Fri, May 04, 2001 at 05:00:40PM +0200 References: <3AF27E98.D4D18783@pasteur.fr> <3AF2C418.8D619233@pasteur.fr> Message-ID: <20010510134949.B1595@wumpus> On Fri, May 04, 2001 at 05:00:40PM +0200, Tru Huynh wrote: > http://www.releasesoftware.com/_intelbetacenteronlinux/cgi-bin/pd.cgi?page=product_info I looked at this compiler, and it's worth nothing that it IS the same compiler version (Fortran 5.0) that Intel used for their SPEC2000 submissions, just under Linux instead of Win2000. Looks like all the P4 optimization and vectorization is there. I would *love* to see some comparisons of, say, this compiler and the latest PGI compiler against some real codes on a P4. -- g From lindahl at conservativecomputer.com Thu May 10 10:53:30 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <3AFACBCB.96C4A3FB@icase.edu>; from josip@icase.edu on Thu, May 10, 2001 at 01:11:39PM -0400 References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> Message-ID: <20010510135330.C1595@wumpus> On Thu, May 10, 2001 at 01:11:39PM -0400, Josip Loncaric wrote: > Per GHz, the F77 P4 "win" factors are 1.25-1.59, while the F90 P4 > "loss" factor is 1.38. What's the point of per GHz comparison? The P4 and the Athlon aren't available at the same GHz, and the wide memory system of the P4 is a large part of its advantage. > We'd like to test our own code compiled with our own compilers. The > overall 2:1 performance uncertainty is too large to ignore. Since both Intel and PGI make their compilers temporarily available for free -- beta-test until September for Intel, trial version from PGI -- you might as well test with the best compilers, instead of the ones you happen to own. -- g From DZhao1 at prius.jnj.com Thu May 10 10:52:51 2001 From: DZhao1 at prius.jnj.com (Zhao, David [PRI]) Date: Wed Nov 25 01:01:17 2009 Subject: PRO & CON of diskless and diskfull cluster Message-ID: <8FB723D60C5AD411969A00508B69860325E3E3@rarusljexs7.prius.jnj.com> Hi there, Is there a summary out there comparing the diskless and diskful Beowulf clusters? Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010510/8577802e/attachment.html From lindahl at conservativecomputer.com Thu May 10 11:28:48 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: Intel Fortran compiler (url) In-Reply-To: <20010510134949.B1595@wumpus>; from lindahl@conservativecomputer.com on Thu, May 10, 2001 at 01:49:49PM -0400 References: <3AF27E98.D4D18783@pasteur.fr> <3AF2C418.8D619233@pasteur.fr> <20010510134949.B1595@wumpus> Message-ID: <20010510142848.A1766@wumpus> On Thu, May 10, 2001 at 01:49:49PM -0400, Greg Lindahl wrote: > I looked at this compiler, and it's worth nothing that it IS the same "worth noting", not "worth nothing". Interesting slip of the tongue, there... -- g From rgb at phy.duke.edu Thu May 10 12:39:25 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:17 2009 Subject: Intel Fortran compiler (url) In-Reply-To: <20010510142848.A1766@wumpus> Message-ID: On Thu, 10 May 2001, Greg Lindahl wrote: > On Thu, May 10, 2001 at 01:49:49PM -0400, Greg Lindahl wrote: > > > I looked at this compiler, and it's worth nothing that it IS the same > > "worth noting", not "worth nothing". Interesting slip of the tongue, > there... I can't resist the double pun. A "digital" tongue indeed. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From josip at icase.edu Thu May 10 12:53:10 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> <20010510135330.C1595@wumpus> Message-ID: <3AFAF1A6.1897E342@icase.edu> Greg Lindahl wrote: > > On Thu, May 10, 2001 at 01:11:39PM -0400, Josip Loncaric wrote: > > > Per GHz, the F77 P4 "win" factors are 1.25-1.59, while the F90 P4 > > "loss" factor is 1.38. > > What's the point of per GHz comparison? The P4 and the Athlon aren't > available at the same GHz, and the wide memory system of the P4 is a > large part of its advantage. The point is data compression: I can convey the first order approximation in one number, with minor (<5%) errors. Also, I can estimate the relative performance in a fairly consistent manner and conclude that on SPECfp2000 an X GHz P4 would be roughly equivalent to a 1.1*X GHz Athlon. You do a linear fit near some operating point, then use a linear model to get approximate numbers near that operating point. It works reasonably well. Performance does node scale exactly with GHz, but the individual SPECfp2000 benchmark numbers are not too far from what you'd expec on the GHz basis. Also, "per GHz" comparisons tell you something about a particular architecture. The Athlon FPU is expected to outperform the P3 FPU by 4:3 per GHz, but on SPECfp2000 P4 benchmarks, this effect is *not* obvious. Here are Intel's SPECfp2000 numbers (D850GB motherboard, PC800 RDRAM): P4 linear fit: SPfp2K ~ 374 * [P4 GHz] P4 GHz: SPfp2K: predicted by GHz: 1.3 511 486 (-4.8% error) 1.4 538 524 (-1.7% error) 1.5 558 561 (+0.6% error) 1.7 608 636 (+4.6% error) Here are the Athlon numbers (Gigabyte GA-7DX motherboard, PC 2100 DDR SDRAM): Athlon linear fit: SPfp2K ~ 340 * [Athlon GHz] Athlon GHz: SPfp2K: predicted by GHz: 1.2 417 408 (-2.0% error) 1.33 445 453 (+1.7% error) > > > We'd like to test our own code compiled with our own compilers. The > > overall 2:1 performance uncertainty is too large to ignore. > > Since both Intel and PGI make their compilers temporarily available > for free -- beta-test until September for Intel, trial version from > PGI -- you might as well test with the best compilers, instead of the > ones you happen to own. We own PGI compilers. Intel compiler availability is *not* the problem. Our problem is that we'd like to test hardware we do not yet have in order to determine if we should have it. This is a Catch 22 situation... Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From lindahl at conservativecomputer.com Thu May 10 13:26:54 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <3AFAF1A6.1897E342@icase.edu>; from josip@icase.edu on Thu, May 10, 2001 at 03:53:10PM -0400 References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> <20010510135330.C1595@wumpus> <3AFAF1A6.1897E342@icase.edu> Message-ID: <20010510162654.B1943@wumpus> On Thu, May 10, 2001 at 03:53:10PM -0400, Josip Loncaric wrote: > The point is data compression: I can convey the first order > approximation in one number, with minor (<5%) errors. You can? So when the Athlon comes out with a faster memory system, your answers break, or if the P4 gets a different chipset with a slower memory system, your answer breaks. It also leads you to this kind of mistake: > Also, "per GHz" comparisons tell you something > about a particular architecture. The Athlon FPU is expected to > outperform the P3 FPU by 4:3 per GHz, but on SPECfp2000 P4 benchmarks, > this effect is *not* obvious. ... because you just ignored the effect of memory bandwidth, which is growing as a factor in SPECfp over time. Not to mention the effect of the ability of a vectorizing compiler to generate SSE2 instructions on the particular code in SPEC2000fp, etc etc. SPEC2000fp is a fine benchmark, if you look at the absolute results. The minute you start computing derived numbers from that, you're making assumptions and generalizations. If you want to predict the probable SPEC2000fp of a new processor by looking at its clock and the clock of a very similar existing chip, that's probably fine. But comparing to a completely different chip? -- g From edwards at icantbelieveimdoingthis.com Thu May 10 14:37:42 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:17 2009 Subject: Athalon clusters Message-ID: <20010510153742.A13503@icantbelieveimdoingthis.com> I am building an athlon cluster and, I think I have met with initial, trivial success. I had reported that the install had hung at the very end of the installation. On rebooting, however, the system appears to be all there. In addition, I have successfully built the slave nodes so that they show up on the beostatus tool. Now, if one is running athlon cpu's with more than 64M of RAM, one normally has to tell the kernel how much memory there is using an append command in LILO. This worked on the master node. When I look at the beostatus lines for each slave node the memory indicates 52/62M making me very nervous. Here are the short questions. 1. What does the memory line actually erport in beostatus? Is it total memory on the node? 2. How do I propagate the kernel parameter mem=XXXM onto the slave nodes? I have already tried adding to the append line in the confg file on the slave node boot disk. This led to a kernel panic (not a pretty site.) Any insight would be apprecitated. Art Edwards From josip at icase.edu Thu May 10 14:41:30 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> <20010510135330.C1595@wumpus> <3AFAF1A6.1897E342@icase.edu> <20010510162654.B1943@wumpus> Message-ID: <3AFB0B0A.EF75C2C@icase.edu> Greg Lindahl wrote: > > On Thu, May 10, 2001 at 03:53:10PM -0400, Josip Loncaric wrote: > > > The point is data compression: I can convey the first order > > approximation in one number, with minor (<5%) errors. > > You can? So when the Athlon comes out with a faster memory system, > your answers break, or if the P4 gets a different chipset with a > slower memory system, your answer breaks. Sure. But clearly we are talking about the *current* generation of high performance memory systems. There was no implication that this could be extended to future technologies. You are reading way more into this than was actually said or implied. Your point about ignoring memory bandwidth is puzzling. Memory bandwidth is very important to us, I am absolutely not ignoring it, I merely consider it a *constant* for the purposes of this discussion. That constant is defined by the current state of the PC market. This was explicitly clear from my message. The variables which *do* concern me in comparing P4/Athlon systems [NOTE: see the chipset/memory specs in my previous message] are: (1) Memory access patterns of our applications (2) Compiler optimization capabilities for different architectures (this includes SSE2 and prefetch instructions) In summary, it is bad policy to consider everything as a variable at the same time. It is best to hold most things fixed, then compare a few variables. Also, one should never confuse a local model with a global theory of everything. BTW, it is interesting to note when the nature of a local model changes. This can tell you when to switch your attention to a new limiting variable. Sincerely, Josip P.S. As we all know, frequency of improvements in memory and chipset technology is not what it should be. CPU speeds are changing much more rapidly than memory speeds. This is why building a balanced machine has been so difficult lately, and why it makes sense to consider the current state-of-the-art in memory technology a constant over our next procurement cycle. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From Eugene.Leitl at lrz.uni-muenchen.de Fri May 11 01:39:47 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:17 2009 Subject: CCL:FYI: Current SPECfp landscape... (fwd) Message-ID: ---------- Forwarded message ---------- Date: Thu, 10 May 2001 15:35:38 +0100 From: jmmckel@attglobal.net To: Eugene Leitl Cc: chemistry@ccl.net Subject: Re: CCL:FYI: Current SPECfp landscape... (fwd) FYI: An Intel/P4/1.7GHZ with 256 MByte Rambus memory, high quality fan, tested is $799 [case and P4 power supply add $99] can be had from a quality source, WWW.JNCS.COM. I've been using them for years. John McKelvey Eugene Leitl wrote: > ---------- Forwarded message ---------- > Date: Thu, 10 May 2001 10:04:01 -0400 > From: Bob Drzyzgula > To: beowulf@beowulf.org > Subject: Re: FYI: Current SPECfp landscape... > > Wes Bauske was kind enough to point out that > I could do the Pentium 4 system a bit more > cheaply. I'd missed the fact that Pentium 4 > boards are available with 4 RIMM slots, so one > can use 256MB RIMMs, which are available at about > half the price per MB over the 512MB modules. > Thus, it should be possible to do a 1.5GHz Pentium > 4 system for a core (CPU+MB+Memory) cost of around > $1300 or so. A 1.7GHz system would cost around > $100 more than that. Updated tables below. Also, > I added SPECfp2000/K$. Fascinating how the Pentium 4 > comes out second in each one of these tables... > > Thanks, Wes. > > --Bob > > Sorted in declining order of SPECfp2000: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > Sorted in declining order of SPECint2000: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > Sorted in order of increasing cost: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > Sorted in declining order of SPECfp2000/K$ > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -= This is automatically added to each message by mailing script =- > CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins > MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH > CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 > Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From Yann.Costes at cdc.u-cergy.fr Fri May 11 02:13:54 2001 From: Yann.Costes at cdc.u-cergy.fr (Yann COSTES) Date: Wed Nov 25 01:01:17 2009 Subject: checkpointing PBS jobs under linux Message-ID: <3AFBAD52.97E618AA@cdc.u-cergy.fr> Hello, I'd like to unable jobs checkpointing under my Linux beowulf cluster wich uses the batch system PBS (this is OpenPBS_V2.3.12) I have made some trials with 2 different checkpointing softwares : epckpt Beta under Linux kernel 2.4.2 (http://www.cs.rutgers.edu/~edpin/epckpt) and after with the software crak as a module for the Linux kernel 2.2.19 (http://www.cs.columbia.edu/~huaz/english/research/crak.htm) Even when I unable checkpointing on a PBS executing queue (with the command "set queue long checkpoint_min = 2" under qmgr), PBS doesn't seem to checkpoint any submitted job. Does anyone know if it's possible to checkpoint PBS batch jobs under linux and if so how we can do it ? Thanks a lot for your help. -- Yann Costes Service Informatique Recherche - Universit? de Cergy-Pontoise Rue d'Eragny - Neuville sur Oise - 95031 Cergy-Pontoise Cedex Tel. 01 34 25 69 56 - Fax. 01 34 25 70 04 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010511/54de0788/attachment.html From bob at drzyzgula.org Fri May 11 03:22:40 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:17 2009 Subject: CCL:FYI: Current SPECfp landscape... (fwd) In-Reply-To: References: Message-ID: <20010511062240.E25739@drzyzgula.org> Eugene, Thanks. This well confirms the price in my table; A 256MB RDRAM RIMM costs around $200. To add three more of these to the $800 256MB configuration at JNCS (all the pricing was based on 1GB configurations) would raise the total price to $1,400, which is what I had used for a 1.7GHz P4 system. --Bob On Fri, May 11, 2001 at 10:39:47AM +0200, Eugene Leitl wrote: > ---------- Forwarded message ---------- > Date: Thu, 10 May 2001 15:35:38 +0100 > From: jmmckel@attglobal.net > To: Eugene Leitl > Cc: chemistry@ccl.net > Subject: Re: CCL:FYI: Current SPECfp landscape... (fwd) > > FYI: > > An Intel/P4/1.7GHZ with 256 MByte Rambus memory, high quality fan, tested is $799 [case and P4 power supply add > $99] can be had from a quality source, WWW.JNCS.COM. I've been using them for years. > > John McKelvey > > Eugene Leitl wrote: > > > ---------- Forwarded message ---------- > > Date: Thu, 10 May 2001 10:04:01 -0400 > > From: Bob Drzyzgula > > To: beowulf@beowulf.org > > Subject: Re: FYI: Current SPECfp landscape... > > > > Wes Bauske was kind enough to point out that > > I could do the Pentium 4 system a bit more > > cheaply. I'd missed the fact that Pentium 4 > > boards are available with 4 RIMM slots, so one > > can use 256MB RIMMs, which are available at about > > half the price per MB over the 512MB modules. > > Thus, it should be possible to do a 1.5GHz Pentium > > 4 system for a core (CPU+MB+Memory) cost of around > > $1300 or so. A 1.7GHz system would cost around > > $100 more than that. Updated tables below. Also, > > I added SPECfp2000/K$. Fascinating how the Pentium 4 > > comes out second in each one of these tables... > > > > Thanks, Wes. > > > > --Bob > > > > Sorted in declining order of SPECfp2000: > > > > Sfp/ > > Processor MHz L2 Si Sfp core $ K$ Notes > > ------------------------- ---- ----- --- --- ------ --- ---------- > > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > > > Sorted in declining order of SPECint2000: > > > > Sfp/ > > Processor MHz L2 Si Sfp core $ K$ Notes > > ------------------------- ---- ----- --- --- ------ --- ---------- > > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > > > Sorted in order of increasing cost: > > > > Sfp/ > > Processor MHz L2 Si Sfp core $ K$ Notes > > ------------------------- ---- ----- --- --- ------ --- ---------- > > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > > > Sorted in declining order of SPECfp2000/K$ > > Sfp/ > > Processor MHz L2 Si Sfp core $ K$ Notes > > ------------------------- ---- ----- --- --- ------ --- ---------- > > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -= This is automatically added to each message by mailing script =- > > CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins > > MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH > > CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 > > Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From orlandorocha at digi.com.br Fri May 11 05:52:53 2001 From: orlandorocha at digi.com.br (Orlando Donato Rocha Filho) Date: Wed Nov 25 01:01:17 2009 Subject: Channel bonding, I have some questions? Message-ID: Hi all, Frist off all, We have a cluster with a node master and a node slave; We configure a channel bond in the node master and in the node slave. We have two switches and two ethernets cards in each host (MASTER and SLAVE) The IPADDR (switch A): 192.168.0.254 The IPADDR (switch B): 192.168.0.253 The IPADDR (Node Master): 192.168.0.1 The IPADDR (Node Slave): 192.168.0.2 ---------- ---------- |switch A| |switch B| ---------- ---------- | | | | | | | | ________| |_______ | | | | | | | | | | ------------- ------------ | | |Node Master| |Node Slave| | | ------------- ------------ | | | | | | | |_________|__| |_________________________| OS: Red Hat 7.1 Kernel: 2.4.2-2 Look at the files following: (NODE SLAVE) ifcfg-bond0: device:bond0 userctl:no onboot:yes bootproto:no broadcast:192.168.0.255 network:192.168.0.0 netmask:255.255.255.0 ipaddr:192.168.0.2 ifcfg-eth0: device:eth0 master:bond0 slave:yes usrectl:no bootproto:yes ifcfg-eth1: device:eth1 master:bond0 slave:yes usrectl:no bootproto:yes When I do as follow: node A: #ifconfig (NODE SLAVE) bond0 Link encap:Ethernet HWaddr 00:80:AD:74:74:87 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:15 errors:0 dropped:0 overruns:0 frame:0 TX packets:82 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 eth0 Link encap:Ethernet HWaddr 00:80:AD:74:74:87 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:13 errors:0 dropped:0 overruns:0 frame:0 TX packets:41 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:11 Base address:0xe800 eth1 Link encap:Ethernet HWaddr 00:80:AD:74:74:87 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:41 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:15 Base address:0xec00 At (NODE MASTER) I did the same configuration, but the IPADDR is 192.168.0.1. When I do as follow: (NODE SLAVE) #ping 192.168.0.1 ping 192.168.0.1 (192.168.0.1) from 192.168.0.2 56(84) of bytes from 192.168.0.2:Destination Host Unreachable from 192.168.0.2:Destination Host Unreachable from 192.168.0.2:Destination Host Unreachable from 192.168.0.2:Destination Host Unreachable from 192.168.0.2:Destination Host Unreachable When I do as follow: (NODE MASTER) #ping 192.168.0.2 ping 192.168.0.2 (192.168.0.2) from 192.168.0.1 56(84) of bytes from 192.168.0.1:Destination Host Unreachable from 192.168.0.1:Destination Host Unreachable from 192.168.0.1:Destination Host Unreachable from 192.168.0.1:Destination Host Unreachable from 192.168.0.1:Destination Host Unreachable #ping 192.168.0.254 PING 192.168.0.254 (192.168.0.254) from 192.168.0.1 : 56(84) bytes of data. Warning: time of day goes back, taking countermeasures. 64 bytes from 192.168.0.254: icmp_seq=0 ttl=255 time=3.155 msec 64 bytes from 192.168.0.254: icmp_seq=3 ttl=255 time=1.724 msec 64 bytes from 192.168.0.254: icmp_seq=5 ttl=255 time=1.700 msec 64 bytes from 192.168.0.254: icmp_seq=7 ttl=255 time=1.740 msec 64 bytes from 192.168.0.254: icmp_seq=9 ttl=255 time=1.740 msec #ping 192.168.0.253 PING 192.168.0.253 (192.168.0.253) from 192.168.0.1 : 56(84) bytes of data. Warning: time of day goes back, taking countermeasures. 64 bytes from 192.168.0.253: icmp_seq=0 ttl=255 time=3.155 msec 64 bytes from 192.168.0.253: icmp_seq=1 ttl=255 time=1.724 msec 64 bytes from 192.168.0.253: icmp_seq=3 ttl=255 time=1.700 msec 64 bytes from 192.168.0.253: icmp_seq=5 ttl=255 time=1.740 msec 64 bytes from 192.168.0.253: icmp_seq=7 ttl=255 time=1.740 msec 64 bytes from 192.168.0.253: icmp_seq=9 ttl=255 time=1.740 msec When I do as follow: (NODE SLAVE) #route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 bond0 Which was the kernell proper to work with the channel bonding? Does anybody have just the same problem? Does anybody know if there is some problems with kernel 2.4.2-2 and channel bonding her? If the configuration was right what can we do it? I thank who could help me. Thanks a lot, Orlando Rocha. Prof. Sistemas Digitais - CEFET-MA/BR Administração de Redes com Windows NT Server e Linux. --------------------------------------------- Webmail Diginet - Internet é Diginet. http://www.digi.com.br/ From cblack at eragen.com Fri May 11 07:35:38 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:17 2009 Subject: bandwidth usage monitoring tools Message-ID: <20010511103538.D2993@getafix.EraGen.com> I am looking for tools that will allow me to monitor the amount of bandwidth used on different links in our cluster. I am already using NetPIPE to measure available bandwidth, but now I want to measure bandwidth usage during different jobs and conditions. I was thinking about using MTRG, but this seems like it might be overkill with all the SNMP, graphing, etc. Are there any other bandwidth usage monitors that anyone could suggest? Thanks, Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010511/364627d1/attachment.bin From chrisa at ASPATECH.COM.BR Fri May 11 07:39:31 2001 From: chrisa at ASPATECH.COM.BR (Chris Richard Adams) Date: Wed Nov 25 01:01:17 2009 Subject: NEIGHBOR TABLE OVERFLOW!! Message-ID: We're using the latest 27BZ-7 release on CD with kernel 2.2-17.33. We have a Master and Slave node, both with a new 3C905C-TXM card. We are using the latest 3c59x.c v0.99Rb driver made by Donald on 8/8/2000. After the slave node boots, it shows: > Connection 192.168.1.1:1555 > Neighbour table Overflow > Neighbour table Overflow > Neighbour table Overflow > Short read: Got 0 bytes, expected 8 What might I want to test? Thanks, Chris From edwards at icantbelieveimdoingthis.com Fri May 11 06:49:46 2001 From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834) Date: Wed Nov 25 01:01:17 2009 Subject: interesting Athlon/P4 discussion from FreeBSD-Q-l References: Message-ID: <3AFBEDFA.20503@icantbelieveimdoingthis.com> Mark Hahn wrote: >> Cant vouch for correctness, but seems to have some explanations/info that >> werent mentioned here. Feel free to rebut the content of course. > > > the P4 has an awesome combination of hardware prefetcher, > fast FSB, and dram that keeps up with it. for code that > needs bandwidth, this is very attractive. and it's dramatically > faster than anything else in the ia32 world: 1.6 GB/s versus > at most around .8 GB/s for even PC2100 DDR systems (at least > so far - I'm hopeful that DDR can manage around 1.2 GB/s when > tuned, and if the next-gen Athlon contains hardware prefetch.) > > but it's also true that most code, even a lot of computational code, > is not primarily dram-bandwidth-bound. the P4 is not exceptional > when running real code in-cache; this is why on most benchmarks > other than Stream, recent Athlons beat P4's quite handily. > > and that's why AMD is having such an awsome time in the market now, > and why Intel is cutting prices so dramatically on the P4. > > regards, mark hahn. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > Is anyone doing anectdotal benchmarks with real applications? We are doing DFT calculations using a local basis code that is highly optimized on serial machines. I am working on pentium III's, athlon's, and alpha machines. I find that my 600 MHz athlon actually beats a 933 MHz Pentium III. Also, both of these PC platforms are competitive with the alpha chips. I'm much more interested in benchmarks on, say, Gaussian 98, GAMESS, and other codes. Any Athlon/P4 comparisons would be very interesting. Art Edwards From joeyraheb at usa.net Fri May 11 08:21:43 2001 From: joeyraheb at usa.net (Joey Raheb) Date: Wed Nov 25 01:01:17 2009 Subject: rdate or xntp Message-ID: <20010511152143.172.qmail@awcst401.netaddress.usa.net> Hello everyone, I was wondering about date updating on a cluster. Does anybody do it and how? I tried rdate and this did not work for some reason, it said that I could not connect to the host??? Also, I tried ntpdate and when for example I type: ntpdate ns1.uwo.ca, it outputs the difference between the clock, but it does not update my clock??? If anyone can explain to me how to use one of these programs I would appreciate the help. Thanks, Joey ____________________________________________________________________ Get free email and a permanent address at http://www.amexmail.com/?A=1 From JParker at coinstar.com Fri May 11 09:03:40 2001 From: JParker at coinstar.com (JParker@coinstar.com) Date: Wed Nov 25 01:01:17 2009 Subject: rdate or xntp Message-ID: G'Day ! I used to do administration on a e-commerce web server. We used plain old vanilla NTP, to keep all the servers date/time correct. If I remeber correctly my main sever was stratum 3. Any way the secret is in your /etc/ntp.conf file. This is very well documented on the ntp web site and man pages ( http://www.eecis.udel.edu/~mills/ntp/ ). I recommend that if you have a permanent connection to the internet use at least 3 ntp servers in your config file. If you are not connected constantly, there are programs (ntpupdate ?) to get a quick timecheck to update your clocks. My current set-up uses a driver pointed to my local servers hardware clock, not accurate, but all my nodes have a consistent time. All your nodes should point to your ntp server and not the internet. A quick serch on the web will show you where to find clients for NT, if necessary. Let me know if you have any specifc questions. cheers, Jim Parker Sailboat racing is not a matter of life and death .... It is far more important than that !!! Joey Raheb Sent by: beowulf-admin@beowulf.org 05/11/01 08:21 AM To: beowulf@beowulf.org cc: Subject: rdate or xntp Hello everyone, I was wondering about date updating on a cluster. Does anybody do it and how? I tried rdate and this did not work for some reason, it said that I could not connect to the host??? Also, I tried ntpdate and when for example I type: ntpdate ns1.uwo.ca, it outputs the difference between the clock, but it does not update my clock??? If anyone can explain to me how to use one of these programs I would appreciate the help. Thanks, Joey ____________________________________________________________________ Get free email and a permanent address at http://www.amexmail.com/?A=1 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010511/3845f2e1/attachment.html From rgb at phy.duke.edu Fri May 11 09:27:32 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:17 2009 Subject: rdate or xntp In-Reply-To: <20010511152143.172.qmail@awcst401.netaddress.usa.net> Message-ID: On 11 May 2001, Joey Raheb wrote: > Hello everyone, > > I was wondering about date updating on a cluster. Does anybody do it and how? > I tried rdate and this did not work for some reason, it said that I could not > connect to the host??? Also, I tried ntpdate and when for example I type: > ntpdate ns1.uwo.ca, it outputs the difference between the clock, but it does > not update my clock??? If anyone can explain to me how to use one of these > programs I would appreciate the help. You don't mention what distro you use, but Red Hat ships with an (x)ntp RPM (e.g. xntp3 in RH 6.2, ntp in RH 7.1). They are preconfigured to manage your clock but you have to turn them on at the appropriate runlevel, e.g. 3-5, with chkconfig. You also need to direct them (in /etc/ntp.conf) to a suitable ntp server, which can be your master server node or whereever you like -- ntp is arranged in layers (or "strata") served by servers served in turn by the master clock sites on the net, which are themselves driven by superaccurate atomic clocks. You may need to ask your main networking guys or upstream providers who you should use as a stratum 2 or 3 server, direct your cluster server to that, and it as a stratum 3 or 4 server to your cluster nodes. When running, ntp will typically keep your hosts unbelievably accurately slaved to Truly Accurate Time -- the protocol actually measures and adjusts for things like the networking time delays between its queries and the responses. In principle one can acheive millisecond accuracy on all hosts. A tool like procstatd can let you monitor local clock settings on your cluster hosts to be sure that they are all being updated correctly -- ntp does have problems if a host is down a long time or its hardware clock is incorrectly set, as it barfs if the time to be adjusted is too long (it otherwise works very smoothly and incrementally, but it doesn't do hourlong adjustments or daylight savings time sized jumps). Thus you do have to get all your hosts set APPROXIMATELY accurately to start with (using date -s), run ntp for a few hours to get them all truly synchronized with true time, use /sbin/clock -w to write the >>soft<< date/time to the BIOS clock (so that the clock doesn't get reset further than ntp can correct on a reboot -- this is the thing that typically screws up when times change in the spring and the fall) and then you should be able to forget time altogether except at the spring/fall changeover. You could probably automate even that. In an RPM-based cluster it is easy enough to either repackage the RPM's with your own /etc/ntp.conf or write a short post-install script that installs your /etc/ntp.conf, does all the messing around with date -s required (if any), and so forth. Hope this helps... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From lindahl at conservativecomputer.com Fri May 11 10:40:15 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: [PBS-USERS] checkpointing PBS jobs under linux In-Reply-To: <3AFBAD52.97E618AA@cdc.u-cergy.fr>; from Yann.Costes@cdc.u-cergy.fr on Fri, May 11, 2001 at 09:13:54AM +0000 References: <3AFBAD52.97E618AA@cdc.u-cergy.fr> Message-ID: <20010511134015.D2026@wumpus> On Fri, May 11, 2001 at 09:13:54AM +0000, Yann COSTES wrote: > Even when I unable checkpointing on a PBS executing queue (with the > command "set queue long checkpoint_min = 2" under qmgr), PBS doesn't > seem to checkpoint any submitted job. You have to tell PBS to compile in the checkpointing code. That's turned off for Linux by default. You'll also have to write a little code to tell the process to actually checkpoint. And that's actually non-trivial with the packages you mention; if you just checkpoint the process PBS knows about, that's the shell that's running your batch script... -- g From lindahl at conservativecomputer.com Fri May 11 10:45:13 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <3AFB0B0A.EF75C2C@icase.edu>; from josip@icase.edu on Thu, May 10, 2001 at 05:41:30PM -0400 References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> <20010510135330.C1595@wumpus> <3AFAF1A6.1897E342@icase.edu> <20010510162654.B1943@wumpus> <3AFB0B0A.EF75C2C@icase.edu> Message-ID: <20010511134513.E2026@wumpus> On Thu, May 10, 2001 at 05:41:30PM -0400, Josip Loncaric wrote: > Sure. But clearly we are talking about the *current* generation of high > performance memory systems. There was no implication that this could be > extended to future technologies. You are reading way more into this > than was actually said or implied. Ah. Well, I did read that you compared the performance/Ghz of AMD to Intel, and that is clearly a bad idea. > Your point about ignoring memory bandwidth is puzzling. Memory > bandwidth is very important to us, I am absolutely not ignoring it, When you were looking for the 4:3 ratio of FPU performance, you neglected the memory bandwidth. Since the memory bandwidth of the things you were comparing were vastly different (3:1), and you know that SPEC2000fp has an important memory bandwidth component, you wouldn't expect the 4:3 ratio of FPU speeds to be present as exactly 4:3. Or anything close to 4:3. So, again, the minute you crossed the line of comparing within a single chip line to comparing two different chip lines, you methodology became bad. That's my objection. -- greg From gmkurtzer at lbl.gov Fri May 11 10:57:26 2001 From: gmkurtzer at lbl.gov (Greg Kurtzer) Date: Wed Nov 25 01:01:17 2009 Subject: SCYLD-- mounting boot.img in loopback Message-ID: <20010511105726.F999@lbl.gov> Has anyone been able to sucessfully mount the /var/beowulf/boot.img using loopback? What kind of filesystem are they using? Greg -- Greg M. Kurtzer, LCSE Lawrence Berkeley National Laboratory 1 Cyclotron Road #90-1116, Berkeley, CA 94720 ------------------------------------------------- Office 510.495.2307 Pager 510.448.4540 Cell 510.703.1286 ================================================= LINUX,... The choice of a GNU generation! From rsimac at thermawave.com Fri May 11 11:56:40 2001 From: rsimac at thermawave.com (Rob Simac) Date: Wed Nov 25 01:01:17 2009 Subject: How many Gflops? Message-ID: <00b801c0da4c$1ff1e4d0$1a64010a@thermawave.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: Rob Simac.vcf Type: text/x-vcard Size: 390 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010511/9025019d/RobSimac.vcf From newt at scyld.com Fri May 11 12:08:43 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:17 2009 Subject: SCYLD-- mounting boot.img in loopback In-Reply-To: <20010511105726.F999@lbl.gov> Message-ID: > Has anyone been able to sucessfully mount the /var/beowulf/boot.img using > loopback? What kind of filesystem are they using? boot.img is not a filesystem image. It is a special format used by monte that contains a kernel, initrd, and command line arguments for the phase 2 kernel. If you want to generate custom boot images, I suggest you use: `beoboot -2 -i -o /tmp/foo` which will generate separate kernel,initrd images to /tmp/foo,/tmp/foo.initrd. You may then play with these to your hearts content. This is what, for instance, you would do if you wanted images for use with PXE. If you merely want to alter the contents of the boot.img without needing split images, I suggest that you investigate the options to the 'beoboot' tool provided with Scyld. This may provide you with what you seek. Regards, Dan Ridge Scyld Computing Corporation From Eugene.Leitl at lrz.uni-muenchen.de Fri May 11 14:15:09 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:17 2009 Subject: [Fwd: CCL:HPC for Genomics & Proteomics 1st Call] Message-ID: <3AFC565D.A363EE14@lrz.uni-muenchen.de> -------- Original Message -------- From: "Wendy Hori" Subject: CCL:HPC for Genomics & Proteomics 1st Call To: chemistry@ccl.net FIRST ANNOUNCEMENT AND CALL FOR PAPERS Cambridge Healthtech Institute?s High Performance Computing Strategic Applications for Genomics & Proteomics September 20-21, 2001 Wyndham San Diego at Emerald Plaza Hotel San Diego, California The success of the drug discovery process is now directly related to a company?s computation capabilities. Results and research projects that were unimaginable a few years ago are accessible with today?s super computers. Industry sources expects the IT life science market to explode to more than $9 billion by 2003. Major IT companies have made significant investments and formed partnerships, with IBM and Compaq each committing $100 million. Data overload from automation and robust database technology, means that companies have an immense amount of data on both drug targets and lead compounds. The downside is that these companies are not equipped in infrastructure or organization to efficiently take full advantage of this gold mine. There is also the question of staff: Do we have the right people? How do we use our existing in-house bioinformatics team? Should we outsource or build our own? What?s a fair contract for services, profit-sharing and leasing? How many teraflops is enough? This meeting will seek to cover these issues for biotech, pharma , software and hardware engineers, application specialists and anyone who envisions the marriage of supercomputers and the life sciences. Other Topics to be Covered: Compute Farms Distributed Computing Clustering Beowulf Architecture Parallel Processing Integrating Applications specific to Life Sciences Data-compilation Automation Achieving Better Performance with Database Work through Shared Memory Systems Affordable Supercomputing: Supercomputers vs desktop compute farms Novel Integrated Supercomputing Solutions to Help Scientists concentrate on domain tasks, not on Computer Science. Case Studies Interoperability of Databases If you would like to submit a proposal to give a presentation or display a poster at this meeting please send us, by fax or email, a title and brief 3-5 sentence summary of a proposed topic on your recent work in the area of High Performance Computing . The deadline for submission is May 11, 2001. All proposals are subject to review by the Scientific Advisory Committee to ensure the overall quality of the conference program. For more information, please contact Wendy Hori at: Cambridge Healthtech Institute, 1037 Chestnut Street, Newton Upper Falls, MA 02464 Tel: 617 630-1382 * Fax: 617 630-1325 * Email: whori@healthtech.com For sponsorship and exhibit information, please contact Jim MacNeil. Tel: 617-630-1341, Email: jmacneil@healthtech.com To register on-line, please visit our web site: www.healthtech.com Wendy Hori Conference Director Cambridge Healthtech Institute 1037 Chestnut St. Newton Upper Falls, MA 02464 Phone: 617-630-1382 Fax: 617-630-1325 Email: whori@healthtech.com Check out our websites! www.healthtech.com www.beyondgenome.com www.httexpo.com From rpasupathy at hotmail.com Fri May 11 13:42:00 2001 From: rpasupathy at hotmail.com (Raghubhushan Pasupathy) Date: Wed Nov 25 01:01:17 2009 Subject: KVM Switch Message-ID: Folks, I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf Cluster. Can anyone give me some directions on this since I am completely lost. What specs, brand etc. do you suggest? Raghu _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com From josip at icase.edu Fri May 11 13:49:28 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... References: <20010509224913.C25773@drzyzgula.org> <3AFACBCB.96C4A3FB@icase.edu> <20010510135330.C1595@wumpus> <3AFAF1A6.1897E342@icase.edu> <20010510162654.B1943@wumpus> <3AFB0B0A.EF75C2C@icase.edu> <20010511134513.E2026@wumpus> Message-ID: <3AFC5058.EE58F75C@icase.edu> Greg Lindahl wrote: > > That's my objection. We can agree to disagree, and still learn in the process. My point is that P4/Athlon chips come labeled with GHz ratings which correlate with both price and performance. I'm not interested in a treatise, just a brief comparison between the two. What I tried to convey is that the +/- 50% variability among applications and compilers dwarfs the 10% "per GHz" P4/Athlon average difference. Is that so hard to accept? If you'd care to contribute by benchmarking applications and compilers, I remain interested. If you'd like to learn more about using surrogate models in design optimization, we should continue this discussion off line. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From hahn at coffee.psychology.mcmaster.ca Fri May 11 13:51:12 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:17 2009 Subject: How many Gflops? In-Reply-To: <00b801c0da4c$1ff1e4d0$1a64010a@thermawave.com> Message-ID: > I would like to find out if anyone knows how many Gflops at Athlon 1.3Ghz CPU can perform at peak. 3 fully pipelined FP units, not counting SIMD, so 4 GFlips. har har... From davidgrant at mediaone.net Fri May 11 13:54:08 2001 From: davidgrant at mediaone.net (David Grant) Date: Wed Nov 25 01:01:17 2009 Subject: KVM Switch References: Message-ID: <00a101c0da5c$86917560$954f1e42@ne.mediaone.net> I've had much sucess with Raritan Gear... http://www.raritan.com David A. Grant, V.P. Cluster Technologies GSH Intelligent Integrated Systems 95 Fairmount St. Fitchburg Ma 01420 Phone 603.898.9717 Fax 603.898.9719 Email: davidg@gshiis.com Web: www.gshiis.com "Providing High Performance Computing Solutions for Over a Decade" ============================== ----- Original Message ----- From: "Raghubhushan Pasupathy" To: Sent: Friday, May 11, 2001 4:42 PM Subject: KVM Switch > Folks, > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > Cluster. Can anyone give me some directions on this since I am completely > lost. What specs, brand etc. do you suggest? > > Raghu > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Fri May 11 14:07:32 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:17 2009 Subject: How many Gflops? In-Reply-To: <00b801c0da4c$1ff1e4d0$1a64010a@thermawave.com> Message-ID: On Fri, 11 May 2001, Rob Simac wrote: > I would like to find out if anyone knows how many Gflops at Athlon > 1.3Ghz CPU can perform at peak. There is the perpetual question of "what's a gigaflop" that makes this question ambgiguous if not meaningless. However, I'll give you at least one answer and you can judge how meaningful it really is for you. In L2 I've measured about 270-275 peak MFLOPS (double precision) with cpu-rate (http://www.phy.duke.edu/brahma) (which averages the rate at which addition, subtraction, multiplication and division occur, where division is generally very slow and a rate limiting factor) on a 1.2 GHz Tbird Athlon. Extrapolating (as is pretty reasonable to do in this case, in cache) to a 1.33 GHz Tbird one might get 300 MFLOPS (or only 0.3 of a GFLOPS -- not even ONE GFLOPS). However, as one increases the size of the memory vectors one operates on (running out of main memory instead of cache) the rate drops off to about 115 MFLOPS where at least part of that good a performance (and it really is quite good, comparatively -- only Alphas benchmark faster out there) is due to the use of DDR, as in this regime floating point is limited by streaming large memory access speed (so stream MFLOPS becomes a viable measure of floating point speed if you prefer them to cpu-rate). This is not peak, though. The cpu-rate numbers aren't peak either. It is quite possible for aggressive optimization, different compiler choices, hand-coded assembler, and perhaps the use of e.g. prefetch to improve them, and then there are the manufacturer's quoted theoretical peak "maximum FLOPS" which I've never seen or even heard of anybody who has seen but which might exist. cpu-rate also always involves SOME sort of vector addressing -- it doesn't just multiply four static variables a gazillion times and evaluate the rate, so it arguably isn't even close to a register-to-register peak rate without any need to access memory at all. However, the cpu-rate numbers are based on straightforward compiled code and are at least MAYBE relevant to certain common operations in core loops. Then there are LINPACK MFLOPS and probably others. MFLOPS is really a pretty meaningless measure, especially given that "peak" MFLOPS will seriously increase if the operation(s) in question is just addition and/or multiplication (which are often heavily optimized in the chip design). As an example of another trap, I've learned the hard way that many vendors (Intel, for example) optimize division by integers that are a power of two so that it is done by a bit shift instead of a full floating point division algorithm -- a measure of "FLOPS" based on (floating point!) multiplication or division by numbers that happen to be integers can be skewed by more than a factor of 2 up. Are these "peak" FLOPS? Or just absurdly unlikely accidents in most real code? A more useful way of viewing and using measures like MFLOPS with all its many possible definitions is comparatively. The fact that an Athlon 1200 Tbird with DDR gets 270 or so peak double precision MFLOPS on cpu-rate is really pretty irrelevant unless your application EXACTLY resembles cpu-rate in its main core loop. However, the fact that it gets 270 peak while a 933 MHz PIII with ordinary PC133 gets only a bit more than 100 peak while an Athlon 800 MHz Tbird with PC133 gets perhaps 177 peak and a lowly 466 MHz Celron gets about 50 peak is possibly relevant. In both cases the peak scales nearly perfectly with CPU clock WITHIN families (Athlon vs P6-family) which gives us a certain amount of warm fuzziness -- the benchmark is insensitive to the (>>very<< different) main memory speeds, as it should be in this range (for vectors maybe 40-80K in length that fit easily in all the L2 caches). It also shows that for code of this type, the Athlon blows the pants off of the P6. HOWEVER, other code that I run shows the Athlon slightly underperforming equivalent clock compared to the P6 family. Then there are the very different and not particularly CPU clock-speed proportional results that hold when the vectors are much bigger than L2. Then there is the fact that cache sizes differ. Then there are latency dominated (instead of streaming vector memory dominated) results to consider. Your mileage can and almost certainly will vary. Aside from this sort of VERY crude rough comparison, the only really useful purpose for the FLOPS rating of a system (any of them!) is to put it into a grant proposal or bandy it around to impress the more ignorant and impressionable of your friends. Otherwise one should seek to prototype and benchmark your actual application, or hope that your code nearly exactly resembles lmbench, or LINPACK, or stream, or cpu-rate, or any of the various components of SPEC. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From alvin at Mail.Linux-Consulting.com Fri May 11 14:29:32 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: bandwidth usage monitoring tools In-Reply-To: <20010511103538.D2993@getafix.EraGen.com> Message-ID: hi chris here a bigger list of options for ethernet monitoring http://www.linux-sec.net/eth.gwif.html have fun alvin http://www.Linux-1U.net On Fri, 11 May 2001, Chris Black wrote: > I am looking for tools that will allow me to monitor the amount of > bandwidth used on different links in our cluster. I am already using > NetPIPE to measure available bandwidth, but now I want to measure > bandwidth usage during different jobs and conditions. I was thinking > about using MTRG, but this seems like it might be overkill with all > the SNMP, graphing, etc. Are there any other bandwidth usage monitors > that anyone could suggest? > > Thanks, > Chris > From rgb at phy.duke.edu Fri May 11 14:35:29 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:17 2009 Subject: KVM Switch In-Reply-To: Message-ID: On Fri, 11 May 2001, Raghubhushan Pasupathy wrote: > Folks, > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > Cluster. Can anyone give me some directions on this since I am completely > lost. What specs, brand etc. do you suggest? Why? KVM's tend to be very expensive (I know, I have a Raritan which is an excellent choice and even keyboard-switchable BUT which costs a whole lot -- good KVM's can cost $100 per port or even more). I also have a really cheapo four position mechanical KVM switch that works for keyboard and video but cannot switch PS2 mice. It degrades video quality a bit but is fine for my simple home beowulf, where I have two or three systems that do a bit of server stuff and hence need a console. Nowadays a cluster node can run anywhere from totally headless (Scyld, I believe, is happy enough with no head at all), headless but a serial port console (a VERY cheap option that is probably adequate for debugging a dying boot and which can be switched with a cheap serial switch or managed via a still not very expensive serial port server), headless but with a cheap video card that one plugs into a monitor one time (to set the bios and monitor the original install) and then never again, headed but no X (X plus a GUI is quite expensive in memory and moderately expensive in wasted CPU), and headed running X. I now have a $3000 KVM switch that is more useful for switching between servers (where one really does sometimes need access to a console) than between beowulf nodes, which one generally accesses over the net anyway. I personally generally go with cheap S3 cards (or any sort of onboard video if the motherboard happens to have it) and no X just to make it a bit faster to set up the systems and debug them if/when they break. The one hassle of running a system with no video card at all is that one often has to put one in long enough to set up the bios, in particular to tell the bios to run without a video card without complaining (which most BIOS's do these days if you ask nicely). Is the time saved worth the $30 the card costs per system? Don't know, but it's close... rgb > > Raghu > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From alvin at Mail.Linux-Consulting.com Fri May 11 14:35:16 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: rdate or xntp In-Reply-To: <20010511152143.172.qmail@awcst401.netaddress.usa.net> Message-ID: hi joey if the time difference is to big... it will NOT update the clock ( too big -- more than a few seconds... or more than a few minutes ( depending on which program you are using have fun alvin what was yoru rdate or ntpupdate syntax ??? eg.. ( put into rc.local or equivalent to account for large time differences ) rdate -ps ntp.linux-consulting.com On 11 May 2001, Joey Raheb wrote: > Hello everyone, > > I was wondering about date updating on a cluster. Does anybody do it and how? > I tried rdate and this did not work for some reason, it said that I could not > connect to the host??? Also, I tried ntpdate and when for example I type: > ntpdate ns1.uwo.ca, it outputs the difference between the clock, but it does > not update my clock??? If anyone can explain to me how to use one of these > programs I would appreciate the help. > > Thanks, > Joey > > ____________________________________________________________________ > Get free email and a permanent address at http://www.amexmail.com/?A=1 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Fri May 11 14:49:27 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: KVM Switch In-Reply-To: Message-ID: hi and yes... good kvm are expensive... we've found that cheap kvm sometimes does NOT work when you switch between servers ... - motherboard and mouse problem... - unplugging the mouse will sometimes/always hang the system if the wrong mouse is used on an incompatible motherboard ( motherboard was D815EEAAL -- its a good mb ) - in that case... you have to have a good kvm that maintains electrical connectivity to the mouse ports when switching between servers - we discarded all the belkin kvm due to the above issues and bought a different brand... ( i've forgotten which one as i dont work with that client any more ) have fun linuxing.. alvin On Fri, 11 May 2001, Robert G. Brown wrote: > On Fri, 11 May 2001, Raghubhushan Pasupathy wrote: > > > Folks, > > > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > > Cluster. Can anyone give me some directions on this since I am completely > > lost. What specs, brand etc. do you suggest? > > Why? KVM's tend to be very expensive (I know, I have a Raritan which is > an excellent choice and even keyboard-switchable BUT which costs a whole > lot -- good KVM's can cost $100 per port or even more). I also have a > really cheapo four position mechanical KVM switch that works for > keyboard and video but cannot switch PS2 mice. It degrades video > quality a bit but is fine for my simple home beowulf, where I have two > or three systems that do a bit of server stuff and hence need a console. > > Nowadays a cluster node can run anywhere from totally headless (Scyld, I > believe, is happy enough with no head at all), headless but a serial > port console (a VERY cheap option that is probably adequate for > debugging a dying boot and which can be switched with a cheap serial > switch or managed via a still not very expensive serial port server), > headless but with a cheap video card that one plugs into a monitor one > time (to set the bios and monitor the original install) and then never > again, headed but no X (X plus a GUI is quite expensive in memory and > moderately expensive in wasted CPU), and headed running X. I now have a > $3000 KVM switch that is more useful for switching between servers > (where one really does sometimes need access to a console) than between > beowulf nodes, which one generally accesses over the net anyway. > > I personally generally go with cheap S3 cards (or any sort of onboard > video if the motherboard happens to have it) and no X just to make it a > bit faster to set up the systems and debug them if/when they break. The > one hassle of running a system with no video card at all is that one > often has to put one in long enough to set up the bios, in particular to > tell the bios to run without a video card without complaining (which > most BIOS's do these days if you ask nicely). Is the time saved worth > the $30 the card costs per system? Don't know, but it's close... > > rgb > > > > > Raghu > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From lindahl at conservativecomputer.com Fri May 11 15:58:41 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: FYI: Current SPECfp landscape... Message-ID: <20010511185841.B3052@wumpus> > We can agree to disagree, and still learn in the process. No: if you don't understand my objection, no one learns anything. > My point is > that P4/Athlon chips come labeled with GHz ratings which correlate with > both price and performance. The point is that the correlations are within the individual processor lines, not between them. But you didn't understand my objection last time, so you probably won't get it this time either. Sorry for wasting everyone's time. -- g From parkw at better.net Fri May 11 16:15:37 2001 From: parkw at better.net (William Park) Date: Wed Nov 25 01:01:17 2009 Subject: KVM Switch In-Reply-To: ; from rgb@phy.duke.edu on Fri, May 11, 2001 at 05:35:29PM -0400 References: Message-ID: <20010511191537.A902@better.net> On Fri, May 11, 2001 at 05:35:29PM -0400, Robert G. Brown wrote: > On Fri, 11 May 2001, Raghubhushan Pasupathy wrote: > > > Folks, > > > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > > Cluster. Can anyone give me some directions on this since I am completely > > lost. What specs, brand etc. do you suggest? I remember seeing few ad in "Linux Journal". From my experience, I only needed K/V/M at the beginning; after setup, I just use ethernet. > > Why? KVM's tend to be very expensive (I know, I have a Raritan which is > an excellent choice and even keyboard-switchable BUT which costs a whole > lot -- good KVM's can cost $100 per port or even more). I also have a > really cheapo four position mechanical KVM switch that works for > keyboard and video but cannot switch PS2 mice. It degrades video Yes, I found this out the hard way. Abit VP6 hangs if you unplug/plug PS/2 mouse. --William Park, Open Geometry Consulting, Mississauga, Ontario, Canada. 8 CPU cluster, (Slackware) Linux, Python, LaTeX, vim, mutt From jakob at unthought.net Fri May 11 19:49:57 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:17 2009 Subject: bandwidth usage monitoring tools In-Reply-To: <20010511103538.D2993@getafix.EraGen.com>; from cblack@eragen.com on Fri, May 11, 2001 at 10:35:38AM -0400 References: <20010511103538.D2993@getafix.EraGen.com> Message-ID: <20010512044957.C30229@unthought.net> On Fri, May 11, 2001 at 10:35:38AM -0400, Chris Black wrote: > I am looking for tools that will allow me to monitor the amount of > bandwidth used on different links in our cluster. I am already using > NetPIPE to measure available bandwidth, but now I want to measure > bandwidth usage during different jobs and conditions. I was thinking > about using MTRG, but this seems like it might be overkill with all > the SNMP, graphing, etc. Are there any other bandwidth usage monitors > that anyone could suggest? Please forgive me for this blatant plug :) http://sysorb.com will cover this. It will monitor ethernet interface throughput on Linux. Stats are stored in a database and you can access the plots via. web interface. It's a monitoring system that was built to alert administrators of system problems - but it can be used as a utilization monitor just as well (just don't configure the alerts). (It will monitor memory usage, load, etc. etc. too) It's a commercial system, but it comes with a free (as in free beer) license that allows you to use it on five systems, no strings attached. And yes, I am affiliated with the company providing this product. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From Kian_Chang_Low at vdgc.com.sg Fri May 11 23:23:58 2001 From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low@vdgc.com.sg) Date: Wed Nov 25 01:01:17 2009 Subject: rdate or xntp Message-ID: Hi Joey, For rdate to work, the following statement in inetd.conf (of the host) must be uncommented, time stream tcp nowait root internal Killed the inted process and restart it. Connection from the slave nodes to the master should be allowed now. Regards, Kian Chang. Joey Raheb cc: Sent by: Subject: rdate or xntp beowulf-admin@b eowulf.org 05/11/2001 11:21 PM Hello everyone, I was wondering about date updating on a cluster. Does anybody do it and how? I tried rdate and this did not work for some reason, it said that I could not connect to the host??? Also, I tried ntpdate and when for example I type: ntpdate ns1.uwo.ca, it outputs the difference between the clock, but it does not update my clock??? If anyone can explain to me how to use one of these programs I would appreciate the help. Thanks, Joey ____________________________________________________________________ Get free email and a permanent address at http://www.amexmail.com/?A=1 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From kragen at pobox.com Fri May 11 23:48:51 2001 From: kragen at pobox.com (kragen@pobox.com) Date: Wed Nov 25 01:01:17 2009 Subject: What is the best C IDE on Linux? Message-ID: <200105120648.CAA05987@kirk.dnaco.net> Greg Lindahl writes: > On Fri, Apr 27, 2001 at 07:35:30AM -0400, Bob Drzyzgula wrote: > > > To a large > > extent, this is because the basic editor is so fast, memory-effecient and > > functional there is rarely the need, but it is also because vi integrates so > > well with the underlying command line environment. Need to reflow a > > paragraph to a max 65 columns? "!}fmt -65" will do it for you. Need to sort > > the contents of your buffer? "1G!}sort". Need to insert the system date in > > your file? ":r!date". The Unix command line is fantastically powerful; why > > would anyone want to re-implement this functionality within the editor > > itself? > > Speaking from the peanut gallery, I think we should impose a 1 beer > penalty on any poster whose example of great functionality is > trivially equaled by the other side. > > Bob, you owe me a beer. Meta-| cmd pipes the current selected region. Right, but it puts it in a new buffer in a new window. So vi's ! } f m t RET, seven keypresses, becomes C-@ M-} M-| f m t RET C-x C-x C-w C-x o C-@ M-< C-w C-x o C-y C-x 1, twenty-seven keypresses, assuming you only had one window open beforehand. (If you have more windows open, you'll probably have to throw some extra "C-x o" sequences in there.) This is actually enough of a pain that people don't use it --- not just for things like fmt, of which Emacs has an inferior elisp version, but also for things like sort and bc, where it's really useful. There are times when the Emacs behavior really is what you want, but usually I want the vi behavior. This is one of a number of examples of how Emacs is better at being an OS and vi is better at integrating into the existing OS. From lindahl at conservativecomputer.com Sat May 12 14:24:24 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:17 2009 Subject: What is the best C IDE on Linux? In-Reply-To: <200105120648.CAA05987@kirk.dnaco.net>; from kragen@pobox.com on Sat, May 12, 2001 at 02:48:51AM -0400 References: <200105120648.CAA05987@kirk.dnaco.net> Message-ID: <20010512172424.A1934@wumpus> On Sat, May 12, 2001 at 02:48:51AM -0400, kragen@pobox.com wrote: > Right, but it puts it in a new buffer in a new window. So vi's > ! } f m t RET, seven keypresses, becomes C-@ M-} M-| f m t RET C-x C-x > C-w C-x o C-@ M-< C-w C-x o C-y C-x 1, twenty-seven keypresses, You owe me a beer: you can make a macro to do that in many fewer keypresses. Now can we stop this off-topic thread? -- g From kragen at pobox.com Sat May 12 16:43:49 2001 From: kragen at pobox.com (kragen@pobox.com) Date: Wed Nov 25 01:01:17 2009 Subject: Trivial C question: iterating through chars Message-ID: <200105122343.TAA20026@kirk.dnaco.net> Niels Walet writes: > Because you can't add integers to chars.. Why did you bother to post an incorrect answer several times when other people had already posted correct ones? From bob at drzyzgula.org Sat May 12 21:17:01 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:17 2009 Subject: What is the best C IDE on Linux? In-Reply-To: <20010512172424.A1934@wumpus> References: <200105120648.CAA05987@kirk.dnaco.net> <20010512172424.A1934@wumpus> Message-ID: <20010513001701.A32196@drzyzgula.org> On Sat, May 12, 2001 at 05:24:24PM -0400, Greg Lindahl wrote: > On Sat, May 12, 2001 at 02:48:51AM -0400, kragen@pobox.com wrote: > > > Right, but it puts it in a new buffer in a new window. So vi's > > ! } f m t RET, seven keypresses, becomes C-@ M-} M-| f m t RET C-x C-x > > C-w C-x o C-@ M-< C-w C-x o C-y C-x 1, twenty-seven keypresses, > > You owe me a beer: you can make a macro to do that in many fewer > keypresses. Hmmm. Emacs is more powerful than I thought. It takes fewer keystrokes to make a macro to do a command than it does to do the command? :-) Seriously, by this logic, gcc is far more powerful even than emacs. Thus, I'm going to add a new rule to your game. Anytime someone claims that the programmability of one editor to do something is equivalent to the native ability of another editor to do the same thing, they'll also be charged a beer. You owe me a beer. Actually, I guess, since I owed you one, we can call it even. > Now can we stop this off-topic thread? Sure. This seems as good a time as any... --Bob From morris at sci.hkbu.edu.hk Sun May 13 18:09:58 2001 From: morris at sci.hkbu.edu.hk (morris@sci.hkbu.edu.hk) Date: Wed Nov 25 01:01:17 2009 Subject: Myrinet in PCI 32bit running 33MHz Message-ID: To all experience Myrinet user in beowulf cluster, In Myrinet site, it is recorded that Myrinet adapter plugged in 32-bit and 64-bit running 33MHz or 66MHz varies in DMA transfer rates, thus affecting the data-rate performance. How is the data-rate performance affect the overall performance? I just got 32-bit PCI running 33MHz. Is there any significant performance difference after I replaced the current motherboard that is equipped with 64-bit, 66MHz PCI bus? Any opinion is welcome. -- Morris Law Assistant Computer Officer Address : 224 Waterloo Road, KLN, Hong Kong Science Faculty Tel : (852) 23395909 Fax : (852) 23395862 Hong Kong Baptist University WWW : http://www.sci.hkbu.edu.hk/~morris Email : morris@hkbu.edu.hk or morrismmlaw@yahoo.com ICQ : 6380626 ========================================================================= From morris at sci.hkbu.edu.hk Sun May 13 21:57:06 2001 From: morris at sci.hkbu.edu.hk (morris@sci.hkbu.edu.hk) Date: Wed Nov 25 01:01:17 2009 Subject: Myrinet in PCI 32bit running 33MHz In-Reply-To: Message-ID: Dear Dr. Skjellum, Thanks for your quick answer to my posting about Myrinet. Nowaday, there are motherboards with both 32-bit, 33MHz and 64-bit, 66MHz PCI bus. I would like to ask if the same Myrinet 2000 was plugged on the 32-bit, 33MHz PCI bus much slower than it was plugged on the 64-bit, 66MHz PCI. How does it affect the Message Passing performance? Hope to seeing your opinion. Regards, Morris Law Assistant Computer Officer Address : 224 Waterloo Road, KLN, Hong Kong Science Faculty Tel : (852) 23395909 Fax : (852) 23395862 Hong Kong Baptist University WWW : http://www.sci.hkbu.edu.hk/~morris Email : morris@hkbu.edu.hk or morrismmlaw@yahoo.com ICQ : 6380626 ========================================================================= On Sun, 13 May 2001, Tony Skjellum wrote: > yes, you can get a lot more performance from the Myrinet 2000, > if the switches are also the new generation. > > > > Anthony Skjellum, PhD, President (tony@mpi-softtech.com) > MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 > +1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com > "Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters." > > On Mon, 14 May 2001 morris@sci.hkbu.edu.hk wrote: > > > To all experience Myrinet user in beowulf cluster, > > > > In Myrinet site, it is recorded that Myrinet adapter plugged in 32-bit > > and 64-bit running 33MHz or 66MHz varies in DMA transfer rates, thus > > affecting the data-rate performance. > > > > How is the data-rate performance affect the overall performance? I > > just got 32-bit PCI running 33MHz. Is there any significant performance > > difference after I replaced the current motherboard that is equipped with > > 64-bit, 66MHz PCI bus? > > > > Any opinion is welcome. > > > > -- > > Morris Law > > Assistant Computer Officer Address : 224 Waterloo Road, KLN, Hong Kong > > Science Faculty Tel : (852) 23395909 Fax : (852) 23395862 > > Hong Kong Baptist University WWW : http://www.sci.hkbu.edu.hk/~morris > > Email : morris@hkbu.edu.hk or morrismmlaw@yahoo.com ICQ : 6380626 > > ========================================================================= > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > From sjohnsto at eso.org Mon May 14 05:45:21 2001 From: sjohnsto at eso.org (Stephen Johnston) Date: Wed Nov 25 01:01:17 2009 Subject: Kickstart Installation from CD-ROM Message-ID: <3AFFD361.D67F4B7F@eso.org> Hi I would like to kickstart my nodes from a cdrom rather than a floppy, also if possible not required a network bootp server (so perhaps use a known install ip then change it afterwards) Is this possible? I am more interested in the kickstart cd rather than no bootp S. -- Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 European Southern Observatory Fax : +49 89 32006380 Karl-Schwarzschild-Strasse 2 D-85748 Garching bei Muenchen http://www.eso.org -- From modus-beowulf at pr.es.to Mon May 14 09:32:44 2001 From: modus-beowulf at pr.es.to (Patrick Michael Kane) Date: Wed Nov 25 01:01:17 2009 Subject: two kernel monte (or equivalent) for 2.4 Message-ID: <20010514093244.A30789@pr.es.to> Hi there: Is anyone aware of patches for two kernel monte, or an equivalent boot-linux-from-linux solution, that works with the 2.4 kernel? TIA, -- Patrick Michael Kane From bari at onelabs.com Mon May 14 10:21:25 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:17 2009 Subject: two kernel monte (or equivalent) for 2.4 References: <20010514093244.A30789@pr.es.to> Message-ID: <3B001415.7010402@onelabs.com> Patrick Michael Kane wrote: > Hi there: > > Is anyone aware of patches for two kernel monte, or an equivalent > boot-linux-from-linux solution, that works with the 2.4 kernel? > Take a look at http://www.linuxbios.org. There is a 2.4 version in the CVS to have Linux boot and then jump to another Linux kernel for x86 and Alpha. The source is easily modified for other processors. Bari Ari email: bari@onelabs.com O.N.E. Technologies 1505 Old Deerfield Road tel: 773-252-9607 Highland Park, IL 60035 fax: 773-252-9604 http://www.onelabs.com From modus-beowulf at pr.es.to Mon May 14 10:43:17 2001 From: modus-beowulf at pr.es.to (Patrick Michael Kane) Date: Wed Nov 25 01:01:17 2009 Subject: two kernel monte (or equivalent) for 2.4 In-Reply-To: <3B001415.7010402@onelabs.com>; from bari@onelabs.com on Mon, May 14, 2001 at 12:21:25PM -0500 References: <20010514093244.A30789@pr.es.to> <3B001415.7010402@onelabs.com> Message-ID: <20010514104317.A1907@pr.es.to> * Bari Ari (bari@onelabs.com) [010514 10:22]: > Patrick Michael Kane wrote: > > > Hi there: > > > > Is anyone aware of patches for two kernel monte, or an equivalent > > boot-linux-from-linux solution, that works with the 2.4 kernel? > > > Take a look at http://www.linuxbios.org. There is a 2.4 version in the > CVS to have Linux boot and then jump to another Linux kernel for x86 and > Alpha. The source is easily modified for other processors. This is LOBOS, right? I assume it still requires the kernel to be patched? Best, -- Patrick Michael Kane From jmlinley at ix.netcom.com Mon May 14 11:03:30 2001 From: jmlinley at ix.netcom.com (Jacques Minot) Date: Wed Nov 25 01:01:17 2009 Subject: Duron Benchmarks. Message-ID: <01051412033000.01037@schiller.poets.com> Does anyone have any benchmarks comparing Duron to Athlon wulfs? From cblack at eragen.com Mon May 14 13:38:11 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:17 2009 Subject: Linpack benchmarking Message-ID: <20010514163811.F13492@getafix.EraGen.com> I would like to run some linpack benchmarks on our cluster to see how we compare to the computers listed at http://www.top500.org/. I've browsed around looking for packages to run this benchmark on a beowulf, but from reading top500's docs, all I found was fortran source code for the test. I was wondering if anyone knows of a good package to run a similar test to what is run for the top500 machines. I know it consists of solving many systems of linear equations. I don't have MPI installed on our cluster but would be willing to install it for this benchmark if it is easy. Has anyone run such a benchmark on their beowulf? Our interconnect is just switched 100mbit, so a program that does not need any/much internode communication would be best. Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010514/471f7dd5/attachment.bin From alvin at Mail.Linux-Consulting.com Mon May 14 14:00:47 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:17 2009 Subject: two kernel monte (or equivalent) for 2.4 In-Reply-To: <20010514093244.A30789@pr.es.to> Message-ID: hi patrick i dont know exactly what monte does, but if i remember correctly... monte allows you to swap out linux-2.2.19 and install linux-2.4.3 kernel instead...while its running ??? ( tricky stuff if thats what it does... if not...plain old raid1 mirroring will also allow you to boot linux if one disk crash, to be able to boot off the other disk have fun alvin http://www.Linux-1U.net On Mon, 14 May 2001, Patrick Michael Kane wrote: > Hi there: > > Is anyone aware of patches for two kernel monte, or an equivalent > boot-linux-from-linux solution, that works with the 2.4 kernel? > From joeyraheb at usa.net Mon May 14 14:25:36 2001 From: joeyraheb at usa.net (Joey Raheb) Date: Wed Nov 25 01:01:18 2009 Subject: Linpack benchmarking References: <20010514163811.F13492@getafix.EraGen.com> Message-ID: <001b01c0dcbc$6afb7280$6982e440@copper> LINPack is available at http://www.netlib.org/benchmark/hpl This is the parallel version of linpack and requires MPI, I recently started running this bencmark on our clusters and it was trivial to install and get going. Joey ----- Original Message ----- From: "Chris Black" To: Sent: Monday, May 14, 2001 4:38 PM Subject: Linpack benchmarking From edwards at icantbelieveimdoingthis.com Mon May 14 16:10:54 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:18 2009 Subject: Scyld and root directory Message-ID: <20010514171054.A4166@icantbelieveimdoingthis.com> Jag: Thanks for the message. I have tried your suggestion but the append line seems to give the kernel problems I get a kernel panic about not being able to open an initial console. It doen't seem to matter where I put the append command. I have been successful putting the default boot image on to a node and rebooting. Only when the append line is included does the kernal puke. I have another general question. I have altered the standard partition table to have a large swap file local on the slave node. Also, I have a large disk area that has no label. Hwoever, when I issue a remote df -k I only see the nfs-mounted file system from the mater node and a ramdisk. Is this normal? Where are the other partitions? If I use bpsh 0 fdisk, I can read the partition table well enough. Art Edwards From agrajag at linuxpower.org Mon May 14 15:36:15 2001 From: agrajag at linuxpower.org (Jag) Date: Wed Nov 25 01:01:18 2009 Subject: Scyld and root directory In-Reply-To: <20010514171054.A4166@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Mon, May 14, 2001 at 05:10:54PM -0600 References: <20010514171054.A4166@icantbelieveimdoingthis.com> Message-ID: <20010514153615.F15130@kotako.analogself.com> On Mon, 14 May 2001, Art Edwards wrote: > Jag: > > Thanks for the message. I have tried your suggestion but the append line seems > to give the kernel problems I get a kernel panic about not being able to > open an initial console. It doen't seem to matter where I put the append > command. I have been successful putting the default boot image on to a node and > rebooting. Only when the append line is included does the kernal puke. Hrm.. I was wrong with what I said before (that's what I get for trying to respond from my parents' house). Check out the file /etc/beowulf/config There is a line that starts with 'kernelcommandline' After that is the options that are passed to the kernel. Try adding the mem= to the end of that line. > > I have another general question. I have altered the standard partition > table to have a large swap file local on the slave node. Also, I have a large > disk area that has no label. Hwoever, when I issue a remote df -k I only > see the nfs-mounted file system from the mater node and a ramdisk. Is this > normal? Where are the other partitions? If I use bpsh 0 fdisk, I can read > the partition table well enough. swap partitions never show up with 'df'. Use a command such as 'free' to see see how much swap space the node has. You should be able to use that number to see if your swap partition is being recognized. Also, make sure you edit /etc/beowulf/fstab to indicate where your swap partition is as well as what your other partition is and where you want it mounted. Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010514/36a20b3c/attachment.bin From erik at hendriks.cx Mon May 14 19:48:21 2001 From: erik at hendriks.cx (Erik Arjan Hendriks) Date: Wed Nov 25 01:01:18 2009 Subject: two kernel monte (or equivalent) for 2.4 In-Reply-To: <20010514093244.A30789@pr.es.to>; from modus-beowulf@pr.es.to on Mon, May 14, 2001 at 09:32:44AM -0700 References: <20010514093244.A30789@pr.es.to> Message-ID: <20010514224821.A28052@hendriks.cx> On Mon, May 14, 2001 at 09:32:44AM -0700, Patrick Michael Kane wrote: > Hi there: > > Is anyone aware of patches for two kernel monte, or an equivalent > boot-linux-from-linux solution, that works with the 2.4 kernel? Here's the straight forward port of two kernel monte to 2.4. The stay in protected mode thing seems to have a problem but dropping all the way to real mode seems to work fine. - Erik Index: kmonte.c --- kmonte.c 2000/10/31 17:23:38 1.19 +++ kmonte.c 2001/05/15 02:44:37 @@ -1,7 +1,7 @@ /*------------------------------------------------------------ -*- C -*- * 2 Kernel Monte a.k.a. Linux loading Linux on x86 * - * Erik Arjan Hendriks + * Erik Arjan Hendriks * Copyright (C) 2000 Scyld Computing Corporation * * This program is free software; you can redistribute it and/or modify @@ -21,21 +21,15 @@ * $Id: kmonte.c,v 1.19 2000/10/31 17:23:38 hendriks Exp $ *--------------------------------------------------------------------*/ -/* Auto-configuration stuff for things living outside the linux kernel - * source tree. */ -/* Include files, designed to support most kernel versions 2.0.0 and later. */ #include #if defined(CONFIG_SMP) && ! defined(__SMP__) #define __SMP__ #endif -#if defined(CONFIG_MODVERSIONS) && defined(MODULE) && ! defined(MODVERSIONS) +#if defined(CONFIG_MODVERSIONS) && ! defined(MODVERSIONS) #define MODVERSIONS #endif - -#include #include -/* Older kernels do not include this automatically. */ -#if LINUX_VERSION_CODE < 0x20300 && defined(MODVERSIONS) +#if defined(MODVERSIONS) #include #endif @@ -53,11 +47,12 @@ * seem too bad. Fooling with the APICs looks like it will be a major * pain unless the kernel exports a few more symbols. */ #ifdef __SMP__ -#error "2 Kernel Monte doesn't work with SMP!" +#warning "2 Kernel Monte cannot doesn't work with SMP!" #endif MODULE_AUTHOR("Erik Arjan Hendriks "); MODULE_DESCRIPTION("Two Kernel Monte: Loads new Linux kernels from Linux."); +EXPORT_NO_SYMBOLS; /*-------------------------------------------------------------------- * Monte memory management @@ -252,11 +247,11 @@ static int monte_restart(unsigned long entry_addr, unsigned long flags); int (*real_reboot)(int, int, int, void *); +static struct semaphore monte_sem; asmlinkage int sys_monte(int magic1, int magic2, int cmd, void *arg) { int err; struct monte_param_t param; struct monte_region_t *regions=0; - static struct semaphore monte_sem = MUTEX; MOD_INC_USE_COUNT; if (magic1 != MONTE_MAGIC_1 || magic2 != MONTE_MAGIC_2) { @@ -283,7 +278,11 @@ err = -EFAULT; goto out; } +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0) down(¤t->mm->mmap_sem); +#else + down_read(¤t->mm->mmap_sem); +#endif if ((err = m_setup_page_list(regions, param.nregions))) goto out1; if ((err = m_check_page_list())) goto out1; @@ -296,7 +295,11 @@ printk("monte: failure (errno = %d)\n", -err); out1: +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0) up(¤t->mm->mmap_sem); +#else + up_read(¤t->mm->mmap_sem); +#endif out: if (regions) kfree(regions); m_pg_list_free(); @@ -466,9 +469,10 @@ struct pci_dev *dev; u16 cmd; -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,3,1) +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0) for (dev=bus->devices; dev!=NULL; dev=dev->next) { #else + struct list_head *l; for (l=bus->devices.next; l != &bus->devices; l=l->next) { dev = pci_dev_b(l); #endif @@ -480,14 +484,15 @@ static void monte_pci_disable(void) { - struct pci_bus *bus; /* Turn off PCI bus masters to keep them from scribbling on our * memory later on. */ if (pcibios_present()) { -#if LINUX_VERSION_CODE < KERNEL_VERSION(2,3,1) +#if LINUX_VERSION_CODE < KERNEL_VERSION(2,4,0) + struct pci_bus *bus; for (bus=&pci_root; bus != NULL; bus=bus->next) monte_pci_disable_bus(bus); #else + struct list_head *l; for (l=pci_root_buses.next; l != &pci_root_buses; l=l->next) monte_pci_disable_bus(pci_bus_b(l)); #endif @@ -497,7 +502,7 @@ static void restore_xt_pic(void) { - /* These following is taken from arch/i386/boot/setup.S + /* These comments are taken from linux/arch/i386/boot/setup.S * * I hope. Now we have to reprogram the interrupts :-( we put * them right after the intel-reserved hardware interrupts, at @@ -535,9 +540,9 @@ * off paging later needs to run out of an identity mapped page. * For simplicity we'll use page zero. This page is normally not * mapped at all. */ - set_bit(PG_reserved, &(mem_map+MAP_NR(__va(0)))->flags); + set_bit(PG_reserved, &(mem_map[0].flags)); if (remap_page_range(0, 0, PAGE_SIZE, PAGE_KERNEL)) { - clear_bit(PG_reserved, &(mem_map+MAP_NR(__va(0)))->flags); + clear_bit(PG_reserved, &(mem_map[0].flags)); return -EAGAIN; } /*----- POINT OF NO RETURN IS HERE --------------------------------------*/ @@ -593,6 +598,7 @@ "monte: Erik Arjan Hendriks \n", PACKAGE_VERSION); + init_MUTEX(&monte_sem); real_reboot = sys_call_table[__NR_reboot]; sys_call_table[__NR_reboot] = sys_monte; return 0; From a_mulyadi at telkom.net Sun May 13 22:29:14 2001 From: a_mulyadi at telkom.net (mulyadi) Date: Wed Nov 25 01:01:18 2009 Subject: Benchmarking tool using PVM Message-ID: <002801c0dc36$f4284b40$6968053d@thunder> Hello all I've been searching through the web, and found that almost 99% benchmarking tool for cluster is using MPI. Maybe anyone knows the PVM version of it?? I am really apreciate if you can tell me some of them. I have to stick with PVM because i'm doing research about PVM+MOSIX combo Regards Andy Mulyadi From sjohnsto at eso.org Tue May 15 01:09:29 2001 From: sjohnsto at eso.org (Stephen Johnston) Date: Wed Nov 25 01:01:18 2009 Subject: Alternative to 'hwinfo' for scsi Message-ID: <3B00E439.CA1992B8@eso.org> Hi I have a 3ware 6800 card with eight ide drives on it, i wanted to use hwinfo to get some info about the disks, but this is for IDE drives and the card makes the drives appear like scsi to the o/s Is there a scsi alternative? TIA S. -- Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 European Southern Observatory Fax : +49 89 32006380 Karl-Schwarzschild-Strasse 2 D-85748 Garching bei Muenchen http://www.eso.org -- From zolia at lydys.sc-uni.ktu.lt Tue May 15 03:46:06 2001 From: zolia at lydys.sc-uni.ktu.lt (zolia) Date: Wed Nov 25 01:01:18 2009 Subject: traffic gathering Message-ID: Hello beo, i am writing an application which lets to start various parallel mpi/pvm programs. What could be the bet way to gather traffic statistics from these programs. I just need to compare generated traffic diference in between pvm and mpi versions of the save program. I could use some traffic monitor like ntop, passing him hosts and ports. And here is other question. What could be fastest way to get all ports, which are responsible for communication. Besides i could patch those pvm/mpi programs with statistics code, but it should be simple enough and not interfere (i mean, that it would be easy enough to patch :) with original code. What could be examples of such code? thanx ==================================================================== Antanas Masevicius Kaunas University of Technology Studentu 48a-101 Computer Center LT-3028 Kaunas LITNET NOC UNIX Systems Administrator Lithuania E-mail: zolia@sc.ktu.lt From brunobg at lsi.usp.br Tue May 15 06:34:37 2001 From: brunobg at lsi.usp.br (Bruno Barberi Gnecco) Date: Wed Nov 25 01:01:18 2009 Subject: Scyld and root directory Message-ID: Jag wrote: > > How do I write in the / directory of a Scyld client? I need at > > least some symbolic links (such as usr->rootfs/usr). > Why are you wanting this symlink? Once the bproc daemon on the slave > chroot's to /rootfs, there really isn't any way to access the real / as > all the jobs that get propegated over there use /rootfs as their /. > If the node came up all the way, there's no way to access the real / for > reading or writing, so I don't see what good this would do you. > If you're trying to start up something before the chroot happens, I > suggest you do it after. It'll save you the headache of trying to make > both / and /rootfs a sane root to run your programs in. It's because I keep having problems when I try to run mpi_mandel, for example. The client complains that vmadump couldn't open the library, and the master: [root@rv00]# mpi_mandel p0_1880: p4_error: net_create_slave: bproc_rfork: -1 p4_error: latest msg from perror: Broken pipe bm_list_1883: p4_error: interrupt SIGINT: 2 I straced and the problems seems to be with shmget() (sorry, I don't have to logs here, but I can send you later). I already tried everything that I knew, and nothing worked. The setup_libs script doesn't work, complaing first tar: usr/lib/libgmodule-1.2.so.0.0.6: Cannot open: No such file or directory then, about the libs in /lib tar: lib/libdl-2.1.3.so: Cannot open: File exist The problem really seems to be that the client nodes can't open the libraries in /usr/lib. /etc/beowulf/config has the line: libraries /lib /usr/lib Would you please help me? It's been a while that I'm stuck with this problem. Any ideas are greatly accepted. -- Bruno Barberi Gnecco http://www.geocities.com/RodeoDrive/1980/ Quoth the Raven, "Nevermore". - Poe From gran at scali.no Tue May 15 06:24:13 2001 From: gran at scali.no (=?iso-8859-1?Q?=D8ystein?= Gran Larsen) Date: Wed Nov 25 01:01:18 2009 Subject: Payload size in Linda messages Message-ID: <3B012DFD.C8F95F01@scali.no> Hi folks! At Scali we are testing a beta version of IP over SCI (called ScaIP). One possible application of ScaIP is to make Linda available for users of Scali clusters. We are not familiar with the full range of applications that use Linda, but one interesting application is the computational chemistry system Gaussian. Of course, we find that the performance of ScaIP depends on message size, but we do not know the range of payloads Gaussian (or other Linda applications) use. Do anybody have information to share with us here? Thanks in advance! -?ystein -- ?ystein Gran Larsen, Dr.Scient mailto:gran@scali.no Tel:+47 2262-8982 --------------------------------------------------------------------- MPI?SCI=HPC -- Scalable Linux Systems -- www.scali.com Subscribe to our mailing lists at http://www.scali.com/support From edwards at icantbelieveimdoingthis.com Tue May 15 09:15:48 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:18 2009 Subject: Scyld and root directory Message-ID: <20010515101548.A18212@icantbelieveimdoingthis.com> Jag: The advice about swap and the local space worked very well. I modified fstab and I, and beostat can now see the swap and root file space on the node. I also tried the kernelcommand line with less success. I'll keep poking around. This is the last obvious problem with my Scyld installation. Thanks for your help. Any other ideas are welcome. Art Edwards From rgb at phy.duke.edu Tue May 15 09:19:03 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:18 2009 Subject: Benchmarking tool using PVM In-Reply-To: <002801c0dc36$f4284b40$6968053d@thunder> Message-ID: On Mon, 14 May 2001, mulyadi wrote: > Hello all > > I've been searching through the web, and found that almost 99% benchmarking > tool for cluster is using MPI. Maybe anyone knows the PVM version of it?? I > am really apreciate if you can tell me some of them. I have to stick with > PVM because i'm doing research about PVM+MOSIX combo PVM's "examples" directory (typically /usr/share/pvm3/examples or /usr/local/pvm3/examples) contains a number of examples that are also simple benchmarks. There is a bandwidth tester and a latency tester, in addition to a few simulated work examples for different programming paradigms. Because they are universally available they are convenient to use for comparisons. Because they directly measure parametric IPC performance, they are also highly relevant. Of course, if you're doing research on this anyway it's a great time to consider turning these examples into a suite of more meaty benchmarks and contributing them back...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From a_mulyadi at telkom.net Mon May 14 09:38:54 2001 From: a_mulyadi at telkom.net (mulyadi) Date: Wed Nov 25 01:01:18 2009 Subject: Benchmarking tool using PVM References: Message-ID: <003a01c0dc98$b7a72c60$0c68053d@thunder> Hello Mr Brown > Of course, if you're doing research on this anyway it's a great time to > consider turning these examples into a suite of more meaty benchmarks > and contributing them back...;-) That's what i'm thinking now. I'll propose some FFT and Monte Carlo benchmark. The FFT one is based on Decimation In Frequency, and the Monte Carlo simulation for area estimation. Maybe you can help me to improve it?? Regards Andy Mulyadi From rgb at phy.duke.edu Tue May 15 10:18:34 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:18 2009 Subject: Benchmarking tool using PVM In-Reply-To: <003a01c0dc98$b7a72c60$0c68053d@thunder> Message-ID: On Mon, 14 May 2001, mulyadi wrote: > Hello Mr Brown > > > > > Of course, if you're doing research on this anyway it's a great time to > > consider turning these examples into a suite of more meaty benchmarks > > and contributing them back...;-) > > That's what i'm thinking now. I'll propose some FFT and Monte Carlo > benchmark. The FFT one is based on Decimation In Frequency, and the Monte > Carlo simulation for area estimation. Maybe you can help me to improve it?? Sure, I'm looking for benchmarks to add to my collection on brahma anyway. I'd be happy to (as I have time, unfortunately). rgb > > Regards > > Andy Mulyadi > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Eugene.Leitl at lrz.uni-muenchen.de Wed May 16 06:26:21 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:18 2009 Subject: CCL:Gaussian 98 in PCcluster (fwd) Message-ID: ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Wed, 16 May 2001 11:54:19 +0300 (EEST) From: Maija Lahtela To: chemistry@ccl.net Subject: CCL:Gaussian 98 in PCcluster Dear All, We are going to build up a PC cluster wiht Red Hat linux in order to run Gaussian 98 A.9. Our propose is to start with 16 PCs and then enlarge it 100 PCs. We have have tested by running Gaussian with one PC but the results has not been very couraging while the code is slow. We have do not have Kinda yet. However, I have found articles about running Gaussian with mpi which sounds interesting for us. We would appreciate if you could give us hint where we could find gaussian mpi version or how you have build up your custer to run Gaussian jobs. Thanks in advance! I will summarize. Yours Sincerely, Maija Lahtela-Kakkonen ***************************************************** Maija Lahtela-Kakkonen, Application Scientist / Chemistry CSC-Scientific Computing Tekniikantie 15 a D, P.O.Box 405 FIN-02101 ESPOO FINLAND TEL 358-9-4572079 /050-3819506, FAX 358-9-4572302 E-MAIL mlahtela@csc.fi, Internet:www.csc.fi **************************************************** -= This is automatically added to each message by mailing script =- CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From stormlaboratory at yahoo.com Wed May 16 09:27:48 2001 From: stormlaboratory at yahoo.com (Angel Dimitrov) Date: Wed Nov 25 01:01:18 2009 Subject: Beuwulf fo weather prediction Message-ID: <20010516162748.31058.qmail@web10906.mail.yahoo.com> Hello, Is there someone that use Beuwulf cluster for weather simulations? I have an idea to run the mesoscale numerical model MM5 on 2 computers (PC, Linux) but first I want to learn more details how exactly to do this.... Regards, Angel Dimitrov ===== ----------------------------------------------- Angel Dimitrov Storm Laboratory Sofia University, Bulgaria Physic faculty, Department of Meteorology tel. +359 052 475-919 http://www.angelfire.com/sc/stormlab/index.html __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From power_liuxjie at 263.net Wed May 16 10:16:52 2001 From: power_liuxjie at 263.net (ÁõöνÜ) Date: Wed Nov 25 01:01:18 2009 Subject: Ask some question Message-ID: <3B02B604.02359@mta4> _____________________________________________ IP¿¨¡¢ÉÏÍø¿¨ÌøË®¼Û http://shopping.263.net/category08.htm NO.5ÏãË®µêÓ­ÏÄÈÈÂô http://shopping.263.net/perfume/ From jgl at unix.shell.com Wed May 16 10:48:41 2001 From: jgl at unix.shell.com (J. G. LaBounty) Date: Wed Nov 25 01:01:18 2009 Subject: kickstart question Message-ID: <200105161748.MAA02548@volta.shell.com> Is there some parameter to the kickstart "part" command that will tell it to check for bad blocks when building the filesystem? ie clearpart --linux part /boot --size 31 --ondisk hda part / --size 1500 --ondisk hda --RUNBADBLOCKonthis?? part swap --size 1000 --ondisk hda part /tmp --size 70 --ondisk hda part /work1 --size 40 --grow --ondisk hda part swap --size 1000 --ondisk hdc part swap --size 1000 --ondisk hdc part /work2 --size 40 --grow --ondisk hdc John From lindahl at conservativecomputer.com Wed May 16 11:23:59 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:18 2009 Subject: Beuwulf fo weather prediction In-Reply-To: <20010516162748.31058.qmail@web10906.mail.yahoo.com>; from stormlaboratory@yahoo.com on Wed, May 16, 2001 at 09:27:48AM -0700 References: <20010516162748.31058.qmail@web10906.mail.yahoo.com> Message-ID: <20010516142359.C5350@wumpus.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿnone> On Wed, May 16, 2001 at 09:27:48AM -0700, Angel Dimitrov wrote: > Is there someone that use Beuwulf cluster for weather simulations? > > I have an idea to run the mesoscale numerical model MM5 on 2 > computers (PC, Linux) but first I want to learn more details how > exactly to do this.... Sure, lots of people run the MPI version of MM5. For example, there's a guy at Utah who's going to use an Athlon cluster running MM5 to do regional predictions for the winter Olympics. And people run mm5 on the AlphaLinux cluster at the Forecast Systems Lab in Boulder. -- g From tibbs at math.uh.edu Wed May 16 11:44:15 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: Wed Nov 25 01:01:18 2009 Subject: kickstart question In-Reply-To: "J. G. LaBounty"'s message of "Wed, 16 May 2001 12:48:41 -0500" References: <200105161748.MAA02548@volta.shell.com> Message-ID: >>>>> "JGL" == J G LaBounty writes: JGL> Is there some parameter to the kickstart "part" command that will tell JGL> it to check for bad blocks when building the filesystem? All I see in the 7.1 source is: for n in args: (str, arg) = n if str == '--size': size = int(arg) elif str == '--maxsize': maxSize = int(arg) elif str == '--grow': grow = 1 elif str == '--onpart' or str == '--usepart': onPart = arg elif str == '--ondisk': device = arg elif str == '--bytes-per-inode': fsopts = ['-i', arg] elif str == '--onprimary': partNum = int(arg) elif str == '--type': type = int(arg) elif str == "--active": active = 1 elif str == "--asprimary": primOnly = 1 elif str == "--noformat": format = 0 If you know a bit of Python, it doesn't look too terribly difficult to hack up an option for it. - J< From Eugene.Leitl at lrz.uni-muenchen.de Wed May 16 14:47:27 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:18 2009 Subject: FY;) [Fwd: furbeowulf cluster] Message-ID: <3B02F56F.9420F3CF@lrz.uni-muenchen.de> -------- Original Message -------- From: david mankins Subject: furbeowulf cluster To: silent-tristero@world.std.com http://www.trygve.com/furbeowulf.html - david mankins (dm@bbn.com, dm@world.std.com) From edwards at icantbelieveimdoingthis.com Wed May 16 13:05:53 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:18 2009 Subject: Hung job Message-ID: <20010516140553.B24106@icantbelieveimdoingthis.com> I'm running a parallel job using MPI and, when I killed the process on the head node, it didn't die on the slave. Is there a command that kills al processes (or specified processes) on a slave node? Art Edwards -- Arthur H. Edwards 712 Valencia Dr. NE Abq. NM 87108 (505) 256-0834 From rgb at phy.duke.edu Wed May 16 13:57:12 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:18 2009 Subject: FY;) [Fwd: furbeowulf cluster] In-Reply-To: <3B02F56F.9420F3CF@lrz.uni-muenchen.de> Message-ID: On Wed, 16 May 2001 Eugene.Leitl@lrz.uni-muenchen.de wrote: > http://www.trygve.com/furbeowulf.html God. I could actually feel the brain cells dying while carefully examining this site (possibly due to the beer I'm sucking down, possibly not). I can only conclude that it is part of some hienous plot to sap the life-forces of geeks everywhere. Thank heavens the Tamagachi didn't come with a networking interface -- if it fell into the hands of these fiends the result could end civilization as we know it. Can you just imagine the conversations? "Oops. A node just died." "What, it broke? Call the dealer." "No, it died. I got behind in feeding all the nodes today and it starved to death. Gotta pen? I have to press the reset button and start feeding the next one or I'll have a mass extinction event..." rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From sgaudet at angstrommicro.com Wed May 16 13:07:37 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:18 2009 Subject: kickstart question In-Reply-To: <200105161748.MAA02548@volta.shell.com> References: <200105161748.MAA02548@volta.shell.com> Message-ID: <990043657.3b02de092c5f4@localhost> Quoting "J. G. LaBounty" : > > Is there some parameter to the kickstart "part" command that > will tell it to check for bad blocks when building the filesystem? > > ie > clearpart --linux > part /boot --size 31 --ondisk hda > part / --size 1500 --ondisk hda --RUNBADBLOCKonthis?? > part swap --size 1000 --ondisk hda > part /tmp --size 70 --ondisk hda > part /work1 --size 40 --grow --ondisk hda > part swap --size 1000 --ondisk hdc > part swap --size 1000 --ondisk hdc > part /work2 --size 40 --grow --ondisk hdc Hello John, There is not currently an option to the 'part' kickstart command to enable checking for bad blocks. It does not look terribly hard to modify Anaconda, which is RedHat's installer, to accept such a parameter. The alternative is to check for bad blocks at the end of the install, in the post section of the kickstart file. It would be something like this: %post umount /dev/hda1 && e2fsck -c /dev/hda1 umount /dev/hda2 && e2fsck -c /dev/hda2 ... do that for all your partitions You would be able to see the output from that command one of the virtual terminals, I think the 4th one. Cheers, Stephen Gaudet Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From rgb at phy.duke.edu Wed May 16 14:22:46 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:18 2009 Subject: Hung job In-Reply-To: <20010516140553.B24106@icantbelieveimdoingthis.com> Message-ID: On Wed, 16 May 2001, Art Edwards wrote: > I'm running a parallel job using MPI and, when I killed the process on > the head node, it didn't die on the slave. Is there a command that kills > al processes (or specified processes) on a slave node? Are you running LAM-MPI by any chance? If so, read the docs. LAM is a bit tricky to start up and shut down. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From newt at scyld.com Wed May 16 15:17:26 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:18 2009 Subject: Hung job In-Reply-To: <20010516140553.B24106@icantbelieveimdoingthis.com> Message-ID: Art, On Wed, 16 May 2001, Art Edwards wrote: > I'm running a parallel job using MPI and, when I killed the process on the head node, it didn't die on the slave. Is there a command that kills al processes (or specified processes) on a slave node? Killing a process on a slave node is exactly like killing a process on the master. Use kill with the process ID from the master. Nothing special is required. Regards, Dan Ridge Scyld Computing Corporation From newt at scyld.com Wed May 16 17:38:06 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:18 2009 Subject: Hung job In-Reply-To: Message-ID: List readers, Replying to one's own posts is an early sign of senility, Nonetheless I wrote: > > I'm running a parallel job using MPI and, when I killed the process on the head node, it didn't die on the slave. Is there a command that kills al processes (or specified processes) on a slave node? > > Killing a process on a slave node is exactly like killing a process on the > master. Use kill with the process ID from the master. Nothing special is > required. But I was out of my mind. This advice only applies on Scyld Beowulf systems. Sorry if I caused any confusion on this point. Regards, Dan Ridge Scyld Computing Corporation From brian at posthuman.com Wed May 16 21:02:20 2001 From: brian at posthuman.com (Brian Atkins) Date: Wed Nov 25 01:01:18 2009 Subject: More on P4 thermal throttling Message-ID: <3B034D4C.CF62A354@posthuman.com> The throttling may not be solvable, even with mega-cooling methods due to internal hot spot(s): http://www.inqst.com/articles/athlon4/0516main.htm Throttling will vary chip to chip because of thermal diode inconsistencies. Intel must have developed a huge case of big-company-arrogance in order to make all the bad decisions they've made over the last few years. -- Brian Atkins Director, Singularity Institute for Artificial Intelligence http://www.singinst.org/ From brian at posthuman.com Wed May 16 21:19:28 2001 From: brian at posthuman.com (Brian Atkins) Date: Wed Nov 25 01:01:18 2009 Subject: AMD issues beowulf press release Message-ID: <3B035150.B675469F@posthuman.com> They are really starting to push the whole beowulf aspect: http://biz.yahoo.com/bw/010517/2683.html "SUNNYVALE, Calif.--(BUSINESS WIRE)--May 17, 2001-- AMD today announced that five more academic institutions have each installed new supercomputers using the award-winning AMD Athlon(TM) processor..." -- Brian Atkins Director, Singularity Institute for Artificial Intelligence http://www.singinst.org/ From jcownie at etnus.com Thu May 17 02:45:49 2001 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:01:18 2009 Subject: American Megatrends Inc. Introduces PC Diagnostic Solution for Linux Message-ID: <150KLu-4EY-00@etnus.com> Might be of interest to the folks who were looking for disk diagnostics a while ago (though it's only in BETA at the moment). http://www.ami.com/ami/showpress.cfm?PrID=77 (I haven't tried it, and it's not clear whether it will cost $$$ or not !) -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From ole at scali.no Thu May 17 10:46:15 2001 From: ole at scali.no (Ole W. Saastad) Date: Wed Nov 25 01:01:18 2009 Subject: ATLAS vs. Intel Math Kernel Library Message-ID: How does the performance of ATLAS generated libraries compare to the Intel Math Kernel Library? Intel claim P4 support, but so does ATLAS. I would also like too see FFT in the ATLAS project. Fast FFT are always in demand! Is it worth the extra investment for the Intel package ? Ole W Saastad Scali AS. From dvos12 at calvin.edu Thu May 17 11:00:53 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:18 2009 Subject: AMD issues beowulf press release In-Reply-To: <3B035150.B675469F@posthuman.com> Message-ID: If you have any decently sized athlon cluster, AMD is willing to get pretty involved. Several AMD guys came out to our location and we were the smallest installation in that list (Calvin College, 18 nodes). David On Thu, 17 May 2001, Brian Atkins wrote: > They are really starting to push the whole beowulf aspect: > > http://biz.yahoo.com/bw/010517/2683.html > > "SUNNYVALE, Calif.--(BUSINESS WIRE)--May 17, 2001-- AMD today announced > that five more academic institutions have each installed new supercomputers > using the award-winning AMD Athlon(TM) processor..." > -- > Brian Atkins > Director, Singularity Institute for Artificial Intelligence > http://www.singinst.org/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From jtao at artsci.wustl.edu Thu May 17 12:18:23 2001 From: jtao at artsci.wustl.edu (Jian Tao) Date: Wed Nov 25 01:01:18 2009 Subject: Several Questions Message-ID: <200105171819.f4HIJD922809@ascc.artsci.wustl.edu> 1. How could I make the monitor, which is pluged into a node, show me what is going on in the node ? 2. When I tried to use "beofdisk -q" on the server to get some infomation, nothing appears. I waited for a few minutes before I used ^c to stop the process. Should there be any prompt appear? Should I wait for a longer time? 3. When I use "beostatus", there comes out an error message, "segmentation dumped". Are there anything I could do to correct this error? Thank you very much ! Yours, Jian From siegert at sfu.ca Thu May 17 11:28:09 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:18 2009 Subject: ATLAS vs. Intel Math Kernel Library In-Reply-To: ; from ole@scali.no on Thu, May 17, 2001 at 07:46:15PM +0200 References: Message-ID: <20010517112809.B19626@stikine.ucs.sfu.ca> On Thu, May 17, 2001 at 07:46:15PM +0200, Ole W. Saastad wrote: > I would also like too see FFT in the ATLAS project. > Fast FFT are always in demand! Wouldn't that be an unnecessary duplication of efforts? ATLAS is for linear algebra. FFTs are done very well by FFTW (which even has MPI routines). Just check http://www.fftw.org Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From jakob at unthought.net Thu May 17 12:20:06 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:18 2009 Subject: Remote boot w. PXE etc. Message-ID: <20010517212006.A24293@unthought.net> Hello people ! I've been trying to get some nodes booting over PXE. I want the machines to load the kernel over the network and only use local disk for /tmp and swap. (I may even do that over the network as well) I tried the Intel PXE server from RedHat 7.[01], and had *some* success. Meaning, I can get the basic configuration to work, but I cannot configure the PXE server so that each node (identified by MAC or IP) gets it's own configuration. This is a requirement for my setup. I simply cannot find documentation. I looked at bpbatch as recommended in the remote-boot HOWTO, but got annoyed because they have a license that I have a hard time agreeing with, and there's no RPM etc. etc. Lazyness, pride and stubbornness is a great combo ;) Does anyone have docs for the Intel PXE server that's included with RedHat, or did you have success booting using some other PXE package, or something entirely different ? Thanks, -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From mail at thomas-boehme.de Thu May 17 12:43:55 2001 From: mail at thomas-boehme.de (Thomas R Boehme) Date: Wed Nov 25 01:01:18 2009 Subject: Remote boot w. PXE etc. Message-ID: Hi, I would recommend using syslinux for the PXE remote boot process It's a lot easier and works reliable for 32 nodes in our cluster. It also gives you the possibility to configure every node independently. You can find it at: ftp://ftp.kernel.org/pub/linux/utils/boot/syslinux/ Just make sure you follow the pxelinux.doc file in the archive. The most important part is getting the right tftp-server (we use tftp-hpa, see the docs). Hope that helps, Thommy -----Original Message----- From: Jakob ?stergaard [mailto:jakob@unthought.net] Sent: Thursday, May 17, 2001 2:20 PM To: Beowulf Mailing List Subject: Remote boot w. PXE etc. Hello people ! I've been trying to get some nodes booting over PXE. I want the machines to load the kernel over the network and only use local disk for /tmp and swap. (I may even do that over the network as well) I tried the Intel PXE server from RedHat 7.[01], and had *some* success. Meaning, I can get the basic configuration to work, but I cannot configure the PXE server so that each node (identified by MAC or IP) gets it's own configuration. This is a requirement for my setup. I simply cannot find documentation. I looked at bpbatch as recommended in the remote-boot HOWTO, but got annoyed because they have a license that I have a hard time agreeing with, and there's no RPM etc. etc. Lazyness, pride and stubbornness is a great combo ;) Does anyone have docs for the Intel PXE server that's included with RedHat, or did you have success booting using some other PXE package, or something entirely different ? Thanks, -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From sgaudet at angstrommicro.com Thu May 17 11:52:51 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:18 2009 Subject: Several Questions In-Reply-To: <200105171819.f4HIJD922809@ascc.artsci.wustl.edu> References: <200105171819.f4HIJD922809@ascc.artsci.wustl.edu> Message-ID: <990125571.3b041e03cd808@localhost> Hello Jain, > 1. How could I make the monitor, which is pluged into a node, > show me what is going on in the node ? This requires a bit more specification... it doesn't really make sense. do you want to know what is going on on OTHER nodes? if you have a monitor plugged into a node, you can figure out what's going on the same way you would in general - Use a 'top' menu to see what processes are running and what they are utilizing. a packet sniffer will show you network activity. check the unix 'ps' command for greater detail. to see what OTHER machines in your cluster are doing, there are several options: #1. log in to each machine and see (not a very good thing to do, but it DOES work). #2. use a program such as VNC to bring up the desktop to each machine on another machine. also not great. #3. use the Simple Network Management Protocol (SNMP) to have a workstation monitor all of your machines at once. #4. use premade cluster management software, such as the open source, VACM (http://www.sourceforge.net) that uses SNMP and other tools to see what's going on with other machines. #5. Beowulf has it's BProc stuff, too. > 2. When I tried to use "beofdisk -q" on the server > to get some infomation, nothing appears. I waited for a few minutes > before I used ^c to stop the process. Should there be any prompt appear? > Should I wait for a longer time? check your logs, because it SHOULD issue a prompt when it's done. watch the 'top' menu to see if something's happening. > 3. When I use "beostatus", there comes out an error message, "segmentation > dumped". this probably explains your problem with beofdisk as well. When a segmentation fault occurs something is damaged or corrupted. I'd say first reinstall the beowulf software and make sure it's properly configured. try recompiling it (this is only applicable for open source. i forget if scyld's stuff is COMPLETELY open. i think it is) for your system. It sounds like your binaries might not be intact. Hope this helps. Stephen Gaudet Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From jakob at unthought.net Thu May 17 12:57:51 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:18 2009 Subject: Remote boot w. PXE etc. In-Reply-To: ; from mail@thomas-boehme.de on Thu, May 17, 2001 at 03:43:55PM -0400 References: Message-ID: <20010517215751.B24293@unthought.net> On Thu, May 17, 2001 at 03:43:55PM -0400, Thomas R Boehme wrote: > Hi, > > I would recommend using syslinux for the PXE remote boot process > It's a lot easier and works reliable for 32 nodes in our cluster. > It also gives you the possibility to configure every node independently. > Cool ! Do you know if it supports booting from local disk as well ? Ideally I want a menu to appear on the machines where the user can select "Boot real OS from boot server" and "Boot experimental stuff from HDD". I was just looking into http://www.kano.org.uk/projects/pxe/ but it seems to suffer from the same problem as the Intel PXE daemon, that all clients must have the same config. However, this code should be in a state where it's fixable - the Intel code was horrible beyond imagination. You can find it at: > ftp://ftp.kernel.org/pub/linux/utils/boot/syslinux/ > > Just make sure you follow the pxelinux.doc file in the archive. The most > important part is getting the right tftp-server (we use tftp-hpa, see the > docs). I'll check out syslinux now. >> > Hope that helps, > Thommy > I'll let the list know what happens (or, worst case, you will watch it on CNN... ;) Thanks a lot ! -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From timothy.g.mattson at intel.com Thu May 17 18:07:47 2001 From: timothy.g.mattson at intel.com (Mattson, Timothy G) Date: Wed Nov 25 01:01:18 2009 Subject: ATLAS vs. Intel Math Kernel Library Message-ID: Ole, The MKL librarires are worth the price since they are free. You can download them at no cost from the intel.developer web site. As for performance comparisons, its mixed. Sometimes Atlas is faster, sometimes MKL is faster. One factor in MKL's favor is its size --- MKL is a lot more than just the BLAS. It includes FFT's, LAPACK, and a very fast vector transcendental library. -Tim Mattson Intel Corp (not part of the MKL team, but close to them) -----Original Message----- From: Ole W. Saastad [mailto:ole@scali.no] Sent: Thursday, May 17, 2001 10:46 AM To: beowulf@beowulf.org Subject: ATLAS vs. Intel Math Kernel Library How does the performance of ATLAS generated libraries compare to the Intel Math Kernel Library? Intel claim P4 support, but so does ATLAS. I would also like too see FFT in the ATLAS project. Fast FFT are always in demand! Is it worth the extra investment for the Intel package ? Ole W Saastad Scali AS. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Eugene.Leitl at lrz.uni-muenchen.de Fri May 18 17:14:17 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:18 2009 Subject: [Fwd: AMD-based Beowulf cluster] Message-ID: <3B05BAD9.EC5CFDBF@lrz.uni-muenchen.de> -------- Original Message -------- From: "Jerome Baudry" Subject: AMD-based Beowulf cluster To: Hello, I saw your post on the CCL: ----- you wrote ------ If you have any decently sized athlon cluster, AMD is willing to get pretty involved. Several AMD guys came out to our location and we were the smallest installation in that list (Calvin College, 18 nodes). David --------------------- We are planning for the design of our Beowulf cluster, and we are considering AMD procs. What kind of support did you get from AMD ? Technical, financial, development ? Where you happy with what they provided (support-wise I mean) Thanks very much in advance, Sincerely Jerome ***************************************** Jerome Baudry, Ph.D. Research Scientist, Computational Chemistry TransTech Pharma, Inc. 4170 Mendenhall Oaks Pwky, Suite 110 High Point, NC, 27265 http://www.ttpharma.com jbaudry@ttpharma.com tel: (336) 841-0300 #120 fax: (336) 841-0310 From Eugene.Leitl at lrz.uni-muenchen.de Sat May 19 01:56:26 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:18 2009 Subject: [Fwd: RE: Etherboot and other things.] Message-ID: <3B06353A.C01EBA76@lrz.uni-muenchen.de> -------- Original Message -------- From: "Larry G. Linde" Subject: RE: Etherboot and other things. To: Since there has been a large amount of discussion about many things relating to: A: fitting linuxbios into a 256k flash B: booting via etherboot from same flash. C: console mode to select different boot images. D: failover boot if one fails E: other misc things. We have a version of many of the items above for the SiS630 chipset. It has been tested on the matsonic M7308e And the procomp bst1b motherboards. We have decided to post our latest efforts to the net a bit early. It is not really linuxbios but it uses large parts of the linuxbios code tree with heavy mods. It not done yet but it does do the following things: 1: Fits in 256k easily. It could fit in 128k with some trimming. 2: boots from any ide device using a std Linux kernel. 3: boot via Ethernet via a port of etherboot 4: has a built in command interp based on forth 5: supports full ext2 access to/from the disk devices for kernel image selection and other info. 6: supports read AND WRITE to/from the flash device while running. You can copy a new flash image via the Ethernet and then burn it into the flash without removing the part from the MB. (note. Be careful with this) 7: inits the vga display and kbd devices for a interactive console. 8: reads and writes to/from the nvram to store boot params etc. 9: supports debug params and diag info to/from the serial port. You could also set the serial port to be the console for the boot params. 10: boots a std Linux kernel from disk or Ethernet faster that the display can power up normally. 11: auto size and setup memory dims 12: init and setup the cpu and bus speed params. 13: deal with power up on power fail modes. 14: we pulled in most everything a normal bios does prior to booting Linux. So you can use a standard kernel image. There are a few things we have not finished yet. 15: we also fully setup the pci bus with mem/io and ints and pass the info to the kernel in the std bios table format. It should be easy to take parts of the code and re-wedge it back into linuxbios. Our goal is a bit different than the standard linuxbios effort. We do not want the entire kernel in flash. We want the ability to boot from a disk or network ie: replace the bios but work better than an a bios for Linux. The code will be located starting Monday 5/21/2001 at: ftp://opensource.talkware.net/pub/tiara there will be a .tgz with the full source/build directory as well as a .bin image that can be put into a 256k flash if you just want to play. There are many things we are still working on and it's a long way from being done but it might be of use For some of the things that have been discussed on the list. have fun. If you have any questions you can send them to: tiara@talkware.net or post to the list we have several people that read it. -The Talkware tiara group. From binabina at mindspring.com Sat May 19 05:15:04 2001 From: binabina at mindspring.com (Zubin) Date: Wed Nov 25 01:01:18 2009 Subject: beowulf software administration and c/fortran compilers Message-ID: <000901c0e05d$55c61880$a55b56d1@zubinsabine> Hello, I am about to build a beowulf(4 nodes). I want to purchase the software as a commercial grade, "cluster kit". Has anyone purchased the Portland Group's "Cluster Development Kit"? Would you recommend for administration, and application development. Easy to install and maintain with tools? If Portland's not the best "Kit", can someone recommend an alternative? I want to focus on the development of my algorithms, and not the maintenance/setup of the beowulf. From newt at scyld.com Sat May 19 15:38:22 2001 From: newt at scyld.com (Daniel Ridge) Date: Wed Nov 25 01:01:18 2009 Subject: beowulf software administration and c/fortran compilers In-Reply-To: <000901c0e05d$55c61880$a55b56d1@zubinsabine> Message-ID: On Sat, 19 May 2001, Zubin wrote: > Hello, I am about to build a beowulf(4 nodes). I want to purchase the > software as a commercial grade, "cluster kit". Has anyone purchased the > Portland Group's "Cluster Development Kit"? Would you recommend for > administration, and application development. Easy to install and maintain > with tools? Scyld's Beowulf distribution is a commercial grade product that you can use directly from a $2.00 CD. It dramatically simplifies the routine administration of a Beowulf cluster. Support is available from Scyld Computing. More information available from www.scyld.com Regards, Dan Ridge Scyld Computing Corporation From timothy.g.mattson at intel.com Sun May 20 15:23:48 2001 From: timothy.g.mattson at intel.com (Mattson, Timothy G) Date: Wed Nov 25 01:01:18 2009 Subject: beowulf software administration and c/fortran compilers Message-ID: I haven't used the PGI kit, but I know the people at PGI and trust them to do a good job. The advantage of the PGI kit is it includes their excellent compilers. The other options I know of are scyld (I think their web site is www.scyld.com) and OSCAR (www.openclustergroup.org). All of these work and should give you what you need. --Tim -----Original Message----- From: Zubin [mailto:binabina@mindspring.com] Sent: Saturday, May 19, 2001 5:15 AM To: beowulf@beowulf.org Subject: beowulf software administration and c/fortran compilers Hello, I am about to build a beowulf(4 nodes). I want to purchase the software as a commercial grade, "cluster kit". Has anyone purchased the Portland Group's "Cluster Development Kit"? Would you recommend for administration, and application development. Easy to install and maintain with tools? If Portland's not the best "Kit", can someone recommend an alternative? I want to focus on the development of my algorithms, and not the maintenance/setup of the beowulf. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jared_hodge at iat.utexas.edu Mon May 21 07:13:16 2001 From: jared_hodge at iat.utexas.edu (Jared Hodge) Date: Wed Nov 25 01:01:18 2009 Subject: Instrumenting a parallel code Message-ID: <3B09227C.3383B68C@iat.utexas.edu> We are working on instrumenting a parallel finite element code we have been working on. We want to analyze how the system is being utilized, and produce reliable, and quantifiable results so that in the future we'll know if clusters designed for this code should be designed to optimize network bandwidth, CPU speed, memory size, or other variables. Basically what we want to do is measure performance degradation as each of these decreases (and vise versa). This won't give us absolute numbers for all variables, since there will obviously be plateaus in performance, but it's a start. Here's what I've got in mind for each of these areas, please let me know if you have any suggestions. Memory This one is pretty easy as far as memory size. We can just launch another application that will allocate a specific amount of memory and hold it (with swap off of course). I'm not sure if adjusting and measuring memory latency is feasible or too great a concern. Network We're writing a series of wrapper functions for the MPI calls that we are using that will time their execution. This will give us a good indication of the blocking nature of communication in the program. CPU usage I'm really not sure how we can decrease this one easily other than changing the bus multiplier in hardware. A timeline of CPU usage would at least give us a start (like capturing top's output), but this would alter the performance too (invasive performance monitor). We could just use the network measurements and assume that whenever a node is not communicating or blocked for communication, it's "computing", but that is definitely an over simplification. Any useful comments or suggestions would be appreciated. Thanks. -- Jared Hodge Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: Jared_Hodge@iat.utexas.edu From klobej at union.edu Mon May 21 07:36:13 2001 From: klobej at union.edu (Joshua T. Klobe) Date: Wed Nov 25 01:01:18 2009 Subject: MPI or PVM enabled jre? Message-ID: As a junior in college trying to devise a useful and interesting senior project, I was wondering why it seems that there is no java support for MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are more than welcome. -Josh Klobe From mail at thomas-boehme.de Mon May 21 07:55:24 2001 From: mail at thomas-boehme.de (Thomas R Boehme) Date: Wed Nov 25 01:01:18 2009 Subject: Instrumenting a parallel code Message-ID: Hi, You might want to take a look at Paradyn. It's a software package for instrumenting parallel code and analyzing performance. You can get it at: http://www.cs.wisc.edu/paradyn/ I used it only on serial codes so far, but it looks quite powerful. cu, Thommy -----Original Message----- From: Jared Hodge [mailto:jared_hodge@iat.utexas.edu] Sent: Monday, May 21, 2001 9:13 AM To: beowulf@beowulf.org Subject: Instrumenting a parallel code We are working on instrumenting a parallel finite element code we have been working on. We want to analyze how the system is being utilized, and produce reliable, and quantifiable results so that in the future we'll know if clusters designed for this code should be designed to optimize network bandwidth, CPU speed, memory size, or other variables. Basically what we want to do is measure performance degradation as each of these decreases (and vise versa). This won't give us absolute numbers for all variables, since there will obviously be plateaus in performance, but it's a start. Here's what I've got in mind for each of these areas, please let me know if you have any suggestions. Memory This one is pretty easy as far as memory size. We can just launch another application that will allocate a specific amount of memory and hold it (with swap off of course). I'm not sure if adjusting and measuring memory latency is feasible or too great a concern. Network We're writing a series of wrapper functions for the MPI calls that we are using that will time their execution. This will give us a good indication of the blocking nature of communication in the program. CPU usage I'm really not sure how we can decrease this one easily other than changing the bus multiplier in hardware. A timeline of CPU usage would at least give us a start (like capturing top's output), but this would alter the performance too (invasive performance monitor). We could just use the network measurements and assume that whenever a node is not communicating or blocked for communication, it's "computing", but that is definitely an over simplification. Any useful comments or suggestions would be appreciated. Thanks. -- Jared Hodge Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: Jared_Hodge@iat.utexas.edu _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Mon May 21 08:32:14 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:18 2009 Subject: Instrumenting a parallel code In-Reply-To: <3B09227C.3383B68C@iat.utexas.edu> Message-ID: On Mon, 21 May 2001, Jared Hodge wrote: > We are working on instrumenting a parallel finite element code we have > been working on. We want to analyze how the system is being utilized, > and produce reliable, and quantifiable results so that in the future > we'll know if clusters designed for this code should be designed to > optimize network bandwidth, CPU speed, memory size, or other variables. > Basically what we want to do is measure performance degradation as each > of these decreases (and vise versa). This won't give us absolute > numbers for all variables, since there will obviously be plateaus in > performance, but it's a start. Here's what I've got in mind for each of > these areas, please let me know if you have any suggestions. > > Memory > This one is pretty easy as far as memory size. We can just launch > another application that will allocate a specific amount of memory and > hold it (with swap off of course). I'm not sure if adjusting and > measuring memory latency is feasible or too great a concern. > > > Network > We're writing a series of wrapper functions for the MPI calls that we > are using that will time their execution. This will give us a good > indication of the blocking nature of communication in the program. > > CPU usage > I'm really not sure how we can decrease this one easily other than > changing the bus multiplier in hardware. A timeline of CPU usage would > at least give us a start (like capturing top's output), but this would > alter the performance too (invasive performance monitor). We could just > use the network measurements and assume that whenever a node is not > communicating or blocked for communication, it's "computing", but that > is definitely an over simplification. > > Any useful comments or suggestions would be appreciated. Thanks. Two comments/suggestions: a) Look over lmbench (www.bitmover.com) as a microbenchmark basis for your measurements. It has tools to explicitly measure just about anything in your list above and more besides. It is used by Linus and the kernel developers to test various kernel subsystems, so it gives you a common basis for discussion with kernel folks should the need arise. It might not do everything you need -- in many cases you will be more interested in stream or cpu-rate like measures of performance that combine the effects of cpu speed and memory speed for certain tasks -- but it does a lot. b) Remember the profiling commands (compile with -pg and use gprof with gcc, for example). In a lot of cases profiling a simple run will immediately tell you whether the code is likely to be memory or CPU bound or bound by trancendental (library) speed or bound by network speed. At the very least you can see where it spends its time on average and then add some timing code to those routines and core loops to determine what subsystem(s) are the rate limiting bottlenecks. I actually think that your project is in an area where real "beowulf research" needs to occur. I have this vision of a suite of microbenchmarks built into a kernel module and accessed via /proc/microbench/whatever that provide any microbenchmark in the suite on demand (or perhaps more simply an ordinary microbenchmark generating program that runs at an appropriate runlevel at boot and stores all the results in e.g. /var/microbench/whatever files). In either or both cases I'd expect that both the latest measurement and a running average with full statistics over invokations of the microbenchmark program would be provided. Either way, the microbenchmark results for a given system would become a permanent part of the commonly available system profile that is available to ALL programs and programmers after the first bootup. System comparison and systems/beowulf/software engineering would all be immeasurably enhanced -- one could write programs that autotune to at least a first order approximation off of this microbenchmark data, and for many folks and applications this would be both a large improvement over flat untuned code and "enough". More complex/critical programs or libraries could refine the first order approximation via an ATLAS-like feedback process. Hope this helps, rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From RSchilling at affiliatedhealth.org Mon May 21 09:44:00 2001 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Wed Nov 25 01:01:18 2009 Subject: MPI or PVM enabled jre? Message-ID: <51FCCCF0C130D211BE550008C724149E01165607@mail1.affiliatedhealth.org> Actually if you look at the PVM web site you will see a version of PVM written in Java. Not sure if there is a version of Java for MPI. Since the focus with PVM and MPI is on message passing, it's relatively to implement the same functionality in Java using simple sockets and datagrams. What I have not seen yet is a Java Virtual Machine that runs as a distributed application. Now that would be interesting. Richard Schilling Web Integration Programmer/Webmaster phone: 360.856.7129 fax: 360.856.7166 URL: http://www.affiliatedhealth.org Affiliated Health Services Information Systems 1971 Highway 20 Mount Vernon, WA USA > -----Original Message----- > From: Joshua T. Klobe [mailto:klobej@union.edu] > Sent: Monday, May 21, 2001 7:36 AM > To: beowulf@beowulf.org > Subject: MPI or PVM enabled jre? > > > As a junior in college trying to devise a useful and > interesting senior > project, I was wondering why it seems that there is no java > support for > MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are > more than welcome. > -Josh Klobe > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > From germ at research.att.com Tue May 8 10:35:26 2001 From: germ at research.att.com (Mitch Germansky) Date: Wed Nov 25 01:01:18 2009 Subject: scyld scsi support References: Message-ID: <3AF82E5E.E708FC6D@research.att.com> daniel, thx much for getting back in touch. we wanted to use scsi for the master. we got around this by installing an IDE drive. what you explained below is how we dealt with the slave nodes. but it didn't appear that there was a way to install the master on scsi off of the scyld cdrom. tell me otherwise. thx for your help! --mitch Daniel Ridge wrote: > Mitch, > > > just about to get started with scyld cluster (once i receive the cdrom > > from linuxcentral). > > recommended hardware lists IDE, but no SCSI. > > > > is SCSI supported for the boot disk? > > SCSI usually works just fine with the Scyld Beowulf software -- although > the level of 'support' provided with the $2.00 CD is 'label side up'. > > I'm not sure what you mean by the 'boot disk'. If you mean the CD as > used to install the master, then yes. If you mean the CD (or node floppy) > as used to boot the nodes, then sort-of. > > The Scyld node boot process has a number of different phases that come > into play here. The first kernel we boot doesn't know anything about SCSI > -- cheer up -- it doesn't know anything about IDE either. All it knows how > to do is grab a kernel over the network and jump to it (via 2-kernel > monte). > > The second-phase kernel can (and by default does) support SCSI. You can > also use tools like 'insmod' and 'modprobe' to plug new modules into node > kernels after your nodes are up. Under scyld, 'insmod' and 'modprobe' take > the additional argument '--node ' and use this as a target kernel to > insert modules into. > > Regards, > Dan Ridge > Scyld Computing Corporation From tbecker at linuxnetworx.com Thu May 10 09:42:00 2001 From: tbecker at linuxnetworx.com (Ted Becker) Date: Wed Nov 25 01:01:18 2009 Subject: Fortran 90 Message-ID: Could anyone tell me what bufferin and bufferout is for F90? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010510/4defd49a/attachment.html From mprinkey at aeolusresearch.com Mon May 21 08:09:21 2001 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed Nov 25 01:01:18 2009 Subject: MPI or PVM enabled jre? References: Message-ID: <3B092FA1.F851DB65@aeolusresearch.com> I certainly wouldn't want to speak for the entire community, but I think that most of us are just now crawling out of the FORTRAN days. The next step is to C, and not even to C++. Experience has borne out the performance advantages of "low-tech" languages like FORTRAN and C for intense number crunching. The performance of object-oriented languages in general and Java in particular are suspect for the types of problems that typically require high-performance parallel hardware. Mike Prinkey Aeolus Research, Inc. "Joshua T. Klobe" wrote: > > As a junior in college trying to devise a useful and interesting senior > project, I was wondering why it seems that there is no java support for > MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are > more than welcome. > -Josh Klobe > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From todd.burch at amd.com Thu May 17 12:04:03 2001 From: todd.burch at amd.com (todd.burch@amd.com) Date: Wed Nov 25 01:01:18 2009 Subject: AMD Supercomputer News Release Message-ID: <858788618A93D111B45900805F85267A02FEEC22@caexmta3.amd.com> Please forward all inquiries to me, as I can help refer them to an AMD field sales rep in their respective area. Regards, Todd R. Burch AMD Public Relations -----Original Message----- From: David Vos [mailto:dvos12@calvin.edu] Sent: Thursday, May 17, 2001 12:00 PM To: Burch, Todd Subject: Re: AMD Supercomputer News Release Someone posted a link to this onto beowulf@beowulf.org, and I made some mention about how AMD even got involved in our small cluster. I imediately have received a number of emails asking questions. It might be nice for you or someone at AMD to post a message to beowulf@beowulf.org telling people who to contact at your company about building Athlon clusters. One person who emailed me said something about 120+ node installations, so I think this might be worth your time. David On Thu, 17 May 2001 todd.burch@amd.com wrote: > > The following news release crossed Business Wire Thursday, May 17, 2001 at > 12:01 AM EDT. > -------------------------------------------------------- > > Contact: > Todd Burch > AMD Public Relations > (408) 749-4581 > todd.burch@amd.com > or > Scott Carroll > AMD Public Relations > (512) 602-8483 > scott.carroll@amd.com > > > > > AMD ATHLON(tm) PROCESSOR GAINING GLOBAL RECOGNITION AS SUPERCOMPUTING > SUPERSTAR > > > - NASA, National Science Foundation-funded university research programs in > the U.S., universities in Hong Kong and Japan employ AMD Athlon(tm) > processor-based supercomputers - > > SUNNYVALE, CA -MAY 17, 2001-AMD today announced that five more academic > institutions have each installed new supercomputers using the award-winning > AMD AthlonTM processor. Cited for their powerful performance, scalability, > and flexibility to expand in a cluster environment, a series of AMD Athlon > processor-based supercomputers have been employed for research programs at > the Hong Kong University of Science & Technology, the Tokyo Institute of > Technology, a National Aeronautics and Space Administration (NASA) funded > program at the University of California at Santa Cruz (UCSC), as well as > National Science Foundation (NSF) funded programs at Western Michigan > University and Calvin College. These wins demonstrate how the AMD Athlon > processor is continuing to expand its reputation as a powerful, innovative > and reliable solution for supercomputing platforms used for scientific > research. > > "This once again proves our AMD Athlon processor is a great choice for > cutting-edge computer platforms targeted for computation-intensive > applications created by academic researchers," said Ed Ellett, > vice-president of Workstation and Server Marketing for AMD. "As the need > for increased performance and bandwidth continues, we are committed to > developing more powerful processors to meet that challenge. We eagerly look > forward to supporting critical research projects with leading academic > institutions around the world." > > The Hong Kong University of Science & Technology, one of the most > prestigious higher education institutions in Hong Kong, has developed a > supercomputer featuring 80 AMD Athlon processors. > > "This AMD processor-based cluster provides a powerful tool for the > advancement of scientific research," says Associate Professor P W Leung of > HKUST's Physics Department. "We can perform realistic simulations, design > advanced composite materials through accurate modeling, and also tackle the > most challenging problems in modern material physics involving complex > materials where the electronic states are strongly correlated." > > The Tokyo Institute of Technology, one of the most prestigious higher > education institutions in Japan, has built the PRESTO III, a 78 AMD Athlon > processor-based cluster that will be employed at the Matsuoka Laboratory of > the Global Scientific Information and Computing Center & Department of > Mathematical and Computing Sciences. > > "The objective of the PRESTO series of Grid clusters project is to enable > cost-effective solutions to empower the computational Grid, investigate > effective software used for commodity clustering, and conduct simulation and > application studies on the Grid for various scientific applications such as > operations research, high energy physics, and neuroscience," said Professor > Satoshi Matsuoka of the Tokyo Institute of Technology. "We want to thank > the sponsors of Japan's national PRESTO program of the Japan Science and > Technology Corporation (JST), and AMD for its processor technology." > > The National Aeronautics and Space Administration (NASA) has helped fund a > 32 AMD Athlon processor-based cluster node supercomputer located at UCSC. > The UCSC supercomputer, developed and built by Racksaver, Inc. through the > assistance of Dolphin Interconnect, will be used to study collisional > processes in the solar system, and run simulations of planetary dynamos, > such as the one responsible for Earth's magnetic field. > > "The university's Earth Sciences, Astronomy and Physics departments now have > the ability to solve complex research problems 24 hours a day, seven days a > week on our own local research cluster," said Erik Asphaug, UCSC principal > investigator of the new 32-node research supercomputer. "Also, we can now > create, archive, and visualize our data locally, and this removes the data > bottlenecks and enhances our student's educational environment." > > Another 32-node supercomputer has been installed at the ParInt Research > Group at Western Michigan University under an NSF-funded grant. "Very early > on in our purchasing decision process we decided to go with AMD Athlon > processors, for their performance and pricing, and we have not been > disappointed," said Elise de Doncker, Professor in the Computer Science > department at Western Michigan University. "The cluster has been very > reliable and invaluable to our research efforts in parallel numerical > integration, and for class projects in various advanced computer science > courses." > "The Department of Computer Science at Calvin College is committed to > providing its students with hands-on experience using cutting-edge > technologies, including high performance computing," said Joel Adams, > Professor of Computer Science, Calvin College in Grand Rapids, Michigan, the > location of an 18 AMD Athlon processor-based cluster. "The cluster will > also greatly benefit our faculty researchers in their individual research > programs. We are grateful to the National Science Foundation, NFP > Enterprises, and AMD for their help in making this a successful project." > > This trend follows AMD Athlon processor-based supercomputers already > installed in the University of Delaware, the University of Kentucky, and the > University of Utah, and reflects a growing number of universities obtaining > and benefiting from the use of powerful supercomputer systems based on AMD > processor technology. Each of these systems employ the Beowulf Cluster > design architecture, which involves connecting each processor in parallel to > maximize speed and processing power while providing inter-communications > between the processors and compute nodes, and use a Linux-based operating > system. > > About AMD > > AMD is a global supplier of integrated circuits for the personal and > networked computer and communications markets with manufacturing facilities > in the United States, Europe, Japan, and Asia. AMD, a Fortune 500 and > Standard & Poor's 500 company, produces microprocessors, flash memory > devices, and support circuitry for communications and networking > applications. Founded in 1969 and based in Sunnyvale, California, AMD had > revenues of US$4.6 billion in 2000. (NYSE: AMD). > > Visit AMD on the Web > > For more news and product information, please visit our virtual pressroom at > . Additional press > releases are available at > > -30- > > AMD, the AMD logo, AMD Athlon, and combinations thereof, are trademarks of > Advanced Micro Devices, Inc. Other product names are for informational > purposes only and may be trademarks of their respective companies. > From dmarkh at cfl.rr.com Tue May 15 01:49:20 2001 From: dmarkh at cfl.rr.com (Mark Hounschell) Date: Wed Nov 25 01:01:18 2009 Subject: [SLE] Alternative to 'hwinfo' for scsi References: <3B00E439.CA1992B8@eso.org> Message-ID: <3B00ED90.BDBB1854@cfl.rr.com> Stephen Johnston wrote: > > Hi > > I have a 3ware 6800 card with eight ide drives on it, i wanted to use > hwinfo to get some info about the disks, but this is for IDE drives and > the card makes the drives appear like scsi to the o/s > > Is there a scsi alternative? > > TIA > > S. > "kcmshell scsi" from an xterm -- Mark Hounschell dmarkh@cfl.rr.com From brian at thefinleys.com Mon May 14 19:05:13 2001 From: brian at thefinleys.com (Brian Elliott Finley) Date: Wed Nov 25 01:01:18 2009 Subject: [SystemImager-discuss] VA - System Imager In-Reply-To: ; from JParker@coinstar.com on Thu, May 10, 2001 at 09:02:15AM -0700 References: Message-ID: <20010514210513.A1778@thefinleys.com> Thus spake JParker@coinstar.com (JParker@coinstar.com): > G'Day ! > > I am having problems with VA-SystemImager (ver 1.4.0). It seems that I > can not get my remote machine to retrieve the kernel and reboot as per > step 6 and 7 of the HOWTO. > > I do not have a floppy or cdrom attached to the machine, so I am using the > rsync/updateclient -autoinstall method. > > The problem is that when I try to run "updateclient -autoinstall -server > bhead -c eth0" it crashes because it can not find the modules > Getopt::Long, etc. A quick search of the hard drive on the local node > confirms it is not a part of the standard Perl-5.005 debian package, but > it is located on my head server bhead. But it is part of the standard Perl-5.005 debian package. [bfinley@dr-jeckyl:~] $ dpkg --search /usr/lib/perl5/5.005/Getopt/Long.pm perl-5.005: /usr/lib/perl5/5.005/Getopt/Long.pm Perhaps it was deleted somehow? Maybe try doing a: "$ apt-get install perl-5.005 --reinstall" > I believe the cause of my confusion may be the documentation. As I read > step 6, where you prepare the boot media, the example on how to prepare a > remote machine's local hd is exactly the same as the instructions for step > 7, where the actually transfer of the image takes place. > > Do you need to "compile" the perl script on the headnode prior to > transfering to the remote machine ? If so what is the command to do this > ? > > BTW, all my remote nodes have a base Debian install with basic networking > installed. > > Another question. During the step 5, I did not have systemImager write to > the /etc/hosts file. All my nodes already have the correct network > settings. Is this a problem ? You should at least copy this file to /tftpboot/systemimager on the imageserver as it is used in part of the autoinstall process. If you do these things, then you should be able to run: "updateclient -autoinstall -server bhead -c eth0" and everything should work fine. If it still doesn't work, mail the list again. Cheers, -Brian > > cheers, > Jim Parker > > Sailboat racing is not a matter of life and death .... It is far more > important than that !!! -- ------------------------------------------------------- Brian Elliott Finley VA Linux http://valinux.com/ http://thefinleys.com/ phone: 972.447.9563 http://systemimager.org/ phax: 801.912.6057 CSA, C2000, CNE, CLSE, MCP, and Certifiable Linux Nut ------------------------------------------------------- From mprinkey at aeolusresearch.com Thu May 17 07:41:29 2001 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed Nov 25 01:01:18 2009 Subject: 1U P4 Systems Message-ID: <3B03E319.A01EE9B1@aeolusresearch.com> Hello, I have seen a few clusters assembled with 1U AMD Athlon systems. Has anyone seen/built/melted down a 1U P4 system? Cooling is of course an issue, as it is with the AMDs. Another is the availability of low profile power supplies with the extra P4 power connector and sufficient wattage rating. Thanks, Mike Prinkey Aeolus Research, Inc. From josh at lnxi.com Wed May 16 13:16:03 2001 From: josh at lnxi.com (joshua harr) Date: Wed Nov 25 01:01:18 2009 Subject: two kernel monte Message-ID: <3B02E003.AD10520A@lnxi.com> Hi Patrick, Eric Biederman's kexec() patch for 2.4 will fit your bill I think. You'll also need his mkElfImage utility. -- Joshua Harr Linux NetworX From jeffrey.b.layton at lmco.com Mon May 14 05:18:12 2001 From: jeffrey.b.layton at lmco.com (Jeffrey B Layton) Date: Wed Nov 25 01:01:18 2009 Subject: Disk reliability (Was: Node cloning) References: <3AD1C348.638EBC4E@icase.edu> Message-ID: <3AFFCD04.BD5DBA78@lmco.com> Hello, I hate to dredge up this topic again, but ... . I've got a machine with an IBM drive that is giving me the following errors, kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } as discussed in previous emails on the list. I followed the pointers that Josip gave and ran the IBM code on the drive. It said the drive was fine. However, I'm still getting the same error messages. Anybody care to suggest anything else to look at? Perhaps cabling or a new motherboard (it's an Abit board). TIA, Jeff Josip Loncaric wrote: > Thanks to several constructive responses, the following picture emerges: > > (1) Modern IDE drives can automatically remap a certain number of bad > blocks. While they are doing this correctly, the OS should not even see > a bad block. > > (2) However, the drive's capacity to do this is limited to 256 bad > blocks or so. If more bad blocks exist, then the OS will start to see > them. To recover from this without replacing the hard drive, one can > detect and map out the bad blocks using 'e2fsck -c ...' and 'mkswap -c > ...' commands. Obviously, the partition where this is being done should > not be in use (turn swap off first, unmount the file system or reboot > after doing "echo '-f -c' >/fsckoptions"). > > (3) In general, IDE cables should be at most 18" long with both ends > plugged in (no stubs), and preferably serving only one (master) drive. > > For IBM drives (IDE or SCSI), one can download and use the Drive Fitness > Test utility (see > http://www.storage.ibm.com/techsup/hddtech/welcome.htm). This program > can diagnose typical problems with hard drives. In many cases, bad > blocks can be 'healed' by erasing the drive using this utility (back up > your data first, and be prepared for the 'Erase Disk' to take an hour or > more). If that fails and your drive is under warranty, the drive ought > to be replaced. > > For older existing drives (in less critical applications, e.g. to boot > Beowulf client nodes where the same data is mirrored by other nodes) > mapping out bad blocks as needed is probably adequate. > > Finally, the existing Linux S.M.A.R.T. utilities apparently do not > handle every SMART drive correctly. Use with caution. > > Sincerely, > Josip > > -- > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From g.gorman at ic.ac.uk Mon May 14 04:48:53 2001 From: g.gorman at ic.ac.uk (Gerard Gorman) Date: Wed Nov 25 01:01:18 2009 Subject: p4_error: net_recv read: References: Message-ID: <3AFFC625.221F7D8E@ic.ac.uk> Hi, I'm having this problem on our cluster while using MPICH1.2 (alphas running osf1 connected via a switch): rm_l_5_3390: p4_error: net_recv read: probable EOF on socket: 1 I have been searching the archives/net for an insight to the problem but all I have found is people reporting the same problem (under linux). The problem arises only when I run on some number of processors greater than 4. The processes are all *reading* the same file which had been NFS mounted across the nodes (I have included the usual perror checks). Has anyone experienced similar problems/know what I should be trying to fix? All help appreciated, g ---------------------------------------------------------- Gerard Gorman (PhD Student) Applied Modelling and Computation Group T. H. Huxley School Imperial College Prince Consort Road Tel. 00 44 (0)207 594 9323 London SW7 2BP Fax. 00 44 (0)207 594 9321 U.K. o~o A good slogan beats a good solution. -----------------------w-v-w------------------------------ From Jon.Tegner at wiglaf.se Mon May 14 11:45:22 2001 From: Jon.Tegner at wiglaf.se (Jon Tegner) Date: Wed Nov 25 01:01:18 2009 Subject: Problem with va-systemimager Message-ID: <3B0027C2.4D98454F@wiglaf.se> Are about to set up a cluster and figured systemimager would be a good way (have used kickstart previously - is there a consensus of which method is "best"?). However, when testing on a fresh system Partition Magic detects some kind of error: "Partition Magic has detected an error 116 on the partition starting at sector 17157420 at disk 1. The starting LBA value is 17157420 and the CHS value is 16450559. The LBA and the CHS values must be equal, Partition Magic has verified that the LBA value is correct." Partition Magic can even fix this error, but I don't want to to load in Partition Magic on all nodes (would take too long time), so I was wondering if there is another easy way to fix this problem, or preventing it from occurring in the first place (nothing seems to be wrong when I make the image). Regards, /jon From jgowdy at home.com Fri May 11 08:34:14 2001 From: jgowdy at home.com (Jeremiah Gowdy) Date: Wed Nov 25 01:01:18 2009 Subject: interesting Athlon/P4 discussion from FreeBSD-Q-l In-Reply-To: <3AFBEDFA.20503@icantbelieveimdoingthis.com> Message-ID: <000701c0da2f$d5b634a0$03e2cbd8@sherline.net> > the P4 has an awesome combination of hardware prefetcher, > fast FSB, and dram that keeps up with it. for code that > needs bandwidth, this is very attractive. and it's dramatically > faster than anything else in the ia32 world: 1.6 GB/s versus > at most around .8 GB/s for even PC2100 DDR systems (at least > so far - I'm hopeful that DDR can manage around 1.2 GB/s when > tuned, and if the next-gen Athlon contains hardware prefetch.) According to InQuest's article: http://www.inqst.com/articles/p4bandwidth/p4bandwidthmain.htm The Culprit - Longer Burst Length Since the P4 is not getting any more work done than the P3 in this application, then its excess bandwidth demand is probably just extraneous, meaningless bus noise. If so, this is a poor marketing justification for higher bandwidth. The P4 uses a 128-byte sectored cache line. This means that most external burst accesses will be 128-bytes long, though some can be abbreviated to 64-bytes long (perhaps code fetches, some write backs or cache misses to the second sector). By the way, this type of long sectored cache design can negatively impact cache-hit rates. If 40% of external bus accesses are 64-bytes, then perhaps 40% of the cache lines are only using 64 of the 128-bytes available per line. This would mean that up to 20% of cache memory is empty (unused, invalid or unallocated). This would negatively impact P4 cache hit rates and thus, performance. And as for the faster DRAM, you are of course referring to RDRAM memory. I've yet to see a benchmark that shows RDRAM actually putting out the bandwidth it claims in real world applications (or most benchmarks). While the memory bandwidth of the P4 with RDRAM is, on paper, faster than anything else in the IA-32 world, in almost every benchmark I've ever seen in which the benchmarking program wasn't specially optimized for SSE2, the Athlon 1.3 GHz has kicked the absolute crap out of the P4 1.5 GHz. What good is all that memory bandwidth if the processor can't stand up to real world applications ? I could make a cpu/memory/chipset combo and say "if you use it this way, it's the fastest computer ever created", but the people are saying "But all of our applications don't do it that way !" Rambus proponents (mostly stockholders and people who's bought a P4 for an outrageous price) always make claims about how great it would be if only applications were optimized for it. "It's ahead of its time" they say, over time, applications will be optimized for it. Fine. Someday, when things actually run FASTER on your 400mhz bus and your 1.7ghz cpu, maybe the rest of us will consider buying one, IF it doesn't cost the price of a small car. But at this point, show me something, anything, that makes it worth it to spend twice as much on a P4 with RDRAM than an Athlon with DDR SDRAM. However, I don't think that day will ever actually come. Rambus is going down the toilet now that they're losing these lawsuits. Intel is disgusted with the whole Rambus affair. DDR SDRAM is FAR cheaper, and is sold by more than one vendor. The 64bit CPU jihad is coming soon, so the Athlon and the P4's days are numbered anyway. Sure, Intel has another P4 core on the books, and AMD *had* the Mustang on the books, but really those processors are meaningless to the high end market once the 64bit cpus come out. Optimizing current 32bit applications especially for the P4 and its RDRAM is nonsensical. All of the Rambus coulda/shoulda/woulda/if-only-this/if-only-that means nothing. And perhaps in the right place and the right time, RDRAM is a superior product. But the market doesn't always favor a superior product. Think of Rambus as Sony. Think of the DDR SDRAM vendors as the VHS vendors. Sony had A LOT more respect, a lot more money, a lot more everything, and yet they couldn't beat the cheaper VHS. It's the classic example of the proprietary expensive best quality model vs the standardized cheaper not as good quality model (but did I mention cheaper?), only in this case, it's not even proven that RDRAM IS the best. So if companies who have a demonstrably better product can't win that fight, how can Rambus and the P4, when they AREN'T demonstrably better ? I won't even begin to get into the Macintosh/Motorola vs Windows/Intel battle. Short and simple: The superior product doesn't always win, and the P4+RDRAM have yet to prove beyond a doubt that they are superior. Just think about it. _______________________________ Jeremiah Gowdy - IT Manager Sherline Products Inc 3235 Executive Ridge Vista CA 92083-8527 Sales: 1-800-541-0735 International: (760) 727-5857 Fax: (760) 727-7857 _______________________________ From mprinkey at aeolusresearch.com Fri May 11 10:48:14 2001 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed Nov 25 01:01:18 2009 Subject: interesting Athlon/P4 discussion from FreeBSD-Q-l References: <3AFBEDFA.20503@icantbelieveimdoingthis.com> Message-ID: <3AFC25DE.CB76ED21@aeolusresearch.com> I have been watching the Athlon/P4 issue from a safe distance for the past few months, but at the request of a client, I finally bought Athlon and P4 test systems. The target application is a commercial CFD code. When I did the benchmarking, I was astonished. The 1.5-GHz P4/RDRAM system simply ran rings around everything else, including a fairly new Compaq ES40. Numbers look something like this: ES40, 500 MHz 1.70 iterations per minute Athlon, 1.33 GHz/DDR 1.85 P3 933 MHz/Serverworks 1.10 P4 1.5 GHz/i850/RDRAM 3.00 This was genuinely unexpected. Needless to say, I have a new found respect for the P4 and RDRAM, at least for this application. And as has been pointed out, we are now under $2000 for a rackmount 1.7-GHz P4 node with 1 GB RDRAM. Mike Prinkey Aeolus Research, Inc. "Arthur H. Edwards,1,505-853-6042,505-256-0834" wrote: > > Mark Hahn wrote: > > >> Cant vouch for correctness, but seems to have some explanations/info that > >> werent mentioned here. Feel free to rebut the content of course. > > > > > > the P4 has an awesome combination of hardware prefetcher, > > fast FSB, and dram that keeps up with it. for code that > > needs bandwidth, this is very attractive. and it's dramatically > > faster than anything else in the ia32 world: 1.6 GB/s versus > > at most around .8 GB/s for even PC2100 DDR systems (at least > > so far - I'm hopeful that DDR can manage around 1.2 GB/s when > > tuned, and if the next-gen Athlon contains hardware prefetch.) > > > > but it's also true that most code, even a lot of computational code, > > is not primarily dram-bandwidth-bound. the P4 is not exceptional > > when running real code in-cache; this is why on most benchmarks > > other than Stream, recent Athlons beat P4's quite handily. > > > > and that's why AMD is having such an awsome time in the market now, > > and why Intel is cutting prices so dramatically on the P4. > > > > regards, mark hahn. > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > Is anyone doing anectdotal benchmarks with real applications? We are > doing DFT calculations using a local basis code that is highly optimized > on serial machines. I am working on pentium III's, athlon's, and alpha > machines. I find that my 600 MHz athlon actually beats a 933 MHz Pentium > III. Also, both of these PC platforms are competitive with the alpha chips. > > I'm much more interested in benchmarks on, say, Gaussian 98, GAMESS, and > other codes. Any Athlon/P4 comparisons would be very interesting. > > Art Edwards > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From fatica at ctr-sgi1.stanford.edu Fri May 11 14:46:58 2001 From: fatica at ctr-sgi1.stanford.edu (Massimiliano Fatica) Date: Wed Nov 25 01:01:18 2009 Subject: KVM Switch References: Message-ID: <3AFC5DD2.30B8F3C6@ctr.stanford.edu> You can get a Belkin Omniview Pro 16 ports (that can be cascaded up to 256 ports) for around $450-500. It is keyboard switchable, rack mountable and works with Linux. We are using 3 of them in our cluster and they work very well. There is also an 8 port model for $300. Massimiliano "Robert G. Brown" wrote: > > On Fri, 11 May 2001, Raghubhushan Pasupathy wrote: > > > Folks, > > > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > > Cluster. Can anyone give me some directions on this since I am completely > > lost. What specs, brand etc. do you suggest? > > Why? KVM's tend to be very expensive (I know, I have a Raritan which is > an excellent choice and even keyboard-switchable BUT which costs a whole > lot -- good KVM's can cost $100 per port or even more). I also have a > really cheapo four position mechanical KVM switch that works for > keyboard and video but cannot switch PS2 mice. It degrades video > quality a bit but is fine for my simple home beowulf, where I have two > or three systems that do a bit of server stuff and hence need a console. > > Nowadays a cluster node can run anywhere from totally headless (Scyld, I > believe, is happy enough with no head at all), headless but a serial > port console (a VERY cheap option that is probably adequate for > debugging a dying boot and which can be switched with a cheap serial > switch or managed via a still not very expensive serial port server), > headless but with a cheap video card that one plugs into a monitor one > time (to set the bios and monitor the original install) and then never > again, headed but no X (X plus a GUI is quite expensive in memory and > moderately expensive in wasted CPU), and headed running X. I now have a > $3000 KVM switch that is more useful for switching between servers > (where one really does sometimes need access to a console) than between > beowulf nodes, which one generally accesses over the net anyway. > > I personally generally go with cheap S3 cards (or any sort of onboard > video if the motherboard happens to have it) and no X just to make it a > bit faster to set up the systems and debug them if/when they break. The > one hassle of running a system with no video card at all is that one > often has to put one in long enough to set up the bios, in particular to > tell the bios to run without a video card without complaining (which > most BIOS's do these days if you ask nicely). Is the time saved worth > the $30 the card costs per system? Don't know, but it's close... > > rgb > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Dr. Massimiliano Fatica Center for Turbulence Research Stanford University Ph: 650-723-9602 Fax: 650-723-9617 From gerry at cs.tamu.edu Sat May 12 05:57:15 2001 From: gerry at cs.tamu.edu (Gerry Creager N5JXS) Date: Wed Nov 25 01:01:18 2009 Subject: KVM Switch References: <20010511191537.A902@better.net> Message-ID: <3AFD332B.949D3222@cs.tamu.edu> We/ve been using the Belkin switch in our operations, both for my cluster and for a lot of other departmental stacks of PCs. They seem to be just about bulletproof, but are certainly not the cheapeest ones out there. gerry -- William Park wrote: > > On Fri, May 11, 2001 at 05:35:29PM -0400, Robert G. Brown wrote: > > On Fri, 11 May 2001, Raghubhushan Pasupathy wrote: > > > > > Folks, > > > > > > I am looking to buy a KVM switch for an 8-node(16 processor) Beowulf > > > Cluster. Can anyone give me some directions on this since I am completely > > > lost. What specs, brand etc. do you suggest? > > I remember seeing few ad in "Linux Journal". From my experience, I only > needed K/V/M at the beginning; after setup, I just use ethernet. > > > > > Why? KVM's tend to be very expensive (I know, I have a Raritan which is > > an excellent choice and even keyboard-switchable BUT which costs a whole > > lot -- good KVM's can cost $100 per port or even more). I also have a > > really cheapo four position mechanical KVM switch that works for > > keyboard and video but cannot switch PS2 mice. It degrades video > > Yes, I found this out the hard way. Abit VP6 hangs if you unplug/plug > PS/2 mouse. -- Gerry Creager -- gerry@cs.tamu.edu Network Engineering |Research focusing on Academy for Advanced Telecommunications |Satellite Geodesy and and Learning Technologies |Geodetic Control Texas A&M University 979.458.4020 (Phone) -- 979.847.8578 (Fax) From jdc at uwo.ca Wed May 9 18:14:40 2001 From: jdc at uwo.ca (Dan Christensen) Date: Wed Nov 25 01:01:18 2009 Subject: Cooling experts out there, some help please In-Reply-To: ('s message of "Tue, 8 May 2001 20:53:05 -0700 (PDT)") References: Message-ID: <87k83qattr.fsf@uwo.ca> writes: > I'd add i2c to one of the systems and start a cpu > temperature tests...over several days/weeks... > - an idle cpu will give you "ambient cpu temperature" I just tried lm-sensors/i2c on a dual processor PIII system and it reports that the cpu temps are +2C when idle and around 16C when loaded. Obviously these are wrong. The question is, are they usually wrong by a constant amount? So can I assume that my cpu's are at something like 32C when idle and 48C when loaded? I want to know this because I want to disconnect some of the eight fans that the retailer put into this box! Seems a bit excessive to me (and noisy) but I don't want to mess with it without knowing for sure whether they are needed. Anyone have any good recommendations for quiet fans? Or fans that only switch on when things get hot? Dan From ouyangl at umkc.edu Wed May 9 10:34:17 2001 From: ouyangl at umkc.edu (Ouyang, Lizhi) Date: Wed Nov 25 01:01:18 2009 Subject: Booting from ATA100 Raid on an ABit KT7A-Raid Mobo (Stephen J ohnston) Message-ID: <95A711A70065D111B58C00609451555C0AEA1BEF@UMKC-MAIL02> I was able to get the PROMISE FASTTRAK 100 to work for 2.2.x kernels although the binary modules from promise did not support kernels other than redhat 6.2/7.0. I am not aware of any HPT370 drivers. The ABIT-KT7A-RAID won't boot because the bios will only boot in the raid mode. You may want to try the linux driver from Adaptec 1200s raid card. They are in fact based on HPT370 chip set. But as you know, HPT370 are merely software raid that fools the BIOS to treat it as a SCSI disk. So adaptec's bios may not be the same. Take it easy, Lizhi ============================== Hi All H/W; ABit KT7A-Raid Mob 2 x IBM DTLA-307030 Deskstar disks on ATA100 Raid controller Escalade 6800 multi-port IDE board 4 x same disks on Escalade cdrom on slave ide0 escalade - no raid config disks seen during install as /dev/sda|b|c|d (ide disks seen as scsi in this card is ok) ATA100 on board raid - mirroring /dev/hde to /dev/hdg (ideally) Boot sequence in bios floppy-cdrom-ata100 Problem; OK, install goes fine. However the machine wont boot. If I change the ATA100-Raid to no raid-iness simply 2 drives and specifically tell the install to either put lilo on mbr of boot disk or on /dev/hde leave it as the default location (/dev/hde i guess) it still wont boot. Any ideas would be appreciated. Regards, Stephen. -- Stephen Johnston (NGAST/Beowulf Project) Phone: +49 89 32006563 European Southern Observatory Fax : +49 89 32006380 Karl-Schwarzschild-Strasse 2 D-85748 Garching bei Muenchen http://www.eso.org -- From mars at cic.ipn.mx Tue May 8 12:59:04 2001 From: mars at cic.ipn.mx (Marco A. Ramirez Salinas) Date: Wed Nov 25 01:01:18 2009 Subject: mpi-mandel Message-ID: <000801c0d7f9$59e51db0$b114cc94@cicmars> hi all i building a cluster of work station in my job, but we used mpipro. how can i run mpi-mandel? where i can download the source code of this software (mpi-pro)? thanks all -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010508/984fa65a/attachment.html From Todd_Henderson at readwo.com Wed May 9 07:35:42 2001 From: Todd_Henderson at readwo.com (Todd Henderson) Date: Wed Nov 25 01:01:18 2009 Subject: D-Link switches and network cards References: Message-ID: <3AF955BE.232DA36F@readwo.com> I'm working on building up a small, 4-6 node home Beowulf using Scyld. It seems that D-Link has some pretty cheap switches and NIC's. I'm thinking about the D-Link DSS-8 for a switch and the DFE-530TX for NICS. Anybody have any comments on these and their suppor tunder Scyld? Thanks, Todd From sean at thepark.org.uk Wed May 9 06:00:14 2001 From: sean at thepark.org.uk (Sean Sturley) Date: Wed Nov 25 01:01:18 2009 Subject: troubles with node startup Message-ID: <000501c0d887$fdd5b680$03000064@clothing.cv> I'm attempting to breath life into a mini beowulf cluster. And I have two problems which i can't seem to find any info on. Firstly, I have built my master server its up and running. So far so good. The login header shows as follows: Sclyd Beowulf release 27bz-7 ( based on Redhat 6.2) Kernel 2.2.17-33.beo on an i686 login: so thats what I've installed...... I have made the boot floppies from within X for the nodes. When i boot the nodes ( they have been added to the beosetup configured group) These machines are pentium 233's with 64 meg of ram with a built in lan card and an additional 10/100 netcard (not the best spec in the world I admit - but I have been given 25 of them FOC) on the second reboot (monte?) I get the following message on the slave nodes screens: VM: do_try_to_free_pages failed for kswapd VM: do_try_to_free_pages failed for init VM: do_try_to_free_pages failed for init VM: do_try_to_free_pages failed for tar This would continue until the end of time if allowed. I get this on any of the nodes........ On the master server the beo status monitor shows: cpu0 up(no) avail(no) cpu0(75/100%) mem(4/30) dsk(0/0) net(0/25000kBps) On searching the net I have found info stating that the VM: error (under normal Redhat Linux) can be rectified by upgrading to the latest kernel. Therefore is there a more recent version of the beowulf kernel or can I use a later one from kernel.org ? Secondly, Is there a way to force the order in which the slave nodes boot the network cards. On half of the machine the 10/100 card boots and process the RARP info and on the other half its the on-board 10meg lan card ( They are different motherboards in about 11 of them) Many thanxs Sean From ewporter at rcn.com Tue May 8 20:46:27 2001 From: ewporter at rcn.com (Ed Porter) Date: Wed Nov 25 01:01:18 2009 Subject: Request for pointers to books or articles on how to efficiently program Beowulf clusters Message-ID: <003001c0d83a$a0aeffe0$6401a8c0@micron> What are some good books and good articles on how to write parallel programs to run on Beowulf clusters (particularly relatively large clusters) for a beginner at parallel programming. My particular interest relates to computations on large semantic networks. It would probably be best to start out with some articles which give an overview, since I only know about parallel programming at a very high level of abstraction. I am interested in understanding issues such as: -how threads are spawned and the costs in terms of time and computational resources of spawning a new instance of a thread, -how instances of the same thread communicate to each other or to instances of other threads (how do they know which machines other instance of the same thread may be on, and do they normally communicate through a message queue), -how does a process on one node read and write data into memory stored on another node, -what control mechanisms are used for allocating computational and memory resources between competing processes, -how threads are terminated, -is there a mechanism for suspending a thread or computation on a task until data relative to it has been loaded into memory (for example if data from a list of different places has to be examined, can a prefetch be done for the data associated with each different location, and then can a list be kept of which of those different locations have their data in cache ready to be computed upon, so that CPU can keep busy processing the data that has already been loaded while data for other locations is being loaded into cache, -how large collections of data on hard disk are mapped into memory, and how the system knows which portions of the data are resident on the memories of which machines, -how 32 bit processors are used to address information on machines having a lot more than 2^32 bytes or words of RAM, -how to design parallel programs to run efficiently on Beowulf clusters, -how memory allocation and de-allocation is done, -how memory consistency is maintained when multiple processes on different machines are working on the same data, and -so on. I would be thankful for some good pointers. Particularly to books and/or articles which are good for beginners to parallel programming. In particular I have not programmed much (the last time I wrote a program of any size was over 12 years ago) and I have never used UNIX. So it would be particularly helpful if you could find some articles which don?t require deep programming language or UNIX knowledge. Thanks for your help. Ed Porter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010508/bf2b3ca1/attachment.html From kinghorn at pqs-chem.com Tue May 8 13:04:58 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:18 2009 Subject: Cooling experts out there, some help please References: <20010508165421.999.qmail@web13503.mail.yahoo.com> Message-ID: <3AF8516A.43EDEB2E@pqs-chem.com> Yes, there would be lots of variables and errors ... ... I've "fixed" lots of HOT 1GHz PIII machines by removing the heatsink scrapping off the thermal pad, polishing the surface, applying an even coat of thermal grease and then carefully reattaching the heatsinks The results can be dramatic i.e. going from >65C to <30C as reported by bios temp monitoring ... -Don > I need some help from the cooling experts out there. > What I am interested in a chart that define maximum > internal die temp / Die surface temp / Heat sink temp > for intel processors. I realize there are whole bunch > of variables but I would like to get a rough idea. > The chart could also specify different heatsink > material types CU,Al... > > At the least I would like to see some of the off the > shelf heatsinks that have been through testing that > compare these variables for Intel Processors. > > Thank you ahead of time for you input, > > Kevin > > ===== > Kevin Facinelli > www.colosource.com > webmaster@colosource.com > > __________________________________________________ > Do You Yahoo!? > Yahoo! Auctions - buy the things you want at great prices > http://auctions.yahoo.com/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jacobsgd21 at BrandonU.CA Tue May 8 10:49:04 2001 From: jacobsgd21 at BrandonU.CA (jacobs) Date: Wed Nov 25 01:01:18 2009 Subject: PGI and Scyld Message-ID: <3AF83190.D731BAD4@brandonu.ca> I'm currently involved in a summer project to port some large astrophysics code to a higher performance system from a DEC/VMS Alpha machine. One of the options that is being looked at strongly is using an Intel/Linux cluster. Previously, a small group of us had installed a small test cluster using the Scyld prepackaged software. We built a few (C) test programs, found some good approximations to Pi, etc... Now, however, we are moving on to the real stuff. Unfortunately, the f77 code is not readily compiled by g77, however the PGI suite works just fine. What are some of the trials and tribulations associated with running the PGI compilers over BPROC. Is it worth my time? Should I use clean installs of RedHat (or whatever) onto the slaves, then compile MPICH using the PGI suite, installing all the libraries on the slaves. Is compiling BeoMPI with the PGI tools a problem, or is it less trouble than I think. I have no problem homebrewing my own setup if that's in fact the easiest way to go. thanks, Geoff From bob at frb.gov Thu May 10 08:44:34 2001 From: bob at frb.gov (bob@frb.gov) Date: Wed Nov 25 01:01:18 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <20010510100401.A27267@drzyzgula.org> References: <20010509224913.C25773@drzyzgula.org> <20010510100401.A27267@drzyzgula.org> Message-ID: <20010510114434.A10957@melissa.rsma.frb.gov> It was pointed out to me that I could do the Pentium 4 system a bit more cheaply. I'd missed the fact that Pentium 4 boards are available with 4 RIMM slots, so one can use 256MB RIMMs, which are available at about half the price per MB over the 512MB modules. Thus, it should be possible to do a 1.5GHz Pentium 4 system for a core (CPU+MB+Memory) cost of around $1300 or so. A 1.7GHz system would cost around $100 more than that. Updated tables below. Also, I added SPECfp2000/K$. Fascinating how the Pentium 4 comes out second in each one of these tables... --Bob Sorted in declining order of SPECfp2000: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) Sorted in declining order of SPECint2000: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) Sorted in order of increasing cost: Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) Sorted in declining order of SPECfp2000/K$ Sfp/ Processor MHz L2 Si Sfp core $ K$ Notes ------------------------- ---- ----- --- --- ------ --- ---------- AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) From kinghorn at pqs-chem.com Thu May 10 06:58:25 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:19 2009 Subject: Booting from ATA100 Raid References: <3AFA45D5.E7A52CA3@eso.org> Message-ID: <3AFA9E81.69E1D604@pqs-chem.com> > PS If someone could tell me what the 'LIL-' error means normally that would > help, I will also try to look it up. L BIOS reads first stage boot program from MBR LI Using BIOS disk geometry read second stage boot program in /boot/boot.b LIL third stage from /boot/boot.b ... still using BIOS services LILO give boot prompt to load kernel -Don From lindahl at conservativecomputer.com Mon May 21 10:34:04 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:19 2009 Subject: Dual P4 STREAM results In-Reply-To: <3B03E319.A01EE9B1@aeolusresearch.com>; from mprinkey@aeolusresearch.com on Thu, May 17, 2001 at 10:41:29AM -0400 References: <3B03E319.A01EE9B1@aeolusresearch.com> Message-ID: <20010521133404.E1550@wumpus.dhcp.fnal.gov> Now that dual "P4 Xeons" have been announced, using the same chipset (860) as existing single-cpu P4's, everyone can learn that the STREAM result for the dual system is basically the same as the STREAM result for the single cpu, i.e. about 1.5 GB/s total. Dang. I also saw an announcement that Serverworks has a 4-cpu chipset coming out that will use 4 DDR SDRAM banks to give about twice the theoretical bandwidth as the 860, for twice the cpus. Hopefully that ratio holds true for STREAM, and then this would be the first non-Intel chipset with great bandwidth. Unfortunately it's a server chipset and will no doubt be expensive, but the disease is spreading.... oh, and no, you don't *need* rambus to get the great bandwidth. -- greg From bvds at bvds.geneva.edu Mon May 21 10:40:37 2001 From: bvds at bvds.geneva.edu (bvds@bvds.geneva.edu) Date: Wed Nov 25 01:01:19 2009 Subject: The Beowulf Archives Message-ID: <200105211740.NAA00957@bvds.geneva.edu> I have found the archives of this list at http://www.beowulf.org/pipermail/beowulf/ to be very useful. But it seems that they stopped a few months ago... Does anyone know about this? (I had no success contacting beowulf-admin@beowulf.org.) Brett van de Sande From joelja at darkwing.uoregon.edu Mon May 21 10:42:48 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B03E319.A01EE9B1@aeolusresearch.com> Message-ID: well. now that dual p4 have been announced... you can definently get a p4 power supply and reasonably low profile heatsinks in 3.5" (2u) joelja On Thu, 17 May 2001, Michael T. Prinkey wrote: > Hello, > > I have seen a few clusters assembled with 1U AMD Athlon systems. Has > anyone seen/built/melted down a 1U P4 system? Cooling is of course an > issue, as it is with the AMDs. Another is the availability of low > profile power supplies with the extra P4 power connector and sufficient > wattage rating. > > Thanks, > > Mike Prinkey > Aeolus Research, Inc. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From jakob at unthought.net Mon May 21 10:44:11 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:19 2009 Subject: MPI or PVM enabled jre? In-Reply-To: <3B092FA1.F851DB65@aeolusresearch.com>; from mprinkey@aeolusresearch.com on Mon, May 21, 2001 at 11:09:21AM -0400 References: <3B092FA1.F851DB65@aeolusresearch.com> Message-ID: <20010521194411.B29415@unthought.net> On Mon, May 21, 2001 at 11:09:21AM -0400, Michael T. Prinkey wrote: > I certainly wouldn't want to speak for the entire community, but I think > that most of us are just now crawling out of the FORTRAN days. The next > step is to C, and not even to C++. Experience has borne out the > performance advantages of "low-tech" languages like FORTRAN and C for > intense number crunching. The performance of object-oriented languages > in general and Java in particular are suspect for the types of problems > that typically require high-performance parallel hardware. Object orientation has nothing to do with it. What so ever. Java has garbage collection, and unless you can switch that off somehow you will have unpredictable performance patterns of your code. That would be a show-stopper for many codes. Also, I don't know if the java compilers are as fast as good C/C++/Fortran compilers. Another concern may be that the Java language forces the compiler to do less-than-optimal operations on data - I don't know if this is a problem though. The language has a lot to say about the code that an optimal compiler can possibly produce. I think one of the reasons we don't see computational utility libraries (or whatever you prefer to call them) for Java is, that the language offers very little for the scientist. It forces everything into an object-oriented paradigm, while many (most?) computational codes don't fit in there at all. In my oppinion that's the biggest drawback of the language - it significantly limits the set of problems you want to solve with it, when you have a choice. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From timothy.g.mattson at intel.com Mon May 21 10:59:57 2001 From: timothy.g.mattson at intel.com (Mattson, Timothy G) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems Message-ID: It is possible to get Pentium IV in a 1u package from Racksaver. http://www.racksaver.com/products/RS1118.asp Since I work for Intel, I had better be real explicit with the disclaimers on this one. I am not speaking for my employer or personally endorsing this product. I have not validated this product for cluster computing. I have, however, spoken to very enthusiastic customers of 1U Pentium IV server. They are using it to build some pretty hefty clusters. --Tim -----Original Message----- From: Michael T. Prinkey [mailto:mprinkey@aeolusresearch.com] Sent: Thursday, May 17, 2001 7:41 AM To: beowulf@beowulf.org Subject: 1U P4 Systems Hello, I have seen a few clusters assembled with 1U AMD Athlon systems. Has anyone seen/built/melted down a 1U P4 system? Cooling is of course an issue, as it is with the AMDs. Another is the availability of low profile power supplies with the extra P4 power connector and sufficient wattage rating. Thanks, Mike Prinkey Aeolus Research, Inc. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at conservativecomputer.com Mon May 21 11:07:01 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:19 2009 Subject: interesting Athlon/P4 discussion from FreeBSD-Q-l In-Reply-To: <000701c0da2f$d5b634a0$03e2cbd8@sherline.net>; from jgowdy@home.com on Fri, May 11, 2001 at 08:34:14AM -0700 References: <3AFBEDFA.20503@icantbelieveimdoingthis.com> <000701c0da2f$d5b634a0$03e2cbd8@sherline.net> Message-ID: <20010521140701.A1785@wumpus.dhcp.fnal.gov> On Fri, May 11, 2001 at 08:34:14AM -0700, Jeremiah Gowdy wrote: > Optimizing current 32bit applications > especially for the P4 and its RDRAM is nonsensical. It is? There are 2 issues: SSE2 instructions, and coping with wider cache lines. SSE2 instructions will be in AMD 64-bit cores, and "optimizing" for SSE2 isn't so hard: get a compiler that supports it well. A few people write assembler these days, but they generally know when they're wasting their time. Cache lines have been getting longer for decades, and will get longer in the future. So where's the wasted effort? -- g From bari at onelabs.com Mon May 21 11:06:05 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems References: <3B03E319.A01EE9B1@aeolusresearch.com> Message-ID: <3B09590D.1050402@onelabs.com> Michael T. Prinkey wrote: > Hello, > > I have seen a few clusters assembled with 1U AMD Athlon systems. Has > anyone seen/built/melted down a 1U P4 system? Cooling is of course an > issue, as it is with the AMDs. Another is the availability of low > profile power supplies with the extra P4 power connector and sufficient > wattage rating. > > Thanks, > > Mike Prinkey > Aeolus Research, Inc. How many would you need? We've looked at making multi P4/ 1U mainboards (4-8 CPUs per 1U) but we have seen very little interest in it as compared to IA-64, Athlon-4, ULV-PIII and Mips designs. 1 P4 per 1U would be simple to make. 8 - 16 ULV P-III or Athlon 4 with DDR per 1U look more attractive. Rambus is another unwelcome component of a current P4 design until the DDR chipsets are ready. Bari Ari email: bari@onelabs.com O.N.E. Technologies 1505 Old Deerfield Road tel: 773-252-9607 Highland Park, IL 60035 fax: 773-252-9604 http://www.onelabs.com From sgaudet at angstrommicro.com Mon May 21 11:27:18 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B03E319.A01EE9B1@aeolusresearch.com> References: <3B03E319.A01EE9B1@aeolusresearch.com> Message-ID: <990469638.3b095e06d1f9c@localhost> Hello Michael, > I have seen a few clusters assembled with 1U AMD Athlon systems. Has > anyone seen/built/melted down a 1U P4 system? Cooling is of course an > issue, as it is with the AMDs. Another is the availability of low > profile power supplies with the extra P4 power connector and sufficient > wattage rating. We have a dual AMD based motherboard inhouse and currently going through testing procedures for heat, power supply and case designs. Products based on the 1u dual AMD "should" be available early next month. However, there are issues related to the power supply and cooling that will require some time. In regards to the P4, there are a few case companies that have a single processor 2U version also due out in June. Again, issues with power supplies (availability) and adequate cooling. Based on the current P4 die won't see the P4 in anything smaller than a 2u. So some time next month should see both offerings. Cheers, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From rgb at phy.duke.edu Mon May 21 11:29:26 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:19 2009 Subject: Request for pointers to books or articles on how to efficiently program Beowulf clusters In-Reply-To: <003001c0d83a$a0aeffe0$6401a8c0@micron> Message-ID: On Tue, 8 May 2001, Ed Porter wrote: > > What are some good books and good articles on how to write parallel programs to run on Beowulf clusters (particularly relatively large clusters) for a beginner at parallel programming. My particular interest relates to computations on large semantic networks. The horse's mouth (so to speak;-) is Sterlin, Salmon, Becker and Savarese, "How to build a beowulf cluster", MIT press. In the same MIT series are excellent primers on PVM and MPI. Between them they answer many if not all of your questions, although perhaps not completely (you have a lot of questions!). You can also look over some of the resources (talks, books, papers) on or linked to the brahma site: http://www.phy.duke.edu/brahma the most notable ones being the FAQ and HOWTO and the beowulf underground site (which has links to still more resources). Understand, though, that there are multiple answers to many of your questions below, and whether or not you WANT to get very technical, things like: > -how threads are spawned and the costs in terms of time and > computational resources of spawning a new instance of a thread, > > -how instances of the same thread communicate to each other or to > instances of other threads (how do they know which machines other > instance of the same thread may be on, and do they normally communicate > through a message queue), > > -how does a process on one node read and write data into memory stored > on another node, etc. are VERY deep questions with very technical answers indeed, and those answers vary depending on the kind of parallel programming library or mechanism (e.g. PVM, MPI, DIPC, raw sockets) that is chosen. This applies to pretty much all of the questions below as well. > -what control mechanisms are used for allocating computational and > memory resources between competing processes, > > -how threads are terminated, > > -is there a mechanism for suspending a thread or computation on a task > until data relative to it has been loaded into memory (for example if > data from a list of different places has to be examined, can a prefetch > be done for the data associated with each different location, and then > can a list be kept of which of those different locations have their data > in cache ready to be computed upon, so that CPU can keep busy processing > the data that has already been loaded while data for other locations is > being loaded into cache, > > -how large collections of data on hard disk are mapped into memory, > and how the system knows which portions of the data are resident on the > memories of which machines, > > -how 32 bit processors are used to address information on machines > having a lot more than 2^32 bytes or words of RAM, > > -how to design parallel programs to run efficiently on Beowulf > clusters, > > -how memory allocation and de-allocation is done, > > -how memory consistency is maintained when multiple processes on > different machines are working on the same data, and > > -so on. > So it would be particularly helpful if you could find some articles > which don't require deep programming language or UNIX knowledge. Yeah, doubt that I or anyone can help you there, although a lot of the conceptual stuff doesn't require a deep technical knowledge. However, things like memory allocation and de-allocation and how memory consistency is maintained require a pretty fair knowledge of cluster computing to even know what the various acronym's you'll encounter stand for. You might also want to look at Greg Pfister's "In Search of Clusters" book; it is a bit dated (what isn't that is older than 1 year at this point?:-) but still a great book. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From bari at onelabs.com Mon May 21 11:28:49 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:19 2009 Subject: MPI or PVM enabled jre? References: Message-ID: <3B095E61.1050600@onelabs.com> Joshua T. Klobe wrote: > As a junior in college trying to devise a useful and interesting senior > project, I was wondering why it seems that there is no java support for > MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are > more than welcome. > -Josh Klobe > It certainly would be nice to see Java running on large/fast clusters. It's the only way to get Java apps to perform at an acceptable rate :-) Bari Ari From lindahl at conservativecomputer.com Mon May 21 11:35:17 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:19 2009 Subject: Request for pointers to books or articles on how to efficiently program Beowulf clusters In-Reply-To: <003001c0d83a$a0aeffe0$6401a8c0@micron>; from ewporter@rcn.com on Tue, May 08, 2001 at 08:46:27PM -0700 References: <003001c0d83a$a0aeffe0$6401a8c0@micron> Message-ID: <20010521143517.A1953@wumpus.dhcp.fnal.gov> On Tue, May 08, 2001 at 08:46:27PM -0700, Ed Porter wrote: > What are some good books and good articles on how to write parallel > programs to run on Beowulf clusters (particularly relatively large > clusters) for a beginner at parallel programming. My particular > interest relates to computations on large semantic networks. From lindahl at conservativecomputer.com Mon May 21 11:43:02 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:19 2009 Subject: PGI and Scyld In-Reply-To: <3AF83190.D731BAD4@brandonu.ca>; from jacobsgd21@BrandonU.CA on Tue, May 08, 2001 at 12:49:04PM -0500 References: <3AF83190.D731BAD4@brandonu.ca> Message-ID: <20010521144302.A1993@wumpus.dhcp.fnal.gov> On Tue, May 08, 2001 at 12:49:04PM -0500, jacobs wrote: > Now, however, we are moving on to the real stuff. Unfortunately, the > f77 code is not readily compiled by g77, however the PGI suite works > just fine. g77 does compile *most* f77 extensions and weirdnesses, you may not be picking the right flags. But PGI is a good choice, just because it's probably quite a bit faster. And with a cluster, faster means fewer nodes which saves money. > What are some of the trials and tribulations associated with running the > PGI compilers over BPROC. Why not compile on the master? Surely compilation time is much less than run time? -- g From Dean.Carpenter at pharma.com Mon May 21 12:25:46 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:19 2009 Subject: MPI or PVM enabled jre? Message-ID: <759FC8B57540D311B14E00902727A0C002EC491D@a1mbx01.pharma.com> I can certainly see this as valuable in a prototyping stage. Getting the algorithms, concepts, what-have-you working quickly and easily. Then write it in a high performance language. Being able to slap something together really quickly to test MPI or PVM for your application would be nice. -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Michael T. Prinkey [mailto:mprinkey@aeolusresearch.com] Sent: Monday, May 21, 2001 11:09 AM To: beowulf@beowulf.org Subject: Re: MPI or PVM enabled jre? I certainly wouldn't want to speak for the entire community, but I think that most of us are just now crawling out of the FORTRAN days. The next step is to C, and not even to C++. Experience has borne out the performance advantages of "low-tech" languages like FORTRAN and C for intense number crunching. The performance of object-oriented languages in general and Java in particular are suspect for the types of problems that typically require high-performance parallel hardware. Mike Prinkey Aeolus Research, Inc. "Joshua T. Klobe" wrote: > > As a junior in college trying to devise a useful and interesting senior > project, I was wondering why it seems that there is no java support for > MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are > more than welcome. > -Josh Klobe From agrajag at scyld.com Mon May 21 13:11:40 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:19 2009 Subject: troubles with node startup In-Reply-To: <000501c0d887$fdd5b680$03000064@clothing.cv>; from sean@thepark.org.uk on Wed, May 09, 2001 at 02:00:14PM +0100 References: <000501c0d887$fdd5b680$03000064@clothing.cv> Message-ID: <20010521161140.B11198@blueraja.scyld.com> On Wed, 09 May 2001, Sean Sturley wrote: > On searching the net I have found info stating that the VM: error (under > normal Redhat Linux) can be rectified by upgrading to the latest kernel. > Therefore is there a more recent version of the beowulf kernel or can I use > a later one from kernel.org ? We are currently working on a new release that includes a newer kernel (one new enough to fix your problem). If you are interested, we would be interested in talking to you about beta testing this new release. > > Secondly, > Is there a way to force the order in which the slave nodes boot the network > cards. On half of the machine the 10/100 card boots and process the RARP > info > and on the other half its the on-board 10meg lan card ( They are different > motherboards in about 11 of them) It should be using all of your network cards to send out the RARP requests. From alvin at Mail.Linux-Consulting.com Mon May 21 13:06:18 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: KVM Switch In-Reply-To: <3AFD332B.949D3222@cs.tamu.edu> Message-ID: hi ... it depends on the motherboard and the PS/2 mouse... some mouse wont hang the motherboard when the mouse is unplugged... from whatw we've seen... unplugging can also be switching between mb with KVMs c ya alvin http://www.Linux-1U.net On Sat, 12 May 2001, Gerry Creager N5JXS wrote: > We/ve been using the Belkin switch in our operations, both for my > cluster and for a lot of other departmental stacks of PCs. They seem to > be just about bulletproof, but are certainly not the cheapeest ones out > there. they seem to be less suspceptible > > > > Yes, I found this out the hard way. Abit VP6 hangs if you unplug/plug > > PS/2 mouse. > From alvin at Mail.Linux-Consulting.com Mon May 21 13:12:33 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: Cooling experts out there, some help please In-Reply-To: <3AF8516A.43EDEB2E@pqs-chem.com> Message-ID: hi ya.. yes... thermal compound/grease seem to work better than the thermal tape ... make sure too that there s about 0.5"-1" of clear air space for the heatsink fan... I'd also want to see if water-cooled copper reservoirs would be a better cooling method than huge heatsinks We have to get all that to fit into our 1U chassis other variable affecting cpu temperature... - load of the cpu...at the time of the measurements - copper will always be better than aluminnum - painting it balck seems to make some difference - whether the block of aluminnum/copper was cut or extruded to make the heatsink fins/towers also is very important... - airflow and ambient temperature also affects the cpu temp have fun alvin http://www.Linux-1U.net On Tue, 8 May 2001, Donald B. Kinghorn wrote: > Yes, there would be lots of variables and errors ... > ... I've "fixed" lots of HOT 1GHz PIII machines by removing the heatsink > scrapping off the thermal pad, polishing the surface, applying an even coat > of thermal grease and then carefully reattaching the heatsinks > The results can be dramatic i.e. going from >65C to <30C as reported > by bios temp monitoring ... > -Don > > > I need some help from the cooling experts out there. > > What I am interested in a chart that define maximum > > internal die temp / Die surface temp / Heat sink temp > > for intel processors. I realize there are whole bunch > > of variables but I would like to get a rough idea. > > The chart could also specify different heatsink > > material types CU,Al... > > > > At the least I would like to see some of the off the > > shelf heatsinks that have been through testing that > > compare these variables for Intel Processors. > > > > Thank you ahead of time for you input, > > > > Kevin > > > > ===== > > Kevin Facinelli > > www.colosource.com > > webmaster@colosource.com > > > > __________________________________________________ > > Do You Yahoo!? > > Yahoo! Auctions - buy the things you want at great prices > > http://auctions.yahoo.com/ > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Mon May 21 13:14:45 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: D-Link switches and network cards In-Reply-To: <3AF955BE.232DA36F@readwo.com> Message-ID: hi ya todd the ($15) dfe-530tx are bad nics...they tend to die/falter when there is heavy network traffic.... tulip based nic cards tend to be better ... netgear ($25) FA310/fa311 and intel eepro100 based nic cards ( $45) intel pila8460B or better... have fun alvin On Wed, 9 May 2001, Todd Henderson wrote: > I'm working on building up a small, 4-6 node home Beowulf using Scyld. It seems that D-Link has some pretty > cheap switches and NIC's. I'm thinking about the D-Link DSS-8 for a switch and the DFE-530TX for NICS. > > Anybody have any comments on these and their suppor tunder Scyld? > > Thanks, > Todd > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Mon May 21 13:17:03 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: Message-ID: hi all the largest 1U power supply is about 300Watt 1U power supply... and it is not interchangeable with the 150/180/220 watt 1U powersupplies heat dissipation of the cpu is a bigger issue have fun alvin http://www.Linux-1U.net On Mon, 21 May 2001, Joel Jaeggli wrote: > well. now that dual p4 have been announced... you can definently get a p4 > power supply and reasonably low profile heatsinks in 3.5" (2u) > > joelja > > On Thu, 17 May 2001, Michael T. Prinkey wrote: > > > Hello, > > > > I have seen a few clusters assembled with 1U AMD Athlon systems. Has > > anyone seen/built/melted down a 1U P4 system? Cooling is of course an > > issue, as it is with the AMDs. Another is the availability of low > > profile power supplies with the extra P4 power connector and sufficient > > wattage rating. > > > > Thanks, > > > > Mike Prinkey > > Aeolus Research, Inc. > > From alvin at Mail.Linux-Consulting.com Mon May 21 13:26:05 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B09590D.1050402@onelabs.com> Message-ID: hi bari the major problem, the only problem is cooling.... kinda hard to fit 2"-3" heatsinks into 1" of available space on/above the motherboard..... dont know what happens to the 3yr cpu warranty when you throw away the manufacturers heatsink in favor of a 0.25" heatsink/fan combo that keep the p3 cpu just as cool... if the cooling problem of the P4/athlon cpu is solved, than one can ship p4-based 1U systems... have fun alvin http://www.Linux-1U.net On Mon, 21 May 2001, Bari Ari wrote: > Michael T. Prinkey wrote: > > > Hello, > > > > I have seen a few clusters assembled with 1U AMD Athlon systems. Has > > anyone seen/built/melted down a 1U P4 system? Cooling is of course an > > issue, as it is with the AMDs. Another is the availability of low > > profile power supplies with the extra P4 power connector and sufficient > > wattage rating. > > > > Thanks, > > > > Mike Prinkey > > Aeolus Research, Inc. > > How many would you need? We've looked at making multi P4/ 1U mainboards > (4-8 CPUs per 1U) but we have seen very little interest in it as > compared to IA-64, Athlon-4, ULV-PIII and Mips designs. 1 P4 per 1U > would be simple to make. 8 - 16 ULV P-III or Athlon 4 with DDR per 1U > look more attractive. Rambus is another unwelcome component of a current > P4 design until the DDR chipsets are ready. > > Bari Ari email: bari@onelabs.com > > O.N.E. Technologies > 1505 Old Deerfield Road tel: 773-252-9607 > Highland Park, IL 60035 fax: 773-252-9604 > http://www.onelabs.com > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From RSchilling at affiliatedhealth.org Mon May 21 13:37:34 2001 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Wed Nov 25 01:01:19 2009 Subject: FY;) [Fwd: furbeowulf cluster] Message-ID: <51FCCCF0C130D211BE550008C724149E0116560C@mail1.affiliatedhealth.org> > -----Original Message----- > From: Robert G. Brown [mailto:rgb@phy.duke.edu] > Subject: Re: FY;) [Fwd: furbeowulf cluster] [snip] > > > http://www.trygve.com/furbeowulf.html > [snip] > "No, it died. I got behind in feeding all the nodes today and it > starved to death. Gotta pen? I have to press the reset button and > start feeding the next one or I'll have a mass extinction event..." > I don't think it would complain about not being played enough with though - not with this crowd :^) You just know the Microsoft Barney doll Beowulf comes out next. Or perhaps the Sony walking dogs will walk around in packs and self configure themselves into a 'wulf cluster. Would that be the mobile Beowulf, then? -- Richard Schilling From josip at icase.edu Mon May 21 13:53:48 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems References: <3B03E319.A01EE9B1@aeolusresearch.com> <990469638.3b095e06d1f9c@localhost> Message-ID: <3B09805C.C125E7C@icase.edu> Steve Gaudet wrote: > > Based on the current P4 die won't see the P4 in anything smaller than a 2u. Except for the P4/1.4 1U RackSaver link posted by Tim... However, cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 small case fans (> 40 cfm total), and a 250W ATX12V power supply. I understand the need to be space efficient, but I'd feel more confident about cooling with larger cases... Sincerely, Josip P.S. RackSaver uses P4/1.4 which can dissipate 51.8W. By contrast, the P4/1.7 can dissipate 64.0W in normal use, i.e. more than a fairly large soldering iron... BTW, this 64.0W is Intel's "thermal design point" for the P4/1.7 processor. Their "thermal design point" is based on 75% of the maximum power dissipation (which could reach 85.3W, lead to internal hot spots, etc.). Intel claims that most popular applications (including SPEC benchmarks) fall below this 75%. However, ATLAS and some other cache optimized codes can seriously stress this assumption... Excellent cooling is essential when doing very intense computations. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From dwalker at erim-int.com Mon May 21 13:54:32 2001 From: dwalker at erim-int.com (David T. Walker) Date: Wed Nov 25 01:01:19 2009 Subject: PGI and Scyld In-Reply-To: <3AF83190.D731BAD4@brandonu.ca> References: <3AF83190.D731BAD4@brandonu.ca> Message-ID: We are currently running PGI Fortran on 16 node cluster under Scyld. As I recall, the way we did it was to install the Scyld 27BZ-7 distribution. We then installed the Portland compilers using Portland's installation procedure on the front-end node. We then used the BeoMPI rpm from Scyld to rebuild BeoMPI using the Portland compilers. The approach for building BeoMPI is to first patch the MPICH 1.2.0 distribution to get the BeoMPI mods, and then build it. This is accomplished with a Makefile located in the beompi-1.0.14 directory. We made the following changes to the Makefile to build BeoMPI using the PGI compilers (the first two lines that are commented out are from the original Makefile): ... mpich-1.2.0: .patch # cd mpich-1.2.0/ && BINDIR="" LIBDIR="" ./configure --with-device=ch_p4 --lib=-lbproc -rsh=/bin/true # cd mpich-1.2.0/ && BINDIR="" LIBDIR="" make # # For building with PGI compilers # cd mpich-1.2.0/ && BINDIR="" LIBDIR="" ./configure --with-device=ch_p4\ --lib=-lbproc \ -rsh=/bin/true \ -c++=pgCC \ -cc=pgcc \ -fc=pgf77 \ -cflags="-Msignextend -tp px -DUSE_U_INT_FOR_XDR -DHAVE_RPC_RPC_H=1" \ -opt=-fast \ -fflags="-tp px" \ -c++flags="-tp px" \ -f90flags="-tp px" \ -f90=pgf90 \ -prefix=/usr/local/mpich \ -comm=shared cd mpich-1.2.0/ && BINDIR="" LIBDIR="" sed -e 's@MPIR_HOME = .*$@MPIR_HOME = $$\\PGI/linux86@' Makefile > Makefile.pgi cd mpich-1.2.0/ && BINDIR="" LIBDIR="" make clean cd mpich-1.2.0/ && BINDIR="" LIBDIR="" make mpi # ... This assumes that $PGI is the path to the PGI compilers (set as recommended by Portland). It also builds for a generic Pentium (-tp px). Portland has on its web site instructions for building MPICH from scratch using the PGI compilers. The above modifications to the Scyld makefile were derived from those instructions. >I'm currently involved in a summer project to port some large >astrophysics code to a higher performance system from a DEC/VMS Alpha >machine. One of the options that is being looked at strongly is using >an Intel/Linux cluster. > >Previously, a small group of us had installed a small test cluster using >the Scyld prepackaged software. We built a few (C) test programs, found >some good approximations to Pi, etc... > >Now, however, we are moving on to the real stuff. Unfortunately, the >f77 code is not readily compiled by g77, however the PGI suite works >just fine. > >What are some of the trials and tribulations associated with running the >PGI compilers over BPROC. Is it worth my time? Should I use clean >installs of RedHat (or whatever) onto the slaves, then compile MPICH >using the PGI suite, installing all the libraries on the slaves. Is >compiling BeoMPI with the PGI tools a problem, or is it less trouble >than I think. > >I have no problem homebrewing my own setup if that's in fact the easiest >way to go. > >thanks, >Geoff > > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at conservativecomputer.com Mon May 21 14:11:16 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B09805C.C125E7C@icase.edu>; from josip@icase.edu on Mon, May 21, 2001 at 04:53:48PM -0400 References: <3B03E319.A01EE9B1@aeolusresearch.com> <990469638.3b095e06d1f9c@localhost> <3B09805C.C125E7C@icase.edu> Message-ID: <20010521171116.A2510@wumpus.dhcp.fnal.gov> On Mon, May 21, 2001 at 04:53:48PM -0400, Josip Loncaric wrote: > Except for the P4/1.4 1U RackSaver link posted by Tim... However, > cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 > small case fans (> 40 cfm total), and a 250W ATX12V power supply. I > understand the need to be space efficient, but I'd feel more confident > about cooling with larger cases... There's an existing example for people wondering about that: 1U single cpu alphas. No, it's not normal, but it has been solved by those other guys, at a bit higher wattage. -- g [ I'm not counting the 1U API CS20 dual alpha until they've been in the field in enough numbers that we know the cooling works ;-) ] From toon at moene.indiv.nluug.nl Mon May 21 11:25:49 2001 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Wed Nov 25 01:01:19 2009 Subject: MPI or PVM enabled jre? References: <3B092FA1.F851DB65@aeolusresearch.com> Message-ID: <3B095DAD.45BA1365@moene.indiv.nluug.nl> "Michael T. Prinkey" wrote: > I certainly wouldn't want to speak for the entire community, but I think > that most of us are just now crawling out of the FORTRAN days. When people say this, often - when asked about particulars - they expound on the, uhhh, non-virtues of FORTRAN 66. For the benefit of this mailing list I'd like to emphasize that since those good ol' days, three more Fortran Standards have been produced (and are actively tracked in most compilers): Fortran 77, Fortran 90 and Fortran 95. :-) -- Toon Moene - mailto:toon@moene.indiv.nluug.nl - phoneto: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html Join GNU Fortran 95: http://g95.sourceforge.net/ (under construction) From agrajag at scyld.com Mon May 21 14:30:18 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:19 2009 Subject: The Beowulf Archives In-Reply-To: <200105211740.NAA00957@bvds.geneva.edu>; from bvds@bvds.geneva.edu on Mon, May 21, 2001 at 01:40:37PM -0400 References: <200105211740.NAA00957@bvds.geneva.edu> Message-ID: <20010521173018.C11198@blueraja.scyld.com> On Mon, 21 May 2001, bvds@bvds.geneva.edu wrote: > > I have found the archives of this list at > > http://www.beowulf.org/pipermail/beowulf/ > > to be very useful. But it seems that they stopped a few months > ago... The archives are now working again. A few months were lost, but all future messages should be archiving now. From Jim.Morton at aventis.com Mon May 21 13:00:23 2001 From: Jim.Morton at aventis.com (Jim.Morton@aventis.com) Date: Wed Nov 25 01:01:19 2009 Subject: scyld / SCSI Message-ID: <55396F6C3DA1D411988200508BD949941824EE@SCLSMXSUSR01> I am interested in installing Scyld Beowulf on a cluster including VALinux 501 computers with SCSI drives. It seems that the bootable kernel on the CD does not support the SCSI controller on ther 501 mobo, has anyone worked through the patches/hacks to make this work? It seems to me that I should be able to compile a kernel with the appropriate SCSI support and burn a new CD with the new kernel in place of the old one, but otherwise basically the same as the Scyld distribution CD. Comments / Suggestions welcome! Jim Morton From rgb at phy.duke.edu Mon May 21 15:19:52 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:19 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <20010510114434.A10957@melissa.rsma.frb.gov> Message-ID: On Thu, 10 May 2001 bob@frb.gov wrote: > It was pointed out to me that I could do the Pentium 4 > system a bit more cheaply. I'd missed the fact that Pentium > 4 boards are available with 4 RIMM slots, so one can use > 256MB RIMMs, which are available at about half the price > per MB over the 512MB modules. Thus, it should be possible It's worth noting that if one compares 768 MB core systems the relative position of the Tbird/DDR system significantly as its speed doesn't change (except for codes big enough to fill memory, of course) the 1.33 GHz Athlon with DDR moves up to 560-600, and the price of DDR seems to be rapidly falling if not quite in free fall. The killer is those 512 MB DDR DIMMS and the lack of Athlon/DDR motherboards with more than 3 slots... This isn't to complain about your ranking -- I think it is very useful and one does have to create some sort of standard in order to compare price performance. It is just to note that if one's application DOESN'T use a GB of core the price/performance rankings can signficantly change because of the nonlinearities in the cost space. > to do a 1.5GHz Pentium 4 system for a core (CPU+MB+Memory) > cost of around $1300 or so. A 1.7GHz system would cost > around $100 more than that. Updated tables below. Also, > I added SPECfp2000/K$. Fascinating how the Pentium 4 comes > out second in each one of these tables... > > --Bob > > Sorted in declining order of SPECfp2000: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > Sorted in declining order of SPECint2000: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > Sorted in order of increasing cost: > > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > > Sorted in declining order of SPECfp2000/K$ > Sfp/ > Processor MHz L2 Si Sfp core $ K$ Notes > ------------------------- ---- ----- --- --- ------ --- ---------- > AMD Athlon (Thunderbird) 1300 256 491 374 520 719 (A7V, PC133 SDRAM) > Pentium 4 1700 256 586 608 1,400 434 (D850GB, RDRAM) > AMD Athlon (Thunderbird) 1330 256 539 445 2,000 223 (GA7DX, DDR SDRAM) > Pentium III (Coppermine) 1000 256 428 314 1,900 165 (VC820, RDRAM) > Alpha (21264) 833 8192 533 644 9,000 72 (UP2000+, est) > UltraSPARC III 750 8192 395 421 8,500 50 (Ocelot) > PA-8700 750 N/A 603 581 14,000 42 (HP J6700, 2304KB L1) > UltraSPARC II 480 8192 234 291 10,000 29 (AXdp) > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From erayo at cs.bilkent.edu.tr Mon May 21 15:14:12 2001 From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa)) Date: Wed Nov 25 01:01:19 2009 Subject: MPI or PVM enabled jre? In-Reply-To: References: Message-ID: <01052201141200.07325@orion> On Monday 21 May 2001 17:36, Joshua T. Klobe wrote: > As a junior in college trying to devise a useful and interesting senior > project, I was wondering why it seems that there is no java support for > MPI or PVM enviroments? Why has it stopped with c+? Any thoughts are > more than welcome. Speed. Message passing libs are all about HPC. In a paper I read an MPI implementation for java was shown to be 10-20 times slower than C. Which means that you can find a Java MPI implementation but would not really be worth using. There are indeed people who do parallel programming with Java, but as you can guess that serves only pedagogic purposes (from the viewpoint of HPC), like Java itself (sorry couldn't resist) Thanks, From alvin at Mail.Linux-Consulting.com Mon May 21 15:23:26 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B09805C.C125E7C@icase.edu> Message-ID: hi josip.. you are right in your concerns... teh itty-bitty 40mm fans are barely enough to keep the 1U chassis cool....even for a P3....amd and p4 requires lots more cooling capacity... amd athlon/duran require 250W power supply ??... Intel P3 system can get away with 150W power supply... larger cases ( deeper ) does NOT provide any cooling improvements... you have to move the air around... and the problem is the atx connectors in the back prevents air flow.... front to back...or side to side is okay too all of this is in 1U cases... we have lateral/squirl cage fans that we are considering for our next generation P4 based 1U chassis ... thanx alvin http://www.Linux-1U.net On Mon, 21 May 2001, Josip Loncaric wrote: > Steve Gaudet wrote: > > > > Based on the current P4 die won't see the P4 in anything smaller than a 2u. > > Except for the P4/1.4 1U RackSaver link posted by Tim... However, > cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 > small case fans (> 40 cfm total), and a 250W ATX12V power supply. I > understand the need to be space efficient, but I'd feel more confident > about cooling with larger cases... > > Sincerely, > Josip > > P.S. RackSaver uses P4/1.4 which can dissipate 51.8W. By contrast, the > P4/1.7 can dissipate 64.0W in normal use, i.e. more than a fairly large > soldering iron... BTW, this 64.0W is Intel's "thermal design point" for > the P4/1.7 processor. Their "thermal design point" is based on 75% of > the maximum power dissipation (which could reach 85.3W, lead to internal > hot spots, etc.). Intel claims that most popular applications > (including SPEC benchmarks) fall below this 75%. However, ATLAS and > some other cache optimized codes can seriously stress this > assumption... Excellent cooling is essential when doing very intense > computations. > > -- > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Mon May 21 15:31:32 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: Cooling experts out there, some help please In-Reply-To: <87k83qattr.fsf@uwo.ca> Message-ID: hi dan our system that is constantly measuring temps.. it ranges around 38C ... loaded or unloaded... varies with day and night... though... ( they turn off air conditioning :-) have not dealt with dual cpu and i2c... for heating/cooling experiment... you;d basically have to be willing to burn up a cpu ... because if it gets too hot, the cpu will die in mysterious ways.. yoy can also blow cigarette smoke to see the airflow... and disconnect fans that does NOT affect airflow - you'd need a clear plastic top to see it have fun alvin On 9 May 2001, Dan Christensen wrote: > writes: > > > I'd add i2c to one of the systems and start a cpu > > temperature tests...over several days/weeks... > > - an idle cpu will give you "ambient cpu temperature" > > I just tried lm-sensors/i2c on a dual processor PIII system and it > reports that the cpu temps are +2C when idle and around 16C when > loaded. Obviously these are wrong. The question is, are they > usually wrong by a constant amount? So can I assume that my > cpu's are at something like 32C when idle and 48C when loaded? > > I want to know this because I want to disconnect some of the eight > fans that the retailer put into this box! Seems a bit excessive > to me (and noisy) but I don't want to mess with it without knowing > for sure whether they are needed. > > Anyone have any good recommendations for quiet fans? Or fans > that only switch on when things get hot? > > Dan > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From joelja at darkwing.uoregon.edu Mon May 21 16:26:19 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:19 2009 Subject: Cooling experts out there, some help please In-Reply-To: <87k83qattr.fsf@uwo.ca> Message-ID: http://www.netroedge.com/~lm78/cvs/browse.cgi/lm_sensors2/doc/FAQ > I just tried lm-sensors/i2c on a dual processor PIII system and it > reports that the cpu temps are +2C when idle and around 16C when > loaded. Obviously these are wrong. The question is, are they > usually wrong by a constant amount? So can I assume that my > cpu's are at something like 32C when idle and 48C when loaded? it's likely that the value is offset. I'd take a look at the value in the bios and figure out what the offset between that value and the idle value is. > I want to know this because I want to disconnect some of the eight > fans that the retailer put into this box! Seems a bit excessive > to me (and noisy) but I don't want to mess with it without knowing > for sure whether they are needed. I typically measure that sort of thing with a thermal probe and a multimeter. casue the mainboards sensors may not be in useful locations for extremely accurate measurements. boxs are typically designed with their worst case rated operating temperatures as the target for the volume of air they have to move... in racks in a datacenter, particularly in the event of hvac failure operating margins afforded but exhausting more air are desirable. > Anyone have any good recommendations for quiet fans? Or fans > that only switch on when things get hot? panaflo FBA-08A12L. 80mm x 25mm is a pretty quiet (21dba at 1900rpm) 80mm fan, but it's fairly low volume for a 80mm fan, 24cfm, compared to say a delta 4300rpm 80mm fan (pretty standard pc fan) which is 48.5dba (more than 2 orders of magnitude louder) at 68cfm... Thermally controlled fans are pretty common, but in pc's having the thermostat on the fan isn't enough, it needs to be on or near the components you're trying to cools. vendors do make i2c controllers that do this, but it's a lot of work as a retrofit. > Dan > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From alvin at Mail.Linux-Consulting.com Mon May 21 16:50:15 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: Cooling experts out there, some help please In-Reply-To: Message-ID: hi joel, dan we use Indek fans...cause its supposedly better in terms of reliability... we happen to use the noisier ones cause its cheaper... dont know exact db levels, as its a moot point,.... noisy fans that are more reliable are better than cheap fans that is very likely to die ?? measuring temperature with a thermal junction is very accurate reading for that junction/area... howerver when reading the "cpu temperature" over 0.25" die... the real die temperature will vary widely... the hot heatsink can distibute the heat evenly across the die...but.... insidea 1U case...you can only use 40mm tall fans though some manufacturers use a 60mm fan that is tilted at 45degrees... the blades in a 60mm fan can push more air... we replaced the 1.5" or 2" tall intel p3 heatsinks with a 0.25" (total height) heatsink/fan combo that works really well in our 1U case.... we've not attempted similar issues for AMD cpus, as there is no "metal clip" that works for the any heatsink/fan that will fit inside the 1U... that allows for 0.5" - 1" of air space above the fans we have 3 cpu side fans that blows air across the cpu heatsink in addition to the normal 0.25" heatsink/fan that sits ontop of the cpu/die - if one fan dies... we should still be okay... have fun alvin http://www.Linux-1U.net ... just gotta keep the air flowing..out... On Mon, 21 May 2001, Joel Jaeggli wrote: > http://www.netroedge.com/~lm78/cvs/browse.cgi/lm_sensors2/doc/FAQ > > > I just tried lm-sensors/i2c on a dual processor PIII system and it > > reports that the cpu temps are +2C when idle and around 16C when > > loaded. Obviously these are wrong. The question is, are they > > usually wrong by a constant amount? So can I assume that my > > cpu's are at something like 32C when idle and 48C when loaded? > > it's likely that the value is offset. I'd take a look at the value in the > bios and figure out what the offset between that value and the idle value > is. > > > I want to know this because I want to disconnect some of the eight > > fans that the retailer put into this box! Seems a bit excessive > > to me (and noisy) but I don't want to mess with it without knowing > > for sure whether they are needed. > > I typically measure that sort of thing with a thermal probe and a > multimeter. casue the mainboards sensors may not be in useful locations > for extremely accurate measurements. > > boxs are typically designed with their worst case rated operating > temperatures as the target for the volume of air they have to move... in > racks in a datacenter, particularly in the event of hvac failure operating > margins afforded but exhausting more air are desirable. > > > Anyone have any good recommendations for quiet fans? Or fans > > that only switch on when things get hot? > > panaflo FBA-08A12L. 80mm x 25mm is a pretty quiet (21dba at 1900rpm) 80mm > fan, but it's fairly low volume for a 80mm fan, 24cfm, compared to say a > delta 4300rpm 80mm fan (pretty standard pc fan) which is 48.5dba (more > than 2 orders of magnitude louder) at 68cfm... > > Thermally controlled fans are pretty common, but in pc's having the > thermostat on the fan isn't enough, it needs to be on or near the > components you're trying to cools. vendors do make i2c controllers that do > this, but it's a lot of work as a retrofit. > > > Dan > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > -------------------------------------------------------------------------- > Joel Jaeggli joelja@darkwing.uoregon.edu > Academic User Services consult@gladstone.uoregon.edu > PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E > -------------------------------------------------------------------------- > It is clear that the arm of criticism cannot replace the criticism of > arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of > the right, 1843. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From becker at scyld.com Mon May 21 17:09:46 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:19 2009 Subject: scyld / SCSI In-Reply-To: <55396F6C3DA1D411988200508BD949941824EE@SCLSMXSUSR01> Message-ID: On Mon, 21 May 2001 Jim.Morton@aventis.com wrote: > I am interested in installing Scyld Beowulf on a cluster including VALinux > 501 computers with SCSI drives. > It seems that the bootable kernel on the CD does not support the SCSI > controller on ther 501 mobo, has > anyone worked through the patches/hacks to make this work? The slave node boot does not need to support SCSI (or even IDE) drives. It only exists to contact the master over the network and download the desired cluster kernel. Once the proper kernel is downloaded, the master node may optionally provide additional driver modules to support e.g. new filesystems or SCSI adapters. Thus only the master's needs a copy of the driver modules, even if you store the boot kernel on the slave node's disk or flash. > It seems to me that I should be able to compile a kernel with the > appropriate SCSI support and burn a new CD with the new kernel in > place of the old one, but otherwise basically the same as the Scyld > distribution CD. It's easier than that. You should be able to build the new SCSI driver without updating the kernel. You then have the master load the correct driver onto the slave nodes. If you need a new or updated driver, just add it to the module directory on the master. If it's a new module, the beosetup program lets you update the PCI to driver name table. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From bob at drzyzgula.org Mon May 21 17:47:29 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:19 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: References: <20010510114434.A10957@melissa.rsma.frb.gov> Message-ID: <20010521204729.B21116@drzyzgula.org> On Mon, May 21, 2001 at 06:19:52PM -0400, Robert G. Brown wrote: > > This isn't to complain about your ranking -- I think it is very useful > and one does have to create some sort of standard in order to compare > price performance. It is just to note that if one's application DOESN'T > use a GB of core the price/performance rankings can signficantly change > because of the nonlinearities in the cost space. Robert, Good point. I had focused on 1GB configurations because that's what we are currently using at work, to support some huge, memory-hungry jobs. I took a stab at readjusting for 512MB configurations (768MB gets to be a hassle to do in highly interleaved systems like the UltraSPARC). This table is a bit shakier than the last ones because I don't have good data for the smaller memory configurations on the UltraSPARCs, and, as I pointed out previously, the Alpha and HP prices are WAGs anyway. Not that precise data on these is likely to change the rankings much, but does anyone have better pricing info for these that they can share? Anyway, this is what I came up with, sorted only by Sfp2K/K$: Core$ is for MB + CPU + 512MB memory Processor MHz L2 Si Sfp core $ Sfp/K$ Notes -------------------------- ---- ---- --- --- ------ ---- ------------------- AMD Athlon (Thunderbird) 1300 256 491 374 400 935 (A7V, PC133 SDRAM) AMD Athlon (Thunderbird) 1330 256 539 445 600 742 (GA7DX, DDR SDRAM) Pentium 4 1700 256 586 608 900 676 (D850GB, RDRAM) Pentium III (Coppermine) 1000 256 428 314 700 449 (VC820, RDRAM) Alpha (21264) 833 8192 533 644 8,500 76 (UP2000+, est) UltraSPARC III 750 8192 395 421 7,500 56 (Ocelot) PA-8700 750 603 581 13,500 43 (HP J6700, 2304KB L1) UltraSPARC II 480 8192 234 291 9,000 32 (AXdp) --Bob PS, BTW, FYI & FWIW, for those of you with a sense of deja-vu, the message Robert just responded to was in fact essentially a duplicate of one I sent before, only it came from my office address rather than my personal address. I had done a reply instead of a forward so that I could edit the content (I have mutt set to forward messages as mime attachments), but then I forgot to take the beowulf address off. I thought the message was dead because my work address isn't subscribed to the list. I sent mail to the admin address asking that it be deleted rather than approved, but I guess that it got approved and posted anyway. :-( From Bari at onelabs.com Mon May 21 18:23:32 2001 From: Bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems References: Message-ID: <3B09BF94.8050403@onelabs.com> alvin@Mail.Linux-Consulting.com wrote: > hi josip.. > you are right in your concerns... > > teh itty-bitty 40mm fans are barely enough to keep the > 1U chassis cool....even for a P3....amd and p4 requires > lots more cooling capacity... > > amd athlon/duran require 250W power supply ??... > Intel P3 system can get away with 150W power supply... > > larger cases ( deeper ) does NOT provide any cooling > improvements... you have to move the air around... and > the problem is the atx connectors in the back prevents > air flow.... front to back...or side to side is okay too > > all of this is in 1U cases... > > we have lateral/squirl cage fans that we are considering > for our next generation P4 based 1U chassis ... > For one P4 on an off the shelf ATX motherboard in a 1U why not just replace the top of the enclosure with an aluminum extrusion with a nice profile for convection cooling tied to the CPU case? There is easily enough area in a 19" rack foorprint to cool off the 86W/per cpu for a few P4s. > > On Mon, 21 May 2001, Josip Loncaric wrote: > > >> Steve Gaudet wrote: >> >>> Based on the current P4 die won't see the P4 in anything smaller than a 2u. >> >> Except for the P4/1.4 1U RackSaver link posted by Tim... However, >> cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 >> small case fans (> 40 cfm total), and a 250W ATX12V power supply. I >> understand the need to be space efficient, but I'd feel more confident >> about cooling with larger cases.. We're designing very dense 8-16 CPUs per 1U clusters now. You can't do this with off the shelf ATX motherboards and low profile ATX power supplies and rely on a few weenie 1" fans for forced air cooling. There are a few small footprint ATX dual Socket 370 boards that will get you 4 CPUs per 1U but they aren't cooled well since they are designed for a typical desktop enclosure with high profile heatsink/fan combinations and not a low profile 1U enclosure. If you look at some of the latest dense servers like the Crusoe based systems with 24 CPUs per 3U announced recently, they use convection and forced air cooling since they have relatively low powered (heat & MFLOPS) CPUs. For dense clusters with high wattage/high performance CPUs like the P4 you need to move the heat from the surface of the hottest components like the CPUs, chipsets and power supply mosfets by conduction out of the enclosure and then rely on forced air and convection cooling for the remainder of the components. If you need to get things cooler (in case you're cluster is going into a very warm climate) you can also add forced AC since the cost of a small AC unit and some sheet metal is less than the cost of 1 CPU. Mainboard and power supply design for cramming multiple CPUs and chipsets in a 1U board is pretty cut and dry. IMHO you're better off building your own boards and power supplies than trying to kludge up dense systems with components designed for the desktop market. Does P4 with Rambus really make sense for clustering since SMP with Athlon4 or ULV PIII with DDR is lower priced and produce far less heat along with the high MFLOPS? SMP with IA-64 will also be available soon to add to the choices. Bari Ari email: bari@onelabs.com O.N.E. Technologies 1505 Old Deerfield Road tel: 773-252-9607 Highland Park, IL 60035 fax: 773-252-9604 http://www.onelabs.com From jvanworkum at cfl.rr.com Mon May 21 18:45:08 2001 From: jvanworkum at cfl.rr.com (John D. Van Workum) Date: Wed Nov 25 01:01:19 2009 Subject: Instrumenting a parallel code In-Reply-To: <3B09227C.3383B68C@iat.utexas.edu> Message-ID: <000401c0e260$d4ce46c0$a63a2141@cfl.rr.com> Take a look at this up-and-coming tool: http://www.aprobe.com/prod_aprobe.html John Van Workum Co-Founder Tsunamic Technologies Inc. From alvin at Mail.Linux-Consulting.com Mon May 21 18:39:42 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B09BF94.8050403@onelabs.com> Message-ID: hi bari... you're right that it;d be better to design ones own motherboard/powersupply/1u chassis.... problem is i think most people dont have the time or the enginnering staff to do so... and that itd be cheaper to buy off the shelf parts... ( or so goes the theory/assumption most 1U chassis designs and atx motherboards fail miserably on dual power supply issues and hot swapp issues... - basically not fixable... adding an extra extrusion to the 1U box would make it into an equivalent 2U server... and those generic 2U chassis are cheaper than starting with a 1U ... so there'd be no point to creating the extrusions/extensions for the 1U to solve the cpu heatsink problem.. the crusoe cpu is a low wattage device to begin with so having 24 of um in the space of 3u chassis is a good idea... the motherboard schematics etc is freely available to those that want to build a crusoe based systems when the proper fans and chassis designed is used.. i dont see any problems with cooling the intel cpu.... have fun alvin On Mon, 21 May 2001, Bari Ari wrote: > alvin@Mail.Linux-Consulting.com wrote: > > > hi josip.. > > you are right in your concerns... > > > > teh itty-bitty 40mm fans are barely enough to keep the > > 1U chassis cool....even for a P3....amd and p4 requires > > lots more cooling capacity... > > > > amd athlon/duran require 250W power supply ??... > > Intel P3 system can get away with 150W power supply... > > > > larger cases ( deeper ) does NOT provide any cooling > > improvements... you have to move the air around... and > > the problem is the atx connectors in the back prevents > > air flow.... front to back...or side to side is okay too > > > > all of this is in 1U cases... > > > > we have lateral/squirl cage fans that we are considering > > for our next generation P4 based 1U chassis ... > > > For one P4 on an off the shelf ATX motherboard in a 1U why not just > replace the top of the enclosure with an aluminum extrusion with a nice > profile for convection cooling tied to the CPU case? There is easily > enough area in a 19" rack foorprint to cool off the 86W/per cpu for a > few P4s. > > > > > On Mon, 21 May 2001, Josip Loncaric wrote: > > > > > >> Steve Gaudet wrote: > >> > >>> Based on the current P4 die won't see the P4 in anything smaller than a 2u. > >> > >> Except for the P4/1.4 1U RackSaver link posted by Tim... However, > >> cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 > >> small case fans (> 40 cfm total), and a 250W ATX12V power supply. I > >> understand the need to be space efficient, but I'd feel more confident > >> about cooling with larger cases.. > > We're designing very dense 8-16 CPUs per 1U clusters now. You can't do > this with off the shelf ATX motherboards and low profile ATX power > supplies and rely on a few weenie 1" fans for forced air cooling. There > are a few small footprint ATX dual Socket 370 boards that will get you 4 > CPUs per 1U but they aren't cooled well since they are designed for a > typical desktop enclosure with high profile heatsink/fan combinations > and not a low profile 1U enclosure. > > If you look at some of the latest dense servers like the Crusoe based > systems with 24 CPUs per 3U announced recently, they use convection and > forced air cooling since they have relatively low powered (heat & > MFLOPS) CPUs. > > For dense clusters with high wattage/high performance CPUs like the P4 > you need to move the heat from the surface of the hottest components > like the CPUs, chipsets and power supply mosfets by conduction out of > the enclosure and then rely on forced air and convection cooling for the > remainder of the components. If you need to get things cooler (in case > you're cluster is going into a very warm climate) you can also add > forced AC since the cost of a small AC unit and some sheet metal is less > than the cost of 1 CPU. Mainboard and power supply design for cramming > multiple CPUs and chipsets in a 1U board is pretty cut and dry. IMHO > you're better off building your own boards and power supplies than > trying to kludge up dense systems with components designed for the > desktop market. > > Does P4 with Rambus really make sense for clustering since SMP with > Athlon4 or ULV PIII with DDR is lower priced and produce far less heat > along with the high MFLOPS? SMP with IA-64 will also be available soon > to add to the choices. > > Bari Ari email: bari@onelabs.com > > O.N.E. Technologies > 1505 Old Deerfield Road tel: 773-252-9607 > Highland Park, IL 60035 fax: 773-252-9604 > > http://www.onelabs.com > > From meisterj at acm.org Mon May 21 20:02:55 2001 From: meisterj at acm.org (JackM) Date: Wed Nov 25 01:01:19 2009 Subject: Disk reliability (Was: Node cloning) References: <3AD1C348.638EBC4E@icase.edu> <3AFFCD04.BD5DBA78@lmco.com> Message-ID: <3B09D6DF.F9AB9132@acm.org> You can try using hdparm to turn the DMA off. Of course, it does slow down data transfer rates considerably. Jeffrey B Layton wrote: > > Hello, > > I hate to dredge up this topic again, but ... . I've got a machine > with an IBM drive that is giving me the following errors, > > kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } > > as discussed in previous emails on the list. I followed the pointers > that Josip gave and ran the IBM code on the drive. It said the drive > was fine. However, I'm still getting the same error messages. > Anybody care to suggest anything else to look at? Perhaps cabling > or a new motherboard (it's an Abit board). > > TIA, > > Jeff > > Josip Loncaric wrote: > > > Thanks to several constructive responses, the following picture emerges: > > > > (1) Modern IDE drives can automatically remap a certain number of bad > > blocks. While they are doing this correctly, the OS should not even see > > a bad block. > > > > (2) However, the drive's capacity to do this is limited to 256 bad > > blocks or so. If more bad blocks exist, then the OS will start to see > > them. To recover from this without replacing the hard drive, one can > > detect and map out the bad blocks using 'e2fsck -c ...' and 'mkswap -c > > ...' commands. Obviously, the partition where this is being done should > > not be in use (turn swap off first, unmount the file system or reboot > > after doing "echo '-f -c' >/fsckoptions"). > > > > (3) In general, IDE cables should be at most 18" long with both ends > > plugged in (no stubs), and preferably serving only one (master) drive. > > > > For IBM drives (IDE or SCSI), one can download and use the Drive Fitness > > Test utility (see > > http://www.storage.ibm.com/techsup/hddtech/welcome.htm). This program > > can diagnose typical problems with hard drives. In many cases, bad > > blocks can be 'healed' by erasing the drive using this utility (back up > > your data first, and be prepared for the 'Erase Disk' to take an hour or > > more). If that fails and your drive is under warranty, the drive ought > > to be replaced. > > > > For older existing drives (in less critical applications, e.g. to boot > > Beowulf client nodes where the same data is mirrored by other nodes) > > mapping out bad blocks as needed is probably adequate. > > > > Finally, the existing Linux S.M.A.R.T. utilities apparently do not > > handle every SMART drive correctly. Use with caution. > > > > Sincerely, > > Josip > > > > -- > > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bob at drzyzgula.org Mon May 21 18:45:15 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:19 2009 Subject: Cooling experts out there, some help please In-Reply-To: <3AF8516A.43EDEB2E@pqs-chem.com> References: <20010508165421.999.qmail@web13503.mail.yahoo.com> <3AF8516A.43EDEB2E@pqs-chem.com> Message-ID: <20010521214515.C21116@drzyzgula.org> As an aside in this discussion, I'd like to recommend a book that can be quite enlightening -- and entertaining: "Hot Air Rises and Heat Sinks, Everything You Know About Cooling Electronics is Wrong" by Tony Kordyban, ASME Press, 1998, ISBN 0-7918-0074-1 The book is written in a humorous style (and in fact is LOL-funny in many places) but the author really does an excellent job of explaining heat transfer in electronic devices. I found that most of the online booksellers (Amazon, B&N, etc) didn't tend to keep this book in stock, but you can order it directly from ASME: http://www.asmeny.org/cgi-bin/WEB017C?351719+0001+AP+HT+800741 I have no affilation with the ASME or Mr. Kordyban. This book was recommended by Bob Pease of National Semiconductor at an Analog design seminar road show they were doing a few weeks ago, which is how I learned of it. --Bob From hahn at coffee.psychology.mcmaster.ca Mon May 21 18:48:53 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:19 2009 Subject: Disk reliability (Was: Node cloning) In-Reply-To: <3AFFCD04.BD5DBA78@lmco.com> Message-ID: > I hate to dredge up this topic again, but ... . I've got a machine > with an IBM drive that is giving me the following errors, it's not an error. > kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } translation: "the udma checksum detected a corrupt transaction due to noise on the cable; it was automatically retried". this *only* happens due to corruption of the signal on the cable; normally, that's because someone has a bogus cable, or is overclocking. if it happens at relatively low rate, there is no performance cost. (for reference, all non-bogus IDE cables must be 18" or less, with both ends plugged in (no stub). for a mode > udma33, the 80-conductor cable must also be used, and yes, it's still at most 18".) > as discussed in previous emails on the list. I followed the pointers > that Josip gave and ran the IBM code on the drive. It said the drive the code in question probably just configured the drive to default to udma33 or something modest. this shouldn't ever be necessary, since the bios shouldn't misconfigure a too-high speed, and any modern Linux will not. (though you can choose your own mode using hdparm, if you wish.) From Bari at onelabs.com Mon May 21 19:03:59 2001 From: Bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems References: Message-ID: <3B09C90F.8090501@onelabs.com> alvin@Mail.Linux-Consulting.com wrote: > hi bari... > > you're right that it;d be better to design ones own > motherboard/powersupply/1u chassis.... > > problem is i think most people dont have the time or > the enginnering staff to do so... and that itd be > cheaper to buy off the shelf parts... > ( or so goes the theory/assumption That's why we offer the design services and will be offering these units to integrators soon. (/shameless plug> > > most 1U chassis designs and atx motherboards fail miserably on > dual power supply issues and hot swapp issues... > - basically not fixable... > > adding an extra extrusion to the 1U box would make it > into an equivalent 2U server... and those generic 2U chassis are cheaper > than starting with a 1U ... so there'd be no point to creating > the extrusions/extensions for the 1U to solve the > cpu heatsink problem.. > I wouldn't add the extrusion to the 1U enclosure, I'd replace the top half of the enclosure with it. Then you'd still have the 1U profile with a lot of surface area for convection cooling. A 1U with a nice extrusion for the top half would be less than $50 in materials and labor. This would work well unless you're building nodes with CD-ROMS, RAID and other high profile components, but how many nodes need this? maybe 1? Bari Ari From patrick at myri.com Mon May 21 19:28:18 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:19 2009 Subject: ATLAS vs. Intel Math Kernel Library References: Message-ID: <3B09CEC2.766FFEA3@myri.com> "Ole W. Saastad" wrote: > > How does the performance of ATLAS generated libraries compare > to the Intel Math Kernel Library? Hi Ole, On P4, ATLAS is pretty good. I let you judge from http://www.netlib.org/atlas/atlas-comm/msg00240.html When ATLAS beats MKL, the difference may be quite large :-) ATLAS is free, easy to install and you can redistribute it. Regards. -- Patrick Geoffray --------------------------------------------------------------- | Myricom Inc | University of Tennessee - CS Dept | | 325 N Santa Anita Ave. | Suite 203, 1122 Volunteer Blvd. | | Arcadia, CA 91006 | Knoxville, TN 37996-3450 | | (626) 821-5555 | Tel/Fax : (865) 974-1950 | --------------------------------------------------------------- From hunting at ix.netcom.com Mon May 21 19:30:15 2001 From: hunting at ix.netcom.com (Michael Huntingdon) Date: Wed Nov 25 01:01:19 2009 Subject: FYI: Current SPECfp landscape... In-Reply-To: <20010521204729.B21116@drzyzgula.org> References: <20010510114434.A10957@melissa.rsma.frb.gov> Message-ID: <3.0.3.32.20010521193015.0071e914@popd.ix.netcom.com> Bob If you can be more specific, I can come back with what ever pricing might be needed. cheers ~m At 08:47 PM 5/21/01 -0400, you wrote: >On Mon, May 21, 2001 at 06:19:52PM -0400, Robert G. Brown wrote: >> >> This isn't to complain about your ranking -- I think it is very useful >> and one does have to create some sort of standard in order to compare >> price performance. It is just to note that if one's application DOESN'T >> use a GB of core the price/performance rankings can signficantly change >> because of the nonlinearities in the cost space. > >Robert, > >Good point. I had focused on 1GB configurations because >that's what we are currently using at work, to support >some huge, memory-hungry jobs. I took a stab at readjusting >for 512MB configurations (768MB gets to be a hassle to do >in highly interleaved systems like the UltraSPARC). This >table is a bit shakier than the last ones because I don't >have good data for the smaller memory configurations >on the UltraSPARCs, and, as I pointed out previously, >the Alpha and HP prices are WAGs anyway. Not that precise >data on these is likely to change the rankings much, but >does anyone have better pricing info for these that they >can share? Anyway, this is what I came up with, sorted >only by Sfp2K/K$: > >Core$ is for MB + CPU + 512MB memory > >Processor MHz L2 Si Sfp core $ Sfp/K$ Notes >-------------------------- ---- ---- --- --- ------ ---- ------------------- >AMD Athlon (Thunderbird) 1300 256 491 374 400 935 (A7V, PC133 SDRAM) >AMD Athlon (Thunderbird) 1330 256 539 445 600 742 (GA7DX, DDR SDRAM) >Pentium 4 1700 256 586 608 900 676 (D850GB, RDRAM) >Pentium III (Coppermine) 1000 256 428 314 700 449 (VC820, RDRAM) >Alpha (21264) 833 8192 533 644 8,500 76 (UP2000+, est) >UltraSPARC III 750 8192 395 421 7,500 56 (Ocelot) >PA-8700 750 603 581 13,500 43 (HP J6700, 2304KB L1) >UltraSPARC II 480 8192 234 291 9,000 32 (AXdp) > >--Bob > >PS, BTW, FYI & FWIW, for those of you with a sense of >deja-vu, the message Robert just responded to was in fact >essentially a duplicate of one I sent before, only it came >from my office address rather than my personal address. >I had done a reply instead of a forward so that I could >edit the content (I have mutt set to forward messages >as mime attachments), but then I forgot to take the beowulf >address off. I thought the message was dead because my >work address isn't subscribed to the list. I sent mail >to the admin address asking that it be deleted rather than >approved, but I guess that it got approved and posted >anyway. :-( > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From Eugene.Leitl at lrz.uni-muenchen.de Tue May 22 04:27:52 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:19 2009 Subject: more AMD Beowulf press sightings Message-ID: Please engage your dekrauting unit of choice before viewing http://heise.de/newsticker/data/cp-20.05.01-010/ From MAHRF at de.ibm.com Tue May 22 07:25:24 2001 From: MAHRF at de.ibm.com (MAHRF@de.ibm.com) Date: Wed Nov 25 01:01:19 2009 Subject: Node power supply Message-ID: Hi everyone, I'm planning to build a home beowulf of nodes with an Athlon 900 in a mini tower. How many watts should the power supply have at least when the only other parts in the nodes are a NIC, a simple graphics card and maybe a small HDD. Maybe I'm going to upgrade the nodes to, say, 1,3 or 1,4GHz Athlons in the future when these are as cheap as the 900MHz models are. I'm asking because less watts means less heat and it's also cheaper. Thanks in advance, Ferdinand From sgaudet at angstrommicro.com Tue May 22 07:48:31 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:19 2009 Subject: 1U P4 Systems In-Reply-To: <3B09805C.C125E7C@icase.edu> References: <3B03E319.A01EE9B1@aeolusresearch.com> <990469638.3b095e06d1f9c@localhost> <3B09805C.C125E7C@icase.edu> Message-ID: <990542911.3b0a7c3fa124c@localhost> Hello Josip, > > Based on the current P4 die won't see the P4 in anything smaller than > a 2u. > > Except for the P4/1.4 1U RackSaver link posted by Tim... However, > cooling a 1U case concerns me. The RackSaver RS-1100 claims to have 6 > small case fans (> 40 cfm total), and a 250W ATX12V power supply. I > understand the need to be space efficient, but I'd feel more confident > about cooling with larger cases... After reading the article posted by Brian Atkins and throttling were ore concerned about heat. ---------from Brian Atkins------------ The throttling may not be solvable, even with mega-cooling methods due to internal hot spot(s): http://www.inqst.com/articles/athlon4/0516main.htm Throttling will vary chip to chip because of thermal diode inconsistencies. Intel must have developed a huge case of big-company-arrogance in order to make all the bad decisions they've made over the last few years. -- Brian Atkins Director, Singularity Institute for Artificial Intelligence http://www.singinst.org/ ================================================================= As also mentioned before on this list, Intel ships a hugh heat sink which would never fit into a 1u. Therefore, coming up with an alternative is a task that should not be rushed because their are warranty issues on both the motherboard and processor that we here at Angstrom want to maintain. Failures due to heat are easy to see, throttling down issues can go undetected. This isn't suggesting Racksaver dosen't have a solid solution, I'm sure they do. However, from what I've seen they are the only ones...and there may me a valid reason for this. > P.S. RackSaver uses P4/1.4 which can dissipate 51.8W. By contrast, > the > P4/1.7 can dissipate 64.0W in normal use, i.e. more than a fairly large > soldering iron... BTW, this 64.0W is Intel's "thermal design point" > for > the P4/1.7 processor. Their "thermal design point" is based on 75% of > the maximum power dissipation (which could reach 85.3W, lead to > internal > hot spots, etc.). Intel claims that most popular applications > (including SPEC benchmarks) fall below this 75%. However, ATLAS and > some other cache optimized codes can seriously stress this > assumption... Excellent cooling is essential when doing very intense > computations. > > -- > Dr. Josip Loncaric, Research Fellow Regards, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From josip at icase.edu Tue May 22 08:10:06 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:20 2009 Subject: Disk reliability (Was: Node cloning) References: Message-ID: <3B0A814E.26558AA3@icase.edu> Mark Hahn wrote: > > > as discussed in previous emails on the list. I followed the pointers > > that Josip gave and ran the IBM code on the drive. It said the drive > > the code in question probably just configured the drive > to default to udma33 or something modest. this shouldn't ever > be necessary, since the bios shouldn't misconfigure a too-high > speed, and any modern Linux will not. (though you can choose your > own mode using hdparm, if you wish.) IBM's Drive Fitness Test (DFT) actually does a lot. It accesses IBM hard drive microcode to enable diagnosis of hard drive operation, and when necessary, it can remap new bad blocks and zero the disk. For more detail, see the DFT white paper http://www.storage.ibm.com/hardsoft/diskdrdl/technolo/dft/dft.htm FYI, we applied this procedure to two IBM hard drives which had developed too many bad blocks (155 and 84, respectively) and we have not seen any bad blocks since then (for over a month). Since IBM's DFT program accesses special IBM hard drive DFT microcode to learn about low level performance details, I am not sure if it can do much for non-IBM drives. JackM wrote: > > You can try using hdparm to turn the DMA off. Of course, it does slow > down data transfer rates considerably. As Mark said, BadCRC only means that the transfer was retried. If a few BadCRC messages are the only problem, I would not turn off DMA. BTW, some early UltraDMA drives have known problems (e.g. http://www.seagate.com/support/kb/disc/bigbear.html) and if you have a drive like that, turning off DMA is advisable. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From joelja at darkwing.uoregon.edu Tue May 22 09:34:16 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:20 2009 Subject: Node power supply In-Reply-To: Message-ID: the rating for the powersupply is the peak load it can handle not how much it will actually draw, a 300watt powersupply with a shouldn't draw any more than a 250 under the same load... that being said amd hasn't certifed any powersupply rated under 300watts for the 1.3ghz athlons... http://www1.amd.com/athlon/npower/index/1,1712,,00.html joelja On Tue, 22 May 2001 MAHRF@de.ibm.com wrote: > > > Hi everyone, > > I'm planning to build a home beowulf of nodes with an Athlon 900 in a mini > tower. > How many watts should the power supply have at least when the only other > parts in the nodes are a NIC, a simple graphics card and maybe a small HDD. > Maybe I'm going to upgrade the nodes to, say, 1,3 or 1,4GHz Athlons in the > future when these are as cheap as the 900MHz models are. > I'm asking because less watts means less heat and it's also cheaper. > > Thanks in advance, > Ferdinand > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From lindahl at conservativecomputer.com Tue May 22 10:01:07 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: ; from alvin@Mail.Linux-Consulting.com on Mon, May 21, 2001 at 06:39:42PM -0700 References: <3B09BF94.8050403@onelabs.com> Message-ID: <20010522130107.C1296@wumpus.dhcp.fnal.gov> On Mon, May 21, 2001 at 06:39:42PM -0700, alvin@Mail.Linux-Consulting.com wrote: > most 1U chassis designs and atx motherboards fail miserably on > dual power supply issues and hot swapp issues... > - basically not fixable... This is the beowulf mailing list. How many beowulf owners have dual power supplies? How many are willing to pay extra for hotswap? The beowulf community is considerably different from your typical 1U server buyer. -- g From edwards at icantbelieveimdoingthis.com Tue May 22 09:34:30 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:20 2009 Subject: Use of slave node disc using Scyld Message-ID: <20010522103430.A21087@icantbelieveimdoingthis.com> I would lilke to use the local, slave-node disk space to write scratch files using Scyld. So far I have been unsuccessful. When I execute df -k on all nodes, they see the /home space on the head node and the root space on the slave. In the simplest MPI job, I cannot open a file on the slaves. Any insights? Art Edwards -- Arthur H. Edwards 712 Valencia Dr. NE Abq. NM 87108 (505) 256-0834 From rgb at phy.duke.edu Tue May 22 09:15:03 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: Node power supply In-Reply-To: Message-ID: On Tue, 22 May 2001 MAHRF@de.ibm.com wrote: > > > Hi everyone, > > I'm planning to build a home beowulf of nodes with an Athlon 900 in a mini > tower. > How many watts should the power supply have at least when the only other > parts in the nodes are a NIC, a simple graphics card and maybe a small HDD. > Maybe I'm going to upgrade the nodes to, say, 1,3 or 1,4GHz Athlons in the > future when these are as cheap as the 900MHz models are. > I'm asking because less watts means less heat and it's also cheaper. Do you mean how many watts does it draw in this configuration (only way to know for sure is to measure)? Or how much current must it be able to deliver transiently (to switch on the motherboard)? Or what AMD expects and requires to certify correct and proper operation? I haven't looked it up myself (although it ought to be on the AMD website and/or available within the technical specs of the motherboard you are considering) but my vendor tells me that AMD requires a 250W "certified" power supply for at least the 1.33 GHz systems -- not just any old 250W power supply will do, and I couldn't get my favorite case as a consequence (no biggie, they had a certified case that appears just as good and actually a bit cheaper). As to what you could get to work -- the only way I can imagine to find out, especially since you'll be working outside the "spec" for the motherboard/CPU, is to try various power supplies and see what works. I don't >>think<< that you can hurt the chip by underpowering it, the most likely consequence is that it won't work at all or will work erratically. HOWEVER, you might think about the logic of what you are doing. It is by no means clear that putting in a smaller power supply is going to make the system run cooler or run cheaper, although the power supply itself might be cheaper. The current draw of the running system is determined by the voltage applied (which had better be fixed within a fairly narrow tolerance or the board won't work at all and might indeed break) and the operating load. The power is determined by the voltage times the power. The power consumed by the motherboard is more or less independent of the power supply used to provide it. You could have one the size of your desk that could provide ten kilowatts of power if asked nicely (with the appropriate voltages on the appropriate lines, of course) and if you plug in the motherboard it will trundle right along drawing its 80W or 130W or whatever its operational load and hardware configuration requires. The only real differences are that the 10000W power supply could probably drive 50-100 motherboards or so at once (instead of just the one you could drive with a normal one) and that it would make BIG sparks if you short it out before blasting the wires into copper vapor. SO, you won't save power on the operational side by using a small supply, you will just risk it not being able to draw enough peak current/power and running erratically or unreliably. What about the power drawn in the power supply itself (in idle mode)? Hard to say, as it depends greatly on the quality and design of the supply. If the transformers and components inside the supply were "perfect" there would be no draw at all beyond the idle current provided to the motherboard and peripherals and the power supply itself would not heat up at all. They're not perfect -- eddy currents are generated in the flux coupling, some flux escapes as 60 Hz radiation energy, the wires and components all get a bit warm even at the idle load. With a power supply that is too small, I'd expect that it would get much hotter at even the idle load, because one of the things that MAKES it a low-wattage design is a relatively small (physically) transformer with relatively thin primary and secondary wires. Those thin wires have a higher resistance and get hotter than the thicker wires of a bigger transformer at any load. I know that the power supply of my laptop is always hot, even when the laptop is idle or off and just trickle charging. As the transformer gets bigger, provided that the flux core lamination remains high quality I'd expect the transformer to run COOLER at any given load, not hotter. There is probably a sanity-check-point here -- the desk-sized transformer might well generate more heat than a simply "big" desktop power supply -- but overall I'd expect a big supply to run cooler than a small one at equivalent load. Put all this together and you might conclude that your system will run COOLER if you use a BIGGER power supply than you really need, and won't run at all if you use one that is too small. You'll pay more up front for the larger supply, but you will actually pay for a bit less electricity during its lifetime of operation (the heating of the transformer especially is utter waste heat that you pay for in the electricity and pay for again in the cooling bill and reduced life of the components sharing the enclosure). This might be why AMD insists on a certified 250W supply or better for their systems -- it might well be twice the power or more that the system actually draws in operation (except possibly during peaks when all the peripherals in a loaded system run at once) but a 250W supply runs cooler and heats the case less under load than a 200W supply that would nominally suffice. A 300W supply would probably heat the case even less, especially given that it will typically have an even larger cooling fan to get rid of the waste heat generated in the transformer under full load. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From agrajag at scyld.com Tue May 22 12:13:17 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:20 2009 Subject: Use of slave node disc using Scyld In-Reply-To: <20010522103430.A21087@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Tue, May 22, 2001 at 10:34:30AM -0600 References: <20010522103430.A21087@icantbelieveimdoingthis.com> Message-ID: <20010522151317.L10313@blueraja.scyld.com> On Tue, 22 May 2001, Art Edwards wrote: > I would lilke to use the local, slave-node disk space to write scratch > files using Scyld. So far I have been unsuccessful. When I execute > df -k on all nodes, they see the /home space on the head node and the > root space on the slave. In the simplest MPI job, I cannot open a file on the > slaves. Any insights? By default, a the slave nodes in a Scyld cluster use a ram disk as / and nfs mount /home from the master node. If you want to use harddrives that are on the slave nodes, then the Installation Guide (http://www.scyld.com/support/docs/beoinstall.html) contains details on partitioning those harddrives as well as setting up an fstab to use them. Note that when you do this, a minimal filesystem will be created on the slave nodes, but it may not contain most files you are expecting (such as /etc/passwd). Also, anything you try to write into /home will actually be written on /home of the master node. However, /tmp will still be there on the slave nodes and is a good place to store temproary files that only need to be accessed by that slave node. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010522/fca3d143/attachment.bin From siegert at sfu.ca Tue May 22 13:15:28 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <20010522130107.C1296@wumpus.dhcp.fnal.gov>; from lindahl@conservativecomputer.com on Tue, May 22, 2001 at 01:01:07PM -0400 References: <3B09BF94.8050403@onelabs.com> <20010522130107.C1296@wumpus.dhcp.fnal.gov> Message-ID: <20010522131528.A31226@stikine.ucs.sfu.ca> On Tue, May 22, 2001 at 01:01:07PM -0400, Greg Lindahl wrote: > On Mon, May 21, 2001 at 06:39:42PM -0700, alvin@Mail.Linux-Consulting.com wrote: > > > most 1U chassis designs and atx motherboards fail miserably on > > dual power supply issues and hot swapp issues... > > - basically not fixable... > > This is the beowulf mailing list. How many beowulf owners have dual > power supplies? How many are willing to pay extra for hotswap? > > The beowulf community is considerably different from your typical 1U > server buyer. You are so right. Unfortunately not many manufacturers/distributers have noticed this. I have been searching for a 1U or 2U case for a beowulf cluster on the web. This has been a totally frustrating experience: 1) I don't want to spend $200 US ($300 CDN) or more on a case. 2) Many web sites don't even specify which size of a motherboard they support (e.g., 12"x9.6" or 12"x13"). They usually just say supports ATX, Micro-ATX, and if you are lucky they say extended ATX. Unfortunately, different sites mean different sizes when they say "ATX supported". ... meaning that many of those web sites are almost content free ... Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From bari at onelabs.com Tue May 22 14:01:41 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems References: <3B09BF94.8050403@onelabs.com> <20010522130107.C1296@wumpus.dhcp.fnal.gov> <20010522131528.A31226@stikine.ucs.sfu.ca> Message-ID: <3B0AD3B5.1090905@onelabs.com> Martin Siegert wrote: > On Tue, May 22, 2001 at 01:01:07PM -0400, Greg Lindahl wrote: > >> On Mon, May 21, 2001 at 06:39:42PM -0700, alvin@Mail.Linux-Consulting.com wrote: >> >> >>> most 1U chassis designs and atx motherboards fail miserably on >>> dual power supply issues and hot swapp issues... >>> - basically not fixable... >> >> This is the beowulf mailing list. How many beowulf owners have dual >> power supplies? How many are willing to pay extra for hotswap? >> The beowulf community is considerably different from your typical 1U >> server buyer. > > > You are so right. Unfortunately not many manufacturers/distributers > have noticed this. I have been searching for a 1U or 2U case for a beowulf > cluster on the web. This has been a totally frustrating experience: > > 1) I don't want to spend $200 US ($300 CDN) or more on a case. > > 2) Many web sites don't even specify which size of a motherboard they > support (e.g., 12"x9.6" or 12"x13"). They usually just say supports ATX, > Micro-ATX, and if you are lucky they say extended ATX. Unfortunately, > different sites mean different sizes when they say "ATX supported". > ... meaning that many of those web sites are almost content free ... You can have sheet metal 1U enclosures with CNC mounting holes made to order with aluminum front panels for under $100 US in low volume(100 pcs.). If you are willing to drill your own mounting holes you can get off the shelf 1U enclosures for around $50 quan. 1. Bari From rgb at phy.duke.edu Tue May 22 14:17:05 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: Node power supply In-Reply-To: Message-ID: Sorry, two brain farts: On Tue, 22 May 2001, Robert G. Brown wrote: > load. The power is determined by the voltage times the power. The The power is determined by the sum of the voltages times the CURRENTS on each line (on the DC side). > This might be why AMD insists on a certified 250W supply or better for I think Joel got the 1.33 GHz power requirements right at 300W. I was probably remembering an earlier conversation on one of their slower CPUs. Too much shopping, too little time... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Tue May 22 14:47:52 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <20010522131528.A31226@stikine.ucs.sfu.ca> Message-ID: On Tue, 22 May 2001, Martin Siegert wrote: > You are so right. Unfortunately not many manufacturers/distributers > have noticed this. I have been searching for a 1U or 2U case for a beowulf > cluster on the web. This has been a totally frustrating experience: > > 1) I don't want to spend $200 US ($300 CDN) or more on a case. Martin, I'd have to echo this frustration. Pretty much all the XU cases I've found are more than $200, some quite a bit more. Then you've got to buy the rack. Compared to $50-60 for a standard mid-tower case this is painful beyond measure when buying in volume, especially when the total node cost might only be $600-750 outside of the case. I regretfully opted for shelves of midtowers for my latest effort(s) -- one can achieve something like (within a factor of 2, surely) of the packing of 2U cases -- I can fairly easily get 16 nodes in a 1-1.5 m^2 floorspace footprint (depending on how you like to count the gaps between units) without going over two nodes high and could double that with tall shelves. It's not as pretty and it is a bit more work to put together neatly, but I still assembled a 16 node cluster in about four hours total work including assembling the heavy duty steel shelving. The shelving itself cost only $50 total (plus another $50 or so for cable ties and surge protectors and miscellany to make it look nice) saving me about $2500. That's serious money for six hours of work (allowing for the time to drive to Home Depot and buy the shelving:-) -- 3 nodes worth of money. I'll make the time back in the first DAY of fulltime operation. Case vendors take note: there is some insanity here. Perhaps a 1U case is tricky, but a 2U case is pretty nearly a desktop case with rack mounts on the side. In all cases they are basically sheetmetal boxes with a motherboard mount, a power supply, and a place to hang a disk and a place to poke a card or three out. No way that 4x is a reasonable multiplier for the retail cost -- it just holds down your volume in the beowulf and server market. I'd like to see 1U or 2U cases selling for a competitive $50-75 (where the margin allows for a bit of extra cooling for the 1U cases). But I don't think I'm going to anytime soon...:-( rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Tue May 22 14:52:36 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <3B0AD3B5.1090905@onelabs.com> Message-ID: On Tue, 22 May 2001, Bari Ari wrote: > You can have sheet metal 1U enclosures with CNC mounting holes made to > order with aluminum front panels for under $100 US in low volume(100 > pcs.). If you are willing to drill your own mounting holes you can get > off the shelf 1U enclosures for around $50 quan. 1. Having things made is painful, but I've got a drill. Several drills, actually. Besides, I could buy a heavy-duty drill press for what I'd save on any significant number of nodes. So, how do I go about making a 1U enclosure out of a OTS case? Do you have any specific cases that you care to recommend that fit (even approximately) the size spec? Did you do this yourself? Do you have some (or can you prepare some) GIFs or JPEGs of the process or finished product? Seriously, this would make a great chapter for the online beowulf engineering book I'm working on. Send me the stuff, I'll organize it and put it in a chapter... rgb > > Bari > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From cblack at eragen.com Tue May 22 15:46:55 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) Message-ID: <20010522184655.E3180@getafix.EraGen.com> We have been looking into motherboards that provide 64-bit 66MHz PCI slots and haven't had much luck. We are now evaluating the Intel Server Board STL2, but have had some problems with it in terms of getting the IDE to work correctly with UltraDMA support. Can anyone share any experience with these boards in terms of problems and what needs to be done to get them to work? It seems they are a bit finicky in respect to the type of memory they want as well. Has anyone found any motherboards that can reliably do 66MHz/64bit PCI bus transfers? We are trying to maximize the performance of our gigabit cards which do support 66MHz/64bit operation. We looked at the ASUS CUR-DLS but it appears that although it has 64bit pci slots, these run at 33MHz. Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010522/b8ae490b/attachment.bin From James.P.Lux at jpl.nasa.gov Tue May 22 16:16:19 2001 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems Message-ID: <00b701c0e315$37838b40$61064f89@cerulean.jpl.nasa.gov> For the large number of odd shaped holes you'd need (power supply and fan cutouts, etc.), I suspect that the CNC sheet metal fabricator will be more cost effective. It's not just the drill press you need, you also need the punches, etc. About $10K would set you up with the necessary tooling. Of course, if you have access to the necessary machines, the actual dies for the punch are fairly inexpensive. There would be, of course, the labor cost, but for those on a low $$ budget, often they have free-ish labor. I've been this route a number of times, and what's fairly straightforward to do 1 or 2 times, gets real tedious and timeconsuming when you have to do it 100 times. It's one thing to spend a day marking and cutting one chassis. It's another to spend 4 months doing 100. -----Original Message----- From: Robert G. Brown To: Bari Ari Cc: beowulf@beowulf.org Date: Tuesday, May 22, 2001 2:53 PM Subject: Re: 1U P4 Systems >On Tue, 22 May 2001, Bari Ari wrote: > >> You can have sheet metal 1U enclosures with CNC mounting holes made to >> order with aluminum front panels for under $100 US in low volume(100 >> pcs.). If you are willing to drill your own mounting holes you can get >> off the shelf 1U enclosures for around $50 quan. 1. > >Having things made is painful, but I've got a drill. Several drills, >actually. Besides, I could buy a heavy-duty drill press for what I'd >save on any significant number of nodes. > >So, how do I go about making a 1U enclosure out of a OTS case? Do you >have any specific cases that you care to recommend that fit (even >approximately) the size spec? Did you do this yourself? Do you have >some (or can you prepare some) GIFs or JPEGs of the process or finished >product? > >Seriously, this would make a great chapter for the online beowulf >engineering book I'm working on. Send me the stuff, I'll organize it >and put it in a chapter... > > rgb > >> >> Bari >> >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > >-- >Robert G. Brown http://www.phy.duke.edu/~rgb/ >Duke University Dept. of Physics, Box 90305 >Durham, N.C. 27708-0305 >Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Tue May 22 16:56:51 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <00b701c0e315$37838b40$61064f89@cerulean.jpl.nasa.gov> Message-ID: hi Bari/Jim i know that this is a beowulf mailing list...but ... and yes... i'd be curious to see how many clusters are using 1U vs 2u vs 4U...etc..etc.. For Making ones own chassis... it is NOT that straight forward... there are always costs and *oops* that add up... that sometimes its cheaper to buy someone elses stuff even if its not quite exactly what is needed... - we tend to build customized 1U chassis per vendor specs... not many people can cut and drill the holes as needed... the $50 1U chassis are flimsy paper .... not worthy of being called a 1u chassis... the typical $250 1U chassis are okay...but those still twist so if its in a tack and hold held by the front mounting ears... those motherboards will twist... our 1U chassis is 48 gauge steel.. it wont bend like theirs does.. and there you cannot drill it either.... its extremely difficult for the ordinary hobbyist and yes...one-z-two-z is fine... but thats a hobby as opposed to building a 1U beowulf clusters ?? > > > >> You can have sheet metal 1U enclosures with CNC mounting holes made to > >> order with aluminum front panels for under $100 US in low volume(100 > >> pcs.). If you are willing to drill your own mounting holes you can get > >> off the shelf 1U enclosures for around $50 quan. 1. > > > >Having things made is painful, but I've got a drill. Several drills, > >actually. Besides, I could buy a heavy-duty drill press for what I'd > >save on any significant number of nodes. make that extreme pain... > >So, how do I go about making a 1U enclosure out of a OTS case? Do you > >have any specific cases that you care to recommend that fit (even > >approximately) the size spec? Did you do this yourself? Do you have > >some (or can you prepare some) GIFs or JPEGs of the process or finished > >product? > > > >Seriously, this would make a great chapter for the online beowulf > >engineering book I'm working on. Send me the stuff, I'll organize it > >and put it in a chapter... if you would like an evaluation 1U chassis... please let me know have fun al;vin From lindahl at conservativecomputer.com Tue May 22 17:03:30 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: <20010522184655.E3180@getafix.EraGen.com>; from cblack@eragen.com on Tue, May 22, 2001 at 06:46:55PM -0400 References: <20010522184655.E3180@getafix.EraGen.com> Message-ID: <20010522200330.A1425@wumpus.dhcp.fnal.gov> On Tue, May 22, 2001 at 06:46:55PM -0400, Chris Black wrote: > We have been looking into motherboards that provide 64-bit 66MHz > PCI slots and haven't had much luck. Myricom highly recommends the Serverworks chipsets, and most of the recently-built large Myrinet clusters followed their advice. Benchmarks show it's a pretty damn good PCI implementation. > Has anyone found any motherboards that can reliably do 66MHz/64bit > PCI bus transfers? We are trying to maximize the performance of > our gigabit cards which do support 66MHz/64bit operation. But for gigE, surely you've got software bandwidth limits far more important than the PCI limit? -- g From keithu at parl.clemson.edu Tue May 22 18:08:02 2001 From: keithu at parl.clemson.edu (Keith Underwood) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: <20010522200330.A1425@wumpus.dhcp.fnal.gov> Message-ID: > But for gigE, surely you've got software bandwidth limits far more > important than the PCI limit? > > -- g Hmmm... I thought I had heard good things out of the 2.4 kernel. I know there are some people reporting good things with jumbo frames. 32/33 PCI is only going to get you 1Gb/s no matter what you do... You could get the theoretical 2 Gb/s out of 64/66... Keith --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University From bari at onelabs.com Tue May 22 18:09:53 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems References: Message-ID: <3B0B0DE1.7090809@onelabs.com> alvin@Mail.Linux-Consulting.com wrote: > > hi Bari/Jim > > i know that this is a beowulf mailing list...but ... > > and yes... i'd be curious to see how many clusters are > using 1U vs 2u vs 4U...etc..etc.. > > For Making ones own chassis... it is NOT that straight forward... > there are always costs and *oops* that add up... that sometimes > its cheaper to buy someone elses stuff even if its not quite > exactly what is needed... > - we tend to build customized 1U chassis per vendor > specs... > > not many people can cut and drill the holes as needed... > > the $50 1U chassis are flimsy paper .... not worthy of being > called a 1u chassis... We use aluminum extrusions for the side, front and rear panels.... far from flimsy. If your going to build more than 10 or 20 enclosures at a time just about any CNC sheet metal shop can stamp out an enclosure for you for under $100. That's why the typical 1U enclosure sells for $250 at the dealer. 100% is a nice profit margin. > > the typical $250 1U chassis are okay...but those still twist > so if its in a tack and hold held by the front mounting ears... > those motherboards will twist... > > our 1U chassis is 48 gauge steel.. it wont bend like theirs does.. > and there you cannot drill it either.... its extremely difficult > for the ordinary hobbyist 48 gauge steel!? 30 gauge is only .0120"..... 48 gauge would be down around .003" -.005".... about the thickness of aluminum foil. 18 gauge .0478" is pretty typical for rack mount enclosure panels with a 10 gauge aluminum front panel for rack mounting. You can order extrusions for side panels from many sources like Wakefield. Bari From bob at drzyzgula.org Tue May 22 18:50:56 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: References: <20010522131528.A31226@stikine.ucs.sfu.ca> Message-ID: <20010522215056.A29259@drzyzgula.org> On Tue, May 22, 2001 at 05:47:52PM -0400, Robert G. Brown wrote: > > Martin, I'd have to echo this frustration. Pretty much all the XU cases > I've found are more than $200, some quite a bit more. Then you've got > to buy the rack. Compared to $50-60 for a standard mid-tower case this > is painful beyond measure when buying in volume, especially when the > total node cost might only be $600-750 outside of the case. Robert, First, I'll say that I believe that the #1 reason why rack-mount cases cost this much is that they sell at that price, in sufficient numbers so as to make a comfortable profit for the case manufacturers. If everyone would just stop buying them at that price, we might see some improvement. :-) Then again, this might just make them *more* expensive, depending. :-( This does beg the question of why low-cost competition does not yet exist. I'd guess that problem is partly in the sales volumes vs. engineering NRCs. But it may also partly be in the the mechanicals -- a 2U, rack-mount chassis needs to have a skin that is much more rigid than that required for a minitower -- a minitower gets a lot of its rigidity from the internal framing structure, which in turn can get its rigidity from being stamped and bent into weight-efficient but volume-inefficient support braces and trusses. This is much harder to do in a flat box like a 2U chassis. Consider two chassis, a 3.5"x24"x17" rackmount and a 17"x18"x8" minitower (both height x depth x width). The rackmount has a volume of 1428 i^3 and a surface area of 1103 i^2, while the minitower has a volume of 2448 i^3 and a surface area of 1172 i^3. The rackmount's cover and base, flat surfaces which need to be supported and kept flat, are 408 i^2, with a 29" diagonal, while the comparable surfaces in the minitower are 144 i^2 with a diagonal of 19". The rackmount has a broader expanse of metal to be kept flat at the same time that it has less internal space to use up doing it. It's probably also partly because of the power supply constraints -- a typical, hyper-mass-produced ATX power supply is too tall to fit in a 2U chassis, and a 300W power supply for a 1U chassis will be significantly harder to do; think about the big electrolytics and transformers which have to somehow be fit into a space no more than 1.25" tall (leaving 0.5" for sheet metal and air). I just took apart a dead 235W Sparkle supply and there's got to be a dozen components and subassemblies in there that are more than 1.25" tall; several are more than 2" tall. There's electrolytics, heat sinks, circuit subassemblies, transformers and a big old toroid which would all have to be re-engineered or at least re-mounted horizontally, sucking up precious board surface area, and that's just in a 235W supply. > I regretfully opted for shelves of midtowers for my latest effort(s) -- > one can achieve something like (within a factor of 2, surely) of the > packing of 2U cases -- I can fairly easily get 16 nodes in a 1-1.5 m^2 > floorspace footprint (depending on how you like to count the gaps > between units) without going over two nodes high and could double that > with tall shelves. This does a good job of fitting the requirements of an ad-hoc or perhaps a lab environment, but in a production data center environment these can be decidedly sub-optimal. It is relatively difficult to get good cooling for hundreds of devices in a small area in open-frame racks like this. At my office we have about thirty-five 19" racks in four rows, bolted side-to-side with panels on the ends of each row, and doors on the front and backs of each rack. (This is just for my group -- there are well over a hundred other racks in this data center.) They sit on a raised floor with positive plenum pressure into the base of the rack, fed by recirculating chillers and assisted by fan trays in the tops and bottoms of the racks. Even in this environment we run into heat problems when the system density gets high. (FWIW, what we are short on is chiller capacity; some of our racks are gulpling for chilled air.) > It's not as pretty and it is a bit more work to put together neatly, but > I still assembled a 16 node cluster in about four hours total work > including assembling the heavy duty steel shelving. The shelving itself > cost only $50 total (plus another $50 or so for cable ties and surge > protectors and miscellany to make it look nice) saving me about $2500. > That's serious money for six hours of work (allowing for the time to > drive to Home Depot and buy the shelving:-) -- 3 nodes worth of money. > I'll make the time back in the first DAY of fulltime operation. FWIW, at home I use one of those NSF-certified wire racks, the kind where you can tie wrap everything to death. If you have access to a Costco Warehouse store, see if they have these in stock. At the one near me (Gaithersburg, MD), they carry a setup with two-section 6' poles, four 18"x48" shelves with centerline bracing, 5" back fences for each shelf (great for tie-wrapping cable trunks), locking 4" hard rubber casters and leveling glides to use if you don't want to use the casters. These take about ten minutes to put together if you've done it a few times and know the tricks. You can easily fit as many as twenty minitowers on one of these, and wheel it around when you're done. For this they want $77. Talk about the advantages of volume production... http://www.costco.com/frameset.asp?trg=product%2Easp&catid=114&subid=858&hierid=1090&prdid=10002618&log= > Case vendors take note: there is some insanity here. Perhaps a 1U case > is tricky, but a 2U case is pretty nearly a desktop case with rack > mounts on the side. In all cases they are basically sheetmetal boxes > with a motherboard mount, a power supply, and a place to hang a disk and > a place to poke a card or three out. No way that 4x is a reasonable > multiplier for the retail cost -- it just holds down your volume in the > beowulf and server market. Not so sure about this. See above. > I'd like to see 1U or 2U cases selling for a competitive $50-75 (where > the margin allows for a bit of extra cooling for the 1U cases). I think that the production volumes would have to go up dramatically for this to happen, and even then there will probably always be a significant price premium because most customers will be businesses, not individual consumers, and businesses will in fact pay more for something like this. What the market will bear and all that. > But I don't think I'm going to anytime soon...:-( Probably not. --Bob From math at velocet.ca Tue May 22 19:10:06 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: ; from rgb@phy.duke.edu on Tue, May 22, 2001 at 05:47:52PM -0400 References: <20010522131528.A31226@stikine.ucs.sfu.ca> Message-ID: <20010522221006.A17399@velocet.ca> On Tue, May 22, 2001 at 05:47:52PM -0400, Robert G. Brown's all... > > 1) I don't want to spend $200 US ($300 CDN) or more on a case. > > Martin, I'd have to echo this frustration. Pretty much all the XU cases > I've found are more than $200, some quite a bit more. Then you've got > to buy the rack. Compared to $50-60 for a standard mid-tower case this > is painful beyond measure when buying in volume, especially when the > total node cost might only be $600-750 outside of the case. Anyone build custom cabinets for their clusters? I am wondering if there are any pointers for it. We've hooked up with a sheet metal/aluminum guy and he is going to be able to house our entire cluster of 40-50 machines for somewhere around $2000. Thats around $40/box ($CDN) which is quite cheap, in the same range as the cheapest cases, but in a much smaller space. (It may well even be cheaper, we've budgeted $2000). Space is a concern for us, we need to keep things down to a very small footprint. Using commodity PC cases are just way too large, and 1U or 2U rackmounts are just way too expensive. Seems everyone is building clusters out of full PCs (hardrives, cases, power supplies) - just wondering if anyone has gone and removed any of these items... ie diskless clusters, custom cabinet, and even more rare, custom power. /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From alvin at Mail.Linux-Consulting.com Tue May 22 19:15:47 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <3B0B0DE1.7090809@onelabs.com> Message-ID: hi ya bari... sorry... maybe i have my head on backwards..probably... ( it is 0.048" think steel ) c ya alvin -- sheet metal shops around here make maybe 30% markup... not 100%.... > you for under $100. That's why the typical 1U enclosure sells for $250 > at the dealer. 100% is a nice profit margin. > > > > > the typical $250 1U chassis are okay...but those still twist > > so if its in a tack and hold held by the front mounting ears... > > those motherboards will twist... > > > > our 1U chassis is 48 gauge steel.. it wont bend like theirs does.. > > and there you cannot drill it either.... its extremely difficult > > for the ordinary hobbyist > > 48 gauge steel!? 30 gauge is only .0120"..... 48 gauge would be down > around .003" -.005".... about the thickness of aluminum foil. 18 gauge > .0478" is pretty typical for rack mount enclosure panels with a 10 gauge > aluminum front panel for rack mounting. You can order extrusions for > side panels from many sources like Wakefield. > > > Bari > > From math at velocet.ca Tue May 22 19:41:57 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <20010522215056.A29259@drzyzgula.org>; from bob@drzyzgula.org on Tue, May 22, 2001 at 09:50:56PM -0400 References: <20010522131528.A31226@stikine.ucs.sfu.ca> <20010522215056.A29259@drzyzgula.org> Message-ID: <20010522224157.D17399@velocet.ca> On Tue, May 22, 2001 at 09:50:56PM -0400, Bob Drzyzgula's all... > > I regretfully opted for shelves of midtowers for my latest effort(s) -- > > one can achieve something like (within a factor of 2, surely) of the > > packing of 2U cases -- I can fairly easily get 16 nodes in a 1-1.5 m^2 > > floorspace footprint (depending on how you like to count the gaps > > between units) without going over two nodes high and could double that > > with tall shelves. the standard floor tile is the decidedly non metric 2'x2', which is 61x61cm, or .372m^2. So you're talking about some 2.7 to 4.0 floortiles total for 16 machines (ie 6 to 4 machines per tile). This is insanely low density for a real machine room. If anyone has any contraints for floor space, even 12 machines per tile (your 'double' figure), which is directly 12 machines per 19" rack (24" on outside frame). In 44 to 48U (1.75" per U) standard racks, you can usually fit 11 or 12 standard huge klunky 4U cases. This matches your best figure. With 2U cases, its double, with 1U, its quadruple (48 machines). The cost effectiveness of standard towercases cannot be denied, if you are not operating in an environmentally-controlled server room raised floor environment. Those cost a fair bit to install and maintain. If this use or construction of space is factored into your cluster cost, its going to add up. An old empty gymnasium full of towercases is hard to deny is the best savings, but if you are only being alloted a small amount of space in the machine room, you cant go that way. THis is why I asked in the previous post about custom cabinest, abandoning the high cost of standard rack mount relay racks and cabinets (which are incredibly overpriced as are the 1U and 2U cases, IMHO). > This does a good job of fitting the requirements > of an ad-hoc or perhaps a lab environment, but in a > production data center environment these can be decidedly > sub-optimal. It is relatively difficult to get good cooling exactly :) [ just putting numbers to your comment based on the original poster's figures] > system density gets high. (FWIW, what we are short on is > chiller capacity; some of our racks are gulpling for > chilled air.) What do you find the required capacity of the chiller to watts of power sucked down by the machines has to be? I've heard that its harder to get better than 1.4:1 with incredibly good design, and closer to 1.75 or 2:1 is more typical. (just think, thats 3 watts of power per 'watt-computing'. Man, this must just make Feynman roll in his grave at the disgustingly inefficient thermodynamics of the situation! What are we working at here, factors of 20 magnitudes of heat production per joule spent processing bits? :) > > It's not as pretty and it is a bit more work to put together neatly, but > > I still assembled a 16 node cluster in about four hours total work > > including assembling the heavy duty steel shelving. The shelving itself > > cost only $50 total (plus another $50 or so for cable ties and surge > > protectors and miscellany to make it look nice) saving me about $2500. > > That's serious money for six hours of work (allowing for the time to > > drive to Home Depot and buy the shelving:-) -- 3 nodes worth of money. > > I'll make the time back in the first DAY of fulltime operation. 16 nodes at $60 odd canadian (guestimate) per machine is ~$1000. We're looking at a custom cabinet for around $2000 for 40 or 50 nodes. And sides it will look *WAY* cooler than a row of standard pc cases. I need to confirm the quote with the metalworker, could go up to $3000 in fact, but thats still great economy. > want to use the casters. These take about ten minutes to > put together if you've done it a few times and know the > tricks. You can easily fit as many as twenty minitowers on > one of these, and wheel it around when you're done. For > this they want $77. Talk about the advantages of volume > production... Gotta have everything in standard cases to start with though! /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From bob at drzyzgula.org Tue May 22 19:42:15 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: References: <3B0AD3B5.1090905@onelabs.com> Message-ID: <20010522224215.B29259@drzyzgula.org> On Tue, May 22, 2001 at 05:52:36PM -0400, Robert G. Brown wrote: > On Tue, 22 May 2001, Bari Ari wrote: > > > You can have sheet metal 1U enclosures with CNC mounting holes made to > > order with aluminum front panels for under $100 US in low volume(100 > > pcs.). If you are willing to drill your own mounting holes you can get > > off the shelf 1U enclosures for around $50 quan. 1. > > Having things made is painful, but I've got a drill. Several drills, > actually. Besides, I could buy a heavy-duty drill press for what I'd > save on any significant number of nodes. We've done the occasional modification to cases, like when we've gotten a motherboard with a mounting hole in a place that the case doesn't (Hi, Steve! :-) Sure, we just pull out the cordless Milwaukee, the spring-loaded markng punch, the tap, some tapping lubricant and deburring tools; five minutes and you're good to go. But you wind up with metal shavings and oil in your case and on your bench, which is sort of a hassle to clean up, and doing it exactly repeatably is difficult. If you want to use a drill press, you'll probably want at least a 18 to 20" swing; these are monsters and at a minimum will probably have to be bolted to the floor to be used safely. You'll also probably want an arbor press, with an assortment of punches and dies. Maybe it would be best to get a verical mill. And with some stepping motors and some microcontrollers, you could... I'm a big believer in DIY, but is this really your personal comparative advantage? Is drilling and tapping holes the best use of your time? Heck, I know how much *fun* it is, but... :-) > So, how do I go about making a 1U enclosure out of a OTS case? Do you > have any specific cases that you care to recommend that fit (even > approximately) the size spec? Did you do this yourself? Do you have > some (or can you prepare some) GIFs or JPEGs of the process or finished > product? You can probably get pretty much anything you need to do this from Allied Electronics, http://www.alliedelec.com or similar places. Also check with EEM -- http://eem.com/ and you'll probably find dozens if not hundreds of suppliers; see section 1400: Cabinets, Enclosures, Racks & Chassis. --Bob From bari at onelabs.com Tue May 22 20:06:39 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <20010522131528.A31226@stikine.ucs.sfu.ca> <20010522221006.A17399@velocet.ca> Message-ID: <3B0B293F.3040708@onelabs.com> Velocet wrote: > On Tue, May 22, 2001 at 05:47:52PM -0400, Robert G. Brown's all... > > >>> 1) I don't want to spend $200 US ($300 CDN) or more on a case. >> >> Martin, I'd have to echo this frustration. Pretty much all the XU cases >> I've found are more than $200, some quite a bit more. Then you've got >> to buy the rack. Compared to $50-60 for a standard mid-tower case this >> is painful beyond measure when buying in volume, especially when the >> total node cost might only be $600-750 outside of the case. > > > Anyone build custom cabinets for their clusters? I am wondering if > there are any pointers for it. Look at using aluminum extrusions for side rails vs. sheet metal. They provide lots of strength and ridgidity plus the sheet metal panels slide right into the channels. We've hooked up with a sheet metal/aluminum > guy and he is going to be able to house our entire cluster of 40-50 machines > for somewhere around $2000. Thats around $40/box ($CDN) which is quite > cheap, in the same range as the cheapest cases, but in a much smaller > space. (It may well even be cheaper, we've budgeted $2000). > > Space is a concern for us, we need to keep things down to a very small > footprint. Using commodity PC cases are just way too large, and > 1U or 2U rackmounts are just way too expensive. > > Seems everyone is building clusters out of full PCs (hardrives, cases, > power supplies) - just wondering if anyone has gone and removed any > of these items... ie diskless clusters, custom cabinet, and even more > rare, custom power. > > /kc We don't do any cluster designs with full PCs, standard motherboards and power supplies. I enjoy reading this mail list about what people are doing with commodity PC parts and shelving from Home Depot. We design clusters down to the component level: multi-CPU motherboards, internetworking, cooling, power supplies, cabinets and LinuxBIOS. There is no reason that a 16 node cluster should take up anymore space than a 2U and a 64 node box should fit under a desk. Single P-III and K7 nodes have been built as small as 3.5" x 5" x 1.25" with a 20GB 2.5" HD, 256MB SDRAM and 10/100 Ethernet ... SiS 635s make it easy. Bari From pottle at lunabase.org Tue May 22 21:56:03 2001 From: pottle at lunabase.org (Sam Pottle) Date: Wed Nov 25 01:01:20 2009 Subject: Kickstart/DHCP In-Reply-To: <200104241600.MAA30093@blueraja.scyld.com> Message-ID: <200105230456.f4N4u3505928@lunabase.org> I have a question about using (Redhat 7.0) Kickstart to do the automagical headless install on my compute nodes. The boxes have two NICs apiece (for eventual channelbonding purposes), and when a node kickstarts off the floppy, the first thing it does is to ask which device to install from (eth0/eth1), at which point the installation stops dead because I'm not there to answer. How can I get the installer not to ask this question? This happens before any DHCP request is made, so putting things in the kickstart file (which is located on the head node) won't help, as the installer hasn't seen it yet. Sam Pottle pottle@lunabase.org From wsb at paralleldata.com Tue May 22 22:38:17 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) Message-ID: <3B0B4CC9.60B08195@paralleldata.com> Bari Ari wrote: > > There is no reason that a 16 node cluster should take up anymore space > than a 2U and a 64 node box should fit under a desk. Single P-III and K7 > nodes have been built as small as 3.5" x 5" x 1.25" with a 20GB 2.5" HD, > 256MB SDRAM and 10/100 Ethernet ... SiS 635s make it easy. > That pretty much excludes your solution for a P4 cluster doesn't it? Wes From wstan at localhostnl.demon.nl Wed May 23 01:05:00 2001 From: wstan at localhostnl.demon.nl (William Staniewicz) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <20010522224157.D17399@velocet.ca>; from math@velocet.ca on Tue, May 22, 2001 at 10:41:57PM -0400 References: <20010522131528.A31226@stikine.ucs.sfu.ca> <20010522215056.A29259@drzyzgula.org> <20010522224157.D17399@velocet.ca> Message-ID: <20010523080500.B1033@localhostnl.demon.nl> I was doing some research on setting up a home Beowulf and came across this site. I kind of like the way it deals with the case/space issue. http://www.clustercompute.com/ -Bill Amsterdam On Tue, May 22, 2001 at 10:41:57PM -0400, Velocet wrote: > > 16 nodes at $60 odd canadian (guestimate) per machine is ~$1000. We're > looking at a custom cabinet for around $2000 for 40 or 50 nodes. And > sides it will look *WAY* cooler than a row of standard pc cases. > I need to confirm the quote with the metalworker, could go up to > $3000 in fact, but thats still great economy. > From alvin at Mail.Linux-Consulting.com Tue May 22 23:31:08 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <20010522221006.A17399@velocet.ca> Message-ID: hi ya you can get "used" cabinets for $500... range... be sure the holes are pre-drilled vs needing the clips to mount your 1U into the cabinet be sure you have cooling fans for air circulation... either do cooling from side to side or down blowing up ... ( have good/clean bottom-side filters dont forget to add shiping/delivery charges tooo as these puppies weighs a ton... we see no reason why clusters cannot be in 1U chassis... instead of bulky mid-tower PCs... unless you need amd cpus than you might be in trouble with air flow in 1Us... 1U P3-800 is around $800-$900 in parts... typically 40 or 80 per cabinet... 120 1Us in cabinet ...is easily possible... 160 1Us is lots harder to do... Transmeta Crusoe servers is 24 servers in 3U ... ( if compute speed is not critical and ( that power or number of server is important ) have fun alvin http://www.Linux-1U.net/Racks -- see the cabinet section On Tue, 22 May 2001, Velocet wrote: > On Tue, May 22, 2001 at 05:47:52PM -0400, Robert G. Brown's all... > > > > 1) I don't want to spend $200 US ($300 CDN) or more on a case. > > > > Martin, I'd have to echo this frustration. Pretty much all the XU cases > > I've found are more than $200, some quite a bit more. Then you've got > > to buy the rack. Compared to $50-60 for a standard mid-tower case this > > is painful beyond measure when buying in volume, especially when the > > total node cost might only be $600-750 outside of the case. > > Anyone build custom cabinets for their clusters? I am wondering if > there are any pointers for it. We've hooked up with a sheet metal/aluminum > guy and he is going to be able to house our entire cluster of 40-50 machines > for somewhere around $2000. Thats around $40/box ($CDN) which is quite > cheap, in the same range as the cheapest cases, but in a much smaller > space. (It may well even be cheaper, we've budgeted $2000). > > Space is a concern for us, we need to keep things down to a very small > footprint. Using commodity PC cases are just way too large, and > 1U or 2U rackmounts are just way too expensive. > > Seems everyone is building clusters out of full PCs (hardrives, cases, > power supplies) - just wondering if anyone has gone and removed any > of these items... ie diskless clusters, custom cabinet, and even more > rare, custom power. > > /kc > -- > Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Tue May 22 23:42:24 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <20010523080500.B1033@localhostnl.demon.nl> Message-ID: hi william if you dont mind the "open frame design"... like at clusterCompute... - keep in mind.. one can get a flat sheet of metal and mount all the motherboard, disk, powersupply onto a piece of sheetmetal .... to create a custom very affordable/inexpensive 1U server rack - it has better airflow due to the huge fans in the cabinet.... than you can easily mount 4 independent server into one 1U cabinet shelf .... 160 1U servers in one cabinet - we can fit 3 independent server per shelf and, you can also do that the kingstar way...http://www.KingStarUSA.com ( they are 2U servers... p3 slot-1 CPU vs the flat cpu ( socket370 )... have fun alvin http://www.Linux-1U.net -- 500Gb 1U Raid5 ... On Wed, 23 May 2001, William Staniewicz wrote: > I was doing some research on setting up a home Beowulf > and came across this site. I kind of like the way > it deals with the case/space issue. > > http://www.clustercompute.com/ > > -Bill > Amsterdam From MAHRF at de.ibm.com Wed May 23 01:11:39 2001 From: MAHRF at de.ibm.com (MAHRF@de.ibm.com) Date: Wed Nov 25 01:01:20 2009 Subject: Node power supply Message-ID: Hey, thanks. It's just that it's difficult to find a cheap but also reliable casing with a good power supply of higher quality. I decided to play it safe and invest a bit more and buy a brand casing with 300W. Because usually you can't find the -real- specs of a power supply of what performance it offers at which current and so on. I don't want to make a big deal of it, it's not for production but I don't want to save money in the wrong place. Ferdinand |--------+-----------------------------> | | Joel Jaeggli | | | | | | | | | 22.05.01 18:34 | | | Please respond to | | | Joel Jaeggli | | | | |--------+-----------------------------> >-----------------------------------------------------------| | | | To: Ferdinand Mahr/Germany/IBM@IBMDE | | cc: beowulf@beowulf.org | | Subject: Re: Node power supply | | | | | >-----------------------------------------------------------| the rating for the powersupply is the peak load it can handle not how much it will actually draw, a 300watt powersupply with a shouldn't draw any more than a 250 under the same load... that being said amd hasn't certifed any powersupply rated under 300watts for the 1.3ghz athlons... http://www1.amd.com/athlon/npower/index/1,1712,,00.html joelja From MAHRF at de.ibm.com Wed May 23 01:51:46 2001 From: MAHRF at de.ibm.com (MAHRF@de.ibm.com) Date: Wed Nov 25 01:01:20 2009 Subject: Node power supply Message-ID: Aaaah, stupid Notes, seems it produced an empty message once again... Hope it works now. What I tried to reply: Hey, thanks. It's just that it's difficult to find a cheap but also reliable casing with a good power supply of higher quality. I decided to play it safe and invest a bit more and buy a brand casing with 300W. Because usually you can't find the -real- specs of a power supply of what performance it offers at which current and so on. I don't want to make a big deal of it, it's not for production but I don't want to save money in the wrong place. Ferdinand From marini at pcmenelao.mi.infn.it Wed May 23 02:41:48 2001 From: marini at pcmenelao.mi.infn.it (Franz Marini) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: Message-ID: On Tue, 22 May 2001, Keith Underwood wrote: > there are some people reporting good things with jumbo frames. 32/33 PCI > is only going to get you 1Gb/s no matter what you do... You could get the > theoretical 2 Gb/s out of 64/66... Uhm, something's wrong here... You have twice the width and twice the clock so you should go up to 4 Gb/s. Btw, theoretical peak rate should be 5.24 Gb/s (given 66 M * 8 bytes). Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. _,'| _.-''``-...___..--';) /_ \'. __..-' , ,--...--''' <\ .`--''' ` /' `-';' ; ; ; __...--'' ___...--_..' .;.' (,__....----''' (,..--'' email : marini@pcmenelao.mi.infn.it --------------------------------------------------------- From jpq at northwestern.edu Wed May 23 06:12:33 2001 From: jpq at northwestern.edu (John P. Quintana) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems Message-ID: > hi william > > if you dont mind the "open frame design"... > like at clusterCompute... > > - keep in mind.. one can get a flat sheet of > metal and mount all the motherboard, disk, powersupply > onto a piece of sheetmetal .... to create a custom > very affordable/inexpensive 1U server rack > > - it has better airflow due to the huge fans > in the cabinet.... > > > > > than you can easily mount 4 independent server into one 1U cabinet > shelf .... 160 1U servers in one cabinet > - we can fit 3 independent server per shelf > > and, you can also do that the kingstar way...http://www.KingStarUSA.com > ( they are 2U servers... p3 slot-1 CPU vs the flat cpu ( socket370 )... > >have fun >alvin > http://www.Linux-1U.net -- 500Gb 1U Raid5 ... I have been in touch with the guys at clustercompute.com and they are very happy with their system. In particular, I was worried about ground loops since they are using metal rods to hold everything together. We have been thinking about doing something similar. If you look at our current cluster http://www.dnd.aps.anl.gov/wulffnet/ you will see that we have had to put our cluster in a linear fashion across on a "big shelf". This is due to space restrictions and if we had to do it all over again, we wouldn't. The cabling issues get to be a real hassle since you are now dealing with a lot of 50 foot CAT 5, KVM switch cable etc... Our cluster is becoming rather popular to the point that it is hard for us to shut it off or to do development work since it is being used. So, I am planning on building another cluster (this time directly on our network so all Linux boxes can send jobs to it via PBS. Most of what we do is trivially parallyzable). This would be for quick (i.e. 10 minute or less) type jobs and for me to steal when I need to do something :). I think clustercompute really has a compact design for COTS hardware. No sheet metal between motherboards means that you can pack more boards together. The location of the powersupplies makes sense and what they really have is a 10 cpu module that can be replicated. In addition to being custom, the kingstarusa.com boards still has a sheet metal rack between motherboards which limits their density. In clustercomputes design they are limited by the headspace that they want above their cooling fan. Being inspired by clustercompute.com, and also a WWW site that I ran across a home based cluster a while ago (but can't find now) which showed the motherboards in a vertical configuration and held in place by plastic rails, we were going to try and put together a system like http://www.clustercompute.com but with rails holding the motherboards in a vertical position. 6" ATX power extenders are available so that we can disconnect the motherboard in place. Rather than gluing floppies to the motherboard, we already purchased Linksys cards from http://www.disklessworkstations.com with etherboot ROMS and we already use the Wake-On-Lan feature in our current cluster to turn the nodes on. (I also have a handfull of 4 MByte IDE Flash disks that could be used). So... if we are lucky, we won't have to build the box with the buttons and LEDS that clustercompute did (which was apparently a pain to build). I have been able to find ATX switches on the WWW for about $3.00 but there is a packaging issue. If I had to go this route, I might just buy a commercial Digital I/O board and place it in a "master node" to turn everything on and off and also do the monitoring. In some sense I agree with the arguments about overpriced racks. If we don't rackmount , we will probably use the Al extrusions and panels from http://www.8020.net. We have worked with their stuff in the past and while it isn't cheap, it does the job. We might need to use PCI riser cards to pack the ethernet cards closer, but we will try it without first if we decide to go this route. I haven't really seen this approach to much in building clusters so I was wondering if there were any good or bad comments people might have before we go buy a lot of stuff. Cheers, John -- John P.G. Quintana jpq@northwestern.edu Northwestern University Phone: 630-252-0221 DND-CAT FAX: 630-252-0226 Building 432/A008 http://www.dnd.aps.anl.gov 9700 South Cass Avenue Argonne, IL 60439 From rgb at phy.duke.edu Wed May 23 06:06:05 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems In-Reply-To: <00b701c0e315$37838b40$61064f89@cerulean.jpl.nasa.gov> Message-ID: > >On Tue, 22 May 2001, Bari Ari wrote: > > > >> You can have sheet metal 1U enclosures with CNC mounting holes made to > >> order with aluminum front panels for under $100 US in low volume(100 > >> pcs.). If you are willing to drill your own mounting holes you can get > >> off the shelf 1U enclosures for around $50 quan. 1. On Tue, 22 May 2001, Jim Lux wrote: > For the large number of odd shaped holes you'd need (power supply and fan > cutouts, etc.), I suspect that the CNC sheet metal fabricator will be more > cost effective. It's not just the drill press you need, you also need the > punches, etc. About $10K would set you up with the necessary tooling. Of > course, if you have access to the necessary machines, the actual dies for > the punch are fairly inexpensive. There would be, of course, the labor > cost, but for those on a low $$ budget, often they have free-ish labor. > > I've been this route a number of times, and what's fairly straightforward to > do 1 or 2 times, gets real tedious and timeconsuming when you have to do it > 100 times. It's one thing to spend a day marking and cutting one chassis. > It's another to spend 4 months doing 100. Ah. I may have misunderstood. I interpreted "off the shelf 1U enclosures" to mean equipped with a power supply and compartmentalized for installation, just like an off the shelf case or the $200 off the shelf 1U enclosure sold by case vendors. I also interpreted the "mounting holes" to be just the rack mounting holes, not all the holes into which things are to be fastened in the completely empty case. With that much hassle I might as well go with a filing cabinet design or buy off the shelf tower cases, take them apart, and remount the motherboard tray, drive cage, and power supply on shelving. My main whine is that one shouldn't have to do all this handiwork to build systems in racks when it is so much easier to do it in an assembly line. Case vendor margins on rackmount cases must be huge, because I cannot believe that there is THAT much economy of scale in the manufacturing process. I'll shut up now and just live with it. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From sp at scali.no Wed May 23 06:13:33 2001 From: sp at scali.no (Steffen Persvold) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) References: <20010522184655.E3180@getafix.EraGen.com> Message-ID: <3B0BB77D.44F248E6@scali.no> Chris Black wrote: > > We have been looking into motherboards that provide 64-bit 66MHz > PCI slots and haven't had much luck. We are now evaluating the > Intel Server Board STL2, but have had some problems with it in > terms of getting the IDE to work correctly with UltraDMA support. > Can anyone share any experience with these boards in terms of > problems and what needs to be done to get them to work? It seems > they are a bit finicky in respect to the type of memory they want > as well. > > Has anyone found any motherboards that can reliably do 66MHz/64bit > PCI bus transfers? We are trying to maximize the performance of > our gigabit cards which do support 66MHz/64bit operation. We looked > at the ASUS CUR-DLS but it appears that although it has 64bit pci > slots, these run at 33MHz. > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually quite nice. They both have onboard SCSI-3. The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the 370DLE is without SCSI. I haven't checked too much, but I believe all of these boards are cheaper than both the Intel and the ASUS boards. Regards, -- Steffen Persvold Systems Engineer Email : mailto:sp@scali.no Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway From bari at onelabs.com Wed May 23 06:43:48 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> Message-ID: <3B0BBE94.9010807@onelabs.com> W Bauske wrote: > Bari Ari wrote: > >> There is no reason that a 16 node cluster should take up anymore space >> than a 2U and a 64 node box should fit under a desk. Single P-III and K7 >> nodes have been built as small as 3.5" x 5" x 1.25" with a 20GB 2.5" HD, >> 256MB SDRAM and 10/100 Ethernet ... SiS 635s make it easy. >> > > > That pretty much excludes your solution for a P4 cluster doesn't it? > > Wes > We haven't seen much interest in the P4 for clustering due to the amount of heat vs MFLOPS and being stuck with Rambus for the near future. To pack P4s into a real tight space I'd probably rely heavily on heat pipes tied to a compressor and heat exchanger..... same for the Alphas and UltraSparcs. Bari From sgaudet at angstrommicro.com Wed May 23 07:18:11 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: <20010522184655.E3180@getafix.EraGen.com> References: <20010522184655.E3180@getafix.EraGen.com> Message-ID: <990627491.3b0bc6a3db0ee@localhost> Hello Chris, > We have been looking into motherboards that provide 64-bit 66MHz > PCI slots and haven't had much luck. We are now evaluating the > Intel Server Board STL2, but have had some problems with it in > terms of getting the IDE to work correctly with UltraDMA support. > Can anyone share any experience with these boards in terms of > problems and what needs to be done to get them to work? It seems > they are a bit finicky in respect to the type of memory they want > as well. > > Has anyone found any motherboards that can reliably do 66MHz/64bit > PCI bus transfers? We are trying to maximize the performance of > our gigabit cards which do support 66MHz/64bit operation. We looked > at the ASUS CUR-DLS but it appears that although it has 64bit pci > slots, these run at 33MHz. Try Supermicro, we built a cluster using the 6010H with the 370DER motherboard and Myrinet's PCI64 with great success. http://www.supermicro.com/PRODUCT/SUPERServer/SUPER%20SERVER%206010%20Server.htm Cheers, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From keithu at parl.clemson.edu Wed May 23 08:01:52 2001 From: keithu at parl.clemson.edu (Keith Underwood) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: Message-ID: Um, theoretical 2 Gb/s out of a Gigabit Ethernet card (full-duplex Gigabit Ethernet). On Wed, 23 May 2001, Franz Marini wrote: > On Tue, 22 May 2001, Keith Underwood wrote: > > > there are some people reporting good things with jumbo frames. 32/33 PCI > > is only going to get you 1Gb/s no matter what you do... You could get the > > theoretical 2 Gb/s out of 64/66... > > Uhm, something's wrong here... You have twice the width and twice the > clock so you should go up to 4 Gb/s. Btw, theoretical peak rate should be > 5.24 Gb/s (given 66 M * 8 bytes). > > Franz > > --------------------------------------------------------- > Franz Marini > Sys Admin and Software Analyst, > Dept. of Physics, University of Milan, Italy. > > _,'| _.-''``-...___..--';) > /_ \'. __..-' , ,--...--''' > <\ .`--''' ` /' > `-';' ; ; ; > __...--'' ___...--_..' .;.' > (,__....----''' (,..--'' > > email : marini@pcmenelao.mi.infn.it > --------------------------------------------------------- > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University From bari at onelabs.com Wed May 23 09:40:28 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> Message-ID: <3B0BE7FC.1060604@onelabs.com> W Bauske wrote: > Bari Ari wrote: > >> There is no reason that a 16 node cluster should take up anymore space >> than a 2U and a 64 node box should fit under a desk. Single P-III and K7 >> nodes have been built as small as 3.5" x 5" x 1.25" with a 20GB 2.5" HD, >> 256MB SDRAM and 10/100 Ethernet ... SiS 635s make it easy. >> > > > That pretty much excludes your solution for a P4 cluster doesn't it? > > Wes > I did a quick compute of the P4 thermal design and it would be possible to put 8 P4s into a 1U if you could use the entire surface area of the top of the enclosure as a heat sink along with forced air cooling in the order of 400 - 800 LFM with a maximum ambient inlet air temp of 30 deg C. You could mount all the P4's on the top of the mainboard contacting the top of the CPU case via a low resistance thermal joint compound. The top cover would need low profile extruded fins (maybe .5" h spaced 3/8" apart to increase the cooling surface area) and then force air through a 1/2" gap (top of heatsink fins to bottom of the enclosure stacked on top with 1" actual air/fin space) between the stacks of enclosures . You would lose about 1/2 of the enclosures internal height so you'd have to use 2.5" hard drives and mount memory DIMMS at angles or perpendicular to the mainboard. So a 1U enclosure like this would only have a .75" internal height plus a .5" heatsink (as a top cover) and allowing .5" gap between enclosures. This would still allow for the standard 1.75" per 1U requirement and you'd have a neat hair dryer at the same time. Bari From rgb at phy.duke.edu Wed May 23 09:58:35 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:20 2009 Subject: Power supplies (addendum) Message-ID: Dear List, Mark Hahn and I have been continuing the PS discussion offline, as I was a decade or so off in assuming that PC power supplies have big transformers. The incoming voltage is converted to high frequency and stepped down with relatively small transformers before being cleaned up and regulated and delivered. My bad. A useful link on power supplies: http://www.howstuffworks.com/power-supply.htm One thing that came up in our discussion that I hadn't really thought about but that is obvious in retrospect is that a "300 Watt" ATX supply doesn't deliver 300 Watts to any particular component in a computer -- this is the (approximate) aggregate of the power delivered on the various voltages provided by the supply, with the bulk of the power delivered to the +5V and +12V lines. A typical 250 W supply might deliver about 120 W to each of these and a handful to the other lines. Thus the motherboard and electronics in a computer with even a 300 W power supply might only be able to draw 150 W total, leaving 140 or so to run the peripherals on the 12V line. In some sense this is way more than most beowulf node designs "need" for the peripheral supply -- a diskless design or design with only one disk might need only 30-50W to run the disk(s) and cooling fans (a useful table of typical power requirements is on the article linked above). However, the requirements of the motherboard are not so flexible. Memory, CPU, the motherboard itself, all of these eat energy to run and 150W could very easily be needed to support a full memory configuration. You therefore might want to look at the details of how any given supply distributes and delivers power -- this is probably what gets a power supply "certified" by a CPU vendor more than the absolute wattage. The site above reiterates my earlier post that suggested that it is a bad idea in general to operate a power supply at 100% of capacity. Power supplies do indeed get hot in operation and contain large heat sinks buffering electronics and they get unhappy if overheated. A large supply has more thermal capacity than a smaller one. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From wsb at paralleldata.com Wed May 23 10:32:15 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> Message-ID: <3B0BF41F.91B0ED6F@paralleldata.com> Bari Ari wrote: > > W Bauske wrote: I personally don't give a flip about 1U setups. I prefer 2U to allow breathing room for the components. However, your layout seems problematic. By that I mean physically mounting 8 2.5in drives, 8 1U power supplies or one exotic one with 8 connections, 8 LAN connections, and enough memory to keep the P4's occupied seems unlikely to fit into a "normal" 1U slot, say 17in by 28in. Are you assuming a different chassis length? Or, are you assuming something else about the layout I don't see? Using a single mainboard would, I think, require multiple power connections due to the P4's current draw. In case you haven't looked, a P4 uses 3 power connectors on it's M/B. These sorts of custom setups would probably preclude the use of GEnet and Myricom I would think. Wes > > > Bari Ari wrote: > > > >> There is no reason that a 16 node cluster should take up anymore space > >> than a 2U and a 64 node box should fit under a desk. Single P-III and K7 > >> nodes have been built as small as 3.5" x 5" x 1.25" with a 20GB 2.5" HD, > >> 256MB SDRAM and 10/100 Ethernet ... SiS 635s make it easy. > >> > > > > > > That pretty much excludes your solution for a P4 cluster doesn't it? > > > > Wes > > > > I did a quick compute of the P4 thermal design and it would be possible > to put 8 P4s into a 1U if you could use the entire surface area of the > top of the enclosure as a heat sink along with forced air cooling in the > order of 400 - 800 LFM with a maximum ambient inlet air temp of 30 deg > C. You could mount all the P4's on the top of the mainboard contacting > the top of the CPU case via a low resistance thermal joint compound. The > top cover would need low profile extruded fins (maybe .5" h spaced 3/8" > apart to increase the cooling surface area) and then force air through a > 1/2" gap (top of heatsink fins to bottom of the enclosure stacked on top > with 1" actual air/fin space) between the stacks of enclosures . You > would lose about 1/2 of the enclosures internal height so you'd have to > use 2.5" hard drives and mount memory DIMMS at angles or perpendicular > to the mainboard. So a 1U enclosure like this would only have a .75" > internal height plus a .5" heatsink (as a top cover) and allowing .5" > gap between enclosures. This would still allow for the standard 1.75" > per 1U requirement and you'd have a neat hair dryer at the same time. > > Bari From cblack at eragen.com Wed May 23 10:16:59 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: <3B0BB77D.44F248E6@scali.no>; from sp@scali.no on Wed, May 23, 2001 at 03:13:33PM +0200 References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> Message-ID: <20010523131659.B7870@getafix.EraGen.com> On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > Chris Black wrote: > > > > We have been looking into motherboards that provide 64-bit 66MHz > > PCI slots and haven't had much luck. We are now evaluating the [stuff deleted] > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > quite nice. They both have onboard SCSI-3. > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > 370DLE is without SCSI. > > I haven't checked too much, but I believe all of these boards are > cheaper than both the Intel and the ASUS boards. Have you or anyone used the onboard IDE on these motherboards? The person working with the Intel serverworks board seems to be having trouble getting IDE working in ultradma mode. Also, do any of these boards have onboard video/ethernet? Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010523/a80ad891/attachment.bin From Josh.Puckett at am1.ericsson.se Wed May 23 10:27:02 2001 From: Josh.Puckett at am1.ericsson.se (Josh Puckett (EUS)) Date: Wed Nov 25 01:01:20 2009 Subject: Swap file in scyld Message-ID: <1F630DA3C567D411AA4F00508B693BC6C9AF9E@eamrtnt702.rtp.ericsson.se> OK, this if my first post here, as you can probably tell. I am working on a beowulf cluster of some older machines we have in our lab. I have ran in to a small problem. We have 5 machines with 32mb of ram, and I wanted to use those as well as the others I have. However, using a default Scyld install you need something like 60mb of ram. No problem, I thought a swap file would fix the problem. I un-commented the swap line in the fstab, and left it as 40960 size. However the node still stops as its "copying libraries", exactly the same as before, when the 32mb of ram is full. 1MB of the swap file is supposedly used, which seems strange, because one would think none of it has been used. Any ideas on how to get the swap file going? thank you Josh P -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010523/9ae6a847/attachment.html From lusk at mcs.anl.gov Wed May 23 10:32:33 2001 From: lusk at mcs.anl.gov (Rusty Lusk) Date: Wed Nov 25 01:01:20 2009 Subject: p4_error: net_recv read: In-Reply-To: Message from Gerard Gorman of "Mon, 14 May 2001 12:48:53 BST." <3AFFC625.221F7D8E@ic.ac.uk> Message-ID: <200105231732.MAA17504@mcs.anl.gov> | I'm having this problem on our cluster while using MPICH1.2 (alphas | running osf1 connected via a switch): | | rm_l_5_3390: p4_error: net_recv read: probable EOF on socket: 1 Please start by getting the current release, which is 1.2.1. If the problem persists please send your problem report to mpi-bugs@mcs.anl.gov. Regards, Rusty Lusk From bari at onelabs.com Wed May 23 10:40:52 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: Power supplies (addendum) References: Message-ID: <3B0BF624.205@onelabs.com> Robert G. Brown wrote: > Dear List, > > Mark Hahn and I have been continuing the PS discussion offline, as I was > a decade or so off in assuming that PC power supplies have big > transformers. The incoming voltage is converted to high frequency and > stepped down with relatively small transformers before being cleaned up > and regulated and delivered. My bad. A useful link on power supplies: > > http://www.howstuffworks.com/power-supply.htm > > One thing that came up in our discussion that I hadn't really thought > about but that is obvious in retrospect is that a "300 Watt" ATX supply > doesn't deliver 300 Watts to any particular component in a computer -- > this is the (approximate) aggregate of the power delivered on the > various voltages provided by the supply, with the bulk of the power > delivered to the +5V and +12V lines. A typical 250 W supply might > deliver about 120 W to each of these and a handful to the other lines. > > Thus the motherboard and electronics in a computer with even a 300 W > power supply might only be able to draw 150 W total, leaving 140 or so > to run the peripherals on the 12V line. > If you use only 2.5" hard drives you can then eliminate the +12VDC supply altogether. The -5 and -12 aren't handy for much in a cluster either unless you're building a cluster for audio or multimedia :-) and even then you can generate these onboard from the +5. This leaves you with just +3VDC and +5VDC as a requirement. > In some sense this is way more than most beowulf node designs "need" for > the peripheral supply -- a diskless design or design with only one disk > might need only 30-50W to run the disk(s) and cooling fans (a useful > table of typical power requirements is on the article linked above). > However, the requirements of the motherboard are not so flexible. > Memory, CPU, the motherboard itself, all of these eat energy to run and > 150W could very easily be needed to support a full memory configuration. > > You therefore might want to look at the details of how any given supply > distributes and delivers power -- this is probably what gets a power > supply "certified" by a CPU vendor more than the absolute wattage. The > site above reiterates my earlier post that suggested that it is a bad > idea in general to operate a power supply at 100% of capacity. Power > supplies do indeed get hot in operation and contain large heat sinks > buffering electronics and they get unhappy if overheated. A large > supply has more thermal capacity than a smaller one. > > rgb If you wish to use one large supply to power multiple boards the issues you run into are isolating the overcurrent fault protection, supply distribution and power management. You don't want one board causing a current overload to bring down the whole cluster. You also need to account for voltage drop and regulation between the supply and each node. Most cluster users probably won't care much about power down or sleep modes but if you do then you have to have isolated supply switching between nodes and 5VSB and 3VSB for stand-by. Bari From agrajag at scyld.com Wed May 23 10:55:17 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:20 2009 Subject: Swap file in scyld In-Reply-To: <1F630DA3C567D411AA4F00508B693BC6C9AF9E@eamrtnt702.rtp.ericsson.se>; from Josh.Puckett@am1.ericsson.se on Wed, May 23, 2001 at 12:27:02PM -0500 References: <1F630DA3C567D411AA4F00508B693BC6C9AF9E@eamrtnt702.rtp.ericsson.se> Message-ID: <20010523135517.A26265@blueraja.scyld.com> On Wed, 23 May 2001, Josh Puckett (EUS) wrote: > OK, this if my first post here, as you can probably tell. I am working on a beowulf cluster of some older machines we have in our lab. I have ran in to a small problem. We have 5 machines with 32mb of ram, and I wanted to use those as well as the others I have. However, using a default Scyld install you need something like 60mb of ram. No problem, I thought a swap file would fix the problem. I un-commented the swap line in the fstab, and left it as 40960 size. However the node still stops as its "copying libraries", exactly the same as before, when the 32mb of ram is full. 1MB of the swap file is supposedly used, which seems strange, because one would think none of it has been used. Any ideas on how to get the swap file going? What is happening is by default, the Scyld system uses a ramdisk as / on the slave nodes. This ramdisk is filling up. If you look at the install guide (http://www.scyld.com/support/docs/beoinstall.html) there are instructions on how to partition the harddrives on slave nodes. I'd suggest you do this, then modify the fstab for the slave nodes so that / is actually a harddrive partition and not a ramdisk. Then comment the line for the ramdisk. This will solve your problem. As far as swap, your swap is most likely working, however ramdisks never get swapped out, so using swap won't solve this particular problem. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010523/744ae2df/attachment.bin From sp at scali.no Wed May 23 10:47:57 2001 From: sp at scali.no (Steffen Persvold) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> Message-ID: <3B0BF7CD.7980EBA2@scali.no> Chris Black wrote: > > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > Chris Black wrote: > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > PCI slots and haven't had much luck. We are now evaluating the > [stuff deleted] > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > > quite nice. They both have onboard SCSI-3. > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > > 370DLE is without SCSI. > > > > I haven't checked too much, but I believe all of these boards are > > cheaper than both the Intel and the ASUS boards. > > Have you or anyone used the onboard IDE on these motherboards? > The person working with the Intel serverworks board seems to be > having trouble getting IDE working in ultradma mode. Also, do > any of these boards have onboard video/ethernet? > I'm sorry but I haven't used IDE on these mobos for disks, only CDROM. As for onboard VGA/ethernet, the 370DER has an onboard ATI Rage XL 8MB card and dual Intel 82557 ethernet controllers. The 370DE6 has an AGP slot and a onboard 82557 ethernet controller. Regards, -- Steffen Persvold Systems Engineer Email : mailto:sp@scali.no Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway From agrajag at scyld.com Wed May 23 11:02:53 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:20 2009 Subject: Kickstart/DHCP In-Reply-To: <200105230456.f4N4u3505928@lunabase.org>; from pottle@lunabase.org on Tue, May 22, 2001 at 09:56:03PM -0700 References: <200104241600.MAA30093@blueraja.scyld.com> <200105230456.f4N4u3505928@lunabase.org> Message-ID: <20010523140253.B26265@blueraja.scyld.com> On Tue, 22 May 2001, Sam Pottle wrote: > I have a question about using (Redhat 7.0) Kickstart to do the automagical > headless install on my compute nodes. The boxes have two NICs apiece (for > eventual channelbonding purposes), and when a node kickstarts off the floppy, > the first thing it does is to ask which device to install from (eth0/eth1), > at which point the installation stops dead because I'm not there to answer. > > How can I get the installer not to ask this question? This happens before > any DHCP request is made, so putting things in the kickstart file (which is > located on the head node) won't help, as the installer hasn't seen it yet. The reference manual for Red Hat Linux 7.0 doesn't show anything about this. However, the people on kickstart-list@redhat.com might have some other ideas. Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010523/08ca0f7d/attachment.bin From wsb at paralleldata.com Wed May 23 11:39:51 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> <3B0BF7CD.7980EBA2@scali.no> Message-ID: <3B0C03F7.B762614A@paralleldata.com> Steffen Persvold wrote: > > Chris Black wrote: > > > > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > > Chris Black wrote: > > > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > > PCI slots and haven't had much luck. We are now evaluating the > > [stuff deleted] > > > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > > > quite nice. They both have onboard SCSI-3. > > > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > > > 370DLE is without SCSI. > > > > > > I haven't checked too much, but I believe all of these boards are > > > cheaper than both the Intel and the ASUS boards. > > > > Have you or anyone used the onboard IDE on these motherboards? > > The person working with the Intel serverworks board seems to be > > having trouble getting IDE working in ultradma mode. Also, do > > any of these boards have onboard video/ethernet? > > > I'm sorry but I haven't used IDE on these mobos for disks, only CDROM. > As for onboard VGA/ethernet, the 370DER has an onboard ATI Rage XL 8MB > card and dual Intel 82557 ethernet controllers. The 370DE6 has an AGP > slot and a onboard 82557 ethernet controller. > I have a 370DLE that does: [root@wsb50 /root]# hdparm -tT /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 0.70 seconds =182.86 MB/sec Timing buffered disk reads: 64 MB in 3.50 seconds = 18.29 MB/sec Wes From bari at onelabs.com Wed May 23 11:18:03 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> <3B0BF41F.91B0ED6F@paralleldata.com> Message-ID: <3B0BFEDB.3080805@onelabs.com> W Bauske wrote: > Bari Ari wrote: > >> W Bauske wrote: > > > I personally don't give a flip about 1U setups. I prefer 2U to allow > breathing room for the components. However, your layout seems > problematic. By that I mean physically mounting 8 2.5in drives, 8 > 1U power supplies or one exotic one with 8 connections, 8 LAN connections, > and enough memory to keep the P4's occupied seems unlikely to fit into > a "normal" 1U slot, say 17in by 28in. Are you assuming a different chassis > length? I only figured for 16" x 16" x 1.25" for the enclosure and a another few inches in depth for fans and ducts to achieve the 800LFM since a P4 + chipset + memory + networking can easily be had in 40 sq. in. 16" x 16" gives you 512 sq. in. of total area for the two sides. I figured using a 48VDC bus running between boards and regulating the +3V and +5V along with the usual Core voltages onboard much the same as is done in telecom apps. Or, are you assuming something else about the layout I don't > see? Using a single mainboard would, I think, require multiple power > connections due to the P4's current draw. In case you haven't looked, > a P4 uses 3 power connectors on it's M/B. These sorts of custom setups > would probably preclude the use of GEnet and Myricom I would think. > Depending on the chipset used 10/100 Ethernet would be included. GB Ethernet, Myrinet and SCI only add a few square inches each to the board space required. Increasing the enclosure size to 17" x 28" would give you plenty of space (952 sq. in) for these as well as a multi G-bit switch, plus you'd have much more surface area for cooling so the required LFM across the enclosure would drop or you could operate in a higher ambient temp. environment. I really don't see P4's for dense clusters. ULV PIIIs and Athlon4 with SMP makes much more sense. IA-64 with SMP will probably come out ahead in MFOPLS per watt and $$. We're working with parts now that offer 160 MFLOPS per watt vs. 20 MFLOPS per watt on the P4. Fixed point processors are down to 1 watt per 1000 Mips. Bari From wsb at paralleldata.com Wed May 23 12:02:21 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> <3B0BF41F.91B0ED6F@paralleldata.com> <3B0BFEDB.3080805@onelabs.com> Message-ID: <3B0C093D.4CE6FCFE@paralleldata.com> Bari Ari wrote: > > I really don't see P4's for dense clusters. ULV PIIIs and Athlon4 with > SMP makes much more sense. IA-64 with SMP will probably come out ahead > in MFOPLS per watt and $$. We're working with parts now that offer 160 > MFLOPS per watt vs. 20 MFLOPS per watt on the P4. Fixed point processors > are down to 1 watt per 1000 Mips. > > Bari That's interesting. By ULV PIII's and Athlon4, I assume you mean the chips used for laptops. I don't have any specs handy but are you saying say a 933Mhz laptop chip running full out takes less current than a standard 933Mhz PIII also running full speed? I realize such a chip will use less power at partial load but how do they compare at 100% busy? Even so, a cluster will idle some of the time so they should save some power. In your example, I agree P4 vs laptop chips are not power effective but I'm comparing them to 21264 Alphas and POWER3+ and POWER4 based systems. I expect a POWER4 will be the most power effective system but one can't buy them yet so it's hard to say for sure. Maybe when Intel shrink the P4 they will get a bit better in the power usage area later this year. For my codes, P4's substantially outperform Athlons and PIII's and at around $750 each, they're hard to beat. (as usual YMMV) Wes From Josh.Puckett at am1.ericsson.se Wed May 23 11:52:25 2001 From: Josh.Puckett at am1.ericsson.se (Josh Puckett (EUS)) Date: Wed Nov 25 01:01:20 2009 Subject: Swap file in scyld Message-ID: <1F630DA3C567D411AA4F00508B693BC6C9AF9F@eamrtnt702.rtp.ericsson.se> Thank you very much. That worked, I just switched $RAMDISK to /dev/hd1/, voila. One more question to show my stupidity. I started on this project because the linux guru of the department told me that it would speed up compile of software we are doing all the time, after delving into this further I have come to realize that this probably isn't possible. Can it be done? thanks again JOSH > -----Original Message----- > From: Sean Dilda [SMTP:agrajag@scyld.com] > Sent: Wednesday, May 23, 2001 1:55 PM > To: Josh Puckett (EUS) > Cc: 'beowulf@beowulf.org' > Subject: Re: Swap file in scyld > > On Wed, 23 May 2001, Josh Puckett (EUS) wrote: > > > OK, this if my first post here, as you can probably tell. I am working on a beowulf cluster of some older machines we have in our lab. I have ran in to a small problem. We have 5 machines with 32mb of ram, and I wanted to use those as well as the others I have. However, using a default Scyld install you need something like 60mb of ram. No problem, I thought a swap file would fix the problem. I un-commented the swap line in the fstab, and left it as 40960 size. However the node still stops as its "copying libraries", exactly the same as before, when the 32mb of ram is full. 1MB of the swap file is supposedly used, which seems strange, because one would think none of it has been used. Any ideas on how to get the swap file going? > > What is happening is by default, the Scyld system uses a ramdisk as / on > the slave nodes. This ramdisk is filling up. If you look at the > install guide (http://www.scyld.com/support/docs/beoinstall.html) there > are instructions on how to partition the harddrives on slave nodes. I'd > suggest you do this, then modify the fstab for the slave nodes so that / > is actually a harddrive partition and not a ramdisk. Then comment the > line for the ramdisk. This will solve your problem. > > As far as swap, your swap is most likely working, however ramdisks never > get swapped out, so using swap won't solve this particular problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010523/8b13ae83/attachment.html From alvin at Mail.Linux-Consulting.com Wed May 23 12:01:58 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:20 2009 Subject: 1U P4 Systems - holes In-Reply-To: Message-ID: hi Robert the $50 1U case off the shelf... - you have to drill all holes for motehrboard, powersupply etc and the metal it uses is like paper... ( extremely thin the average off the shelf 1U case is about $250... - one fd, one hd, one cdrom...--or-- - 2 5.25" bays w/ floppy our chassis is a range... 2 drives up to 8 drives... All 1U chassis suffer from: - room only one pci card - atx connectors on the back of the motherboard has to be the Intel CA810EAL/D815EAL style... - NOT the dual NIC supermicro motherboards - except if you buy the 1U from supermicro themself For a list of all the 1U vendors... http://www.Linux-1U.net/1U_Others have fun alvin > Ah. I may have misunderstood. I interpreted "off the shelf 1U > enclosures" to mean equipped with a power supply and compartmentalized > for installation, just like an off the shelf case or the $200 off the > shelf 1U enclosure sold by case vendors. I also interpreted the > "mounting holes" to be just the rack mounting holes, not all the holes > into which things are to be fastened in the completely empty case. > > With that much hassle I might as well go with a filing cabinet design or > buy off the shelf tower cases, take them apart, and remount the > motherboard tray, drive cage, and power supply on shelving. > > My main whine is that one shouldn't have to do all this handiwork to > build systems in racks when it is so much easier to do it in an assembly > line. Case vendor margins on rackmount cases must be huge, because I > cannot believe that there is THAT much economy of scale in the > manufacturing process. > > I'll shut up now and just live with it. > From agrajag at scyld.com Wed May 23 12:22:52 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:20 2009 Subject: Swap file in scyld In-Reply-To: <1F630DA3C567D411AA4F00508B693BC6C9AF9F@eamrtnt702.rtp.ericsson.se>; from Josh.Puckett@am1.ericsson.se on Wed, May 23, 2001 at 01:52:25PM -0500 References: <1F630DA3C567D411AA4F00508B693BC6C9AF9F@eamrtnt702.rtp.ericsson.se> Message-ID: <20010523152252.A15929@blueraja.scyld.com> On Wed, 23 May 2001, Josh Puckett (EUS) wrote: > Thank you very much. That worked, I just switched $RAMDISK to /dev/hd1/, voila. > One more question to show my stupidity. I started on this project because the linux guru of the department told me that it would speed up compile of software we are doing all the time, after delving into this further I have come to realize that this probably isn't possible. Can it be done? > thanks again Unfortunately, 'Will a beowulf speed things up for me?' isn't a simple question and depends a lot on what you're doing. Even with compiling, there are some situations where it will help and others where it won't. If your compiles are such that you have a lot of files that can be compiled in parallel (as opposed to having to wait for others to be compiled before they can be compiled) then its possible that a beowulf cluster will help speed things up. Also, if your compile is such that your main slow down is the processor time to compile the files and not the I/O time to read the files, then there's even more of a chance that a beowulf cluster will help speed things up. The tricky part is getting make (or something like it) to use a cluster for compiling. I have heard of companies that use clusters for compiling like that, but unfortunately I know of no publiclly available verion of make (or something similar) that can be setup to use a cluster for compiling. If anyone knows of one, I'd love to hear about it. Jag -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010523/c4d7c6d1/attachment.bin From bari at onelabs.com Wed May 23 12:23:23 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> <3B0BF41F.91B0ED6F@paralleldata.com> <3B0BFEDB.3080805@onelabs.com> <3B0C093D.4CE6FCFE@paralleldata.com> Message-ID: <3B0C0E2B.8070805@onelabs.com> W Bauske wrote: > Bari Ari wrote: > >> I really don't see P4's for dense clusters. ULV PIIIs and Athlon4 with >> SMP makes much more sense. IA-64 with SMP will probably come out ahead >> in MFOPLS per watt and $$. We're working with parts now that offer 160 >> MFLOPS per watt vs. 20 MFLOPS per watt on the P4. Fixed point processors >> are down to 1 watt per 1000 Mips. >> >> Bari > > > That's interesting. By ULV PIII's and Athlon4, I assume you mean the chips > used for laptops. I don't have any specs handy but are you saying say a > 933Mhz laptop chip running full out takes less current than a standard 933Mhz > PIII also running full speed? I realize such a chip will use less power > at partial load but how do they compare at 100% busy? Even so, a cluster > will idle some of the time so they should save some power. > Some of the ULV PIIIs are down to 0.975V core voltage now. These take less current running flat out a top clock speed than the PIII counterparts. Athlon4s are much the same. > In your example, I agree P4 vs laptop chips are not power effective but > I'm comparing them to 21264 Alphas and POWER3+ and POWER4 based systems. > I expect a POWER4 will be the most power effective system but one can't > buy them yet so it's hard to say for sure. Maybe when Intel shrink the > P4 they will get a bit better in the power usage area later this year. > > For my codes, P4's substantially outperform Athlons and PIII's and at > around $750 each, they're hard to beat. (as usual YMMV) > > Wes > Mips CPUs are now down to around 6W per GFlop. What would be real nice is if Intel would put a few floating point pipelines back into the XScale. They would be only a few watts per GFLOP since the fixed point now is only 1W per 1000Mips. Bari From jsmith at structbio.vanderbilt.edu Wed May 23 12:36:54 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod Smith) Date: Wed Nov 25 01:01:20 2009 Subject: Make in parallel (was 'Swap file in scyld') References: <1F630DA3C567D411AA4F00508B693BC6C9AF9F@eamrtnt702.rtp.ericsson.se> <20010523152252.A15929@blueraja.scyld.com> Message-ID: <3B0C1156.35A8EB1D@structbio.vanderbilt.edu> Sean Dilda wrote: > The tricky part is getting make (or something like it) to use a cluster > for compiling. I have heard of companies that use clusters for > compiling like that, but unfortunately I know of no publiclly available > verion of make (or something similar) that can be setup to use a cluster > for compiling. If anyone knows of one, I'd love to hear about it. > > Jag If you were in the regime where your compile bottleneck were the CPU time (and not I/O or object order-dependence) then a simple thread-based load balancing scheme like MOSIX and "make -jN" (where N is the number of processors in your cluster) might stand a reasonable chance of speeding things up... This is just in theory. I have not tried this except on a 2-way SMP box, where it does indeed seem to speed things up. -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From sp at scali.no Wed May 23 12:45:31 2001 From: sp at scali.no (Steffen Persvold) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> <3B0BF7CD.7980EBA2@scali.no> <3B0C03F7.B762614A@paralleldata.com> Message-ID: <3B0C135B.582E57F3@scali.no> W Bauske wrote: > > Steffen Persvold wrote: > > > > Chris Black wrote: > > > > > > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > > > Chris Black wrote: > > > > > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > > > PCI slots and haven't had much luck. We are now evaluating the > > > [stuff deleted] > > > > > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > > > > quite nice. They both have onboard SCSI-3. > > > > > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > > > > 370DLE is without SCSI. > > > > > > > > I haven't checked too much, but I believe all of these boards are > > > > cheaper than both the Intel and the ASUS boards. > > > > > > Have you or anyone used the onboard IDE on these motherboards? > > > The person working with the Intel serverworks board seems to be > > > having trouble getting IDE working in ultradma mode. Also, do > > > any of these boards have onboard video/ethernet? > > > > > I'm sorry but I haven't used IDE on these mobos for disks, only CDROM. > > As for onboard VGA/ethernet, the 370DER has an onboard ATI Rage XL 8MB > > card and dual Intel 82557 ethernet controllers. The 370DE6 has an AGP > > slot and a onboard 82557 ethernet controller. > > > > I have a 370DLE that does: > > [root@wsb50 /root]# hdparm -tT /dev/hda > > /dev/hda: > Timing buffer-cache reads: 128 MB in 0.70 seconds =182.86 MB/sec > Timing buffered disk reads: 64 MB in 3.50 seconds = 18.29 MB/sec > Actually, I located a 6010H (370DER) machine in our lab with 2x800MHz PIII and an IDE disk. The box is running a 2.2.17 kernel _without_ the OSB4 IDE patch. Part of the dmesg output : PCI_IDE: unknown IDE controller on PCI bus 00 device 79, VID=1166, DID=0211 PCI_IDE: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio hda: IBM-DTLA-307030, ATA DISK drive hdc: MATSHITA CR-176, ATAPI CDROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: IBM-DTLA-307030, 29314MB w/1916kB Cache, CHS=3737/255/63 # hdparm -Tt /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 0.65 seconds =196.92 MB/sec Timing buffered disk reads: 64 MB in 2.58 seconds = 24.81 MB/sec On a Tyan S2510 (ServerWorks LE chipset) with 2x800MHz PIII same IDE disk, but running a 2.4.3 kernel (with the OSB4 IDE patch) : ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 ServerWorks OSB4: chipset revision 0 ServerWorks OSB4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio hda: IBM-DTLA-307030, ATA DISK drive hdc: ATAPI 24X CDROM, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: 60036480 sectors (30739 MB) w/1916KiB Cache, CHS=3737/255/63, UDMA(33) # /sbin/hdparm -tT /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 0.76 seconds =168.42 MB/sec Timing buffered disk reads: 64 MB in 2.85 seconds = 22.46 MB/sec We have (AFAIK) not experienced any problems with them. Regards -- Steffen Persvold Systems Engineer Email : mailto:sp@scali.no Scali AS (http://www.scali.com) Tlf : (+47) 22 62 89 50 Olaf Helsets vei 6 Fax : (+47) 22 62 89 51 N-0621 Oslo, Norway From lckun at chollian.net Wed May 23 12:53:06 2001 From: lckun at chollian.net (lckun) Date: Wed Nov 25 01:01:20 2009 Subject: Help for DQS-Problem References: <200105221615.SAA02649@ob.informatik.uni-rostock.de> Message-ID: <3B0C1522.44060CCB@chollian.net> Hi all! I am not sure if I may ask this problem in this mailing list. Does anyone help me about this problem? I installed DQS on the solaris. After i sent dqs script, the jobs stay always in the queue and it can not be executed. /users/lckun -> qstat32 -f Queue Name Queue Type Quan Load State ---------- ---------- ---- ---- ----- Q_themse batch 1/1 0.00 er UP lckun UTESTJOB 52 0:4 r RUNNING 05/22/101 17:02:58 ----Pending Jobs --------------------------------------------------------------- lckun UTESTJOB 53 0:5 QUEUED 05/22/101 17:10:25 The error message of err_file is as follows; time=990628376 Illegitimate host >themse< tried to connect to dqs_execd on >themse< as qmaster ../SRC/dqs_sec.c 311 dqs_execd32 themse Thanks in advance for the help. Regards, tag From sgaudet at angstrommicro.com Wed May 23 12:53:14 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:20 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: <20010523131659.B7870@getafix.EraGen.com> References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> Message-ID: <990647594.3b0c152aa878a@localhost> Hello Chris Black, > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > Chris Black wrote: > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > PCI slots and haven't had much luck. We are now evaluating the > [stuff deleted] > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are > actually > > quite nice. They both have onboard SCSI-3. > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and > the > > 370DLE is without SCSI. > > > > I haven't checked too much, but I believe all of these boards are > > cheaper than both the Intel and the ASUS boards. > Have you or anyone used the onboard IDE on these motherboards? > The person working with the Intel serverworks board seems to be > having trouble getting IDE working in ultradma mode. Also, do > any of these boards have onboard video/ethernet? From josip at icase.edu Wed May 23 12:55:02 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> Message-ID: <3B0C1596.BCAB1E03@icase.edu> Bari Ari wrote: > > I did a quick compute of the P4 thermal design and it would be possible > to put 8 P4s into a 1U if you could use the entire surface area of the > top of the enclosure as a heat sink [...] A neat idea. However, one would also have to justify its cost. How large would the market be? Deep pocket customers only? Then, the price will be high, and all but the most space constrained users will flee to cheaper alternatives. The same reasoning applies to the discussion concerning 1U vs. commodity cases. Technical computing users used to be kings of the computing jungle, but that was decades ago. A few of them still have deep pockets and the ability to buy exactly what they want. The rest are using mass market leverage to buy compute cycles at a discount. Unless some day mass market switches to high density packaging, this feature will continue to cost extra. I see the Beowulf concept of using commodity components as a way of establishing the base price of computing. I'd be willing to pay more, but only if this alleviates some critical constraint. Then, the price of non-commodity extras (Myrinet, high density packaging, etc.) can be justified. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From bari at onelabs.com Wed May 23 13:45:49 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:20 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> <3B0C1596.BCAB1E03@icase.edu> Message-ID: <3B0C217D.7080809@onelabs.com> Josip Loncaric wrote: > Bari Ari wrote: > >> I did a quick compute of the P4 thermal design and it would be possible >> to put 8 P4s into a 1U if you could use the entire surface area of the >> top of the enclosure as a heat sink [...] > > > A neat idea. However, one would also have to justify its cost. We don't envision very dense clusters being priced above the cost of off the shelf built clusters. The cost of building one enclosure is less than eight enclosures. > > How large would the market be? Very large if it kept in the same or lower price range as OTS. Deep pocket customers only? There is no reason why dense clusters will cost more than large clusters built using commodity parts. In fact just the opposite will be true with nodes built using commodity parts from the component level rather than from the board level. We are working on a scalable design now that will pack 1TFLOP into a single 19" rack for under $1,000,000 and 64 node clusters that will fit under a desk. Then, the > price will be high, and all but the most space constrained users will > flee to cheaper alternatives. The same reasoning applies to the > discussion concerning 1U vs. commodity cases. > High performance computing design used to be very costly and time consuming. I am still surprised to hear that some CO's still spend >$100K for a new mainboard design and take half a year to complete it. Chipsets have become so highly integrated now that mainboard designs can be completed in only a few short weeks for only $10-20K. If you're building 100 or more systems which is what clustering is all about. Gone are the days when custom nodes = high costs. > Technical computing users used to be kings of the computing jungle, but > that was decades ago. A few of them still have deep pockets and the > ability to buy exactly what they want. The rest are using mass market > leverage to buy compute cycles at a discount. Unless some day mass > market switches to high density packaging, this feature will continue to > cost extra. > > I see the Beowulf concept of using commodity components as a way of > establishing the base price of computing. I'd be willing to pay more, > but only if this alleviates some critical constraint. Then, the price > of non-commodity extras (Myrinet, high density packaging, etc.) can be > justified. > With the dropping costs of CPUs and chipsets what we are now seeing is that the cost of the Myrinet and SCI parts are several times the cost of the CPU and chipset. Maybe someone will make a low cost low latency high bandwith ASIC soon to fit the growing market. Bari From ksfacinelli at yahoo.com Wed May 23 13:58:25 2001 From: ksfacinelli at yahoo.com (Kevin Facinelli) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <3B0C1596.BCAB1E03@icase.edu> Message-ID: <20010523205825.12303.qmail@web13502.mail.yahoo.com> If the entire top of the case was a heatsink many other variable come to mind....what happens when you stack or rack??? how thick will the lid be? How would you get air across the lid in a racked configuration. Let's think about this??? why does it have to be 1U??...think a little bit outside the box...if the unit could be any shape would it be thin and narrow creating a terrible aspect ratio for air movement...NO. Take a look at the ingenious approach of this product: http://www.crystalpc.com/products/computers/cs300.asp The cooling and space advantages are easy to see. Kevin --- Josip Loncaric wrote: > Bari Ari wrote: > > > > I did a quick compute of the P4 thermal design and > it would be possible > > to put 8 P4s into a 1U if you could use the entire > surface area of the > > top of the enclosure as a heat sink [...] > > A neat idea. However, one would also have to > justify its cost. > > How large would the market be? Deep pocket > customers only? Then, the > price will be high, and all but the most space > constrained users will > flee to cheaper alternatives. The same reasoning > applies to the > discussion concerning 1U vs. commodity cases. > > Technical computing users used to be kings of the > computing jungle, but > that was decades ago. A few of them still have deep > pockets and the > ability to buy exactly what they want. The rest are > using mass market > leverage to buy compute cycles at a discount. > Unless some day mass > market switches to high density packaging, this > feature will continue to > cost extra. > > I see the Beowulf concept of using commodity > components as a way of > establishing the base price of computing. I'd be > willing to pay more, > but only if this alleviates some critical > constraint. Then, the price > of non-commodity extras (Myrinet, high density > packaging, etc.) can be > justified. > > Sincerely, > Josip > > -- > Dr. Josip Loncaric, Research Fellow > mailto:josip@icase.edu > ICASE, Mail Stop 132C PGP key at > http://www.icase.edu./~josip/ > NASA Langley Research Center > mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 > Fax +1 757 864-6134 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== Kevin Facinelli www.colosource.com webmaster@colosource.com __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From ksfacinelli at yahoo.com Wed May 23 13:59:05 2001 From: ksfacinelli at yahoo.com (Kevin Facinelli) Date: Wed Nov 25 01:01:21 2009 Subject: Fwd: Re: 1U P4 Systems - holes Message-ID: <20010523205905.27232.qmail@web13503.mail.yahoo.com> > Alvin... > > I just wanted to make you aware of a the new 1U > chassis made by Crystal Group Inc. First, it is > expensive, that said, it does have many nice > features. > > It has three blowers that are designed to perform > well > under high static pressure (4x the efficiency of > standard fans). This allows the case to cool very > efficiently. It also utilizes a CPU Card instead of > a > backplane allowing two full length expansion cards. > > The CPU card technology can range from simple PIII > to > highly integrated dual PIII with on board dual > eathernet, SCSI and VGA. > > take a look....it is designed very well: > > http://www.crystalpc.com/products/roservers.asp > > Kevin > > --- alvin@Mail.Linux-Consulting.com wrote: > > > > hi Robert > > > > the $50 1U case off the shelf... > > - you have to drill all holes for motehrboard, > > powersupply etc > > and the metal it uses is like paper... ( > extremely > > thin > > > > the average off the shelf 1U case is about $250... > > - one fd, one hd, one cdrom...--or-- > > - 2 5.25" bays w/ floppy > > > > our chassis is a range... 2 drives up to 8 > drives... > > > > All 1U chassis suffer from: > > - room only one pci card > > - atx connectors on the back of the motherboard > has > > to be > > the Intel CA810EAL/D815EAL style... > > - NOT the dual NIC supermicro motherboards > > - except if you buy the 1U from supermicro > > themself > > > > For a list of all the 1U vendors... > > http://www.Linux-1U.net/1U_Others > > > > have fun > > alvin > > > > > > > Ah. I may have misunderstood. I interpreted > "off > > the shelf 1U > > > enclosures" to mean equipped with a power supply > > and compartmentalized > > > for installation, just like an off the shelf > case > > or the $200 off the > > > shelf 1U enclosure sold by case vendors. I also > > interpreted the > > > "mounting holes" to be just the rack mounting > > holes, not all the holes > > > into which things are to be fastened in the > > completely empty case. > > > > > > With that much hassle I might as well go with a > > filing cabinet design or > > > buy off the shelf tower cases, take them apart, > > and remount the > > > motherboard tray, drive cage, and power supply > on > > shelving. > > > > > > My main whine is that one shouldn't have to do > all > > this handiwork to > > > build systems in racks when it is so much easier > > to do it in an assembly > > > line. Case vendor margins on rackmount cases > must > > be huge, because I > > > cannot believe that there is THAT much economy > of > > scale in the > > > manufacturing process. > > > > > > I'll shut up now and just live with it. > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > ===== > Kevin Facinelli > www.colosource.com > webmaster@colosource.com > > __________________________________________________ > Do You Yahoo!? > Yahoo! Auctions - buy the things you want at great > prices > http://auctions.yahoo.com/ > ===== Kevin Facinelli www.colosource.com webmaster@colosource.com __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From bari at onelabs.com Wed May 23 14:27:12 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <20010523205825.12303.qmail@web13502.mail.yahoo.com> Message-ID: <3B0C2B30.2020202@onelabs.com> Kevin Facinelli wrote: > If the entire top of the case was a heatsink many > other variable come to mind....what happens when you > stack or rack??? how thick will the lid be? How would > you get air across the lid in a racked configuration. > > Let's think about this??? why does it have to be > 1U??...think a little bit outside the box...if the > unit could be any shape would it be thin and narrow > creating a terrible aspect ratio for air > movement...NO. > > Take a look at the ingenious approach of this product: > > http://www.crystalpc.com/products/computers/cs300.asp > > The cooling and space advantages are easy to see. > > Take a look at the earlier post, it's all worked out. The Crystal approach equates to only .87U per server, "the CS300 lets you fit up to 52 servers in a standard 7 ft. rack/cabinet", better but not dense enough. We were targeting 0.125U per node or 384 nodes per 7 ft. rack. I'd probably never build a single board to fill a 1U with eight P4s on it, but the numbers do work out if you wanted to build some. Bari From James.P.Lux at jpl.nasa.gov Wed May 23 14:36:31 2001 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:01:21 2009 Subject: low cost high speed interface was: custom cluster cabinets (was Re: 1U P4 Systems) Message-ID: <006a01c0e3d0$70dda310$61064f89@cerulean.jpl.nasa.gov> >> >With the dropping costs of CPUs and chipsets what we are now seeing is >that the cost of the Myrinet and SCI parts are several times the cost of >the CPU and chipset. Maybe someone will make a low cost low latency high >bandwith ASIC soon to fit the growing market. > You might take a look at SpaceWire (aka IEEE 1355.2). Very low cost, high speed (150 Mbps today, 400 in next year) low latency serial communications with nonblocking wormhole routers (i.e. the router just looks at the header to figure out where to send it and doesn't buffer up the whole packet). The hardware deals with bandwidth sharing among multiple links in parallel, etc. Today, it's being pushed for space applications (hence the name), but University of Dundee has just finished a VHDL core for the interface and router, and is starting to test, and expects to be done in a year. Today, you can buy Intellectual property from a company called 4links (in England) to fit in the smallest Xilinx FPGA to implement it. The full up core and router are targeted to the Virtex parts, which are quite large, and expensive. Whether someone will market an inexpensive ASIC to support it is an open question. There are some large consumer electronics firms looking at it (since it can easily handle video rates, and is much cheaper and simpler than 1394 (FireWire)). Spacewire PCI cards are wretchedly expensive today ($10K for a three link card) but that's because they are using a $5K part designed for space applications, which are low volume and not price sensitive to the degree that consumer, mass market, is. If one were interested in implementing the interface on the mobo with a motherboard chipset, the cost would be quite low for the additional hardware needed. Full Disclosure: I am on the IEEE 1355.2 committee and have been using the interface in a lab application for a year now, so I am a bit biased. From alvin at Mail.Linux-Consulting.com Wed May 23 14:31:33 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: 64bit/66MHz PCI mobos - onbaord video/ethernet In-Reply-To: <20010523131659.B7870@getafix.EraGen.com> Message-ID: hi ya chris for motherboards with onboard NIC, onboard svga... http://www.Linux-1U.net/1U_Features/dual.txt have fun alvin On Wed, 23 May 2001, Chris Black wrote: > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > Chris Black wrote: > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > PCI slots and haven't had much luck. We are now evaluating the > [stuff deleted] > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > > quite nice. They both have onboard SCSI-3. > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > > 370DLE is without SCSI. > > > > I haven't checked too much, but I believe all of these boards are > > cheaper than both the Intel and the ASUS boards. > > Have you or anyone used the onboard IDE on these motherboards? > The person working with the Intel serverworks board seems to be > having trouble getting IDE working in ultradma mode. Also, do > any of these boards have onboard video/ethernet? > > Chris > From bari at onelabs.com Wed May 23 14:36:23 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: low cost high speed interface was: custom cluster cabinets (was Re: 1U P4 Systems) References: <006a01c0e3d0$70dda310$61064f89@cerulean.jpl.nasa.gov> Message-ID: <3B0C2D57.9040400@onelabs.com> Jim Lux wrote: > You might take a look at SpaceWire (aka IEEE 1355.2). Very low cost, high > speed (150 Mbps today, 400 in next year) low latency serial communications > with nonblocking wormhole routers (i.e. the router just looks at the header > to figure out where to send it and doesn't buffer up the whole packet). > Today, it's being pushed for space applications (hence the name), but > University of Dundee has just finished a VHDL core for the interface and > router, and is starting to test, and expects to be done in a year. Today, > you can buy Intellectual property from a company called 4links (in England) > to fit in the smallest Xilinx FPGA to implement it. The full up core and > router are targeted to the Virtex parts, which are quite large, and > expensive. > If it could fit in a Xilinx Spartan-II that would be very nice. Bari From sgaudet at angstrommicro.com Wed May 23 14:41:18 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <20010523205825.12303.qmail@web13502.mail.yahoo.com> References: <20010523205825.12303.qmail@web13502.mail.yahoo.com> Message-ID: <990654078.3b0c2e7e7a14b@localhost> Hello All, > If the entire top of the case was a heatsink many > other variable come to mind....what happens when you > stack or rack??? how thick will the lid be? How would > you get air across the lid in a racked configuration. > > Let's think about this??? why does it have to be > 1U??...think a little bit outside the box...if the > unit could be any shape would it be thin and narrow > creating a terrible aspect ratio for air > movement...NO. > > Take a look at the ingenious approach of this product: > > http://www.crystalpc.com/products/computers/cs300.asp > > The cooling and space advantages are easy to see. It's ok, but many points of failure. Look into Compaq PCI(cPCI). Intel bought a company called Ziatech and released their products under the name of Ketris. http://www.ziatech.com/ketris/main.htm They were to start shipping a 9U with 16 single board computers with 800Mhz low power processors and gigabit ethernet. However, they are updating the design and delayed shipping until 4th quarter of this year. The good news is this product will have dual processors when released in Q4. Whats even better is no switch needed for interconnect, much easier management tools, and multiple operating systems able to run on different blades, stripping of blades, replacement of defective blades, or adding new blades when the system is still online and running. Moreover, because its running on Intel's laptop processors it draws less power, easier to cool. Therefore, less cost of ownership. On the other spectrum look at Alpha's. We sold a 64 node dual 750/8Mhz cluster the customer needed a 5 ton air conditioner to cool it and the 618w power supplies draw quite a bit of electricity. We ran the Ketris box in inhouse, great technology, check it out. Cheers, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From alvin at Mail.Linux-Consulting.com Wed May 23 14:43:35 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets -- cooling In-Reply-To: <3B0C2B30.2020202@onelabs.com> Message-ID: hi ya thank kevin for the url... nice looking chassis but... i dont know if theyve solved the cooling problem per se since we've nto measured/tested it... adding blowers/fans does NOT mean that its cooler if there is no air coming in and out of the chassis designing 8 servers inside a 1U shelf space is a challenge and must be accompanied by a similarly sized budget and if off-the-shelf 8 servers in 1U space is required... a cool crusoe based system might work if its not too compute intensive... have fun alvin On Wed, 23 May 2001, Bari Ari wrote: > Kevin Facinelli wrote: > > > If the entire top of the case was a heatsink many > > other variable come to mind....what happens when you > > stack or rack??? how thick will the lid be? How would > > you get air across the lid in a racked configuration. > > > > Let's think about this??? why does it have to be > > 1U??...think a little bit outside the box...if the > > unit could be any shape would it be thin and narrow > > creating a terrible aspect ratio for air > > movement...NO. > > > > Take a look at the ingenious approach of this product: > > > > http://www.crystalpc.com/products/computers/cs300.asp > > > > The cooling and space advantages are easy to see. > > > > > Take a look at the earlier post, it's all worked out. > > The Crystal approach equates to only .87U per server, "the CS300 lets > you fit up to 52 servers in a standard 7 ft. rack/cabinet", better but > not dense enough. We were targeting 0.125U per node or 384 nodes per 7 > ft. rack. I'd probably never build a single board to fill a 1U with > eight P4s on it, but the numbers do work out if you wanted to build some. > > Bari > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From bari at onelabs.com Wed May 23 15:18:01 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets -- cooling References: Message-ID: <3B0C3719.9020402@onelabs.com> alvin@Mail.Linux-Consulting.com wrote: > designing 8 servers inside a 1U shelf space is a challenge > and must be accompanied by a similarly sized budget > True, the first 8-node unit would cost about $30-40K but if you're going to build over 30, then the cost would drop below an OTS solution. > and if off-the-shelf 8 servers in 1U space is required... > a cool crusoe based system might work if its not too compute > intensive... I was surprised to see the Crusoe blades (nodes) priced around $1200/node. Maybe the prices will fall after the AMD and Intel equivalents turn up later this year. Has anyone seen the benchmarks on the Crusoe? ...Mips, Flops, Spec ?? I haven't seen any actual results yet. Bari From haohe at me1.eng.wayne.edu Wed May 23 18:55:21 2001 From: haohe at me1.eng.wayne.edu (Hao He) Date: Wed Nov 25 01:01:21 2009 Subject: ifenslave error in Channel Bonding Message-ID: <200105240102.VAA12949@me1.eng.wayne.edu> Hi, all. I am trying to bond our cluster with 3C905 cards. Since my Linux distribution is SuSE 6.1 (2.2.5 kernel upgraded to 2.4.4), I have to run ifconfig and ifenslave at command line. Finally I got success in one try, I think, but failed in all others. I am confused. Here are the details. When I ran ifconfig bond0 192.168.1.1 up No error prompted. When I check ifconfig, I find that bond0 got IP 192.168.1.1 and HWADDR is 00:00:00:00:00:00. Seems it is OK. Then I ran ifenslave bond0 eth0 I got following error message: SIOCSIFHWADDR on bond0 failed: Device or resource busy. The master device bond0 is busy: it must be idle before running this command. What's wrong? Could you tell me how to correct this problem? Your advice will be highly appreciated. Thanks a lot! 8-) Best regards, Hao He From marini at pcmenelao.mi.infn.it Wed May 23 23:47:09 2001 From: marini at pcmenelao.mi.infn.it (Franz Marini) Date: Wed Nov 25 01:01:21 2009 Subject: 64bit/66MHz PCI - now: gigabit ethernet/bonding In-Reply-To: Message-ID: On Wed, 23 May 2001, Keith Underwood wrote: > Um, theoretical 2 Gb/s out of a Gigabit Ethernet card (full-duplex Gigabit > Ethernet). > Ok, I thought you were talking about PCI bandwidth. Talking about gigabit ethernet, yes you're right. Btw, is it possible to bond two or more gigabit cards ? And, if anyone ever tried, how is latency and, is the cpu able to drive two cards at full speed ? Franz --------------------------------------------------------- Franz Marini Sys Admin and Software Analyst, Dept. of Physics, University of Milan, Italy. email : marini@pcmenelao.mi.infn.it --------------------------------------------------------- From hamlet at cs.umu.se Thu May 24 03:44:44 2001 From: hamlet at cs.umu.se (Fredrik Augustsson) Date: Wed Nov 25 01:01:21 2009 Subject: OT question Message-ID: <20010524124444.I10884@peppar.cs.umu.se> Hi, I was thinking about what we would have done if we didnt have 'wulfs?! What other ways are there to get that extra cycle to do calculations? I know of a few nice ones, like Condor [1], distibuted.net [2] and United Devices [3]. Are there any other cool ways of getting that extra cycle? + Fredrik [1] http://www.cs.wisc.edu/condor/ [2] http://distributed.net/ [3] http://www.ud.com/ From brian at patriot.net Thu May 24 07:14:17 2001 From: brian at patriot.net (Brian C Merrell) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node Message-ID: I'm using Scyld Linux on a 24 node cluster. Everything installed fine, and I'm fairly happy with it so far; beowulf has come a long way since I last played with it about two years ago. But I have a few questions on how to set some things up. 1) How can I get access to a specific node? Some of the researchers I'm supporting want to use only part of the cluster at a time (so that other chunks are free for use, I suppose). How can they access a specific node and run programs from it? There doesn't appear to be any rlogind (or rsh, or telnet, or even a local /etc/passwd file, or ...) to accept connections. bpsh won't give me a shell, so that won't work. 2) How can I use the master node as an active compute node (node0)? Running bpslave 192.168.1.1 2223 (so from the master to the master) doesn't seem to work, although maybe I'm really misunderstanding how this is supposed to work? 3) While I'm at it, a few other things: is it possible to define hostnames for the nodes? Obviously the master is a (mostly) normal installation that I can add to the hosts file or use DNS, but what about the slave nodes? We'd like to have it so the master is l001 and the slaves are l002 through l024 (that first character is a lowercase 'L', BTW). -brian -- Brian C. Merrell P a t r i o t N e t Systems Staff brian@patriot.net http://www.patriot.net (703) 277-7737 PatriotNet ICBM address: 38.845 N, 77.3 W From agrajag at scyld.com Thu May 24 08:05:28 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: ; from brian@patriot.net on Thu, May 24, 2001 at 10:14:17AM -0400 References: Message-ID: <20010524110528.A26997@blueraja.scyld.com> On Thu, 24 May 2001, Brian C Merrell wrote: > I'm using Scyld Linux on a 24 node cluster. Everything installed fine, > and I'm fairly happy with it so far; beowulf has come a long way since I > last played with it about two years ago. But I have a few questions on > how to set some things up. > > 1) How can I get access to a specific node? Some of the researchers I'm > supporting want to use only part of the cluster at a time (so that other > chunks are free for use, I suppose). How can they access a specific node > and run programs from it? There doesn't appear to be any rlogind (or rsh, > or telnet, or even a local /etc/passwd file, or ...) to accept > connections. bpsh won't give me a shell, so that won't work. You do this through bpsh. The slave nodes are to run compute jobs, not interactive login sessions, so you really shouldn't need to run an interactive shell on them. If you need to do some admin work that requires you to run stuff on the slave nodes, you can always bpsh all the individual commands to the slave node. > > 2) How can I use the master node as an active compute node (node0)? > Running bpslave 192.168.1.1 2223 (so from the master to the master) > doesn't seem to work, although maybe I'm really misunderstanding how this > is supposed to work? What are you using to run your jobs? If you're using MPI, the rank == 0 job is always run on the master node (unless you give mpirun the -nolocal option) > > 3) While I'm at it, a few other things: is it possible to define hostnames > for the nodes? Obviously the master is a (mostly) normal installation > that I can add to the hosts file or use DNS, but what about the slave > nodes? We'd like to have it so the master is l001 and the slaves are l002 > through l024 (that first character is a lowercase 'L', BTW). This is handled through a program called beonss. It assigns the master node to be master or .-1, and all ther other nodes are . So node 0 is .0, node 1 is .1, etc. Can you change your programs to use this? If not, beonss is a fairly simple C program that shouldn't be too difficult to modify. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010524/9e5cca9e/attachment.bin From brian at patriot.net Thu May 24 08:28:45 2001 From: brian at patriot.net (Brian C Merrell) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524110528.A26997@blueraja.scyld.com> Message-ID: On Thu, 24 May 2001, Sean Dilda wrote: > > You do this through bpsh. The slave nodes are to run compute jobs, not > interactive login sessions, so you really shouldn't need to run an > interactive shell on them. If you need to do some admin work that > requires you to run stuff on the slave nodes, you can always bpsh all > the individual commands to the slave node. > Hmmm. It's fairly important for them to be able to rlogin to each machine. They really want to be able to get to each box and run programs a certain way. Is it possible to set up a custom install on each machine that still has the beowulf modifications (and can join the cluster) yet is also an independent and full linux box? Would I simply have to run bpslave on a node to bring it into the cluster? Thanks for the other info, BTW. -brian -- Brian C. Merrell P a t r i o t N e t Systems Staff brian@patriot.net http://www.patriot.net (703) 277-7737 PatriotNet ICBM address: 38.845 N, 77.3 W From tibbs at math.uh.edu Thu May 24 08:46:16 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: Wed Nov 25 01:01:21 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) In-Reply-To: "Chris Black"'s message of "Wed, 23 May 2001 13:16:59 -0400" References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> Message-ID: >>>>> "CB" == Chris Black writes: CB> Have you or anyone used the onboard IDE on these motherboards? I have, on a SuperMicro 370DLE. The IDE is OK, but not great. XYX:lserv00:~> s hdparm -tT /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 0.74 seconds =172.97 MB/sec Timing buffered disk reads: 64 MB in 2.83 seconds = 22.61 MB/sec XYX:lserv00:~> cat /proc/ide/hda/model IBM-DTLA-307020 It's a mostly-meaningless benchmark, but I get 34MB/sec out of the same disk hanging off of the Promise controller built into the Asus A7V133. The OSB4 really doesn't do IDE very well. If you really need the disk bandwidth but don't want to shell out for SCSI, consider getting a cheap (<$40) PCI Promise IDE controller and hanging your disks there. - J< From timothy.g.mattson at intel.com Thu May 24 09:35:42 2001 From: timothy.g.mattson at intel.com (Mattson, Timothy G) Date: Wed Nov 25 01:01:21 2009 Subject: OT question Message-ID: If you want extra cycles, you might want to check out the Milan project (and Calypso). http://www.cs.nyu.edu/milan/milan/index.html They have come up with a programming model that is relatively easy to use, provides good statistical load balancing, and some degree of fault tolerance. While I haven't downloaded or used it, I find MOSIX very intriguing as well. www.mosic.org --Tim -----Original Message----- From: Fredrik Augustsson [mailto:hamlet@cs.umu.se] Sent: Thursday, May 24, 2001 3:45 AM To: beowulf@beowulf.org Subject: OT question Hi, I was thinking about what we would have done if we didnt have 'wulfs?! What other ways are there to get that extra cycle to do calculations? I know of a few nice ones, like Condor [1], distibuted.net [2] and United Devices [3]. Are there any other cool ways of getting that extra cycle? + Fredrik [1] http://www.cs.wisc.edu/condor/ [2] http://distributed.net/ [3] http://www.ud.com/ _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From josip at icase.edu Thu May 24 09:42:34 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: <3B0B4CC9.60B08195@paralleldata.com> <3B0BE7FC.1060604@onelabs.com> <3B0C1596.BCAB1E03@icase.edu> <3B0C217D.7080809@onelabs.com> Message-ID: <3B0D39FA.A2254070@icase.edu> Bari Ari wrote: > > We don't envision very dense clusters being priced above the cost of off > the shelf built clusters. The cost of building one enclosure is less > than eight enclosures. The cost of building an item is NOT the price charged for the item. The manufacturer's development cost must also be recovered. Dense packaging adds value not found in the mass market, so hardware vendors can price them higher even if the unit production cost is actually lower. Some of the difference goes to their profit, but a lion's share covers their development costs. I remain skeptical about your pricing projections. Until the mass market for PCs drastically changes, I doubt that we'll see high performance dense packages selling for the same price as equally capable commodity alternatives. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From agrajag at scyld.com Thu May 24 09:54:52 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: ; from brian@patriot.net on Thu, May 24, 2001 at 11:28:45AM -0400 References: <20010524110528.A26997@blueraja.scyld.com> Message-ID: <20010524125452.B26997@blueraja.scyld.com> On Thu, 24 May 2001, Brian C Merrell wrote: > On Thu, 24 May 2001, Sean Dilda wrote: > > > > > You do this through bpsh. The slave nodes are to run compute jobs, not > > interactive login sessions, so you really shouldn't need to run an > > interactive shell on them. If you need to do some admin work that > > requires you to run stuff on the slave nodes, you can always bpsh all > > the individual commands to the slave node. > > > > Hmmm. It's fairly important for them to be able to rlogin to each > machine. They really want to be able to get to each box and run programs > a certain way. Is it possible to set up a custom install on each machine > that still has the beowulf modifications (and can join the cluster) yet is > also an independent and full linux box? Would I simply have to run > bpslave on a node to bring it into the cluster? Is there any reason the program itself can't run itself in the special way they want? Anything you can do with rlogin or rsh can be done with bpsh, except for an interactive shell. However, this can be mimiced through bpsh. If you can give me some idea of what they are wanting to do, I might be able to help you find a way to do it without requiring an interactive shell. Scyld clusters are designed to run background jobs on all of the slave nodes, not to run login services for users on the slave nodes. It is possible to use BProc with a full install on every slave node however this reduces a lot of the easy administration features we've trying to put into our distro. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010524/ea8dc8b2/attachment.bin From brian at patriot.net Thu May 24 11:07:04 2001 From: brian at patriot.net (Brian C Merrell) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524125452.B26997@blueraja.scyld.com> Message-ID: On Thu, 24 May 2001, Sean Dilda wrote: > Is there any reason the program itself can't run itself in the special > way they want? Anything you can do with rlogin or rsh can be done with > bpsh, except for an interactive shell. However, this can be mimiced > through bpsh. If you can give me some idea of what they are wanting to > do, I might be able to help you find a way to do it without requiring an > interactive shell. Scyld clusters are designed to run background jobs > on all of the slave nodes, not to run login services for users on the > slave nodes. > Hmmm. I guess this warrants some background info. The cluster is not a new cluster. It was previously built by someone else who is now gone. The cluster master node crashed, taking the system and most of their data with it. I am now trying to rebuild the cluster. The cluster previously used RH6.1 stock and followed more of a NOW model than a beowulf model, although all the hardware was dedicated to the cluster, not on people's desks. I'm now trying to use Scyld's distro to bring the cluster back up. I'm pretty happy with it, and managed to get the master node up with a SCSI software RAID array, and a few test nodes up with boot floppies. Seems fine to me. BUT.... There are three reasons that they want to be able to rlogin to the machines: 1) first, there are a number of people with independent projects who use the cluster. They are used to being able to simply login to the master, rlogin to a node, and start their projects on one or more nodes, so that they take up only a chunk of the cluster. 2) Also, at least one researcher was previously able to and wants to be able to continue to login to separate nodes and run slightly different (and sometimes non-parallelizable) programs on his data. 3) ALSO, they have code that they would rather not change. > It is possible to use BProc with a full install on every slave node > however this reduces a lot of the easy administration features we've > trying to put into our distro. > I just set this up, and realize what you mean. I had to statically define IP addresses, users, etc. At first it wasn't a pain, but I realized after the first two that doing all 24 would be. Even though it is now possible to rlogin to different nodes, it wasn't what I was hoping for. I imagine it will be particularly unpleasant when software upgrades need to be performed. :( I'm still hoping to find some happy medium, but I'm going to present these options to the group and see what they think. The problem is that they are mathematicians and physicists, not computer people. They really don't want to have to change, even though it seems to be the same. Also one thing I'm still trying to find a solution to: how can the nodes address each other? Previously they used a hosts file that had listings for L001-L024 (and they would like to keep it that way) I guess with the floppy method they don't have to, because the BProc software maps node numbers to IP addresses, -brian From agrajag at scyld.com Thu May 24 11:37:28 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: ; from brian@patriot.net on Thu, May 24, 2001 at 02:07:04PM -0400 References: <20010524125452.B26997@blueraja.scyld.com> Message-ID: <20010524143728.C26997@blueraja.scyld.com> On Thu, 24 May 2001, Brian C Merrell wrote: > On Thu, 24 May 2001, Sean Dilda wrote: > > > Is there any reason the program itself can't run itself in the special > > way they want? Anything you can do with rlogin or rsh can be done with > > bpsh, except for an interactive shell. However, this can be mimiced > > through bpsh. If you can give me some idea of what they are wanting to > > do, I might be able to help you find a way to do it without requiring an > > interactive shell. Scyld clusters are designed to run background jobs > > on all of the slave nodes, not to run login services for users on the > > slave nodes. > > > > Hmmm. I guess this warrants some background info. > > The cluster is not a new cluster. It was previously built by someone else > who is now gone. The cluster master node crashed, taking the system and > most of their data with it. I am now trying to rebuild the cluster. The > cluster previously used RH6.1 stock and followed more of a NOW model than > a beowulf model, although all the hardware was dedicated to the cluster, > not on people's desks. I'm now trying to use Scyld's distro to bring the > cluster back up. I'm pretty happy with it, and managed to get the master > node up with a SCSI software RAID array, and a few test nodes up with boot > floppies. Seems fine to me. BUT.... > > There are three reasons that they want to be able to rlogin to the > machines: 1) first, there are a number of people with independent > projects who use the cluster. They are used to being able to simply login > to the master, rlogin to a node, and start their projects on one or more > nodes, so that they take up only a chunk of the cluster. 2) Also, at > least one researcher was previously able to and wants to be able to > continue to login to separate nodes and run slightly different (and > sometimes non-parallelizable) programs on his data. 3) ALSO, they have > code that they would rather not change. Ok, I understand now. All of these things can be handled with bpsh. Do you think these people will be happy with doing something like 'rsh ' instead of rsh'ing in to get a shell and then run the command? If so, you could probablly get away with just symlinking /usr/bin/rsh to /usr/bin/bpsh > > > It is possible to use BProc with a full install on every slave node > > however this reduces a lot of the easy administration features we've > > trying to put into our distro. > > > > I just set this up, and realize what you mean. I had to statically define > IP addresses, users, etc. At first it wasn't a pain, but I realized after > the first two that doing all 24 would be. Even though it is now possible > to rlogin to different nodes, it wasn't what I was hoping for. I imagine > it will be particularly unpleasant when software upgrades need to be > performed. :( This is one of the advantages of our software. It is setup in such a way that you don't have to do so much work to keep the slave nodes up to date. > > I'm still hoping to find some happy medium, but I'm going to present these > options to the group and see what they think. The problem is that they > are mathematicians and physicists, not computer people. They really don't > want to have to change, even though it seems to be the same. > > Also one thing I'm still trying to find a solution to: how can the nodes > address each other? Previously they used a hosts file that had listings > for L001-L024 (and they would like to keep it that way) I guess with the > floppy method they don't have to, because the BProc software maps node > numbers to IP addresses, Perhaps you could write some sort of rsh replacement script that turns the L001-L024 names into the BProc node numbers, then call bpsh. Would that be a happy medium? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010524/26758715/attachment.bin From gscluster at hotmail.com Thu May 24 12:04:58 2001 From: gscluster at hotmail.com (Georgia Southern Beowulf Cluster Project) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. Message-ID: Hello Everyone, I'm assembling a new cluster and I want to know what people's experiences are with the new RedHat 7.1. I've assembled a couple with RH6.2, but I'm noticing a fair amount of differences with the 7.x line of RH. Especially, can someone point me in a direction to find good documentation for xinet.d. I can't seem to find much through google. Thank you, Wesley Wells _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com From alvin at Mail.Linux-Consulting.com Thu May 24 13:44:49 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets - cost of ownership... In-Reply-To: <3B0D39FA.A2254070@icase.edu> Message-ID: hi Josip... yes.. cost of bulding and owernship includes at least an almost endless list of stuff to keey the system up... - costs of parts ( off the shelf ) - costs of customized parts - paid to vendors - costs of internal designers - to specify what they want/getting - power it up and see how long to get it going smoothly/reliably - saving all work to another system in case the system loses its mind and starts erasing data ( if you have a bad sdram ... it cn erase data on disks ) - costs to admin the system instead of using the system for work - costs of the building and space and rack - costs of all the office admin and support staff... .. blah .. blah .. - all the hardware costs becomes a miniml fraction of the entire project costs -- off the shelf... if number of servers is important and doing some predefined tasks... Transmeta cursoe seems to be a very good alternative as it supports 24 servers in 3U space.. have fun alvin http://www.Linux-1U.net On Thu, 24 May 2001, Josip Loncaric wrote: > Bari Ari wrote: > > > > We don't envision very dense clusters being priced above the cost of off > > the shelf built clusters. The cost of building one enclosure is less > > than eight enclosures. > > The cost of building an item is NOT the price charged for the item. The > manufacturer's development cost must also be recovered. Dense packaging > adds value not found in the mass market, so hardware vendors can price > them higher even if the unit production cost is actually lower. Some of > the difference goes to their profit, but a lion's share covers their > development costs. > > I remain skeptical about your pricing projections. Until the mass > market for PCs drastically changes, I doubt that we'll see high > performance dense packages selling for the same price as equally capable > commodity alternatives. > > Sincerely, > Josip > > -- > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From alvin at Mail.Linux-Consulting.com Thu May 24 13:51:24 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. In-Reply-To: Message-ID: hi GSBCP yes.... xinetd.d is a mess... turn off all of them thingies.... you have to explicitly say "disable = yes" in each file in /etc/xinetd.d/* to turn that service off - make a set of changes...tar it up and copy it to the other servers... from www.linux-Sec.net/harden.gwif.html, ( middle of the page ) you can find xinetd tutorials/info: http://www.macsecurity.org/resources/xinetd/tutorial.shtml http://cwrulug.cwru.edu/archive/cwrulug/200011/0043.html http://www.synack.net/xinetd have fun alvin On Thu, 24 May 2001, Georgia Southern Beowulf Cluster Project wrote: > Hello Everyone, > > I'm assembling a new cluster and I want to know what people's experiences > are with the new RedHat 7.1. I've assembled a couple with RH6.2, but I'm > noticing a fair amount of differences with the 7.x line of RH. Especially, > can someone point me in a direction to find good documentation for xinet.d. > I can't seem to find much through google. > > Thank you, > > Wesley Wells > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From tibbs at math.uh.edu Thu May 24 14:27:32 2001 From: tibbs at math.uh.edu (Jason L Tibbitts III) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. In-Reply-To: 's message of "Thu, 24 May 2001 13:51:24 -0700 (PDT)" References: Message-ID: >>>>> "A" == writes: A> turn off all of them thingies.... you have to explicitly say "disable = A> yes" in each file in /etc/xinetd.d/* to turn that service off It's probably easier to use chkconfig: cd /etc/xinetd.d; for i in *; do chkconfig $i off; done chkconfig --list will show you all of your services (boot-time and xinetd-based). I'm very happy that Red Hat moved to xinetd because I can finally install and remove packages without worrying about things messing with /etc/inetd.conf. - J< From edwards at icantbelieveimdoingthis.com Thu May 24 21:08:18 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: ; from brian@patriot.net on Thu, May 24, 2001 at 02:07:04PM -0400 References: <20010524125452.B26997@blueraja.scyld.com> Message-ID: <20010524220818.A17790@icantbelieveimdoingthis.com> For what it's worth: I have been using a truly serial code on a very high performance cluster that supports MPI. For some political and some actual reasons, even though I am running seperate serial jobs, I actually run them under MPI. I'm also starting to use a Scyld cluster and the analogies are heartening. MPI is part of Scyld so I would like to state some obvious things. 1. You can assign as many nodes as you want using MPI (Read you can assign one) 2. Installing MPI into existing serial code is very easy. There are about three or four calls. 3. Thanks to Sean Dilda I can now write files to /tmp on each node. I haven't tried to read from /tmp files but I'm guessing this will be straight forward. So, rather than work on some system modification that could be clumbsy, why not modfiy the research code to take advantage of the features of Scyld? I think you could use a combination of bash shell scripts and MPI calls to accomplish what you say you want. Use the bash script to put the appropriate input files on the various nodes and use it to start the mpi job. Then let the computational codes open the files on the various nodes and use them in the specific calculations. This is a slightly different paradigm and I know from my own experience that one would rather simply keep doing what they know. Once I got use to it there was no big deal. Sorry if I wasted your time. Art Edwards On Thu, May 24, 2001 at 02:07:04PM -0400, Brian C Merrell wrote: > On Thu, 24 May 2001, Sean Dilda wrote: > > > Is there any reason the program itself can't run itself in the special > > way they want? Anything you can do with rlogin or rsh can be done with > > bpsh, except for an interactive shell. However, this can be mimiced > > through bpsh. If you can give me some idea of what they are wanting to > > do, I might be able to help you find a way to do it without requiring an > > interactive shell. Scyld clusters are designed to run background jobs > > on all of the slave nodes, not to run login services for users on the > > slave nodes. > > > > Hmmm. I guess this warrants some background info. > > The cluster is not a new cluster. It was previously built by someone else > who is now gone. The cluster master node crashed, taking the system and > most of their data with it. I am now trying to rebuild the cluster. The > cluster previously used RH6.1 stock and followed more of a NOW model than > a beowulf model, although all the hardware was dedicated to the cluster, > not on people's desks. I'm now trying to use Scyld's distro to bring the > cluster back up. I'm pretty happy with it, and managed to get the master > node up with a SCSI software RAID array, and a few test nodes up with boot > floppies. Seems fine to me. BUT.... > > There are three reasons that they want to be able to rlogin to the > machines: 1) first, there are a number of people with independent > projects who use the cluster. They are used to being able to simply login > to the master, rlogin to a node, and start their projects on one or more > nodes, so that they take up only a chunk of the cluster. 2) Also, at > least one researcher was previously able to and wants to be able to > continue to login to separate nodes and run slightly different (and > sometimes non-parallelizable) programs on his data. 3) ALSO, they have > code that they would rather not change. > > > It is possible to use BProc with a full install on every slave node > > however this reduces a lot of the easy administration features we've > > trying to put into our distro. > > > > I just set this up, and realize what you mean. I had to statically define > IP addresses, users, etc. At first it wasn't a pain, but I realized after > the first two that doing all 24 would be. Even though it is now possible > to rlogin to different nodes, it wasn't what I was hoping for. I imagine > it will be particularly unpleasant when software upgrades need to be > performed. :( > > I'm still hoping to find some happy medium, but I'm going to present these > options to the group and see what they think. The problem is that they > are mathematicians and physicists, not computer people. They really don't > want to have to change, even though it seems to be the same. > > Also one thing I'm still trying to find a solution to: how can the nodes > address each other? Previously they used a hosts file that had listings > for L001-L024 (and they would like to keep it that way) I guess with the > floppy method they don't have to, because the BProc software maps node > numbers to IP addresses, > > -brian > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From edwards at icantbelieveimdoingthis.com Thu May 24 21:48:54 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524110528.A26997@blueraja.scyld.com>; from agrajag@scyld.com on Thu, May 24, 2001 at 11:05:28AM -0400 References: <20010524110528.A26997@blueraja.scyld.com> Message-ID: <20010524224854.A18332@icantbelieveimdoingthis.com> > What are you using to run your jobs? If you're using MPI, the rank == 0 > job is always run on the master node (unless you give mpirun the > -nolocal option) I just tried to issue the following command jarrett/home/edwardsa>mpirun -np 4 -nolocal pi3 Failed to exec target program: No such file or directory When I execute jarrett/home/edwardsa>mpirun -np 4 pi3 -nolocal The code runs, but ps -x reveals that it is computing on the head node. I really don't want compute jobs executing on the head node and it seems that -nolocal has no effect. What are my options? Art Edwards P. S. Incidentally, -nolocal doesn't appear on the MPI man page From edwards at icantbelieveimdoingthis.com Thu May 24 22:33:43 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:21 2009 Subject: [Mailer-Daemon@icantbeli Message-ID: <20010524233343.A19069@icantbelieveimdoingthis.com> I have confirmed that I can place files in /tmp on each node, read from them and write to other files in /tmp, so, in principle, I can do anything I want with files. I would like to point out that in a typical multi processor run, node 0 is completely ignored. Only when I ask for n+1 nodes, where n is the number of slave nodes, does node0 activate. How do I assure that node 0 is, well, node 0? Art Edwards On Thu, May 24, 2001 at 02:37:28PM -0400, Sean Dilda wrote: > On Thu, 24 May 2001, Brian C Merrell wrote: > > > On Thu, 24 May 2001, Sean Dilda wrote: > > > > > Is there any reason the program itself can't run itself in the special > > > way they want? Anything you can do with rlogin or rsh can be done with > > > bpsh, except for an interactive shell. However, this can be mimiced > > > through bpsh. If you can give me some idea of what they are wanting to > > > do, I might be able to help you find a way to do it without requiring an > > > interactive shell. Scyld clusters are designed to run background jobs > > > on all of the slave nodes, not to run login services for users on the > > > slave nodes. > > > > > > > Hmmm. I guess this warrants some background info. > > > > The cluster is not a new cluster. It was previously built by someone else > > who is now gone. The cluster master node crashed, taking the system and > > most of their data with it. I am now trying to rebuild the cluster. The > > cluster previously used RH6.1 stock and followed more of a NOW model than > > a beowulf model, although all the hardware was dedicated to the cluster, > > not on people's desks. I'm now trying to use Scyld's distro to bring the > > cluster back up. I'm pretty happy with it, and managed to get the master > > node up with a SCSI software RAID array, and a few test nodes up with boot > > floppies. Seems fine to me. BUT.... > > > > There are three reasons that they want to be able to rlogin to the > > machines: 1) first, there are a number of people with independent > > projects who use the cluster. They are used to being able to simply login > > to the master, rlogin to a node, and start their projects on one or more > > nodes, so that they take up only a chunk of the cluster. 2) Also, at > > least one researcher was previously able to and wants to be able to > > continue to login to separate nodes and run slightly different (and > > sometimes non-parallelizable) programs on his data. 3) ALSO, they have > > code that they would rather not change. > > Ok, I understand now. All of these things can be handled with bpsh. > Do you think these people will be happy with doing something like 'rsh > ' instead of rsh'ing in to get a shell and then run the > command? If so, you could probablly get away with just symlinking > /usr/bin/rsh to /usr/bin/bpsh > > > > > It is possible to use BProc with a full install on every slave node > > > however this reduces a lot of the easy administration features we've > > > trying to put into our distro. > > > > > > > I just set this up, and realize what you mean. I had to statically define > > IP addresses, users, etc. At first it wasn't a pain, but I realized after > > the first two that doing all 24 would be. Even though it is now possible > > to rlogin to different nodes, it wasn't what I was hoping for. I imagine > > it will be particularly unpleasant when software upgrades need to be > > performed. :( > > This is one of the advantages of our software. It is setup in such a > way that you don't have to do so much work to keep the slave nodes up to > date. > > > > > I'm still hoping to find some happy medium, but I'm going to present these > > options to the group and see what they think. The problem is that they > > are mathematicians and physicists, not computer people. They really don't > > want to have to change, even though it seems to be the same. > > > > Also one thing I'm still trying to find a solution to: how can the nodes > > address each other? Previously they used a hosts file that had listings > > for L001-L024 (and they would like to keep it that way) I guess with the > > floppy method they don't have to, because the BProc software maps node > > numbers to IP addresses, > > Perhaps you could write some sort of rsh replacement script that turns > the L001-L024 names into the BProc node numbers, then call bpsh. Would > that be a happy medium? ----- End forwarded message ----- From ds10025 at hermes.cam.ac.uk Fri May 25 01:18:24 2001 From: ds10025 at hermes.cam.ac.uk (ds10025@hermes.cam.ac.uk) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets - cost of ownership... Message-ID: <2939826374.990782304@DATABUG> Are they any Universities or colleges both in the States & UK are prepare to run courses in building low cost custom cluster cabinets? Dan --On 24 May 2001, 13:44 -0700 alvin@Mail.Linux-Consulting.com wrote: > > hi Josip... > > yes.. cost of bulding and owernship includes at least > an almost endless list of stuff to keey the system up... > - costs of parts ( off the shelf ) > - costs of customized parts - paid to vendors > - costs of internal designers - to specify what they want/getting > - power it up and see how long to get it going smoothly/reliably > - saving all work to another system in case the system loses > its mind and starts erasing data > ( if you have a bad sdram ... it cn erase data on disks ) > - costs to admin the system instead of using the system for work > - costs of the building and space and rack > - costs of all the office admin and support staff... > .. blah .. blah .. > > - all the hardware costs becomes a miniml fraction of the entire > project costs > > -- off the shelf... if number of servers is important and doing > some predefined tasks... Transmeta cursoe seems to be a very good > alternative as it supports 24 servers in 3U space.. > > have fun > alvin > http://www.Linux-1U.net > > > On Thu, 24 May 2001, Josip Loncaric wrote: > >> Bari Ari wrote: >> > >> > We don't envision very dense clusters being priced above the cost of off >> > the shelf built clusters. The cost of building one enclosure is less >> > than eight enclosures. >> >> The cost of building an item is NOT the price charged for the item. The >> manufacturer's development cost must also be recovered. Dense packaging >> adds value not found in the mass market, so hardware vendors can price >> them higher even if the unit production cost is actually lower. Some of >> the difference goes to their profit, but a lion's share covers their >> development costs. >> >> I remain skeptical about your pricing projections. Until the mass >> market for PCs drastically changes, I doubt that we'll see high >> performance dense packages selling for the same price as equally capable >> commodity alternatives. >> >> Sincerely, >> Josip >> >> -- >> Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu >> ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ >> NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov >> Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From MAHRF at de.ibm.com Fri May 25 01:14:54 2001 From: MAHRF at de.ibm.com (MAHRF@de.ibm.com) Date: Wed Nov 25 01:01:21 2009 Subject: ifenslave error in Channel Bonding Message-ID: Hi! I'm no expert in bonding, but maybe it works if you try it in the opposite order. First bond, then ifconfig your device up. Best regards Ferdinand From alvin at Mail.Linux-Consulting.com Fri May 25 01:24:05 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: custom cluster cabinets - cost of ownership... In-Reply-To: <2939826374.990782304@DATABUG> Message-ID: hi ds10025 costs is relative to what you would like to hve...vs what it costs to get the hardware... - these hard costs are NOT too negotiable... though yu can make specific minimum specs to keep your within oyur budget or grow the budge support and operating costs are variable and can be adjusted up or down to keep teh servers up and running.. no courses for this stuff ?? ... just a simple list of cost of ownership analysis??? you can probably build a custom cabinet for abut $500 or so.. vs the general production cabinets at $1000 -$2000 you see... just deends on what you keep and what you throw out.. majority is cosmetics and governmental requirements that adds costs to your cabinets have fun alvin http://www.Linux-1U.net/Racks ... list of cabinet manufacturers On Fri, 25 May 2001 ds10025@hermes.cam.ac.uk wrote: > Are they any Universities or colleges both in the States & UK are prepare to > run courses in building low cost custom cluster cabinets? > > Dan > --On 24 May 2001, 13:44 -0700 alvin@Mail.Linux-Consulting.com wrote: > > > > > hi Josip... > > > > yes.. cost of bulding and owernship includes at least > > an almost endless list of stuff to keey the system up... > > - costs of parts ( off the shelf ) > > - costs of customized parts - paid to vendors > > - costs of internal designers - to specify what they want/getting > > - power it up and see how long to get it going smoothly/reliably > > - saving all work to another system in case the system loses > > its mind and starts erasing data > > ( if you have a bad sdram ... it cn erase data on disks ) > > - costs to admin the system instead of using the system for work > > - costs of the building and space and rack > > - costs of all the office admin and support staff... > > .. blah .. blah .. > > > > - all the hardware costs becomes a miniml fraction of the entire > > project costs > > > > -- off the shelf... if number of servers is important and doing > > some predefined tasks... Transmeta cursoe seems to be a very good > > alternative as it supports 24 servers in 3U space.. > > From rgb at phy.duke.edu Fri May 25 04:12:56 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. In-Reply-To: Message-ID: On Thu, 24 May 2001 alvin@Mail.Linux-Consulting.com wrote: > > hi GSBCP > > yes.... xinetd.d is a mess... Not exactly a mess -- it is just different. In one sense every change like this is egregious -- things were perfectly easy to manage with /etc/inetd.conf and /etc/rc.[0-6] (BSD/SunOS-style) if you knew what you were doing. On the other hand, things are perfectly easy to manage with /etc/rc.d/init.d and /etc/xinetd.d, and the layout is arguably more self-consistent. If you know what you are doing. The only problem is learning about the differences. The GOOD thing about 7.1 is it forces you to deal with services to some extent during a standard install, and its standard install configuration is a firewalled configuration with pretty much only sshd punched through. This is actually just about right for a node or a non-server desktop anyway. chkconfig --list now lists the boot-configuration status of xinetd.d based services AS WELL AS init.d based services. So it is easy to see when services are on or off. One thing to remember is that one has to send xinetd either SIGUSR1 or SIGUSR2 to force a soft or hard restart after reconfiguring to force the changes to take effect (read man xinetd to see the difference). Or you can follow the Red Hat Way and do /etc/init.d/xinetd restart. Don't get me wrong -- I still find quite a bit of RH's boot and services and configuration layout immensely annoying. In particular, they seem to have mixed configuration scripts, configuration data, and boot scripts up in strange ways. In many ways I vastly prefer the "old days" when all the configuration parameters could be set by hand in a few flatfiles in /etc (toplevel). I also think that things like /etc/sysconfig/network-scripts are an abomination and offense in the eyes of the Penguin -- these are not things intended to be managed by human hand, which takes RH farther and farther from the Unix Way (front end all you like, but leave the basic interface hand-manageable). Still, Unix has always been expert-friendly, and this just adds yet-another-bloody-interface to learn to manage, with a few advantages and a few disadvantages. Sigh. rgb > > turn off all of them thingies.... > you have to explicitly say "disable = yes" in each file > in /etc/xinetd.d/* to turn that service off > > - make a set of changes...tar it up and copy it to > the other servers... > > from www.linux-Sec.net/harden.gwif.html, ( middle of the page ) > you can find xinetd tutorials/info: > > http://www.macsecurity.org/resources/xinetd/tutorial.shtml > http://cwrulug.cwru.edu/archive/cwrulug/200011/0043.html > http://www.synack.net/xinetd > > have fun > alvin > > > On Thu, 24 May 2001, Georgia Southern Beowulf Cluster Project wrote: > > > Hello Everyone, > > > > I'm assembling a new cluster and I want to know what people's experiences > > are with the new RedHat 7.1. I've assembled a couple with RH6.2, but I'm > > noticing a fair amount of differences with the 7.x line of RH. Especially, > > can someone point me in a direction to find good documentation for xinet.d. > > I can't seem to find much through google. > > > > Thank you, > > > > Wesley Wells > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From alvin at Mail.Linux-Consulting.com Fri May 25 04:19:45 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. In-Reply-To: Message-ID: hi ya... i like the /etc/rc.d/init.d/foo stop and start method i didnt like that all the individual services are now explicity defined in each file in /etc/xinetd.d - went around to each file ... one by one to check it... ( to explicitly disable it ) where as, in the old setup, all the config items was all in one file /etc/inetd.conf. though it was not really confiurable/flexible but was easy to add stuff and add/delete services ) at least its easily fixable a dozen different ways ??? c ya alvin http://www.Linux-1U.net On Fri, 25 May 2001, Robert G. Brown wrote: > On Thu, 24 May 2001 alvin@Mail.Linux-Consulting.com wrote: > > > > > hi GSBCP > > > > yes.... xinetd.d is a mess... > > Not exactly a mess -- it is just different. In one sense every change > like this is egregious -- things were perfectly easy to manage with > /etc/inetd.conf and /etc/rc.[0-6] (BSD/SunOS-style) if you knew what you > were doing. On the other hand, things are perfectly easy to manage with > /etc/rc.d/init.d and /etc/xinetd.d, and the layout is arguably more > self-consistent. If you know what you are doing. The only problem is > learning about the differences. > > From rgb at phy.duke.edu Fri May 25 06:37:55 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. In-Reply-To: Message-ID: On Fri, 25 May 2001 alvin@Mail.Linux-Consulting.com wrote: > > hi ya... > > i like the /etc/rc.d/init.d/foo stop and start method > > i didnt like that all the individual services are > now explicity defined in each file in /etc/xinetd.d > - went around to each file ... one by one to > check it... ( to explicitly disable it ) Or use the chkconfig one-liner as previously mentioned. I think that there are some decent reasons for the xinetd changes: a) By putting all inetd configuration data for a service in a single file per service, it becomes very easy to package inetd-based services. Before, one had to write clever sed-ism's like (the uninstall part of procstatd.spec): %postun if grep "^procstatd" /etc/inetd.conf > /dev/null then # There is a procstatd line installed. Remove it (and the # rest of the procstatd modifications). Should be no need to # backup the files as I'm just going to sed them outta there. sed -e "/^# procstatd/d;/^procstatd/d" /etc/inetd.conf > /tmp/inetd.conf.tmp sed -e "/^procstatd/d" /etc/services > /tmp/services.tmp sed -e "/^procstatd/d" /etc/hosts.allow > /tmp/hosts.allow.tmp mv /tmp/inetd.conf.tmp /etc/inetd.conf mv /tmp/services.tmp /etc/services mv /tmp/hosts.allow.tmp /etc/hosts.allow killall -HUP inetd fi Note that this is a real pain and fraught with terror for error -- lots of ways to screw things up royally if the system dies in mid-script, even though the script fragment is written to be at least moderately robust agains this. Ditto on the install side -- one has to remove any old procstatd lines from /etc/inetd.conf and put in the new one (as the default configuration might have changed) while saving the old as /etc/inetd.conf.rpmsave, but even this latter isn't stable as one might be installing a string of RPM's and each one might overwrite the one saved by the original. If you HAD hand-tuned procstatd's inetd port or ownership or permissions, by the time you got around to restoring it after a reinstall or upgrade the original line might long be gone. In the next procstatd release (coming soon) this will all be much simpler -- just pop in /etc/xinetd/procstatd on install (saving the old one in /etc/xinetd/procstatd.rpmsave), remove it on uninstall. b) It allows inetd to be managed with chkconfig. This is actually not a bad thing at all -- a single tool provides a consistent user interface. chkconfig --list shows you at a glance what is on and what is off and where/when everything is started for BOTH script-started services and xinetd-managed services. And of course there can be higher order GUI tools and such for novices or ex-Windows or Mac users who don't do command lines. c) I personally never really cared for an /etc/inetd.conf that was basically all commented out anyway, with just one or two functional lines (if that) at the very bottom. Messy and inefficient and dangerous (easy to miss commmenting something out on one client in a department-sized network), but those lines get in there pretty much no matter what in a default install followed by years of updating, upgrading, adding this and removing that. This layout encourages the removal of irrelevant packages and their xinetd.d configuration, or better their non-inclusion in a kickstart-based install. It further encourages the separate packaging of each separate service rather than packaging in an "inetd" package that contains all of them at once (but turned off by default). You want/need certain services in your particular environment? Include them (and ONLY them) in your kickstart configuration and you don't have to worry about what is and isn't commented out -- the service just ain't on the system. (A special bonus is that one can do an rpm --erase linuxconf and have it go AWAY and/or keep it OUT of a kickstart configuration -- this is one that seems to come back like a bad penny and add a line to /etc/inetd.conf each time it comes back. I've had as many as two or three commented out linuxconf lines in old systems after upgrades.) It still isn't perfect. If I were to whine about xinetd it would primarily be about documentation -- there is a man page for xinetd and another for xinetd.conf which aren't bad, but they don't explain the chkconfig mechanism at all and xinetd is different enough that it really deserves a HOWTO to help buffer the change -- and the format of /etc/inetd.d/whatever. It should be done in XML, goddammit. Everything should be moving there at this point. The latter is a personal religious gripe. In the Beginning there was the Flatfile, formed in the Bounty of Diskful Space, and it was Good. It allowed information to be easily read and edited by the administrator with a Single Universal Tool, the Editor. The Good of the Flatfile, however, contained the Seeds of Evil. A Babel of Cacaphonous Voices arose creating configuration files with ":" as separators in this, "\t" as separators in that. New Fields were Added, and old Fields were Taken Away as systems Evolved to a Higher State of Being. Soon the Users and Administrators in all the Lands were Crying out under the Burden of this Evil. What Began as a Good Idea of Universal Access to Power had created a Cult of the Guru, for veritably only a Guru could keep straight the Fields and the Separators and the Formats of lo, the many Files. Then the Wall said, Let There Be Perl and there was Perl, and lo, regular expression parsing became easy at the script level. This removed some of the Oppression from the shoulders of the Administrators across the land, but if anything the Evil Power of the Cult grew stronger, as Deep Knowledge of the Ways was required to write the appropriate Perl Incantations to manage simple systems functions. This same Evil existed in the Other Realms of Computing. Complexity grew everywhere, but Control was left to the Whim of the application designer as each application had its own Interface. Parsing of the Interface required Dark Secret Lore known only to the most puissant and powerful Wizards, who thereby commanded great Sacrifice and Tribute. Finally, those high in the privy councils of the Web and Service design, who sought to bring the Power of the Universal Interface to Application Design spoke, and said Parsing Each Tom, Dick and Harry's Flatfile Data is Darkest Evil and Not Extensible or Portable. They meditated and prayed and finally said, Let There be XML and lo, there was XML. Among the gods, the Sun god, the Daemon god, and the Penguin looked on XML and Smiled, for it was Good. At last there was an Interface Standard (and tools for parsing it) and its Dream of the Universal Portable Interface was one step closer to reality. Although the god of all Windows plotted and schemed to seize the Power of XML and suborn it his Dark Schemes, the strength of the rest of the gods was United and Opposed -- XML remained Open and Free. Alas, the Wizards were Steeped in the Ways of Sin. They would not easily give up their bedamned foo bar (tab separated pairs), their foo,bar (comma separated pairs), their foo = bar (equal sign separated pairs), their foo bar (whitespace separated pairs), their foo:bar (colon separated pairs), or evillest of all, their foo bar fields=fubar for:real (a bunch of crap strung out on a single line) where adding a new field or using a space instead of a tab created great Torment and Misery to Administrators and Users everywhere (as the Dream of the Penguin was to make every User an Administrator while preserving the Power and Freedom of the Fountain of Open Source and so make the Evil of Windows dissipate like the Mist of a Dream). But it was not to be. Not Yet. The Penguin Sleeps, but in its Mind's Eye it beholds these Evildoers, these Necromantic Wizards who turn away from the Light of XML, and one day they shall be cast into the Outer Darkness and forced to work as Windows Programmers and MCSEs if they do not turn away from Sin and convert their configuration and control interfaces to XML. Yea, a special fiery furnace awaits those who perpetuate the Misery that is nameservice, the Hell that is Pam, even the Grinding Agony that is /etc/passwd. Oblivion is the Best that could befall all who Shun the Light that is XML. Seriously, life would be so infinitely better if all systems configuration files were converted to an XML-compliant structural layout and parsed in the associated services and front end configurators and managers with standard XML parsing tools. It would lengthen the files, sure, but it would also make checking syntactical correctness trivial, eliminate all possible confusion over which column is which field, get rid of the tab vs equal vs whitespace vs comma vs period crap, and make it possible (nay, EASY) to build a universal configurator tool that dynamically builds a GUI application configurator out of the XML-based configuration file itself (just parse the damn thing, identify the fields, create a matching layout of column-or-otherwise labelled entrybox widgets, pop up the window, done). Right now the "universal configurator tool" is the editor of your choice, but without XML you have to basically know exactly what the each field is for whatever you are editing (and what the correct formatting is and what matters in the syntax). This sucks. No wonder non-Unix gurus think that Unixoid systems are "difficult" to figure out... rgb > > where as, in the old setup, all the config items > was all in one file /etc/inetd.conf. though it was not really > confiurable/flexible but was easy to add stuff and add/delete services ) > > at least its easily fixable a dozen different ways ??? > > c ya > alvin > http://www.Linux-1U.net > > > > On Fri, 25 May 2001, Robert G. Brown wrote: > > > On Thu, 24 May 2001 alvin@Mail.Linux-Consulting.com wrote: > > > > > > > > hi GSBCP > > > > > > yes.... xinetd.d is a mess... > > > > Not exactly a mess -- it is just different. In one sense every change > > like this is egregious -- things were perfectly easy to manage with > > /etc/inetd.conf and /etc/rc.[0-6] (BSD/SunOS-style) if you knew what you > > were doing. On the other hand, things are perfectly easy to manage with > > /etc/rc.d/init.d and /etc/xinetd.d, and the layout is arguably more > > self-consistent. If you know what you are doing. The only problem is > > learning about the differences. > > > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rsand at d.umn.edu Fri May 25 07:28:29 2001 From: rsand at d.umn.edu (Robert Sand) Date: Wed Nov 25 01:01:21 2009 Subject: PVM with a SCYLD cluster. Message-ID: <3B0E6C0D.19A7C5DC@d.umn.edu> Hello all, I have a customer that is more familiar with using pvm rather than mpi so I need some instructions on how to get pvm working with the SCYLD cluster. Is there anyone out there using pvm on a scyld cluster and if so can I get instructions to get pvm to work with the cluster? TIA. -- Robert Sand. mailto:rsand@d.umn.edu US Mail University of Minnesota Duluth 10 University Dr. Information Technology Systems and Services MWAH 176 144 MWAH Duluth, MN 55812 Phone 218-726-6122 fax 218-726-7674 "Walk behind me I may not lead, Walk in front of me I may not follow, Walk beside me and we walk together" UTE Tribal proverb. From Eugene.Leitl at lrz.uni-muenchen.de Fri May 25 09:33:16 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene.Leitl@lrz.uni-muenchen.de) Date: Wed Nov 25 01:01:21 2009 Subject: [Fwd: Announcing M-VIA 1.2b1] Message-ID: <3B0E894C.5DAD20E1@lrz.uni-muenchen.de> -------- Original Message -------- From: Michael Welcome Subject: Announcing M-VIA 1.2b1 To: via-announce@nersc.gov Announcing the release of M-VIA version 1.2b1. This is a beta release which includes support for the (optional) VIPL peer-to-peer connection protocol, as well as support for the Linux 2.2 and Linux 2.4 kernels. The distribution has been split into multiple tarballs: mvia-1.2b1.tar.gz M-VIA core distribution mvia-devs-2.2-1.2b1.0.tar.gz Linux 2.2 driver set mvia-devs-2.4-1.2b1.0.tar.gz Linux 2.4 driver set The M-VIA core tarball contains the source code for VIPL, the Kernel Agent as well as the ERing and Loopback drivers. The driver sets contain the Fast Ethernet and Gigabit Ethernet drivers specific to Linux 2.2 and Linux 2.4. You will need to download both the core distribution and one (or both) of the driver sets. See the "INSTALL" in the top level directory of the M-VIA distribution for installation instructions. This is a beta release of M-VIA 1.2, which will also contain a reliable delivery option not included in this release. For download instructions, see http://www.nersc.gov/research/FTG/via. From carlos at baldric.uwo.ca Fri May 25 07:39:20 2001 From: carlos at baldric.uwo.ca (Carlos O'Donell Jr.) Date: Wed Nov 25 01:01:21 2009 Subject: [OT] rgb - XML Admin & Heterogenous Clusters [Was: Re: RH7.1 question.] Message-ID: <20010525103920.A1793@megatonmonkey.net> rgb, Excellent rant. I'm part of a student group at the University of Western Ontario, and we are in the process of building a 48 Node PA-RISC cluster (one of the latest and greatest linux ports, enabling us to use aging hardware for a higher moral purpose!). Lately, any tool that I develop has some type of well documented XML interface. And if performance/size is ever an issue, we rely on changing the IO stubs and writing binary XML (another open standard). In a beowulf-ian utopia, heterogeneous systems will have to interoperate on a level that we can't yet imagine. My utility will require a few bobs of information from your utility. What format do we impose? XML. Can my utility validate the output of your utility? Sure. Use and XSD for that document. How many lines of code does it take to validate your own configuration files? About 3. Instansiate the schema, load the document, check it. Most of us have homogeneous clusters to administer. Imagine if you had a large 200 node hetergenous cluster? It might be possible that not all your nodes run Linux :) Administration nightmare? I think most people would need effexor or naxin atleast to battle the migraines and depression from such a cluster ;) I await the day, like a patient disciple, when everything will speak, read and understand an open communication format ... XML. When browsers will enforece, and refuse to view malformed pages. When all is quiet on the western front. (and possibly when mozilla runs as fast lynx, but with all of it's creature comforts!) Peace, Carlos. ------------------------- Baldric Project University of W. Ontario http://www.baldric.uwo.ca ------------------------- From edwards at icantbelieveimdoingthis.com Fri May 25 07:52:47 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <3B0E6DF4.8B70C329@cab.cnea.gov.ar>; from darie@cab.cnea.gov.ar on Fri, May 25, 2001 at 11:36:36AM -0300 References: <20010524110528.A26997@blueraja.scyld.com> <20010524224854.A18332@icantbelieveimdoingthis.com> <3B0E6DF4.8B70C329@cab.cnea.gov.ar> Message-ID: <20010525085247.A26835@icantbelieveimdoingthis.com> On Fri, May 25, 2001 at 11:36:36AM -0300, Enzo Dari wrote: > Art Edwards wrote: > > ... > > I just tried to issue the following command > > > > jarrett/home/edwardsa>mpirun -np 4 -nolocal pi3 > > Failed to exec target program: No such file or directory > > ... > It seems that mpirun can't find "pi3" in the nodes. > Did you try with absolute pathnames? (i.e. /home/edwardsa/pi3) I did just try that with the same result. Interestingly, when I execute with -local it works. I'm guessing that Scyld MPI has not enabled the -nolocal option. > > > When I execute > > > > jarrett/home/edwardsa>mpirun -np 4 pi3 -nolocal > > > > The code runs, but ps -x reveals that it is computing on the head node. I > > really don't want compute jobs executing on the head node and it seems that > > -nolocal has no effect. What are my options? > > ... > In the command above -nolocal has no effect on mpirun, it is > passed as an argument to program pi3 Thanks for pointing that out. I should have known that. > > > ... > > P. S. Incidentally, -nolocal doesn't appear on the MPI man page > > ... > It should appear in the mpirun man page. BTW the MPI standard > says nothing about how a program should be started. It happens > that most implementations of MPI have the command mpirun to > start n copies of the same program. > > -- > Saludos, > O__ > Enzo. ,>/ > ____________________________________________________()_\()____ > Enzo A. Dari | Instituto Balseiro / Centro Atomico Bariloche > 8400-S. C. de Bariloche, Argentina | darie@cab.cnea.gov.ar > Phone: 54-2944-445208, 54-2944-445100 Fax: 54-2944-445299 > Web page: http://cabmec1.cnea.gov.ar/darie/darie.htm From sgaudet at angstrommicro.com Fri May 25 08:10:32 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium Message-ID: <990803432.3b0e75e87c860@localhost> FYI:http://www.zdnet.com/zdnn/stories/news/0,4586,2765266,00.html Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From ptelegin at jscc.ru Fri May 25 08:18:27 2001 From: ptelegin at jscc.ru (Pavel Telegin) Date: Wed Nov 25 01:01:21 2009 Subject: ifenslave error in Channel Bonding References: <200105240102.VAA12949@me1.eng.wayne.edu> Message-ID: <00d101c0e52d$f43b7440$f728d0c3@ptelegin> Hi Hao He, What exact model of 3c905 cards do you have? We had a problem when we received by mistake 3c905CX. With RH 6.2 distribution there were the following symptoms 1. 3c59x driver did not work with the card, only 3c90x did. 2. When I attempted to bond I got a similar problem. I did not investigate the problem further, because after replacing cards to 3c905B the problem has gone. ----- Original Message ----- From: Hao He To: Sent: Thursday, May 24, 2001 5:55 AM Subject: ifenslave error in Channel Bonding > Hi, all. > > I am trying to bond our cluster with 3C905 cards. > Since my Linux distribution is SuSE 6.1 (2.2.5 kernel upgraded to 2.4.4), I have to run ifconfig and ifenslave at command line. > Finally I got success in one try, I think, but failed in all others. I am confused. > Here are the details. When I ran > ifconfig bond0 192.168.1.1 up > No error prompted. When I check ifconfig, I find that bond0 got IP 192.168.1.1 and > HWADDR is 00:00:00:00:00:00. Seems it is OK. Then I ran > ifenslave bond0 eth0 > I got following error message: > SIOCSIFHWADDR on bond0 failed: Device or resource busy. > The master device bond0 is busy: it must be idle before running this command. > What's wrong? > > Could you tell me how to correct this problem? > Your advice will be highly appreciated. Thanks a lot! 8-) > > Best regards, > Hao He > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From bari at onelabs.com Fri May 25 08:28:24 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <990803432.3b0e75e87c860@localhost> Message-ID: <3B0E7A18.1080803@onelabs.com> Steve Gaudet wrote: > FYI:http://www.zdnet.com/zdnn/stories/news/0,4586,2765266,00.html > > "Initial price lists indicated that the chip would range in cost from $4,227 for an 800MHz Itanium with 4MB of performance-enhancing tertiary cache memory to over $3,500 for a 733MHz Itanium with 2MB of tertiary cache." Unless the floating point performance is 22X a PIII or Athlon or until the IA-64 prices plummet I don't think we'll be seeing many of these in the Beowulf world :-) Bari From keithu at parl.clemson.edu Fri May 25 08:49:30 2001 From: keithu at parl.clemson.edu (Keith Underwood) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524224854.A18332@icantbelieveimdoingthis.com> Message-ID: Since no one else seems to have answered yet... You can use a p4pg file located on node 0 with a command line (under bash) that looks like this: NO_INLINE_MPIRUN=true bpsh 0 app -p4pg /tmp/p4pgfile Keith On Thu, 24 May 2001, Art Edwards wrote: > > What are you using to run your jobs? If you're using MPI, the rank == 0 > > job is always run on the master node (unless you give mpirun the > > -nolocal option) > > I just tried to issue the following command > > jarrett/home/edwardsa>mpirun -np 4 -nolocal pi3 > Failed to exec target program: No such file or directory > > When I execute > > jarrett/home/edwardsa>mpirun -np 4 pi3 -nolocal > > The code runs, but ps -x reveals that it is computing on the head node. I > really don't want compute jobs executing on the head node and it seems that > -nolocal has no effect. What are my options? > > Art Edwards > > P. S. Incidentally, -nolocal doesn't appear on the MPI man page > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University From agrajag at scyld.com Fri May 25 08:50:57 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524224854.A18332@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Thu, May 24, 2001 at 10:48:54PM -0600 References: <20010524110528.A26997@blueraja.scyld.com> <20010524224854.A18332@icantbelieveimdoingthis.com> Message-ID: <20010525115057.A14374@blueraja.scyld.com> On Thu, 24 May 2001, Art Edwards wrote: > When I execute > > jarrett/home/edwardsa>mpirun -np 4 pi3 -nolocal > > The code runs, but ps -x reveals that it is computing on the head node. I > really don't want compute jobs executing on the head node and it seems that > -nolocal has no effect. What are my options? BProc masquerades your process pids so that you can still see the pids for all the processes on the slave nodes on the master node. If you want to see which node the processes are running on, you can either do 'bpstat -p' which will just tell you all the masqueraded pids and where they're running, or you can do 'ps -x|bpstat -P' This will modify the output of ps so that there's an extra column at the beginning of the line that shows the node the process is running on. (if the column is empty, it means the process is running on the master node) > > Art Edwards > > P. S. Incidentally, -nolocal doesn't appear on the MPI man page It should be in 'man mpirun' -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010525/ddc06d6c/attachment.bin From brian at patriot.net Fri May 25 08:45:02 2001 From: brian at patriot.net (Brian C Merrell) Date: Wed Nov 25 01:01:21 2009 Subject: Scyld, local access to nodes, and master node as compute node In-Reply-To: <20010524220818.A17790@icantbelieveimdoingthis.com> Message-ID: To be honest, I agree that this would be the way to go. But they have fairly specific requirements. Also, they want to use the PGI CDK because of its fortran 90 compiler. This presents a problem since the PGI stuff (v 3.1) won't install on the Scyld nodes. Also, I don't seem to be able to install any PVM stuff and have it work with the Scyld nodes, although I've only been tinkering with that so far. I've started installing a mix of full nodes and partial (Scyld) nodes to see how I can resolve my issues, since I would prefer to use the scyld stuff too (the BProc stuff is pretty slick). I expect two more days of hard work and maybe a few hours on top of that of little things. (BTW, I'm doing this on contract :) Thanks for everyone's help. Hope to have either a successful report soon, but if not then just more questions. :) -brian -- Brian C. Merrell P a t r i o t N e t Systems Staff brian@patriot.net http://www.patriot.net (703) 277-7737 PatriotNet ICBM address: 38.845 N, 77.3 W On Thu, 24 May 2001, Art Edwards wrote: > For what it's worth: > > I have been using a truly serial code on a very high performance cluster that > supports MPI. For some political and some actual reasons, even though I am > running seperate serial jobs, I actually run them under MPI. I'm also starting > to use a Scyld cluster and the analogies are heartening. MPI is part of > Scyld so I would like to state some obvious things. > > 1. You can assign as many nodes as you want using MPI (Read you can assign one) > > 2. Installing MPI into existing serial code is very easy. There are about three > or four calls. > > 3. Thanks to Sean Dilda I can now write files to /tmp on each node. I haven't > tried to read from /tmp files but I'm guessing this will be straight > forward. > > So, rather than work on some system modification that could be clumbsy, why > not modfiy the research code to take advantage of the features of Scyld? > > I think you could use a combination of bash shell scripts and MPI calls to > accomplish what you say you want. Use the bash script to put the appropriate > input files on the various nodes and use it to start the mpi job. Then let > the computational codes open the files on the various nodes and use them in > the specific calculations. > > This is a slightly different paradigm and I know from my own experience that > one would rather simply keep doing what they know. Once I got use to it there > was no big deal. > > Sorry if I wasted your time. > > Art Edwards > On Thu, May 24, 2001 at 02:07:04PM -0400, Brian C Merrell wrote: > > On Thu, 24 May 2001, Sean Dilda wrote: > > > > > Is there any reason the program itself can't run itself in the special > > > way they want? Anything you can do with rlogin or rsh can be done with > > > bpsh, except for an interactive shell. However, this can be mimiced > > > through bpsh. If you can give me some idea of what they are wanting to > > > do, I might be able to help you find a way to do it without requiring an > > > interactive shell. Scyld clusters are designed to run background jobs > > > on all of the slave nodes, not to run login services for users on the > > > slave nodes. > > > > > > > Hmmm. I guess this warrants some background info. > > > > The cluster is not a new cluster. It was previously built by someone else > > who is now gone. The cluster master node crashed, taking the system and > > most of their data with it. I am now trying to rebuild the cluster. The > > cluster previously used RH6.1 stock and followed more of a NOW model than > > a beowulf model, although all the hardware was dedicated to the cluster, > > not on people's desks. I'm now trying to use Scyld's distro to bring the > > cluster back up. I'm pretty happy with it, and managed to get the master > > node up with a SCSI software RAID array, and a few test nodes up with boot > > floppies. Seems fine to me. BUT.... > > > > There are three reasons that they want to be able to rlogin to the > > machines: 1) first, there are a number of people with independent > > projects who use the cluster. They are used to being able to simply login > > to the master, rlogin to a node, and start their projects on one or more > > nodes, so that they take up only a chunk of the cluster. 2) Also, at > > least one researcher was previously able to and wants to be able to > > continue to login to separate nodes and run slightly different (and > > sometimes non-parallelizable) programs on his data. 3) ALSO, they have > > code that they would rather not change. > > > > > It is possible to use BProc with a full install on every slave node > > > however this reduces a lot of the easy administration features we've > > > trying to put into our distro. > > > > > > > I just set this up, and realize what you mean. I had to statically define > > IP addresses, users, etc. At first it wasn't a pain, but I realized after > > the first two that doing all 24 would be. Even though it is now possible > > to rlogin to different nodes, it wasn't what I was hoping for. I imagine > > it will be particularly unpleasant when software upgrades need to be > > performed. :( > > > > I'm still hoping to find some happy medium, but I'm going to present these > > options to the group and see what they think. The problem is that they > > are mathematicians and physicists, not computer people. They really don't > > want to have to change, even though it seems to be the same. > > > > Also one thing I'm still trying to find a solution to: how can the nodes > > address each other? Previously they used a hosts file that had listings > > for L001-L024 (and they would like to keep it that way) I guess with the > > floppy method they don't have to, because the BProc software maps node > > numbers to IP addresses, > > > > -brian > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From j.c.burton at gats-inc.com Fri May 25 08:46:04 2001 From: j.c.burton at gats-inc.com (John Burton) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: <3B0E7A18.1080803@onelabs.com> References: <990803432.3b0e75e87c860@localhost> <3B0E7A18.1080803@onelabs.com> Message-ID: <20010525.15460400@piper.gats-inc.com> Later on in the article it mentions that "actual" (not initial) prices would be more along the lines of $1000 for the 733Mhz... >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 5/25/01, 11:28:24 AM, Bari Ari wrote regarding Re: Intel is finally shipping the 64-bit Itanium: > Steve Gaudet wrote: > > FYI:http://www.zdnet.com/zdnn/stories/news/0,4586,2765266,00.html > > > > > "Initial price lists indicated that the chip would range in cost from > $4,227 for an 800MHz Itanium with 4MB of performance-enhancing tertiary > cache memory to over $3,500 for a 733MHz Itanium with 2MB of tertiary > cache." > Unless the floating point performance is 22X a PIII or Athlon or until > the IA-64 prices plummet I don't think we'll be seeing many of these in > the Beowulf world :-) > Bari > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bari at onelabs.com Fri May 25 09:07:31 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <990803432.3b0e75e87c860@localhost> <3B0E7A18.1080803@onelabs.com> <20010525.15460400@piper.gats-inc.com> Message-ID: <3B0E8343.4010708@onelabs.com> John Burton wrote: > > Later on in the article it mentions that "actual" (not initial) prices > would be more along the lines of $1000 for the 733Mhz... > The Itaniums will make for nice SMP clusters though with its fast front side bus. Multiple IA-64s sharing a FSB along with many GB of shared memory and Infinband for very fast interconnects will be nice. Clusters will probably have a completely different face a year from now. It will probably be another year until the AMD Hammer series comes out that there is a real price drop in the processors. Bari From tony at MPI-Softtech.Com Fri May 25 11:31:10 2001 From: tony at MPI-Softtech.Com (Tony Skjellum) Date: Wed Nov 25 01:01:21 2009 Subject: Running Giganet cLAN on Linux 2.4 Kernel Message-ID: Folks, has anyone successfully hacked the open source cLAN drivers for the 2.4 kernel? No Giganet updates exist as far as we can tell, and we've asked them about it before. It may be that this is just around the corner, but we're anxious to upgrade both at MSTI and at MSU. Thanks, Tony Anthony Skjellum, PhD, President (tony@mpi-softtech.com) MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 +1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com "Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters." From alvin at Mail.Linux-Consulting.com Fri May 25 13:55:10 2001 From: alvin at Mail.Linux-Consulting.com (alvin@Mail.Linux-Consulting.com) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 question. -- good In-Reply-To: Message-ID: hi Robert good explanations and reasoning.. good thing unix/linux supports many different ways to do the same thing... have fun alvin > > now explicity defined in each file in /etc/xinetd.d > > - went around to each file ... one by one to > > check it... ( to explicitly disable it ) > > Or use the chkconfig one-liner as previously mentioned. > > I think that there are some decent reasons for the xinetd changes: > ... snipped good explanations ... From tony at MPI-Softtech.Com Sat May 26 06:11:46 2001 From: tony at MPI-Softtech.Com (Tony Skjellum) Date: Wed Nov 25 01:01:21 2009 Subject: Running Giganet cLAN on Linux 2.4 Kernel In-Reply-To: Message-ID: Hi, For those with specific interest in Giganet cLAN, a few of us have decided to have an unofficial mailing group to list to exchange ideas and information about drivers, etc: You can subscribe to that group by e-mailing to: giganet-network-users-subscribe@yahoogroups.com We're hoping to specifically discuss issues with getting support for new Linux drivers as a first topic of discussion... so that thread will continue there. Regards, Tony Anthony Skjellum, PhD, President (tony@mpi-softtech.com) MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 +1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com "Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters." On Fri, 25 May 2001, Tony Skjellum wrote: > Folks, has anyone successfully hacked the open source cLAN drivers for the > 2.4 kernel? No Giganet updates exist as far as we can tell, and we've > asked them about it before. It may be that this is just around the > corner, but we're anxious to upgrade both at MSTI and at MSU. > > Thanks, > Tony > > Anthony Skjellum, PhD, President (tony@mpi-softtech.com) > MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 > +1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com > "Best-of-breed Software for Beowulf and Easy-to-Own Commercial Clusters." > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From perchrh at stud.math.ntnu.no Sat May 26 08:29:09 2001 From: perchrh at stud.math.ntnu.no (Per Christian Henden) Date: Wed Nov 25 01:01:21 2009 Subject: problems with channel bonding Message-ID: Hi, I'm having some problems with channel bonding. Setup: Two otherwise-working PCs with identical kernels (Linux 2.4.4) Both have three nics. eth0 is used to access the Internet on both computers, while eth1 and eth2 (realtek 8139-c, all four of them) is connected with two crossed TP-s to the other node. I'm trying to bond the connection between the computers. I'm using the driver 8139too v.0.9.13 for my realtek nics because the one included in 2.4.4 (..17) doesn't work properly (for me at least). Kernel support for bonding is enabled, and I'm using ifenslave from http://pdsf.nersc.gov/linux/ifenslave.c Is this right version to use? On each computer I execute: ifconfig bond0 172.16.0.x netmask 255.255.255.0 up ifenslave bond0 eth1 ifenslave bond0 eth2 without any errors. (x is either "1" or "2") In /etc/modules.conf I have "alias bond0 bonding" on both computers. The nodes can ping each other, but almost all packages are lost. They are not counted as lost by the kernel. ifconfig shows RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:nnnn errors:0 dropped:0 overruns:0 carrier:0 for devices eth1,2 and bond0 The reason I say that packages are lost is because only a small number of pings is followed by a ping reply. route-n on each node returns 172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 172.16.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 aaa.bbb.ccc.0 0.0.0.0 255.255.254.0 U 0 0 0 eth0 0.0.0.0 aaa.bbb.ccc.1 0.0.0.0 UG 0 0 0 eth0 where aaa.bbb.ccc.1 is my gateway to the Internet. /var/log/debug contains these errors on both nodes: eth1: Abnormal interrupt, status 00002020. eth2: Abnormal interrupt, status 00002020. The connection between the machines when not using bonding works fine. Ideas, anyone? Cheers, Per Christian Henden, pchenden@nlc.no From dan at rapidascent.com Sat May 26 09:41:17 2001 From: dan at rapidascent.com (Dan Fitzpatrick) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 - 3Com PCI 3c905C Tornado - Interrupt posted but not delivered -- IRQ blocked by another Message-ID: <3B0FDCAD.E09D08A2@rapidascent.com> I have 2 identical 3Com network cards in a Compaq 6400R server. I just installed RedHat 7.1 (a clean install) and I'm getting an IRQ conflict on one of the network cards: Interrupt posted but not delivered -- IRQ blocked by another device? Eth0 works fine and I can connect to the server through it. Eth1 is not accessible. I'm not a Linux master. I can't find anything obviously wrong with the configuration. I've included the following debug info. (Sorry it is so long): dmesg /proc/pci ifconfig Any help would be appreciated. Dan _________________________ dmesg CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU0: Intel Pentium III (Katmai) stepping 02 per-CPU timeslice cutoff: 2929.88 usecs. Getting VERSION: 40011 Getting VERSION: 40011 Getting ID: 3000000 Getting ID: c000000 Getting LVT0: 700 Getting LVT1: 400 enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 CPU present map: c Booting processor 1/2 eip 3000 Setting warm reset code and vector. 1. 2. 3. Asserting INIT. Waiting for send to finish... +Deasserting INIT. Waiting for send to finish... +#startup loops: 2. Sending STARTUP #1. After apic_write. Initializing CPU#1 CPU#1 (phys ID: 2) waiting for CALLOUT Startup point 1. Waiting for send to finish... +Sending STARTUP #2. After apic_write. Startup point 1. Waiting for send to finish... +After Startup. Before Callout 1. After Callout 1. CALLIN, before setup_local_APIC(). masked ExtINT on CPU#1 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Calibrating delay loop... 999.42 BogoMIPS Stack at about c144dfb8 CPU: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0 CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 1024K Intel machine check reporting enabled on CPU#1. CPU: After vendor init, caps: 0383fbff 00000000 00000000 00000000 CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 OK. CPU1: Intel Pentium III (Katmai) stepping 02 CPU has booted. Before bogomips. Total of 2 processors activated (1995.57 BogoMIPS). Before bogocount - setting activated=1. Boot done. ENABLING IO-APIC IRQs ...changing IO-APIC physical APIC ID to 8 ... ok. Synchronizing Arb IDs. init IO_APIC IRQs IO-APIC (apicid-pin) 8-0, 8-3, 8-5, 8-9, 8-10, 8-11, 8-15, 8-24, 8-32, 8-33, 8-34 not connected. ..TIMER: vector=49 pin1=2 pin2=-1 number of MP IRQ sources: 37. number of IO-APIC #8 registers: 35. testing the IO APIC....................... IO APIC #8...... .... register #00: 08000000 ....... : physical APIC id: 08 .... register #01: 00220011 ....... : max redirection entries: 0022 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 003 03 0 0 0 0 0 1 1 39 02 003 03 0 0 0 0 0 1 1 31 03 000 00 1 0 0 0 0 0 0 00 04 003 03 0 0 0 0 0 1 1 41 05 000 00 1 0 0 0 0 0 0 00 06 003 03 0 0 0 0 0 1 1 49 07 003 03 0 0 0 0 0 1 1 51 08 003 03 0 0 0 0 0 1 1 59 09 000 00 1 0 0 0 0 0 0 00 0a 000 00 1 0 0 0 0 0 0 00 0b 000 00 1 0 0 0 0 0 0 00 0c 003 03 0 0 0 0 0 1 1 61 0d 003 03 0 0 0 1 0 1 1 69 0e 003 03 0 0 0 0 0 1 1 71 0f 000 00 1 0 0 0 0 0 0 00 10 003 03 1 1 0 1 0 1 1 79 11 003 03 1 1 0 1 0 1 1 81 12 003 03 1 1 0 1 0 1 1 89 13 003 03 1 1 0 1 0 1 1 91 14 003 03 1 1 0 1 0 1 1 99 15 003 03 1 1 0 1 0 1 1 A1 16 003 03 1 1 0 1 0 1 1 A9 17 003 03 1 1 0 1 0 1 1 B1 18 000 00 1 0 0 0 0 0 0 00 19 003 03 1 1 0 1 0 1 1 B9 1a 003 03 1 1 0 1 0 1 1 C1 1b 003 03 1 1 0 1 0 1 1 C9 1c 003 03 1 1 0 1 0 1 1 D1 1d 003 03 1 1 0 1 0 1 1 D9 1e 003 03 1 1 0 1 0 1 1 E1 1f 003 03 1 1 0 1 0 1 1 E9 20 000 00 1 0 0 0 0 0 0 00 21 000 00 1 0 0 0 0 0 0 00 22 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ4 -> 0:4 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ20 -> 0:20 IRQ21 -> 0:21 IRQ22 -> 0:22 IRQ23 -> 0:23 IRQ25 -> 0:25 IRQ26 -> 0:26 IRQ27 -> 0:27 IRQ28 -> 0:28 IRQ29 -> 0:29 IRQ30 -> 0:30 IRQ31 -> 0:31 .................................... done. calibrating APIC timer ... ..... CPU clock speed is 499.8320 MHz. ..... host bus clock speed is 99.9660 MHz. cpu: 0, clocks: 999660, slice: 333220 CPU0 cpu: 1, clocks: 999660, slice: 333220 CPU1 checking TSC synchronization across CPUs: passed. Setting commenced=1, go go go PCI: PCI BIOS revision 2.10 entry at 0xf0084, last bus=9 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: Searching for i450NX host bridges on 00:10.0 Unknown bridge resource 0: assuming transparent Unknown bridge resource 1: assuming transparent PCI->APIC IRQ transform: (B0,I1,P0) -> 31 PCI->APIC IRQ transform: (B0,I2,P0) -> 29 PCI->APIC IRQ transform: (B0,I11,P0) -> 27 PCI->APIC IRQ transform: (B0,I13,P0) -> 26 PCI->APIC IRQ transform: (B0,I13,P1) -> 25 PCI: Device 00:78 not found by BIOS PCI: Device 00:80 not found by BIOS PCI: Device 00:90 not found by BIOS PCI: Device 00:a0 not found by BIOS isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket apm: BIOS not found. Starting kswapd v1.8 pty: 256 Unix98 ptys configured block: queued sectors max/low 168888kB/56296kB, 512 slots per queue RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PIIX4: IDE controller on PCI bus 00 dev 79 PIIX4: chipset revision 1 PIIX4: not 100% native mode: will probe irqs later ide0: BM-DMA at 0x3000-0x3007, BIOS settings: hda:pio, hdb:pio PIIX4: IDE controller on PCI bus 00 dev 80 PIIX4: device not capable of full native PCI mode PIIX4: device disabled (BIOS) hda: CD-224E, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Floppy drive(s): fd0 is 1.44M FDC 0 is a National Semiconductor PC87306 RAMDISK: Compressed image found at block 0 Freeing initrd memory: 341k freed Serial driver version 5.02 (2000-08-09) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A Real Time Clock Driver v1.10d md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md.c: sizeof(mdp_super_t) = 4096 autodetecting RAID arrays autorun ... ... autorun DONE. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 2048 buckets, 16Kbytes TCP: Hash tables configured (established 16384 bind 16384) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. VFS: Mounted root (ext2 filesystem). SCSI subsystem driver Revision: 1.00 sym53c8xx: at PCI bus 0, device 13, function 0 sym53c8xx: 53c876 detected sym53c8xx: at PCI bus 0, device 13, function 1 sym53c8xx: 53c876 detected sym53c876-0: rev 0x14 on pci bus 0 device 13 function 0 irq 26 sym53c876-0: ID 7, Fast-20, Parity Checking sym53c876-0: on-chip RAM at 0xc6eb0000 sym53c876-0: restart (scsi reset). sym53c876-0: Downloading SCSI SCRIPTS. sym53c876-1: rev 0x14 on pci bus 0 device 13 function 1 irq 25 sym53c876-1: NCR clock is 40218KHz sym53c876-1: ID 7, Fast-20, Parity Checking sym53c876-1: on-chip RAM at 0xc6e90000 sym53c876-1: restart (scsi reset). sym53c876-1: Downloading SCSI SCRIPTS. scsi0 : sym53c8xx - version 1.6b scsi1 : sym53c8xx - version 1.6b Vendor: COMPAQ Model: BD009122BA Rev: 3B07 Type: Direct-Access ANSI SCSI revision: 02 Vendor: COMPAQ Model: BD009122BA Rev: 3B07 Type: Direct-Access ANSI SCSI revision: 02 sym53c876-0-<0,0>: tagged command queue depth set to 8 sym53c876-0-<1,0>: tagged command queue depth set to 8 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0 sym53c876-0-<0,0>: wide msgout: 1-2-3-1. sym53c876-0-<0,0>: wide msgin: 1-2-3-1. sym53c876-0-<0,0>: wide: wide=1 chg=0. sym53c876-0-<0,0>: wide msgout: 1-2-3-1. sym53c876-0-<0,0>: wide msgin: 1-2-3-1. sym53c876-0-<0,0>: wide: wide=1 chg=0. sym53c876-0-<0,0>: sync msgout: 1-3-1-c-10. sym53c876-0-<0,0>: sync msg in: 1-3-1-c-f. sym53c876-0-<0,0>: sync: per=12 scntl3=0x90 scntl4=0x0 ofs=15 fak=0 chg=0. sym53c876-0-<0,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15) SCSI device sda: 17773524 512-byte hdwr sectors (9100 MB) Partition check: sda: sda1 sda2 < sda5 > sym53c876-0-<1,0>: wide msgout: 1-2-3-1. sym53c876-0-<1,0>: wide msgin: 1-2-3-1. sym53c876-0-<1,0>: wide: wide=1 chg=0. sym53c876-0-<1,0>: wide msgout: 1-2-3-1. sym53c876-0-<1,0>: wide msgin: 1-2-3-1. sym53c876-0-<1,0>: wide: wide=1 chg=0. sym53c876-0-<1,0>: sync msgout: 1-3-1-c-10. sym53c876-0-<1,0>: sync msg in: 1-3-1-c-f. sym53c876-0-<1,0>: sync: per=12 scntl3=0x90 scntl4=0x0 ofs=15 fak=0 chg=0. sym53c876-0-<1,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15) SCSI device sdb: 17773524 512-byte hdwr sectors (9100 MB) sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 > VFS: Mounted root (ext2 filesystem) readonly. change_root: old root has d_count=3 Trying to unmount old root ... okay Freeing unused kernel memory: 252k freed Adding Swap: 530104k swap-space (priority -1) Winbond Super-IO detection, now testing ports 3F0,370,250,4E,2E ... SMSC Super-IO detection, now testing Ports 2F0, 370 ... parport0: PC-style at 0x378 [PCSPP,TRISTATE] parport0: cpp_daisy: aa5500ff(38) parport0: assign_addrs: aa5500ff(38) parport0: cpp_daisy: aa5500ff(38) parport0: assign_addrs: aa5500ff(38) ip_conntrack (2047 buckets, 16376 max) 3c59x.c:LK1.1.13 27 Jan 2001 Donald Becker and others. http://www.scyld.com/network/vortex.html See Documentation/networking/vortex.txt eth0: 3Com PCI 3c905C Tornado at 0x2000, 00:01:02:72:bd:d4, IRQ 29 product code 4552 rev 00.13 date 07-14-00 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. eth0: scatter/gather disabled. h/w checksums enabled eth1: 3Com PCI 3c905C Tornado at 0x4000, 00:01:02:74:59:91, IRQ 15 product code 4552 rev 00.13 date 07-14-00 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. eth1: scatter/gather disabled. h/w checksums enabled eth0: using NWAY device table, not 8 eth1: using NWAY device table, not 8 NETDEV WATCHDOG: eth1: transmit timed out eth1: transmit timed out, tx_status 00 status e601. diagnostics: net 0cfa media 8880 dma 0000003a. eth1: Interrupt posted but not delivered -- IRQ blocked by another device? Flags; bus-master 1, dirty 16(0) current 16(0) Transmit list 00000000 vs. cf14b200. 0: @cf14b200 length 8000002a status 0001002a 1: @cf14b240 length 8000002a status 0001002a 2: @cf14b280 length 8000002a status 0001002a 3: @cf14b2c0 length 8000002a status 0001002a 4: @cf14b300 length 8000002a status 0001002a 5: @cf14b340 length 8000002a status 0001002a 6: @cf14b380 length 8000002a status 0001002a 7: @cf14b3c0 length 8000002a status 0001002a 8: @cf14b400 length 8000002a status 0001002a 9: @cf14b440 length 8000002a status 0001002a 10: @cf14b480 length 8000002a status 0001002a 11: @cf14b4c0 length 8000002a status 0001002a 12: @cf14b500 length 8000002a status 0001002a 13: @cf14b540 length 8000002a status 0001002a 14: @cf14b580 length 8000002a status 8001002a 15: @cf14b5c0 length 8000002a status 8001002a eth1: Resetting the Tx ring pointer. NETDEV WATCHDOG: eth1: transmit timed out eth1: transmit timed out, tx_status 00 status e601. diagnostics: net 0cfa media 8880 dma 0000003a. eth1: Interrupt posted but not delivered -- IRQ blocked by another device? Flags; bus-master 1, dirty 32(0) current 32(0) Transmit list 00000000 vs. cf14b200. 0: @cf14b200 length 8000002a status 0001002a 1: @cf14b240 length 8000002a status 0001002a 2: @cf14b280 length 8000002a status 0001002a 3: @cf14b2c0 length 8000002a status 0001002a 4: @cf14b300 length 80000075 status 00010075 5: @cf14b340 length 80000075 status 00010075 6: @cf14b380 length 80000052 status 00010052 7: @cf14b3c0 length 80000052 status 00010052 8: @cf14b400 length 80000052 status 00010052 9: @cf14b440 length 80000052 status 00010052 10: @cf14b480 length 80000075 status 00010075 11: @cf14b4c0 length 80000075 status 00010075 12: @cf14b500 length 80000075 status 00010075 13: @cf14b540 length 8000002a status 0001002a 14: @cf14b580 length 80000075 status 80010075 15: @cf14b5c0 length 8000002a status 8001002a eth1: Resetting the Tx ring pointer. NETDEV WATCHDOG: eth1: transmit timed out eth1: transmit timed out, tx_status 00 status e601. diagnostics: net 0cfa media 8880 dma 0000003a. eth1: Interrupt posted but not delivered -- IRQ blocked by another device? Flags; bus-master 1, dirty 48(0) current 48(0) Transmit list 00000000 vs. cf14b200. 0: @cf14b200 length 8000002a status 0001002a 1: @cf14b240 length 8000002a status 0001002a 2: @cf14b280 length 8000002a status 0001002a 3: @cf14b2c0 length 8000002a status 0001002a 4: @cf14b300 length 800000e1 status 000100e1 5: @cf14b340 length 800000de status 000100de 6: @cf14b380 length 8000002a status 0001002a 7: @cf14b3c0 length 8000002a status 0001002a 8: @cf14b400 length 8000002a status 0001002a 9: @cf14b440 length 8000002a status 0001002a 10: @cf14b480 length 800000d2 status 000100d2 11: @cf14b4c0 length 8000002a status 0001002a 12: @cf14b500 length 8000002a status 0001002a 13: @cf14b540 length 8000002a status 0001002a 14: @cf14b580 length 8000002a status 8001002a 15: @cf14b5c0 length 8000002a status 8001002a eth1: Resetting the Tx ring pointer. NETDEV WATCHDOG: eth1: transmit timed out eth1: transmit timed out, tx_status 00 status e601. diagnostics: net 0cfa media 8880 dma 0000003a. eth1: Interrupt posted but not delivered -- IRQ blocked by another device? Flags; bus-master 1, dirty 64(0) current 64(0) Transmit list 00000000 vs. cf14b200. 0: @cf14b200 length 8000002a status 0001002a 1: @cf14b240 length 8000002a status 0001002a 2: @cf14b280 length 8000002a status 0001002a 3: @cf14b2c0 length 8000002a status 0001002a 4: @cf14b300 length 8000002a status 0001002a 5: @cf14b340 length 8000002a status 0001002a 6: @cf14b380 length 8000002a status 0001002a 7: @cf14b3c0 length 8000002a status 0001002a 8: @cf14b400 length 8000002a status 0001002a 9: @cf14b440 length 8000002a status 0001002a 10: @cf14b480 length 8000002a status 0001002a 11: @cf14b4c0 length 8000002a status 0001002a 12: @cf14b500 length 8000002a status 0001002a 13: @cf14b540 length 8000002a status 0001002a 14: @cf14b580 length 8000002a status 8001002a 15: @cf14b5c0 length 800000de status 800100de eth1: Resetting the Tx ring pointer. ___________________________ /proc/pci PCI devices found: Bus 0, device 1, function 0: PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 5). Master Capable. Latency=64. Min Gnt=13. Bus 0, device 1, function 1: Memory controller: Intel Corporation 80960RP [i960RP Microprocessor] (rev 5). IRQ 31. Master Capable. Latency=64. Prefetchable 32 bit memory at 0xc4fc0000 [0xc4ffffff]. Bus 0, device 2, function 0: Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 116). IRQ 29. Master Capable. Latency=64. Min Gnt=10.Max Lat=10. I/O at 0x2000 [0x207f]. Non-prefetchable 32 bit memory at 0xc6ef0000 [0xc6ef007f]. Bus 4, device 4, function 0: Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (#2) (rev 116). IRQ 15. Master Capable. Latency=64. Min Gnt=10.Max Lat=10. I/O at 0x4000 [0x407f]. Non-prefetchable 32 bit memory at 0xc6ff0000 [0xc6ff007f]. Bus 0, device 11, function 0: PCI Hot-plug controller: Compaq Computer Corporation PCI Hotplug Controller (rev 4). IRQ 27. Non-prefetchable 32 bit memory at 0xc6ee0000 [0xc6ee00ff]. Bus 4, device 11, function 0: PCI Hot-plug controller: Compaq Computer Corporation PCI Hotplug Controller (#2) (rev 4). IRQ 5. Non-prefetchable 32 bit memory at 0xc6fe0000 [0xc6fe00ff]. Bus 0, device 12, function 0: System peripheral: Compaq Computer Corporation Advanced System Management Controller (rev 0). I/O at 0x1800 [0x18ff]. Non-prefetchable 32 bit memory at 0xc6ed0000 [0xc6ed00ff]. Bus 0, device 13, function 0: SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 20). IRQ 26. Master Capable. Latency=255. Min Gnt=17.Max Lat=64. I/O at 0x2400 [0x24ff]. Non-prefetchable 32 bit memory at 0xc6ec0000 [0xc6ec00ff]. Non-prefetchable 32 bit memory at 0xc6eb0000 [0xc6eb0fff]. Bus 0, device 13, function 1: SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (#2) (rev 20). IRQ 25. Master Capable. Latency=255. Min Gnt=17.Max Lat=64. I/O at 0x2800 [0x28ff]. Non-prefetchable 32 bit memory at 0xc6ea0000 [0xc6ea00ff]. Non-prefetchable 32 bit memory at 0xc6e90000 [0xc6e90fff]. Bus 0, device 14, function 0: VGA compatible controller: ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC] (rev 122). Master Capable. No bursts. Min Gnt=8. Prefetchable 32 bit memory at 0xc3000000 [0xc3ffffff]. I/O at 0x2c00 [0x2cff]. Non-prefetchable 32 bit memory at 0xc6e80000 [0xc6e80fff]. Bus 0, device 15, function 0: ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2). Bus 0, device 15, function 1: IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1). Master Capable. Latency=64. I/O at 0x3000 [0x300f]. Bus 0, device 15, function 2: USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1). Master Capable. Latency=64. I/O at 0x3020 [0x303f]. Bus 0, device 15, function 3: Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2). IRQ 9. Bus 0, device 16, function 0: Host bridge: Intel Corporation 450NX - 82451NX Memory & I/O Controller (rev 3). Bus 0, device 18, function 0: Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 2). Bus 0, device 20, function 0: Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (#2) (rev 2). Bus 1, device 0, function 0: VGA compatible controller: Cirrus Logic GD 5446 (rev 0). Prefetchable 32 bit memory at 0xc5000000 [0xc5ffffff]. ________________________ ifconfig eth0 Link encap:Ethernet HWaddr 00:01:02:72:BD:D4 inet addr:192.168.100.9 Bcast:192.168.190.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11156 errors:0 dropped:0 overruns:0 frame:0 TX packets:413 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:29 Base address:0x2000 eth1 Link encap:Ethernet HWaddr 00:01:02:74:59:91 inet addr:209.247.191.9 Bcast:209.247.191.31 Mask:255.255.255.224 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:99 errors:0 dropped:0 overruns:1611 frame:0 TX packets:70 errors:4 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:15 Base address:0x4000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 From hahn at coffee.psychology.mcmaster.ca Sat May 26 13:24:18 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: <3B0E8343.4010708@onelabs.com> Message-ID: > The Itaniums will make for nice SMP clusters though with its fast front > side bus. Multiple IA-64s sharing a FSB along with many GB of shared > memory and Infinband for very fast interconnects will be nice. choosing a shared FSB (shared remote DRAM) is a fairly bold statement about the expected uses of a machine. for instance, it makes sense if you expect very good local cache hitrates. and it makes sense if you expect zero cache hitrates. the alternative (local DRAM) is really an explicitly-managed, much larger, CPU-local cache. I'm guessing, definitely WAG, that Itanium will be pessimal for compute clusters. suppose, in 1-2 years, we have this scenario: 1. Itanium machines, perhaps 8-way, with CPUs sitting on a 3.2 to 6.4 GB/s bus, talking to DRAM. each CPU is roughly the same speed as a P4/2GHz, and has some small number of MB's of local cache. 2. Athlon machine, with each CPU connected to its own 2.1-3.2 GB/s DRAM array, using 6.4 GB/s hyperchannel to to maintain coherence, etc. now, which do you think will perform better? the AMD approach has a HUGE advantage if your working set (as seen by a single CPU) is more like 2^30 bytes, rather than 2^20. *and* assuming that you can arrange this data reasonably locally. personally, I'd much prefer the optimistic architecture that scales my DRAM bandwidth with ncpus. in fact, this is really the whole idea of clustering, at a different scale. I believe that many-way Itaniums are aimed at "commercial" applications, which seem to be mainly pumping blocks from one place to another. clearly if your DRAM is mainly just a staging area for disk/net IO, these working-set issues are pretty irrelevant. afakt, this is the rationale for Intel's current 8x Xeon high-end, which would seem to suck rocks for any computational purpose (2 clusters of 4 cpus starving on a measly little .8 GB/s bus!) who knows, maybe a 4M local cache really is enough to make up for the fact that big-SMP machines have always delivered pathetic dram latency (I recall >500 ns for Sun's high-end of a year or so ago, versus 150 ns for local/uniprocessor)... I can't imagine Itanium being a mass-market item for years, if ever. and I pledge allegiance to the Orthodox Church of Beowulf, which holds that if it's not mass-market, it's not cluster-Kosher ;) regards, mark hahn. From crhea at mayo.edu Sat May 26 22:23:42 2001 From: crhea at mayo.edu (Cris Rhea) Date: Wed Nov 25 01:01:21 2009 Subject: Help on cluster hang problem... Message-ID: <200105270523.AAA23169@sijer.mayo.edu> I've been using Linux for several years, but am new to Linux cluster computing. I set up a "proof of concept cluster" with 4 nodes- each node is a 1.2GHz Athlon on a MicroStar K7TPro2-A motherboard with 1GB of RAM (RackSaver 1200). RedHat 7.1 is loaded locally on each system. Also loaded mpich-1.2.0-10.i386.rpm on each system and set up the rhosts/hosts.equiv to allow all the rsh stuff... Systems are interconnected with Intel 10/100 Ethernet cards. One of the research PhD's in my group has a program that has run successfully on other supercomputer-class systems (Cray and SGI). Very CPU-intensive, but does nothing fancy other than using MPI for communication (very little disk I/O, etc.). /home file system is NFS mounted on each system. I've tried NFS server is the master node or another system outside the cluster. Even though this code runs as a normal user (not root), it will hard-hang the "master" node in about 10 minutes. "Hard-hang" means nothing on console, disk light on solid, doesn't respond to reset or power switches- have to reset by pulling plug. I've tried the stock 2.4.2-2 kernel that loads with RedHat 7.1, I've tried the 2.4.2 kernel recompiled to specifically call the CPU an Athlon, and I've tried downloading/using the 2.4.4 kernel. All of my attempts produce the same result- his program can crash the system every time it is run. I've searched the normal dejanews/altavista sites for Linux/Athlon/hang, but nothing interesting pops out. I must be missing something simple- the 2.4.X kernels can't be that unstable. Does this ring a bell with anyone in the group? TIA- -- Cris --- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 From kragen at pobox.com Sun May 27 01:33:32 2001 From: kragen at pobox.com (kragen@pobox.com) Date: Wed Nov 25 01:01:21 2009 Subject: Disk reliability (Was: Node cloning) Message-ID: <200105270833.EAA11102@kirk.dnaco.net> Josip Loncaric writes: > JackM wrote: > > You can try using hdparm to turn the DMA off. Of course, it does slow > > down data transfer rates considerably. > > As Mark said, BadCRC only means that the transfer was retried. If a few > BadCRC messages are the only problem, I would not turn off DMA. What size of CRCs are being used? If it's a 32-bit CRC and the errors involved are likely to involve several bits, I think your chances of having an uncaught data error are only four billion to one. Four billion microseconds is about eighty minutes, a billion milliseconds is about a month and a half, and four billion seconds is about 125 years. From hahn at coffee.psychology.mcmaster.ca Sun May 27 09:23:02 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:21 2009 Subject: Disk reliability (Was: Node cloning) In-Reply-To: <200105270833.EAA11102@kirk.dnaco.net> Message-ID: > > > You can try using hdparm to turn the DMA off. Of course, it does slow > > > down data transfer rates considerably. > > > > As Mark said, BadCRC only means that the transfer was retried. If a few > > BadCRC messages are the only problem, I would not turn off DMA. > > What size of CRCs are being used? If it's a 32-bit CRC and the errors > involved are likely to involve several bits, I think your chances of > having an uncaught data error are only four billion to one. Four > billion microseconds is about eighty minutes, a billion milliseconds > is about a month and a half, and four billion seconds is about 125 > years. hmm, I'll admit I never actually looked at the details. the CRC is 16b (not really surprising, since ATA is that wide): G(X) = X15 + X12 + X5 + 1. so I think your point was to be less blase' about badCRC reports, and you're certainly right. hmm, so the chance of undetected errors depends on tranfers/second, right? so figuring a worst-case ATA100 and nothing but 4K transfers, we'd see something like 20K t/s. hmm, how do you go from those numbers to mean time to undetected failure? I think your back-of-envelope numbers were assuming 1 transfer per us, right? so with 16b CRC, you'd expect an uncaught error in 64K/20K=3 s. but is that assuming some particular distribution of errors? thanks, mark hahn. From bari at onelabs.com Sun May 27 11:47:30 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: Message-ID: <3B114BC2.1050205@onelabs.com> Mark Hahn wrote: >> > I can't imagine Itanium being a mass-market item for years, if ever. > and I pledge allegiance to the Orthodox Church of Beowulf, which > holds that if it's not mass-market, it's not cluster-Kosher ;) > The AMD Sledge/Hammer series will also be nice for clusters whenever they finally make it to market. Hopefully there will be some nice chipset support to go along with them. For the time being Mips has the price performance edge since nobody has taken the ARM 10 to market yet and Intel yanked the FPU out of the XScale before they released it. It's great to see Beowulf clusters offering similar performance to traditional supercomputers for coarse grained applications and even some fine grained for a fraction of the cost, but X86 with OTS motherboards will also always be a kludge. X86 has 20 years of baggage for legacy support and also produce enormous amounts of heat as compared to RISC. Low cost RISC clusters will outperform any x86 mass-market OTS clusters. RISC offers lower cost, smaller footprint, far less heat along with higher fixed and floating point performance. Bari Ari From jtao at artsci.wustl.edu Sun May 27 12:20:45 2001 From: jtao at artsci.wustl.edu (Jian Tao) Date: Wed Nov 25 01:01:21 2009 Subject: Several questions obout Beofdisk In-Reply-To: <200105271600.MAA10156@blueraja.scyld.com> Message-ID: Only one node could be partitioned properly with beofdisk. Using "beostatus", I can only monitor the usage of CPU, Memory, Swap disk of that node. All other nodes stops at "node_up: Setting system clock." at phase 3. BTW: I used default configuration files only. From wsb at paralleldata.com Sun May 27 14:31:34 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <3B114BC2.1050205@onelabs.com> Message-ID: <3B117236.6ADDE94A@paralleldata.com> Bari Ari wrote: > > Mark Hahn wrote: > > >> > > I can't imagine Itanium being a mass-market item for years, if ever. > > and I pledge allegiance to the Orthodox Church of Beowulf, which > > holds that if it's not mass-market, it's not cluster-Kosher ;) > > > The AMD Sledge/Hammer series will also be nice for clusters whenever > they finally make it to market. Hopefully there will be some nice > chipset support to go along with them. For the time being Mips has the > price performance edge since nobody has taken the ARM 10 to market yet > and Intel yanked the FPU out of the XScale before they released it. > Help me out. I look at the SPEC2000 results for MIPS R14K and it can't get to a P4 1.3Ghz level for either INT or FP. So, you're telling me I can buy a MIPS 500Mhz R14K for less than $185? > It's great to see Beowulf clusters offering similar performance to > traditional supercomputers for coarse grained applications and even some > fine grained for a fraction of the cost, but X86 with OTS motherboards > will also always be a kludge. X86 has 20 years of baggage for legacy > support and also produce enormous amounts of heat as compared to RISC. > > Low cost RISC clusters will outperform any x86 mass-market OTS clusters. > RISC offers lower cost, smaller footprint, far less heat along with > higher fixed and floating point performance. > SPEC says you're incorrect on performance. I suspect your pricing is off also, at least for R14K's. So, that leaves heat/power consumption, which I'd say is probably true. Something you may not take into account that at least matters in what I do is the raw performance per cpu. I prefer to have fewer high performance nodes than, say twice as many lower performance nodes. That can reduce the communication burden between nodes because you give each node a larger part of the problem to solve. Depends of course on your algorithm. Wes From hahn at coffee.psychology.mcmaster.ca Sun May 27 14:17:05 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: <3B114BC2.1050205@onelabs.com> Message-ID: > will also always be a kludge. X86 has 20 years of baggage for legacy > support and also produce enormous amounts of heat as compared to RISC. that's a nice sentiment. but the only RISC that competes well with ia32 is the Alpha, which has never been known for running cool (or being cheap, for that matter). > Low cost RISC clusters will outperform any x86 mass-market OTS clusters. where are these low-cost RISC components available? > RISC offers lower cost, smaller footprint, far less heat along with > higher fixed and floating point performance. again, sounds great. where's the beef? From bari at onelabs.com Sun May 27 14:47:42 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: Message-ID: <3B1175FE.3080708@onelabs.com> Mark Hahn wrote: >>> >> chipset support to go along with them. For the time being Mips has the >> price performance edge since nobody has taken the ARM 10 to market yet >> and Intel yanked the FPU out of the XScale before they released it. > > > well MIPS has never delivered competitive performance, and seems to > be entirely out of the mass-market, as is ARM. do you know of someone > who is trying to mass-produce MIPS or ARM-based boxes? > Other than our low cost nodes, I am not aware of any. >> fine grained for a fraction of the cost, but X86 with OTS motherboards >> will also always be a kludge. X86 has 20 years of baggage for legacy >> support and also produce enormous amounts of heat as compared to RISC. > > > a very traditional, conservative response. alas, ia32 is the fastest > processor available excepting Alpha. and alas, Alpha is not exactly > price-competive in the usual sense. > > You're not factoring cost/performance/heat/footprint. Alpha also comes out highest as far as heat and cost with the P4 hot on its tail. They also require companion chips like x86s that eat up $$ and board space. The 700 MHz Alphas wouldn't be so bad if they were <$200, same with the PPC 750cxe. For fixed point using OTS XScale you get around 1000 Mips/W for around $50. Mips CPUs are about double that cost and 4x heat with the FPU. >> Low cost RISC clusters will outperform any x86 mass-market OTS clusters. > > > please give specific > >> RISC offers lower cost, smaller footprint, far less heat along with >> higher fixed and floating point performance. > > > I can't imagine why you say that, except reading too much PR. > for instance, the two fastest processors you can buy (spec int/fp) > are Alpha and P4. both are roughly comparable in heat. it's obviously > not the case that RISC systems in general are delivering any better > FP performance. > > or are you talking about some other more specialized measure? Sure the Alpha comes out on top if you just look at spec int/fp followed by P4. If you compare the systems cost vs GFLOPs/Watts/cu.in., X86 and Alpha come out as highest cost, and much higher heat. Bari Ari From bari at onelabs.com Sun May 27 14:57:48 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <3B114BC2.1050205@onelabs.com> <3B117236.6ADDE94A@paralleldata.com> Message-ID: <3B11785C.8020601@onelabs.com> W Bauske wrote: > > Help me out. I look at the SPEC2000 results for MIPS R14K and it > can't get to a P4 1.3Ghz level for either INT or FP. So, you're > telling me I can buy a MIPS 500Mhz R14K for less than $185? > We never considered the Mips R14K, even SGI is moving away from them. > SPEC says you're incorrect on performance. I suspect your pricing is > off also, at least for R14K's. So, that leaves heat/power consumption, > which I'd say is probably true. > > Something you may not take into account that at least matters in what > I do is the raw performance per cpu. I prefer to have fewer high > performance nodes than, say twice as many lower performance nodes. > That can reduce the communication burden between nodes because you > give each node a larger part of the problem to solve. Depends of > course on your algorithm. > I agree. Some apps can get by just fine with 9600 baud between nodes and others 10Gbps is to slow. That's why we also work with x86. If Alphas start moving into .13um and the prices drop they will look interesting again. Bari Ari From bob at drzyzgula.org Sun May 27 17:33:38 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:21 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: <3B114BC2.1050205@onelabs.com> Message-ID: Bari, With all due respect (I've designed multilayer PCBs and have even read much of Howard Johnson's book on signal integrity analysis, so this respect is in fact considerable), I think that the problem that I (and I suspect, many others on this list) have with your statements here is that you would appear to be redefining the market space in order to achieve what you predict will be a more optimal solution. IIRC, the *entire* point of Beowulf clusters, as they were originally defined, was to build high-performance parallel-processing clusters out of COTS, commodity parts. The idea was that you could take surplus or otherwise underutilized systems and/or components and assemble them into a new aggregate which would enable you to do scientific calculations which were not otherwise feasible given your budget realities. This idea has been refined somewhat over time, and the success of the approach has brought new funding that will often, if not usually, allow the purchase of the latest, greatest COTS hardware for new clusters, but I think that for most of us the core philosophy still holds. Here, on the other hand, you are talking about doing custom board designs which are specifically optimized for cluster computing. This is a fine approach to scientific computing, and you are of course not the first person to do this, but I'm not sure that, in doing so, you fully appreciate the extent to which you are departing from the mainstream of the Beowulf cluster computing philosophy, as opposed to mere technology. In the end, it may in fact be true that you will be able to build and sell turn-key computing clusters which are competitive, from a price/performance perspective, with turn-key clusters built from COTS motherboards and chassis, and to do so in such a way that the bulk of the Beowulf software base can still be used. However, part of the attraction of Beowulf clusters to many organizations is the extent to which one can leverage available staff and student labor, the creativity and frugality of determined but under-funded scientists, below-the-line budget expenses such as power and HVAC (never underestimate the value of getting another department to foot much of the bill :-), as well as sunk costs such as data center space freed up by shrinking or disappearing mainframes, to vastly reduce the apparent acquisition cost for high-performance computers. One major consideration is that, through the use of parts, the most expensive of which may cost about $500 or so, many organizations (mine in particular) will find that a cluster can be purchased out of operating funds, rather than as a capital expense. This can give an extraordinary boost to the organization's flexibility down the road. $100,000 spent on a turn-key cluster might have to be capitalized -- and prove useful -- over a period of three or four years. OTOH, spread that cost over a bunch of budget line items, use student labor and racks from Home Depot or Costco, and you might just be able to put the thing together without the bean counters even noticing. Moreover, the cluster that was slapped together, if it proves not to do the job or has outlived it's usefulness after a few months, can probably be torn apart and the parts used for some other application. This kind of insurance policy against fiscal error is extraordinarily difficult to replicate for capital purchases of large, turn-key systems. Although there are several high-profile and many lower-profile instances of whole clusters being purchased and delivered on a single manifest, my impression is that the vast majority of Beowulf clusters continue to be built on an ad-hoc, piece-meal basis (am I wrong about this?). While I applaud your creativity in searching for new ways to improve the price/performance of available computing solutions, I think that it's pretty important to keep in mind that the many people on this list who are building the latter sort of cluster will remain quite unconvinced of your approach until (a) they can build their own system nodes from your boards at an *absolute* price competitive with those they are building today (note that, for some of us, once the node cost crosses a threshold from operating and into capital, they might as well cost a million dollars each, so price/performance is only one part of the equation), (b) they have a reasonable level of confidence that the parts they acquire from you will not be so specialized as to be virtually worthless outside of the context of a Beowulf, and (c) that your design will not be so unique that the expandability and maintainability of their cluster would be shot to hell in the event your company goes belly-up. Not that we don't wish you well... Best regards, --Bob From tkimball at tampabay.rr.com Sun May 27 18:31:06 2001 From: tkimball at tampabay.rr.com (Tim K.) Date: Wed Nov 25 01:01:21 2009 Subject: firewire networking Message-ID: <006301c0e715$e3a2ca40$c5c95c18@tampabay.rr.com> Can this be used on a beowulf? From bogdan.costescu at iwr.uni-heidelberg.de Mon May 28 02:45:10 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:01:21 2009 Subject: RH7.1 - 3Com PCI 3c905C Tornado - Interrupt posted but not delivered -- IRQ blocked by another In-Reply-To: <3B0FDCAD.E09D08A2@rapidascent.com> Message-ID: On Sat, 26 May 2001, Dan Fitzpatrick wrote: > I have 2 identical 3Com network cards in a Compaq 6400R server. > I just installed RedHat 7.1 (a clean install) and I'm getting an > IRQ conflict on one of the network cards: > > Interrupt posted but not delivered -- IRQ blocked by another device? This is an APIC error which seems to appear more often with 2.4 kernels. It's not related to the network card, juts that the 3c59x driver tests for this condition and prints this warning. Probably your best bet is to boot with "noapic" kernel option. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From Christophe.Boulain at ceram.fr Mon May 28 09:52:30 2001 From: Christophe.Boulain at ceram.fr (Christophe BOULAIN) Date: Wed Nov 25 01:01:22 2009 Subject: Bonding and acenic drivers Message-ID: Hi all, Does anybody know if I can use two 3Com 3C985B Gigabit ethernet cards (acenic driver) with the Linux bonding driver ??? The bonding driver doesn't switch when one link goes down ! The mii-diag tool from Donald Becker reports an 'operation not supported'. Does it mean that there's no MII on these boards ? Is there another way to notify the bonding driver that the link is down ? Thanks a lot. Christophe Boulain. System Manager CERAM Sophia Antipolis Christophe.Boulain@ceram.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010528/54378434/attachment.html From bogdan.costescu at iwr.uni-heidelberg.de Mon May 28 10:55:05 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:01:22 2009 Subject: Bonding and acenic drivers In-Reply-To: Message-ID: On Mon, 28 May 2001, Christophe BOULAIN wrote: > Is there another way to notify the bonding driver that the link is down ? Is there _any_ way to notify somebody that the link is down ? AFAIK, there is no standard way in the kernel right now to pass this info (gathered at driver level) somewhere in the upper levels. Basically, if the driver detects lost link, it has to try its best to re-establish the link. However, as most of the current net drivers use 60 seconds polling to check link state, by the time the driver detects lost link, the driver already signalled Tx timeout errors. The polling cannot be done very often (like once every second or so), because reading the MII registers is a time consuming operation. The only solution to this problem is to use NICs that can generate an interrupt when the link status changes (and I think that AceNIC can do this) - you can then avoid polling at all. But as this is not standard across the Ethernet world, this info is not used right now in the Linux kernel outside the NIC driver (f.e. the 3C905C cards also support this feature, but the 3c59x driver doesn't - yet). OTOH, bonding (the Linux version) was not intended for link resiliance, but only for improved bandwidth. I don't know what you are trying to achieve, but there are user-level approaches for this problem. Look for Linux-HA (high availability). Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From perchrh at stud.math.ntnu.no Mon May 28 12:01:26 2001 From: perchrh at stud.math.ntnu.no (Per Christian Henden) Date: Wed Nov 25 01:01:22 2009 Subject: problems with channel bonding (fwd) Message-ID: I thought I'd share this information with the list, as it probably is of some importance to others. ---------- Forwarded message ---------- Date: Mon, 28 May 2001 10:19:27 -0700 From: Aaron Van Couwenberghe To: Per Christian Henden Subject: Re: problems with channel bonding (...) At work we have tested some 8139 cards and gotten the same kind of results as you. Realtek cards just won't work properly with bonding; we have no idea whether this is a bug in the driver or the hardware. If you need generic cards try tulip, as many people seem to have success with that. However, at my work we are using intel cards (large rx/tx cache == great performance with bonding) and 3com cards (smaller cache but still very fast). From vbeddo at ucla.edu Mon May 28 12:40:45 2001 From: vbeddo at ucla.edu (Vanessa Beddo) Date: Wed Nov 25 01:01:22 2009 Subject: Statistical Work Message-ID: Hello, I am a Ph.D. student in statistics and my dissertation concerns parallel programming. I was hoping that some of you may have heard of other statistical work (papers, projects, applications, etc.) being done using parallel computation (not restricted to cluster computing). Your input on this would be very much appreciated. Best, Vanessa Beddo UCLA Department of Statistics From okeefe at borg.umn.edu Mon May 28 14:20:38 2001 From: okeefe at borg.umn.edu (Matthew O'Keefe) Date: Wed Nov 25 01:01:22 2009 Subject: [declerck@sistina.com: [gfs-announce] GFS v4.1 released] Message-ID: <20010528162037.A55096@brule.borg> All, GFS version 4.1 has been released and is available via the Sistina web site -> http://www.sistina.com/gfs/software/ All known bugs have been documented in the Release Notes. This is not to say that there are not other bugs, just that we have not seen them in our testing. If you experience any problems please report them via Bugzilla at Sistina.com -> Bugs ==> http://bugzilla.sistina.com If you would like to tell us how you are using GFS version 4.1 so that we can provide a better product please fill out the survery at -> Feedback ==> http://www.sistina.com/gfs/Pages/gfs_eval.html Thank you for your continued support. Now, go grab and start using it! ****************************************************************************** Features / Bug Fixes for GFS v4.1 ****************************************************************************** ########################################################################### --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ############################################################################ The addition of Lock Value Blocks (LVBs) to GFS. Please see the note in the `Caveats and Usage' in the Release Notes for instructions on how to upgrade from a prior release of v4.x.y to v4.1. ########################################################################### --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ############################################################################ o Support for Linux kernel 2.4.4 o FCNTL and FLOCK support o A complete rewrite of Pool tools with new command line options and enhanced functionality o Continued improvements to the IP lock server - `memexpd' o Performance improvements for GFS when it is utilized as a local filesystem instead of its normal context as a cluster filesystem. o Improved `df' performance due to the addition of LVB support o `atime' bug has been fixed o New STOMITH methods (Vixel switches and updates to the Brocade methods) o New mount options (please see the man page) --- Matt O'Keefe Sistina Software, Inc. _______________________________________________ gfs-announce mailing list gfs-announce@sistina.com http://lists.sistina.com/mailman/listinfo/gfs-announce Read the GFS HOWTO http://www.sistina.com/gfs/Pages/howto.html ----- End forwarded message ----- From elehman at sunflower.com Mon May 28 19:39:30 2001 From: elehman at sunflower.com (Eddie Lehman) Date: Wed Nov 25 01:01:22 2009 Subject: Linux Clusters: The HPC Revolution References: <20010528162037.A55096@brule.borg> Message-ID: <002001c0e7e8$967d5900$5695fea9@lawrence.ks.us> Linux Clusters: The HPC Revolution A conference for high-performance Linux cluster users and system administrators, organized by the National Computational Science Alliance. June 25-27, 2001 at the National Center for Supercomputing Applications (NCSA), University of Illinois, Urbana, IL. http://www.ncsa.uiuc.edu/LinuxRevolution/index.html From jtao at artsci.wustl.edu Tue May 29 02:50:34 2001 From: jtao at artsci.wustl.edu (Jian Tao) Date: Wed Nov 25 01:01:22 2009 Subject: Anyone could give me some advice other than changing cards ? Message-ID: <200105290851.f4T8peA01850@ascc.artsci.wustl.edu> We have two kinds of Linksys EtherFast 10/100 LAN Cards in out cluster. Card 1 : EtherFast 10/100 LAN Card, Model No. LNE100TX, VERSION 4.1 Card 2 : EtherFast 10/100 LAN Card, Model No. LNE100TX, VERSION 2.0 Card 1 works very well and any node with Card 1 can be set up smoothly, but nodes with Card 2 stop at some point when booting up. (I switched cards in different nodes to make sure that the other parts in nodes are good) Following is the log file It seems that the node stoped when it was configuring loopback interface. Anyone encountered this problem ? How to deal with it other than changing card ? (:-<) ************************************************************************* */var/log/beowulf/node.2 ************************************************************************* node_up: Setting system clock. mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09 ext2fs_check_if_mountFilesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 128 inodes, 1024 blocks 51 blocks (4.98%) reserved for the super user First data block=1 1 block group 8192 blocks per group, 8192 fragments per group 128 inodes per group Writing inode tables: 0/1^H^H^Hdone Writing superblocks and filesystem accounting information: done : No such file or directory while determining whether /dev/ram1 is mounted.^M node_up: TODO set interface netmask. node_up: Configuring loopback interface. ************************************************************************ Yours, Jian Tao jtao@artsci.wustl.edu From josip at icase.edu Tue May 29 06:49:29 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:22 2009 Subject: Disk reliability (Was: Node cloning) References: Message-ID: <3B13A8E9.1F2B7CBD@icase.edu> Mark Hahn wrote: > > > What size of CRCs are being used? > > hmm, I'll admit I never actually looked at the details. > the CRC is 16b (not really surprising, since ATA is that wide): > G(X) = X15 + X12 + X5 + 1. The ATA channel is usually quite reliable. BadCRC errors are rarely seen, and the probability of missing one despite the 16-bit CRC is 2^16 times lower. This means that there is no need to worry unless the system is regularly reporting lots of BadCRC events. If your target MTBF is one year, "lots" means about a few hundred BadCRC events per day. Sincerely, Josip P.S. You need at least UltraDMA mode 2 to get CRC protection (=> you do not want to turn off DMA without a really good reason). Lower modes (PIO or multiword DMA) do not have CRC protection. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From deadline at plogic.com Tue May 29 06:53:05 2001 From: deadline at plogic.com (Douglas Eadline) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <3B0BFEDB.3080805@onelabs.com> Message-ID: On Wed, 23 May 2001, Bari Ari wrote: --snip-- > I really don't see P4's for dense clusters. ULV PIIIs and Athlon4 with > SMP makes much more sense. IA-64 with SMP will probably come out ahead > in MFOPLS per watt and $$. We're working with parts now that offer 160 > MFLOPS per watt vs. 20 MFLOPS per watt on the P4. Fixed point processors > are down to 1 watt per 1000 Mips. Keep in mind there is no easy way to correlate dense CPUs or MFLOPS per watt to actual performance. Packing CPUs in a custom 1U box may be a big win for some problems, but a big waste of money for others. Also, the "lego we use" in the Beowulf community is largely what is produced for other much larger markets and while we would like to see some things done differently, there is a reasonable trade-off between cost/flexibility/performance. I find it interesting that in all the talk of P4 systems there is little discussion about the Intel chips-set for the P4 only supporting 32 PCI. If you need anything other than Fast Ethernet this could be a real drawback (i.e. an imbalanced system with bottle necks) The new Xeon chip set has 2 64-bit slots. Of course there are other chip-sets on the immediate horizon, but the market seems to be making a clear differentiation about the "desktop" vs. "server" product. Doug -- ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From henken at seas.upenn.edu Tue May 29 07:25:31 2001 From: henken at seas.upenn.edu (henken) Date: Wed Nov 25 01:01:22 2009 Subject: Announce: Clubmask Cluster Tools Message-ID: <3B13B15B.7040001@seas.upenn.edu> What is clubmask ---------------- Clubmask is a set of open source Beowulf Cluster management tools being developed at the University of Pennsylvania. These tools are unique in that the aim of the project is to provide a "Physicist Proof", completely turnkey set of tools for installing and mangagig clusters. Links -------------------------------- Home page: http://grove.cis.upenn.edu/~henken/clubmask What's available presently (alpha1) --------------------------------- Installation and rudimentry administration via cfenfine Extensibility via python classes but not database Current Support --------------------------------- The current state of the Clubmask software allows the user to install an entire cluster by creating kickstart configuration files through the use of a web page. The web page steps the user through all of the neccesary steps in the install procedure that are prudent to a cluster install. There have been options in RedHat's kickstart method that have been removed or been used as default settings. For example, the option to configure Xwindows has been removed due to the fact that one does not need X on the remote nodes. The auto-reboot after install function has been set to 'on' all of the time due to a small python script that communicates with the clubmask server during the install and halts the install until the user removes the floppy and presses a key to signal that he/she has done so. The software will also generate the /etc/hosts, /etc/hosts.equiv/ and /root/.rhosts file each time a machine is installed using the Clubmask software. This allows the user to use a short naming scheme for each of the machines, such as eio6, fileserver1, etc. Also installed by default is the rsh service. Rsh is a standard way to communicate inside the cluster without the need to log in each time. These service are set through cfengine files generated by the Clubmask software. Nicholas Henke University of Pennsylvania Engineeriing '02 From agrajag at scyld.com Tue May 29 08:03:17 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:22 2009 Subject: Several questions obout Beofdisk In-Reply-To: ; from jtao@artsci.wustl.edu on Sun, May 27, 2001 at 02:20:45PM -0500 References: <200105271600.MAA10156@blueraja.scyld.com> Message-ID: <20010529110317.A20187@blueraja.scyld.com> On Sun, 27 May 2001, Jian Tao wrote: > Only one node could be partitioned properly > > with beofdisk. Using "beostatus", I can only monitor > > the usage of CPU, Memory, Swap disk of that node. > > > All other nodes stops at "node_up: Setting system clock." > > at phase 3. > > BTW: I used default configuration files only. What version of Scyld Beowulf? 'cat /etc/scyld-release' will tell you which version you have. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010529/ad8b479f/attachment.bin From bari at onelabs.com Tue May 29 08:46:37 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: Message-ID: <3B13C45D.1000706@onelabs.com> Douglas Eadline wrote: > Keep in mind there is no easy way to correlate dense CPUs or MFLOPS per > watt to actual performance. Packing CPUs in a custom 1U box > may be a big win for some problems, but a big waste of money > for others. > The StoneSoup BeoClusters will always be the best approach for many as others have pointed out here due to budget, reuse and management/organizational constraints. > Also, the "lego we use" in the Beowulf community is largely > what is produced for other much larger markets and while we would like > to see some things done differently, there is a reasonable > trade-off between cost/flexibility/performance. The turn-key clusters of 16 nodes and greater targeted at high speed with high bandwith/low latency interconnects seems to be where things can be improved significantly. Maybe some new "legos" are needed. > > I find it interesting that in all the talk of P4 systems > there is little discussion about the Intel chips-set > for the P4 only supporting 32 PCI. If you need anything > other than Fast Ethernet this could be a real drawback > (i.e. an imbalanced system with bottle necks) > The new Xeon chip set has 2 64-bit slots. Of course there > are other chip-sets on the immediate horizon, but > the market seems to be making a clear differentiation about the > "desktop" vs. "server" product. Some applications will churn away at a piece of data for hours or days before spitting out a few bits to pass on or compare to what the other nodes have as results. A 300bps link between nodes may be adequate here. Other applications may only run through a few CPU cycles before passing on a chunk of data where even 10Gb/sec interconnections are bottlenecks. It's this high speed end of clustering where I see a need for improvements. I have seen interest in the P4 for it's raw performance only and not for the lack of current support for DDR and 64/66 PCI. It's all around the corner though from all the chipset vendors. Infinband will really make a big difference for high speed and high bandwidth applications when it comes standard in chipsets. Bari From deadline at plogic.com Tue May 29 09:16:57 2001 From: deadline at plogic.com (Douglas Eadline) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <3B13C45D.1000706@onelabs.com> Message-ID: On Tue, 29 May 2001, Bari Ari wrote: > Douglas Eadline wrote: > > > Keep in mind there is no easy way to correlate dense CPUs or MFLOPS per > > watt to actual performance. Packing CPUs in a custom 1U box > > may be a big win for some problems, but a big waste of money > > for others. > > > The StoneSoup BeoClusters will always be the best approach for many as > others have pointed out here due to budget, reuse and > management/organizational constraints. > > > Also, the "lego we use" in the Beowulf community is largely > > what is produced for other much larger markets and while we would like > > to see some things done differently, there is a reasonable > > trade-off between cost/flexibility/performance. > > The turn-key clusters of 16 nodes and greater targeted at high speed > with high bandwith/low latency interconnects seems to be where things > can be improved significantly. Maybe some new "legos" are needed. The new "legos" we get are usually out of our control and created through market forces much larger then Beowulf clusters. We are to some degree parasites. Indeed, there are many among us that have been burned on "proprietary legos" (legos that do work the other kids toys) and now will only use those legos we can get from the bigger markets because we know that they will be low cost and have an upgrade path and play well with others. > > > > > I find it interesting that in all the talk of P4 systems > > there is little discussion about the Intel chips-set > > for the P4 only supporting 32 PCI. If you need anything > > other than Fast Ethernet this could be a real drawback > > (i.e. an imbalanced system with bottle necks) > > The new Xeon chip set has 2 64-bit slots. Of course there > > are other chip-sets on the immediate horizon, but > > the market seems to be making a clear differentiation about the > > "desktop" vs. "server" product. > > Some applications will churn away at a piece of data for hours or days > before spitting out a few bits to pass on or compare to what the other > nodes have as results. A 300bps link between nodes may be adequate here. > Other applications may only run through a few CPU cycles before passing > on a chunk of data where even 10Gb/sec interconnections are bottlenecks. > It's this high speed end of clustering where I see a need for improvements. And in some cases many slower less expensive CPUs may be better than expensive faster ones. With clusters, focusing on one part of the system (usually the cpu) can be a dangerous. Other aspects of the system from software to hardware need to be considered. > > I have seen interest in the P4 for it's raw performance only and not for > the lack of current support for DDR and 64/66 PCI. It's all around the > corner though from all the chipset vendors. Infinband will really make a > big difference for high speed and high bandwidth applications when it > comes standard in chipsets. An increase of processor speed without a corresponding increase in network speed can reduce scalability of an application. Of course it depends on the application. If you are doing rendering, then you will not see the effect. If you are calculating molecular orientation of some kind, you would be be well rewarded to consider a balanced system. Doug > > Bari > > > -- ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From bari at onelabs.com Tue May 29 10:02:06 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) References: Message-ID: <3B13D60E.80909@onelabs.com> Douglas Eadline wrote: > >> The turn-key clusters of 16 nodes and greater targeted at high speed >> with high bandwith/low latency interconnects seems to be where things >> can be improved significantly. Maybe some new "legos" are needed. > > > The new "legos" we get are usually out of our control and > created through market forces much larger then Beowulf clusters. > We are to some degree parasites. Indeed, there are many among us > that have been burned on "proprietary legos" (legos that > do work the other kids toys) and now will only use those legos > we can get from the bigger markets because we know that they > will be low cost and have an upgrade path and play well with others. > > > And in some cases many slower less expensive CPUs may be better than > expensive faster ones. With clusters, focusing on one part of the system > (usually the cpu) can be a dangerous. Other aspects of the > system from software to hardware need to be considered. > Low cost "legos" with control in the hands of the fringe cluster integrators is a challenge most vendors with market forces much larger than BeoClusters won't or can't offer. An elegant dual P4 or quad K7 ATX mainboard that operates in a 1U enclosure is what you'll probably not find anywhere on Pricewatch but will be available from other sources that will not violate any of the Beowulf pledge to affordability, upgradeability, reusability and flexibility. Very low cost nodes can also be built using OTS system-on-chip components not much larger than a deck of cards, simliar to what's becoming the mainstream in the bowels of PDA, IA and handheld computers. Bari From dvos12 at calvin.edu Tue May 29 11:19:27 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: KVM Switch In-Reply-To: Message-ID: Actually the problem is a little worse. Every time you plug a PS/2 keyboard into a motherboard, you have a chance of ruining the motherboard. With good, digital KVM's they simulate a constant connection so that the computer never realizes the keyboard's been unplugged. We're using the Belkin OmniView PRO 16-Port that was recommended earlier, and it has been working pretty well for us. David On Mon, 21 May 2001 alvin@Mail.Linux-Consulting.com wrote: > > hi ... > > it depends on the motherboard and the PS/2 mouse... > some mouse wont hang the motherboard when the mouse is unplugged... > from whatw we've seen... > unplugging can also be switching between mb with KVMs > > c ya > alvin > http://www.Linux-1U.net > > On Sat, 12 May 2001, Gerry Creager N5JXS wrote: > > > We/ve been using the Belkin switch in our operations, both for my > > cluster and for a lot of other departmental stacks of PCs. They seem to > > be just about bulletproof, but are certainly not the cheapeest ones out > > there. > > they seem to be less suspceptible > > > > > > > Yes, I found this out the hard way. Abit VP6 hangs if you unplug/plug > > > PS/2 mouse. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From dvos12 at calvin.edu Tue May 29 12:55:09 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: Gaussian 98 Message-ID: Some professors are interested in using Gaussian 98 on our Beowulf. I'm researching to find out what would all be involved to get it working. I remember people talking about this awhile back, but I think the discussion was during the time the archives weren't working, because I couldn't find much on them. It is a 16 node Beowulf running Scyld 27BZ-7, and people will be viewing the results on windows machines. 1. What kind of licensing is envolved? Can we just get a site license for Gaussian 98 or do we also need to get a site license for GaussViewW also? If we have to pay extra for GaussView W, then we might just use a free web interface being developed at Hope College. Does the license for BLAS cost extra? 2. What compilers are needed to use Gaussian 98? www.gausian.com says that I need the Portland Group Fortran compiler. Why? Would it not work with g77? 3. What does it take to install it? Do I just run make, or do I have to do something special to run on a Beowulf. Is is an MPI app or something? 4. Are there any web sites someone could point me to that deal with Gaussian 98 on a Beowulf? I know that quite a few people have done this. David From ksfacinelli at yahoo.com Tue May 29 14:04:51 2001 From: ksfacinelli at yahoo.com (Kevin Facinelli) Date: Wed Nov 25 01:01:22 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: <3B117236.6ADDE94A@paralleldata.com> Message-ID: <20010529210451.81812.qmail@web13506.mail.yahoo.com> I once saw a chart that comparied diferent aspects of processors: Speed, Power Consumption, Heat, Transitors...ect Does anyone know where I can find this type of summary chart??? Kevin ===== Kevin Facinelli www.colosource.com webmaster@colosource.com __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ From hahn at coffee.psychology.mcmaster.ca Tue May 29 16:11:13 2001 From: hahn at coffee.psychology.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:22 2009 Subject: Disk reliability (Was: Node cloning) In-Reply-To: <3B13A8E9.1F2B7CBD@icase.edu> Message-ID: > P.S. You need at least UltraDMA mode 2 to get CRC protection (=> you do > not want to turn off DMA without a really good reason). Lower modes > (PIO or multiword DMA) do not have CRC protection. afaik, there do exist UDMA modes 0 and 1 - they're not commonly used, but they're valid, and should definitely have the same benefit of CRC. From lindahl at conservativecomputer.com Tue May 29 16:44:30 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: Intel is finally shipping the 64-bit Itanium In-Reply-To: ; from bob@drzyzgula.org on Sun, May 27, 2001 at 08:33:38PM -0400 References: <3B114BC2.1050205@onelabs.com> Message-ID: <20010529194430.A3353@wumpus.dhcp.fnal.gov> On Sun, May 27, 2001 at 08:33:38PM -0400, Bob Drzyzgula wrote: > Although there are several high-profile and many lower-profile instances of > whole clusters being purchased and delivered on a single manifest, my > impression is that the vast majority of Beowulf clusters continue to be > built on an ad-hoc, piece-meal basis (am I wrong about this?). Beats me: are you counting money, or number of clusters? If you're counting the number of clusters, 2-3 machine clusters in people's basements are probably a majority. If you're counting money, you might be surprised, but there's no good way to get a total for either category. I figure it's not worth arguing. -- greg From lindahl at conservativecomputer.com Tue May 29 16:45:43 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: ; from deadline@plogic.com on Tue, May 29, 2001 at 09:53:05AM -0400 References: <3B0BFEDB.3080805@onelabs.com> Message-ID: <20010529194543.B3353@wumpus.dhcp.fnal.gov> On Tue, May 29, 2001 at 09:53:05AM -0400, Douglas Eadline wrote: > I find it interesting that in all the talk of P4 systems > there is little discussion about the Intel chips-set > for the P4 only supporting 32 PCI. It certainly has been noted by Myrinet sites; it's a bit of a pain. But I suspect the numbers will play out that I'll be doing dual P4 systems, which don't have that little limitation. -- g From lindahl at conservativecomputer.com Tue May 29 16:50:10 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... In-Reply-To: <200105270523.AAA23169@sijer.mayo.edu>; from crhea@mayo.edu on Sun, May 27, 2001 at 12:23:42AM -0500 References: <200105270523.AAA23169@sijer.mayo.edu> Message-ID: <20010529195010.C3353@wumpus.dhcp.fnal.gov> On Sun, May 27, 2001 at 12:23:42AM -0500, Cris Rhea wrote: > "Hard-hang" means nothing on console, disk light on solid, doesn't > respond to reset or power switches- have to reset by pulling plug. That's kind of weird. Now the "power switch" on modern systems is a toggle switch that signals the power supply that you'd like it to change state. If it's ignoring that, then I'd suspect you've got something really wrong, along the lines of you don't have a power supply that can supply peak need for the system under weird load. You can test this by moving the power supply to a new node and see if it does it there. Nasty work, yes, but it would prove the point. Or, maybe I don't understand power switches and it actually is bios catchable or something. -- g From abc at quux.chem.emory.edu Tue May 29 19:00:19 2001 From: abc at quux.chem.emory.edu (Ben Cornett) Date: Wed Nov 25 01:01:22 2009 Subject: Gaussian 98 In-Reply-To: ; from dvos12@calvin.edu on Tue, May 29, 2001 at 03:55:09PM -0400 References: Message-ID: <20010529220019.A22084@quux.chem.emory.edu.> On Tue, May 29, 2001 at 03:55:09PM -0400, David Vos wrote: > Some professors are interested in using Gaussian 98 on our Beowulf. I'm > researching to find out what would all be involved to get it working. I > remember people talking about this awhile back, but I think the discussion > was during the time the archives weren't working, because I couldn't find > much on them. > > It is a 16 node Beowulf running Scyld 27BZ-7, and people will be viewing > the results on windows machines. > > 1. What kind of licensing is envolved? Can we just get a site license for > Gaussian 98 or do we also need to get a site license for GaussViewW > also? If we have to pay extra for GaussView W, then we might just use a > free web interface being developed at Hope College. You don't need GaussView to run Gaussian. GaussView just gives you a nice way of setting up jobs and visuallizing output. > > Does the license for BLAS cost extra? > > 2. What compilers are needed to use Gaussian 98? www.gausian.com says > that I need the Portland Group Fortran compiler. Why? Would it not work > with g77? > > 3. What does it take to install it? Do I just run make, or do I have to > do something special to run on a Beowulf. Is is an MPI app or something? Gaussian uses the Linda parallel programming environment. It's not an MPI app. > > 4. Are there any web sites someone could point me to that deal with > Gaussian 98 on a Beowulf? I know that quite a few people have done this. > > David > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dvos12 at calvin.edu Tue May 29 19:22:04 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: Gaussian 98 In-Reply-To: <20010529220019.A22084@quux.chem.emory.edu.> Message-ID: On Tue, 29 May 2001, Ben Cornett wrote: > You don't need GaussView to run Gaussian. GaussView just gives you a nice way > of setting up jobs and visuallizing output. Yeah, but you still need to visuallize the output. > Gaussian uses the Linda parallel programming environment. It's not an > MPI app. From dvos12 at calvin.edu Tue May 29 19:26:39 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... In-Reply-To: <20010529195010.C3353@wumpus.dhcp.fnal.gov> Message-ID: Hmmm. I've seen Windows do that to enough computers I doubt the problem is the power supply. Although to make Linux hang like that is usually a hardware problem. David On Tue, 29 May 2001, Greg Lindahl wrote: > On Sun, May 27, 2001 at 12:23:42AM -0500, Cris Rhea wrote: > > > "Hard-hang" means nothing on console, disk light on solid, doesn't > > respond to reset or power switches- have to reset by pulling plug. > > That's kind of weird. Now the "power switch" on modern systems is a > toggle switch that signals the power supply that you'd like it to > change state. If it's ignoring that, then I'd suspect you've got > something really wrong, along the lines of you don't have a power > supply that can supply peak need for the system under weird load. > > You can test this by moving the power supply to a new node and see if > it does it there. Nasty work, yes, but it would prove the point. > > Or, maybe I don't understand power switches and it actually is bios > catchable or something. > > -- g > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From lindahl at conservativecomputer.com Tue May 29 19:54:37 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... In-Reply-To: ; from dvos12@calvin.edu on Tue, May 29, 2001 at 10:26:39PM -0400 References: <20010529195010.C3353@wumpus.dhcp.fnal.gov> Message-ID: <20010529225437.A1992@wumpus> On Tue, May 29, 2001 at 10:26:39PM -0400, David Vos wrote: > Hmmm. I've seen Windows do that to enough computers I doubt the problem > is the power supply. Although to make Linux hang like that is usually a > hardware problem. If it weren't for the "power button doesn't work", which I haven't seen before, I'd certainly agree that it's likely a random hardware problem. You've seen Windows hang machines to the point where the power button doesn't do anything? I never have. But then again I don't use Windows much. -- g From edwards at icantbelieveimdoingthis.com Tue May 29 20:42:40 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:22 2009 Subject: Scyld Message-ID: <20010529214240.B25886@icantbelieveimdoingthis.com> I want to thank Sean Dilda and Keith Undd for pointing to ways to use a Scyld cluster without running jobs on the head node and taking advantage of local disk. The remedy for the latter is quite straightforward. However, the use of p4pg files is virtually unworkable in a multiuser environment. To avoid crowding the head node, each user would have to query all the nodes to find the inactive ones, writea prpg file specifying which nodes they want, and deposit it on the node they decide will be node0 for their run. The chances for collision are finite. In coming releases of Scyld I hope the -nolocal option is activated. This is, to my mind, the cleaner way to get off of the head node. Art Edwards -- Arthur H. Edwards 712 Valencia Dr. NE Abq. NM 87108 (505) 256-0834 From rsand at d.umn.edu Wed May 30 04:49:50 2001 From: rsand at d.umn.edu (Robert Sand) Date: Wed Nov 25 01:01:22 2009 Subject: PVM with the scyld cluster. Message-ID: <3B14DE5E.B5097734@d.umn.edu> Hello all, I have a customer that is more familiar with using pvm rather than mpi so I need some instructions on how to get pvm working with the SCYLD cluster. Is there anyone out there using pvm on a scyld cluster and if so can I get instructions to get pvm to work with the cluster? -- Robert Sand. mailto:rsand@d.umn.edu US Mail University of Minnesota Duluth 10 University Dr. Information Technology Systems and Services MWAH 176 144 MWAH Duluth, MN 55812 Phone 218-726-6122 fax 218-726-7674 "Walk behind me I may not lead, Walk in front of me I may not follow, Walk beside me and we walk together" UTE Tribal proverb. From deadline at plogic.com Wed May 30 06:42:02 2001 From: deadline at plogic.com (Douglas Eadline) Date: Wed Nov 25 01:01:22 2009 Subject: custom cluster cabinets (was Re: 1U P4 Systems) In-Reply-To: <20010529194543.B3353@wumpus.dhcp.fnal.gov> Message-ID: On Tue, 29 May 2001, Greg Lindahl wrote: > On Tue, May 29, 2001 at 09:53:05AM -0400, Douglas Eadline wrote: > > > I find it interesting that in all the talk of P4 systems > > there is little discussion about the Intel chips-set > > for the P4 only supporting 32 PCI. > > It certainly has been noted by Myrinet sites; it's a bit of a pain. > But I suspect the numbers will play out that I'll be doing dual P4 > systems, which don't have that little limitation. BTW: My understanding is that Intel is now referring to the dual P4 (those CPUS that can be used on dual motherboards) as Xeon. They are not called P4-Xeon, just Xeon. In terms of price the are about the same as P4's. They do not have larger caches, but allow dual CPUs and have the "Netburst" thingie. (http://www.tomshardware.com/technews/technews-20010521.html) So it looks more like some kind of Intel marketing thing to move users of dual systems to to "Xeon" platforms. I suspect a little bait and switch is coming (i.e. higher priced Xeons once we all have our dual "Xeon" systems) It will be interesting to see how the dual Athlon 4 effects this strategy. Doug > > -- g > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ------------------------------------------------------------------- Paralogic, Inc. | PEAK | Voice:+610.814.2800 130 Webster Street | PARALLEL | Fax:+610.814.5844 Bethlehem, PA 18015 USA | PERFORMANCE | http://www.plogic.com ------------------------------------------------------------------- From wiseowl at accessgate.net Wed May 30 07:28:43 2001 From: wiseowl at accessgate.net (Douglas M. Shubert) Date: Wed Nov 25 01:01:22 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <20010529210451.81812.qmail@web13506.mail.yahoo.com> Message-ID: <3B15039B.607D7180@accessgate.net> looks like Intel is benching the Itanium to the UltraSparc II. http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm > I once saw a chart that comparied diferent aspects of > processors: Speed, Power Consumption, Heat, > Transitors...ect > > Does anyone know where I can find this type of summary > chart??? > > Kevin > > ===== > Kevin Facinelli > www.colosource.com > webmaster@colosource.com > > __________________________________________________ > Do You Yahoo!? > Yahoo! Auctions - buy the things you want at great prices > http://auctions.yahoo.com/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Help find a cure. Sponsor a computer today! http://www.computers4acure.org From rgb at phy.duke.edu Wed May 30 07:35:43 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... In-Reply-To: Message-ID: On Tue, 29 May 2001, David Vos wrote: > Hmmm. I've seen Windows do that to enough computers I doubt the problem > is the power supply. Although to make Linux hang like that is usually a > hardware problem. > > David > > On Tue, 29 May 2001, Greg Lindahl wrote: > > > Or, maybe I don't understand power switches and it actually is bios > > catchable or something. I don't think it is Linux or Windows -- I think it is just a mismatch between the power supply capacity and the hardware configuration. I've definitely seen difficulty with an ATX system turning itself on and off with an inferior power supply -- I once had to go through three on a brand new system to get the damn thing to power up as the vendor clearly hadn't read the motherboard spec (or powered it up before shipping it, grrr). Note that all three had the proper lineout voltages (I checked) -- they simply didn't have the peak power capacity required to do the switching. Thus by "inferior" I mean unable to provide the >>peak<< current required on the switching line to make an ATX board (given the hardware loading of the total system configuration) turn on or off, not that there is anything necessarily "wrong with" or cheap about the power supply itself. Note also that the supply can actually have plenty of nominal capacity measured in aggregate watts -- it is its ability to deliver power on ONE LINE that matters. Since I tend to get the cheapest possible systems, I probably see this more often than some. In my own experience, it is not at all unusual that the front panel toggle (which is the thing that controls this) on a "loaded" hardware configurations can turn the system on when it is basically unloaded but cannot seem to turn the system off when running (presumably it could provide enough juice for the first with the system "off", but when the motherboard is under even idle load it cannot manage the second). I've got a couple of these systems sitting in the room with me right now. One is "loaded" -- CD-RW, a couple of HD's, a floppy, dual CPUs, lots of memory, a NIC, a high end video card. The other isn't as loaded but has an older motherboard and a smaller case and power supply. They run only Linux -- this isn't an OS issue. Motherboards often come with their switching current requirements indicated somewhere in the technical specs, but given the vast range of motherboards, cases and "generic" power supplies, and hardware configurations within the case (with every element making its own demands, in many cases with e.g. NICs powered up even before the system is turned "on") it really isn't that surprising that some systems are mismatched or end up operating on the margins of the switching power range. Systems that have a hard time on the front panel switch also generally can't manage to do a proper powerdown "halt" in software. A LOT of systems come with both the front panel "hot" switch and a rocker switch on the power supply itself, and even if the front panel switch is tired and doesn't want to turn a system off the back one always works. So does pulling the plug;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From djanies at amnh.org Wed May 30 08:19:39 2001 From: djanies at amnh.org (djanies@amnh.org) Date: Wed Nov 25 01:01:22 2009 Subject: parallelism conference AMNH NYC Message-ID: NEW DIRECTIONS IN CLUSTER SUPERCOMPUTING Convened by: American Museum of Natural History in collaboration with National Aeronautics and Space Administration (Ames Research Center, Office of Fundamental Biology Program) Wednesday and Thursday, June 13 and 14, 2001 American Museum of Natural History Central Park West at 79th Street New York, NY 10024 Registration fee: $10 Over the last ten years, parallel supercomputer machines have come to prominence in computation. Research areas such as astrophysics and genomics generate huge data sets of immense complexity. Only this new computing paradigm can give the scientific research community the computational power to make sense of the flood of data these fields present. Over two days, the diverse industries that make up this community will gather to discuss the effect of supercomputing on biology, astrophysics, and research, as well as air traffic, finance and the entertainment world. In addition to AMNH and NASA Ames, IT professionals and scientists from Celera, The Dogma Project, IBM, National Cancer Institute, National Center for Supercomputing Applications, Stanford University and Brigham Young University and will be speaking on a diverse range of topics. Sponsored by: Compaq For a full agenda, or to register, click here http://www.amnh.org/supercomputing/?src=CSCL Daniel Janies, PhD American Museum of Natural History Division of Invertebrates Central Park West at 79th Street New York, NY 10024 212-313-7538 voice 212-769-5277 fax http://research.amnh.org/~djanies ftp://ftp.amnh.org/pub/molecular From math at velocet.ca Wed May 30 08:36:07 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:22 2009 Subject: cheap DDR systems Message-ID: <20010530113607.K1360@velocet.ca> Well whatever corollary of Moore's law that talks about prices of technology dropping exponentially has struck: PcChips M817 LMR board for around $145 CDN. On board 100BTX ether, but unlike the M810, no on board video (dang!). And, according to the confusing website at www.crucial.com they supposedly have DDR ram for very cheap, like 256Mb DDR PC2100 for $65 US (but I cant confirm this - what's a 'memory upgrade', do I buy something from them then upgrade it?). Putting these two together with a 1.2Ghz Tbird and a cheapass videocard (I havent tested wether the board boots without a videocard, would be nice, but is unlikley) gives you a diskless node with 512Mb for around $700 CDN (build yer own cabinet out of sheetmetal). Not a month ago people were talking about full 512Mb DDR systems for $1700 CDN (depends on the HD you throw in - these boards have at least ATA66 on them). I found about a 1.5x speed increase of the 1.2Ghz Tbird w/512Mb DDR over the 900Mhz Tbird w/512Mb PC133 for our Gaussian98 work. Now that this 1.5x advantage is no longer a 2.3x cost premium (but more like 1.3x if all these prices pan out), and the addition of videocards to my design isnt a huge cost increase (Im searching the net to order 1Mb PCI vidcards to just boot these things - cost ~ $15 CDN), then this is seemingly viable. Price estimates subject to bogosity as the CDN$ falls into the toilet... If anyone can confirm crucial.com's ordering stuff, let me know. I might just order 512Mb just as a test for the company and see where I get... /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From agrajag at scyld.com Wed May 30 09:15:50 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:22 2009 Subject: cheap DDR systems In-Reply-To: <20010530113607.K1360@velocet.ca>; from math@velocet.ca on Wed, May 30, 2001 at 11:36:07AM -0400 References: <20010530113607.K1360@velocet.ca> Message-ID: <20010530121550.A6605@blueraja.scyld.com> On Wed, 30 May 2001, Velocet wrote: > And, according to the confusing website at www.crucial.com they supposedly > have DDR ram for very cheap, like 256Mb DDR PC2100 for $65 US (but I cant > confirm this - what's a 'memory upgrade', do I buy something from them > then upgrade it?). 'memory upgrade' is marketing playing tricks on you. Its just a standard stick of RAM. I assume the use the word 'upgrade' as they think you'll only be using it to upgrade existing systems. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010530/9d955589/attachment.bin From bari at onelabs.com Wed May 30 09:23:30 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:22 2009 Subject: cheap DDR systems References: <20010530113607.K1360@velocet.ca> <20010530121550.A6605@blueraja.scyld.com> Message-ID: <3B151E82.8020605@onelabs.com> Sean Dilda wrote: > On Wed, 30 May 2001, Velocet wrote: > > >> And, according to the confusing website at www.crucial.com they supposedly >> have DDR ram for very cheap, like 256Mb DDR PC2100 for $65 US (but I cant >> confirm this - what's a 'memory upgrade', do I buy something from them >> then upgrade it?). > > > 'memory upgrade' is marketing playing tricks on you. Its just a > standard stick of RAM. I assume the use the word 'upgrade' as they > think you'll only be using it to upgrade existing systems. Take a look at http://www.pricewatch.com/ There are at least a dozen vendors with 256MB DDR PC2100 for under $70 US and 256MB DDR PC2400 for $107 - $ 130. Bari From rgb at phy.duke.edu Wed May 30 09:29:26 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:22 2009 Subject: cheap DDR systems In-Reply-To: <20010530121550.A6605@blueraja.scyld.com> Message-ID: On Wed, 30 May 2001, Sean Dilda wrote: > On Wed, 30 May 2001, Velocet wrote: > > > And, according to the confusing website at www.crucial.com they supposedly > > have DDR ram for very cheap, like 256Mb DDR PC2100 for $65 US (but I cant > > confirm this - what's a 'memory upgrade', do I buy something from them > > then upgrade it?). > > 'memory upgrade' is marketing playing tricks on you. Its just a > standard stick of RAM. I assume the use the word 'upgrade' as they > think you'll only be using it to upgrade existing systems. Now if they would just start dropping the prices of the 512 MB DDR in a similar way, life would be good. Very good, actually. It's too late for my last cluster upgrade, but it is just right on time for the next two... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From edwards at icantbelieveimdoingthis.com Wed May 30 08:59:46 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:22 2009 Subject: Scyld In-Reply-To: <015501c0e8c7$52534a20$dc61fea9@beejay>; from bobbyh@MPI-Softtech.Com on Wed, May 30, 2001 at 12:13:52AM -0500 References: <20010529214240.B25886@icantbelieveimdoingthis.com> <015501c0e8c7$52534a20$dc61fea9@beejay> Message-ID: <20010530095946.A27701@icantbelieveimdoingthis.com> Bobby: Thanks for the reply. On Wed, May 30, 2001 at 12:13:52AM -0500, Bobby Hunter wrote: > Art, > > We have a version of MPI/Pro for Scyld that uses a "node" file, much in the > same way one uses a machine file. You simply list the nodes in a file. i.e. > 1 > 2 > 3 > 6 > 8 Does your system choose from those nodes not in use but that are in your machine file? > and then launch from the head node using > 'mpirun -np 5 -mach_file myapp' > You can download a copy at www.mpi-softtech.com/downloads . The versions > available for download are free for up to 16 way jobs. > > Regards, > > Bobby > ------------- > > Bobby Hunter > Software Engineer > MPI Software Technology, Inc. > 662-320-4300 x. 16 > > ----- Original Message ----- > From: "Art Edwards" > To: > Cc: "Art Edwards" > Sent: Tuesday, May 29, 2001 10:42 PM > Subject: Scyld > > > > I want to thank Sean Dilda and Keith Undd for pointing to ways to use a > Scyld > > cluster without running jobs on the head node and taking advantage of > local > > disk. The remedy for the latter is quite straightforward. However, the use > > of p4pg files is virtually unworkable in a multiuser environment. To avoid > > crowding the head node, each user would have to query all the nodes to > find the > > inactive ones, writea prpg file specifying which nodes they want, and > deposit it > > on the node they decide will be node0 for their run. The chances for > collision > > are finite. In coming releases of Scyld I hope the -nolocal option is > activated. > > This is, to my mind, the cleaner way to get off of the head node. > > > > Art Edwards > > -- > > Arthur H. Edwards > > 712 Valencia Dr. NE > > Abq. NM 87108 > > > > (505) 256-0834 > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > Art Edwards -- Arthur H. Edwards 712 Valencia Dr. NE Abq. NM 87108 (505) 256-0834 From bari at onelabs.com Wed May 30 09:55:47 2001 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:01:22 2009 Subject: Itanium Processor and i460 Chipset Specs Message-ID: <3B152613.5070201@onelabs.com> FYI .... Intel finally released the Itanium specs today http://developer.intel.com/design/ia-64/ Bari From dvos12 at calvin.edu Wed May 30 10:18:01 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... In-Reply-To: Message-ID: On Wed, 30 May 2001, Robert G. Brown wrote: > A LOT of systems come with both the front panel "hot" switch and a > rocker switch on the power supply itself, and even if the front panel > switch is tired and doesn't want to turn a system off the back one > always works. So does pulling the plug;-) It should also work to just hold the power-switch in for 4 seconds. > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > From josip at icase.edu Wed May 30 11:11:25 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:22 2009 Subject: Help on cluster hang problem... References: Message-ID: <3B1537CD.6CEB6489@icase.edu> "Robert G. Brown" wrote: > > On Tue, 29 May 2001, David Vos wrote: > > > Hmmm. I've seen Windows do that to enough computers I doubt the problem > > is the power supply. Although to make Linux hang like that is usually a > > hardware problem. > > I don't think it is Linux or Windows -- I think it is just a mismatch > between the power supply capacity and the hardware configuration. I've seen both types of failures: (1) insufficient power supply capacity (fixed by upgrading to 400W power supplies) and (2) total machine crash where even the power button (pressed >5sec) did not work (rare but not fixed; typically caused by malfunctioning applications using VIA userspace access to devices). ATX power is under software control. In the case (1) the power supply can drop its 'power good' signal and the machine shuts off. In the case (2) the CPU fails to tell the power supply to shut off. The power switch is just a momentary contact switch, which the PC is supposed to read, and then interpret the length of time the switch was closed as 'suspend' or 'power off' requests (this is usually defined in BIOS), then send the appropriate signal to the power supply. When the machine is totally crashed, this process cannot be carried out as intended. Unfortunately, inexpensive ATX power supplies seldom include normal power switches. Of course, pulling the power cord always works :-) Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From cblack at eragen.com Wed May 30 11:41:07 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software Message-ID: <20010530144107.A3979@getafix.EraGen.com> We are trying to move our large sets of batch jobs (several thousand) into a make type environment. We already have a cluster running OpenPBS, but are looking at alternatives to PBS since we have been having problems with hangs and such. I am interested in MOSIX since it seems like it would integrate well with our multiple independant processes model and with make. But I also read in the MOSIX FAQ that java processes don't get migrated. Most of our programs are java code that links in fast native code for algorithms. I am not that familiar with MOSIX and I don't know if migrating processes would even be necessary for us. Ideally I would like to be able to do a: make -j50 and run 50 processes at a time. Right now we just have make execute a bunch of qsubs to queue jobs into PBS. We are also looking at Codine/GRD engine. So, does anyone have any suggestions? Any experience with MOSIX and make? Would this need the ability to migrate processes? Is MOSIX not a good solution for this type of application? Anyone else setting up make to use a cluster semi-transparently? Thanks, Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010530/cf34f479/attachment.bin From RSchilling at affiliatedhealth.org Wed May 30 12:03:37 2001 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software Message-ID: <51FCCCF0C130D211BE550008C724149E01165632@mail1.affiliatedhealth.org> You might try the clusterit toolkit. It allows you to spawn jobs across several machines quite easily. You can find it on the FreeBSD web site (www.freebsd.org). Richard Schilling Web Integration Programmer/Webmaster phone: 360.856.7129 fax: 360.856.7166 URL: http://www.affiliatedhealth.org Affiliated Health Services Information Systems 1971 Highway 20 Mount Vernon, WA USA > -----Original Message----- > From: Chris Black [mailto:cblack@eragen.com] > Sent: Wednesday, May 30, 2001 11:41 AM > To: beowulf@beowulf.org > Subject: make and cluster software > > > We are trying to move our large sets of batch jobs (several thousand) > into a make type environment. We already have a cluster running > OpenPBS, but are looking at alternatives to PBS since we have been > having problems with hangs and such. I am interested in MOSIX since > it seems like it would integrate well with our multiple independant > processes model and with make. But I also read in the MOSIX FAQ that > java processes don't get migrated. Most of our programs are java > code that links in fast native code for algorithms. I am not that > familiar with MOSIX and I don't know if migrating processes would > even be necessary for us. Ideally I would like to be able to do a: > make -j50 > > and run 50 processes at a time. Right now we just have make execute > a bunch of qsubs to queue jobs into PBS. > We are also looking at Codine/GRD engine. > > So, does anyone have any suggestions? Any experience with MOSIX and > make? Would this need the ability to migrate processes? Is MOSIX not > a good solution for this type of application? Anyone else setting up > make to use a cluster semi-transparently? > > Thanks, > Chris > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010530/072f97a5/attachment.html From math at velocet.ca Wed May 30 12:20:15 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:22 2009 Subject: cheap DDR systems In-Reply-To: <3B151E82.8020605@onelabs.com>; from bari@onelabs.com on Wed, May 30, 2001 at 11:23:30AM -0500 References: <20010530113607.K1360@velocet.ca> <20010530121550.A6605@blueraja.scyld.com> <3B151E82.8020605@onelabs.com> Message-ID: <20010530152015.O1360@velocet.ca> On Wed, May 30, 2001 at 11:23:30AM -0500, Bari Ari's all... > Sean Dilda wrote: > > > On Wed, 30 May 2001, Velocet wrote: > > > > > >> And, according to the confusing website at www.crucial.com they supposedly > >> have DDR ram for very cheap, like 256Mb DDR PC2100 for $65 US (but I cant > >> confirm this - what's a 'memory upgrade', do I buy something from them > >> then upgrade it?). > > > > > > 'memory upgrade' is marketing playing tricks on you. Its just a > > standard stick of RAM. I assume the use the word 'upgrade' as they > > think you'll only be using it to upgrade existing systems. > > Take a look at http://www.pricewatch.com/ > > There are at least a dozen vendors with 256MB DDR PC2100 for under $70 > US and 256MB DDR PC2400 for $107 - $ 130. This says $65 USD. $65 << $107. Here's the direct paste from the page 184-pin DIMM 256MB DDR PC2100 CL=2.5 Non-parity CT3264Z265 $72.99 $65.69 $7.30 Yes (the last prices are "web price" and "you save"). Here's the URL. Seems bogus to me to be that far under market. http://www.crucial.com/store/PartSpecs.asp?imodule=CT3264Z265 Thats $USD. This converts to ~$102 CDN which might have confused people when I was talking about system prices. Lets stick to $USD for now. $65.69. /kc From wsb at paralleldata.com Wed May 30 13:00:09 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:22 2009 Subject: [Fwd: Intel is finally shipping the 64-bit Itanium] Message-ID: <3B155149.44A9041@paralleldata.com> "Douglas M. Shubert" wrote: > > looks like Intel is benching the Itanium to the UltraSparc II. > http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm > That SPECfp2000 of 711 is king of the hill at the moment it appears. The SPECint2000 is a tad slow at 404. I suspect the prices won't be palatable though. Wes From pu at ku.ac.th Wed May 30 12:45:54 2001 From: pu at ku.ac.th (Putchong Uthayopas) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software References: <20010530144107.A3979@getafix.EraGen.com> Message-ID: <005501c0e941$2395f4b0$2f10dd8c@uthayopa> There is a tool called pvm make . I never use it but come across it sometime. Please checkout. http://203.162.44.72/books/system/pade/node1.html Putchong ----- Original Message ----- From: "Chris Black" To: Sent: Thursday, May 31, 2001 1:41 AM Subject: make and cluster software From pu at ku.ac.th Wed May 30 12:48:21 2001 From: pu at ku.ac.th (Putchong Uthayopas) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software References: <20010530144107.A3979@getafix.EraGen.com> Message-ID: <005701c0e941$7b518520$2f10dd8c@uthayopa> Also, how to apply mpi todo that http://www.mpi.nd.edu/downloads/mpidc95/papers/html/devaney/Mmg.html ----- Original Message ----- From: "Chris Black" To: Sent: Thursday, May 31, 2001 1:41 AM Subject: make and cluster software From tim.carlson at pnl.gov Wed May 30 14:09:52 2001 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed Nov 25 01:01:22 2009 Subject: [Fwd: Intel is finally shipping the 64-bit Itanium] In-Reply-To: <3B155149.44A9041@paralleldata.com> Message-ID: On Wed, 30 May 2001, W Bauske wrote: > "Douglas M. Shubert" wrote: > > > > looks like Intel is benching the Itanium to the UltraSparc II. > > http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm > > > > That SPECfp2000 of 711 is king of the hill at the moment it appears. > > The SPECint2000 is a tad slow at 404. > > I suspect the prices won't be palatable though. Check http://www.hp.com/workstations/products/itanium/i2000/summary.html for pricing of the machine used in the above test. With a Dell Precision 530 workstation at 1.7Ghz giving specfp2000 of 593 and specint2000 of 575 for half the price... Why would you want to waste money on Itanium boxes in a cluster configuration? Tim Tim Carlson Voice: (509) 375-5978 Email: Tim.Carlson@pnl.gov From wsb at paralleldata.com Wed May 30 14:47:20 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:22 2009 Subject: [Fwd: Intel is finally shipping the 64-bit Itanium] References: Message-ID: <3B156A68.8BD61B2B@paralleldata.com> Tim Carlson wrote: > > On Wed, 30 May 2001, W Bauske wrote: > > > "Douglas M. Shubert" wrote: > > > > > > looks like Intel is benching the Itanium to the UltraSparc II. > > > http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm > > > > > > > That SPECfp2000 of 711 is king of the hill at the moment it appears. > > > > The SPECint2000 is a tad slow at 404. > > > > I suspect the prices won't be palatable though. > > Check http://www.hp.com/workstations/products/itanium/i2000/summary.html > for pricing of the machine used in the above test. > > With a Dell Precision 530 workstation at 1.7Ghz giving specfp2000 of 593 > and specint2000 of 575 for half the price... Why would you want to waste > money on Itanium boxes in a cluster configuration? > Compared to an Alpha 833Mhz, it's probably not so bad. Wes From omri at NMR.MGH.Harvard.EDU Wed May 30 14:43:54 2001 From: omri at NMR.MGH.Harvard.EDU (Omri Schwarz) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software In-Reply-To: <005501c0e941$2395f4b0$2f10dd8c@uthayopa> Message-ID: You could also have makefiles that invoke OpenPBS. I have two perl script, pbsubmit and launch to do that (pbsubmit is a wrapper to qsub, and launch is a wrapper for jobs with dependecy relations): http://www.mit.edu/~ocschwar/pbsubmit http://www.mit.edu/~ocschwar/launch http://www.mit.edu/~ocschwar/launch.1 (man page) Omri Schwarz --- omri@nmr.mgh.harvard.edu Timeless wisdom of biomedical engineering: "Noise is principally due to the presence of the patient." -- R.F. Farr From cblack at eragen.com Wed May 30 14:59:44 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:22 2009 Subject: make and cluster software In-Reply-To: ; from omri@NMR.MGH.Harvard.EDU on Wed, May 30, 2001 at 05:43:54PM -0400 References: <005501c0e941$2395f4b0$2f10dd8c@uthayopa> Message-ID: <20010530175944.G3979@getafix.EraGen.com> On Wed, May 30, 2001 at 05:43:54PM -0400, Omri Schwarz wrote: > You could also have makefiles that invoke > OpenPBS. > > I have two perl script, pbsubmit and launch > to do that (pbsubmit is a wrapper to qsub, > and launch is a wrapper for jobs with > dependecy relations): > > http://www.mit.edu/~ocschwar/pbsubmit > http://www.mit.edu/~ocschwar/launch > http://www.mit.edu/~ocschwar/launch.1 (man page) > Our makefiles currently invoke a qsub wrapper for PBS. This works fine except that we are starting to have problems with PBS locking up. I have been on the PBS mailing lists for awhile and have tweaked some things that make it much better, but PBS just doesn't seem to handle having thousands of jobs in its queue very well. Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010530/b7ac0a79/attachment.bin From lindahl at conservativecomputer.com Wed May 30 15:05:13 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: [Fwd: Intel is finally shipping the 64-bit Itanium] In-Reply-To: ; from tim.carlson@pnl.gov on Wed, May 30, 2001 at 02:09:52PM -0700 References: <3B155149.44A9041@paralleldata.com> Message-ID: <20010530180513.A2237@wumpus> On Wed, May 30, 2001 at 02:09:52PM -0700, Tim Carlson wrote: > With a Dell Precision 530 workstation at 1.7Ghz giving specfp2000 of 593 > and specint2000 of 575 for half the price... Why would you want to waste > money on Itanium boxes in a cluster configuration? Because places like pnl.gov want 64-bit machines? ;-) Seriously: The peak of the Itanium is great if your application can get there, and a few codes like Lattice QCD can. You have more flexability in main memory size. But these benefits might not be enough to outweigh the higher cost than a P4 or Athlon or whatever. -- greg From Thomas_Hoeffel at chiron.com Wed May 30 16:39:13 2001 From: Thomas_Hoeffel at chiron.com (Hoeffel, Thomas) Date: Wed Nov 25 01:01:22 2009 Subject: Question: Task Farm and Private Networks. Message-ID: <938CCE0495C5D411AD9B0001027598DE01298F57@emvshiva.chiron.com> Hi, I currently have a small cluster in which the slave nodes are on a private network. It is used primarily as a task farm and not as a true parallel machine. Only the master node sees our other systems (which are on their own switch). This casues problems with certain remote job submissions via some commercial packages since they write both local temp files and scratch temp files. Question: What is the drawback to giving each slave it's own true IP address and allowing them to NFS mount the same file systems as the master node? Thanks Thomas J. Hoeffel Computational Chemistry Chiron Corporation 4560 Horton St. Emeryville, CA 94608 510.923.8346 office 510.923.2010 fax From wiseowl at accessgate.net Wed May 30 17:21:23 2001 From: wiseowl at accessgate.net (Douglas M. Shubert) Date: Wed Nov 25 01:01:22 2009 Subject: new hp 4108gl switch Message-ID: <3B158E83.62B42F76@accessgate.net> Since Intel dropped the 420T this may be the next best switch for port density. http://www.hp.com/rnd/products/switches/switch4108GL/summary.htm 72 10/100 ports and a switch fabric speed of 36.6Gbps. -Doug -- Help find a cure. Sponsor a computer today! http://www.computers4acure.org From andreas at amy.udd.htu.se Thu May 31 00:48:33 2001 From: andreas at amy.udd.htu.se (Andreas Boklund) Date: Wed Nov 25 01:01:22 2009 Subject: Question: Task Farm and Private Networks. In-Reply-To: <938CCE0495C5D411AD9B0001027598DE01298F57@emvshiva.chiron.com> Message-ID: > I currently have a small cluster in which the slave nodes are on a private > network. It is used primarily as a task farm and not as a true parallel > machine. Only the master node sees our other systems (which are on their > own switch). This casues problems with certain remote job submissions via > some commercial packages since they write both local temp files and scratch > temp files. > > Question: What is the drawback to giving each slave it's own true IP address > and allowing them to NFS mount the same file systems as the master node? > > Thanks Hi, I have a system where the nodes are on a private network and they mount the same NFS shares as all other UNIX computers on our network. The reason that we do this is that the users sometimes want stuff from their /home/ dirs. I didnt use true IP-adresses because our IP-adressspace is limited. What i did was just to place a NAT on the master node that translates the node adresses to the master nodes adress. Actually this was one of the reasons that i used the 2.3pre release kernels in this cluster, i wanted to use IP-tables. It is very easy to set it up this way. And if you dont want the traffic to go through the master node you can always add communications node that will only handle traffic to and from your backbone, a P 200 64MB ram should be able to forward enough traffic for two FastEthernet NICs, and i bet you can find one lying around somwehere. Best Regards //Andreas F-Lab Research ********************************************************* * Administator of Amy and Sfinx(Iris23) * * * * Voice: 070-7294401 * * ICQ: 12030399 * * Email: andreas@shtu.htu.se, boklund@linux.nu * * * * That is how you find me, How do -I- find you ? * ********************************************************* From rgb at phy.duke.edu Thu May 31 07:51:12 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:22 2009 Subject: Question: Task Farm and Private Networks. In-Reply-To: <938CCE0495C5D411AD9B0001027598DE01298F57@emvshiva.chiron.com> Message-ID: bOn Wed, 30 May 2001, Hoeffel, Thomas wrote: > Hi, > > I currently have a small cluster in which the slave nodes are on a private > network. It is used primarily as a task farm and not as a true parallel > machine. Only the master node sees our other systems (which are on their > own switch). This casues problems with certain remote job submissions via > some commercial packages since they write both local temp files and scratch > temp files. > > Question: What is the drawback to giving each slave it's own true IP address > and allowing them to NFS mount the same file systems as the master node? In a real "compute farm" (where the tasks are embarrassingly parallel and don't communicate) none that I can think of. Indeed, it is the only sane way to go. There are many kinds of clusters, only a few of which are true "beowulfs" in the narrow sense of the definition of the architecture. For the task mix you describe (lots of embarrassingly parallel work run as separate jobs on the various "nodes") there is very little benefit to using a true beowulf architecture and plenty of additional costs in the form of scripting solutions to problems that arise due to a lack of a shared filesystem and so forth. Yes, recent list discussion has shown that you "can" use a scyld beowulf as a compute farm; it has also shown that it is a bit clumsy and difficult to do so, so why bother? It should be very easy to flatten your network -- either connect the inner switch to the outer switch (rationalizing e.g. the IP space and routing and all that) or arrange for the master node to act as a router and pass the NFS mounts through it. The In most cases I think the former makes more sense; in a few (mostly when the master is idle enough that the overhead of its acting as a router isn't "expensive" in terms of time to complete work) the latter might. Pop a more or less standard linux on each node (remembering that the nodes are now openly accessible and hence need to be configured with probably only sshd open as a means of access to minimize security hassles). You can strip the node configuration a bit -- if they are headless they probably don't need X servers, for example, and can likely live without games, KDE and/or Gnome desktops and tools, mail, news, web browsers, and the like. If they have big disks, though, there isn't much point in stripping the configuration a lot -- heterogeneity in a network costs more money in time than extra space costs in disk. Users can then login to each node and run jobs, or a remote job submission package can do it for them or you can install MOSIX on the nodes and let them login to a single node to run jobs and let MOSIX migrate them around to balance load. You may still want a tool like procstatd to monitor load on the cluster, especially if users are logging into nodes to run their jobs -- it can easily reveal which nodes are idle and ready for more work. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jsmith at structbio.vanderbilt.edu Thu May 31 08:43:47 2001 From: jsmith at structbio.vanderbilt.edu (Jarrod Smith) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? Message-ID: <3B1666B3.24BF3EF8@structbio.vanderbilt.edu> I'm sorry to post such a question here, but I'm getting no advice (good or bad) from the folks around here :) I need a 24-port switch with a reasonable backplane fabric that is preferably rack-mountable. I'm interested in getting two of them and trying to use channel bonding on a 17-node cluster. We're using the supermicro 1610H, and it has dual Intel NICs onboard. I've looked at the Cisco catalyst 2924XL, but at $1300, I'm not sure it's the most economical solution that will get me what I need, which is fairly basic I think. I look forward to hearing some of your suggestions. -- Jarrod A. Smith Research Asst. Professor, Biochemistry Asst. Director, Center for Structural Biology Computation and Molecular Graphics Vanderbilt University jsmith@structbio.vanderbilt.edu From leunen.d at fsagx.ac.be Thu May 31 08:54:50 2001 From: leunen.d at fsagx.ac.be (David Leunen) Date: Wed Nov 25 01:01:22 2009 Subject: MPI performance test for Scyld Message-ID: <3B16694A.8BF688FD@fsagx.ac.be> Hello, I need a program that do a performance test of a cluster. I saw that there isn't such a program in the Scyld distribution (have I missed it?). Can you tell me where I can find either source code or executable that run for MPI-Beowulf of Scyld. Thank you. David From bogdan.costescu at iwr.uni-heidelberg.de Thu May 31 09:07:23 2001 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? In-Reply-To: <3B1666B3.24BF3EF8@structbio.vanderbilt.edu> Message-ID: On Thu, 31 May 2001, Jarrod Smith wrote: > I need a 24-port switch with a reasonable backplane fabric that is > preferably rack-mountable. I'm interested in getting two of them and trying > to use channel bonding on a 17-node cluster. We're using the supermicro > 1610H, and it has dual Intel NICs onboard. I've had good experience with Nortel's BayStack 350-24T and 3Com's SuperStack 3300 XM (without additional ports). They both come in "desktop" version, but are accompanied by rack mounting brackets and screws. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From enano at fi.udc.es Thu May 31 09:25:59 2001 From: enano at fi.udc.es (Miguel Barreiro Paz) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? In-Reply-To: Message-ID: Hi, > I've had good experience with Nortel's BayStack 350-24T and 3Com's > SuperStack 3300 XM (without additional ports). They both come in "desktop" > version, but are accompanied by rack mounting brackets and screws. We've been using several SuperStack II 3300 for several years without problems, but note that their backplane is limited to 2.1Gbps (if Nortel ads are to be believed, that is). Maybe the newer SuperStack3 are faster. Regards, Miguel From lindahl at conservativecomputer.com Thu May 31 09:31:04 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: MPI performance test for Scyld In-Reply-To: <3B16694A.8BF688FD@fsagx.ac.be>; from leunen.d@fsagx.ac.be on Thu, May 31, 2001 at 05:54:50PM +0200 References: <3B16694A.8BF688FD@fsagx.ac.be> Message-ID: <20010531123104.B1662@wumpus> On Thu, May 31, 2001 at 05:54:50PM +0200, David Leunen wrote: > I need a program that do a performance test of a cluster. I saw that > there isn't such a program in the Scyld distribution (have I missed > it?). I would recommend the "mpptest" program distributed with mpich, for simple tests. A more comprehensive suite is available from Pallas or from SKaMPI. http://www.pallas.com/pages/pmbd.htm http://liinwww.ira.uka.de/~skampi/ -- g From dvos12 at calvin.edu Thu May 31 09:50:32 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:22 2009 Subject: MPI performance test for Scyld In-Reply-To: <3B16694A.8BF688FD@fsagx.ac.be> Message-ID: Scyld comes with linpack. /usr/bin/linpack is a script you can edit to run with different parameters, if you wish. David On Thu, 31 May 2001, David Leunen wrote: > Hello, > > I need a program that do a performance test of a cluster. I saw that > there isn't such a program in the Scyld distribution (have I missed > it?). > > Can you tell me where I can find either source code or executable that > run for MPI-Beowulf of Scyld. > > > Thank you. > > David > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From JParker at coinstar.com Thu May 31 09:52:55 2001 From: JParker at coinstar.com (JParker@coinstar.com) Date: Wed Nov 25 01:01:22 2009 Subject: How do I link a C program to a FORTRAN library Message-ID: G'Day ! Sorry if these are the wrong lists to post to but I am having difficulty in my search of docs, and you guys/gals are the most knowledgable ... Basically I am using a GTK/C frontend to a cluster app. The app needs to call routines in the SCALAPACK library for some matrix operations. I am using a Debian 2.2 install, and associated GCC and SCALAPACK versions (latest/greatest stable). Portabilty is not an issue. If this is not possible, are there C libraries that offer the same functionality ? Thanks for your help. cheers, Jim Parker Sailboat racing is not a matter of life and death .... It is far more important than that !!! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010531/77c522a0/attachment.html From leunen.d at fsagx.ac.be Thu May 31 09:56:21 2001 From: leunen.d at fsagx.ac.be (David Leunen) Date: Wed Nov 25 01:01:22 2009 Subject: MPI performance test for Scyld References: Message-ID: <3B1677B5.1D122CA5@fsagx.ac.be> > Scyld comes with linpack. /usr/bin/linpack is a script you can edit to > run with different parameters, if you wish. thank you, but linpack seems to be missing. I probably have a too old version of Scyld Beowulf (2.0 Preview Release). From sgaudet at angstrommicro.com Thu May 31 10:00:43 2001 From: sgaudet at angstrommicro.com (Steve Gaudet) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? In-Reply-To: <3B1666B3.24BF3EF8@structbio.vanderbilt.edu> References: <3B1666B3.24BF3EF8@structbio.vanderbilt.edu> Message-ID: <991328443.3b1678bb94f1b@localhost> Hello Jarrod, > I'm sorry to post such a question here, but I'm getting no advice (good > or > bad) from the folks around here :) > > I need a 24-port switch with a reasonable backplane fabric that is > preferably rack-mountable. I'm interested in getting two of them and > trying > to use channel bonding on a 17-node cluster. We're using the > supermicro > 1610H, and it has dual Intel NICs onboard. > > I've looked at the Cisco catalyst 2924XL, but at $1300, I'm not sure > it's > the most economical solution that will get me what I need, which is > fairly > basic I think. > > I look forward to hearing some of your suggestions. We've had good luck with Extreme Networks. http://www.extremenetworks.com/products/products.asp Cheers, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com From rauch at inf.ethz.ch Thu May 31 10:35:31 2001 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? In-Reply-To: <3B1666B3.24BF3EF8@structbio.vanderbilt.edu> Message-ID: On Thu, 31 May 2001, Jarrod Smith wrote: > I've looked at the Cisco catalyst 2924XL, but at $1300, I'm not sure > it's the most economical solution that will get me what I need, > which is fairly basic I think. I'm not very familiar with Cisco-Switches, but we once had to use a Catalyst 2900 XL and it sucked. The performance was gone as soon as more then 12 machines were communicating at full speed *Therefore we could not use the switch to install all the nodes with our cloning tool Dolly at reasonable speeds). We didn't have these problems with ATI CentreCom 742i switches, but I think they are in "desktop" cases. There are some performance figures from our own tests comparing the two switches: http://www.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Agg.pdf http://www.inf.ethz.ch/~rauch/tmp/FE.Catalyst2900XL.Stream.pdf http://www.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Agg.pdf http://www.inf.ethz.ch/~rauch/tmp/FE.CentreCom742i.Stream.pdf Regards, Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From frankmcb at microsoft.com Thu May 31 10:36:58 2001 From: frankmcb at microsoft.com (Frank McBath) Date: Wed Nov 25 01:01:22 2009 Subject: ethernet switch recommendation? Message-ID: <90FDFCE1CC02B147B4948A1F17CC182F0257B4F5@crd-msg-02.northamerica.corp.microsoft.com> for what it's worth... http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=1241681092 2924XL for $500 unused in box. -----Original Message----- From: Steve Gaudet [mailto:sgaudet@angstrommicro.com] Sent: Thursday, May 31, 2001 1:01 PM To: Jarrod Smith Cc: beowulf@beowulf.org Subject: Re: ethernet switch recommendation? Hello Jarrod, > I'm sorry to post such a question here, but I'm getting no advice (good > or > bad) from the folks around here :) > > I need a 24-port switch with a reasonable backplane fabric that is > preferably rack-mountable. I'm interested in getting two of them and > trying > to use channel bonding on a 17-node cluster. We're using the > supermicro > 1610H, and it has dual Intel NICs onboard. > > I've looked at the Cisco catalyst 2924XL, but at $1300, I'm not sure > it's > the most economical solution that will get me what I need, which is > fairly > basic I think. > > I look forward to hearing some of your suggestions. We've had good luck with Extreme Networks. http://www.extremenetworks.com/products/products.asp Cheers, Steve Gaudet ..... <(???)> ---------------------- Angstrom Microsystems 200 Linclon St., Suite 401 Boston, MA 02111-2418 pH:617-695-0137 ext 27 home office:603-472-5115 cell:603-498-1600 e-mail:sgaudet@angstrommicro.com http://www.angstrommicro.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at conservativecomputer.com Thu May 31 10:58:33 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:22 2009 Subject: bwbug: Baltimore-Washington Beowulf Users Group Message-ID: <20010531135833.A2070@wumpus> I'm forwarding this message on behalf of David Rhoades, drhoades@conservativecomputer.com. ---------------------------------------------------------------------- Well, after a brief hiatus we're hoping to restart the local Beowulf User Group. I think we owe a big thank-you to Howard Levinson and others at VA for doing such a great job in the past. To date, we have been promised the corporate support of Logicon, Compaq, Scyld and SteelEye in providing meeting places and speakers. Of course, those are just the ones we've had contact with so far, and I'm sure others will also support the group. A couple of items need your inputs: The first is meeting times and places. Some folks can't make it during the afternoons, some can't make it in the evenings. Which group are you in? It seems like MD is where most of the meetings are held. How many of you don't go because of location? How many of you couldn't go to Northern Virginia for a meeting? The second is focus. My thought is that the goal is to see the Beowulf system mature and expand, but is that your interest? The web site http://www.bwbug.org/, has been rehosted (thanks to Greg and Greg) and we have a cute web form to join/leave the mailing list. If you know someone who might be interested, please point them at the site. For those of you who would prefer to NOT get every list e-mail singly, but bundled together is groups of 30 or so there is a bwbug-digest list. Some of you may not know me. Greg Lindahl (whom most of you DO know) and I were formerly at HPTi and I ran the business unit that delivered the Alpha cluster to the Forecast Systems Laboratory in Boulder, CO. We've since left HPTi and formed Conservative Computer to offer commercial clusters. Expect a meeting in June based on your inputs. ______________________________________ David Rhoades Conservative Computer, Inc. 703-244-0579 drhoades@conservativecomputer.com From heyward_k at summitcds.org Thu May 31 15:25:32 2001 From: heyward_k at summitcds.org (Kent Heyward) Date: Wed Nov 25 01:01:23 2009 Subject: partitioning HD for use of swap & for booting Message-ID: <4550C97CEF5CD4118DC300A0C9FB2FCE07EE53@mail.SUMMIT.ORG> 1. What is the correct partitioning of the hard drive? Currently, I am booting from the floppy and am not able to use a swap file. 2. Does having a swap file contribute to performance? 3. When I run linpack, I see in the beostatus monitor, the cpu usage go to 100% and the memory utilization on my 256k node move from 24% to 32%(59m to 80m)and never get any higher. On another node with 128m, it will utilize 58 to 90 m. It appears that adding more memory does not improve performance on the benchmark. 4. Another cluster uses scalapack as one of their benchmarking tools. How does it compare with linpack? From lindahl at conservativecomputer.com Thu May 31 21:42:41 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:23 2009 Subject: partitioning HD for use of swap & for booting In-Reply-To: <4550C97CEF5CD4118DC300A0C9FB2FCE07EE53@mail.SUMMIT.ORG>; from heyward_k@summitcds.org on Thu, May 31, 2001 at 06:25:32PM -0400 References: <4550C97CEF5CD4118DC300A0C9FB2FCE07EE53@mail.SUMMIT.ORG> Message-ID: <20010601004241.C2001@wumpus> On Thu, May 31, 2001 at 06:25:32PM -0400, Kent Heyward wrote: > 2. Does having a swap file contribute to performance? Generally not. The object of swap is to not ever use it ;-) > 3. When I run linpack, I see in the beostatus monitor, the cpu usage > go to 100% and the memory utilization on my 256k node move from 24% to > 32%(59m to 80m)and never get any higher. On another node with 128m, it > will utilize 58 to 90 m. It appears that adding more memory does not > improve performance on the benchmark. In order to get the highest linpack number, you want to increase the size of the global array until it uses all the memory it can. Since you have no swap, if you set this too large, the job will die. -- g From arkich at worldonline.dk Thu May 31 11:30:07 2001 From: arkich at worldonline.dk (Arnold K. Christensen) Date: Wed Nov 25 01:01:23 2009 Subject: Help on cluster hang problem... References: <20010529195010.C3353@wumpus.dhcp.fnal.gov> <20010529225437.A1992@wumpus> Message-ID: <3B168DAF.5B9BAF12@worldonline.dk> Greg Lindahl wrote: > On Tue, May 29, 2001 at 10:26:39PM -0400, David Vos wrote: > > > Hmmm. I've seen Windows do that to enough computers I doubt the problem > > is the power supply. Although to make Linux hang like that is usually a > > hardware problem. > > If it weren't for the "power button doesn't work", which I haven't > seen before, I'd certainly agree that it's likely a random hardware > problem. You've seen Windows hang machines to the point where the > power button doesn't do anything? I never have. But then again I don't > use Windows much. > > -- g > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf Well I've seen once or twice on my Linux desktop, it starts out with the keyboard and mouse not working. Then when I press the power botton, the keyboard resets (led's flash) and then nothing... But I doubt it is a hardware problem 'cause then it should have happend more often (?) ------- Arnold From paul.milligan at 4thwaveimaging.com Thu May 31 10:00:24 2001 From: paul.milligan at 4thwaveimaging.com (Paul Milligan) Date: Wed Nov 25 01:01:23 2009 Subject: MPI performance test for Scyld References: <3B16694A.8BF688FD@fsagx.ac.be> Message-ID: <3B1678A8.86FD9F7@4thwaveimaging.com> Yes, Scyld does supply some benchmark utilities with their software, see: /usr/bin/linpack it comes in: hpl-1.0-1.rpm Paul. David Leunen wrote: > > Hello, > > I need a program that do a performance test of a cluster. I saw that > there isn't such a program in the Scyld distribution (have I missed > it?). > > Can you tell me where I can find either source code or executable that > run for MPI-Beowulf of Scyld. > > Thank you. > > David > --------------- Paul A. Milligan 4th Wave Imaging 949-464-0943 x6 (work) e-mail: paul.milligan@4thwaveimaging.com "DYNAMIC LINKING ERROR: Your mistake is now everywhere." From walle at amnh.org Tue May 29 08:25:51 2001 From: walle at amnh.org (Ann Walle) Date: Wed Nov 25 01:01:23 2009 Subject: Supercomputing Conference - Please post Message-ID: NEW DIRECTIONS IN CLUSTER SUPERCOMPUTING Convened by: American Museum of Natural History in collaboration with National Aeronautics and Space Administration (Ames Research Center, Office of Fundamental Biology Program) Wednesday and Thursday, June 13 and 14, 2001 American Museum of Natural History Central Park West at 79th Street New York, NY 10024 Registration fee: $10 Over the last ten years, parallel supercomputer machines have come to prominence in computation. Research areas such as astrophysics and genomics generate huge data sets of immense complexity. Only this new computing paradigm can give the scientific research community the computational power to make sense of the flood of data these fields present. Over two days, the diverse industries that make up this community will gather to discuss the effect of supercomputing on biology, astrophysics, and research, as well as air traffic, finance and the entertainment world. In addition to AMNH and NASA Ames, IT professionals and scientists from Celera, The Dogma Project, IBM, National Cancer Institute, National Center for Supercomputing Applications, Stanford University and Brigham Young University and will be speaking on a diverse range of topics. Sponsored by: Compaq For a full agenda, or to register, click here http://www.amnh.org/supercomputing/?src=CSCL From tbecker at linuxnetworx.com Thu May 31 10:26:56 2001 From: tbecker at linuxnetworx.com (Ted Becker) Date: Wed Nov 25 01:01:23 2009 Subject: SPEC 2000 fp and int benchmarks Message-ID: Hi there. Does anyone have benchmarks for the Dual AMD systems with SPEC 2000 int and SPEC 2000 fp? Send me the results if so. It would be great if the systems had a gig of ram for the tests. Best, Ted Becker 8689 South 700 West Sandy, UT 84070 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010531/4c98d4a8/attachment.html From laytonjb at bellsouth.net Sun May 27 13:37:05 2001 From: laytonjb at bellsouth.net (Jeff Layton) Date: Wed Nov 25 01:01:23 2009 Subject: Intel is finally shipping the 64-bit Itanium References: <3B114BC2.1050205@onelabs.com> Message-ID: <3B116571.CDDA16ED@bellsouth.net> Bari Ari wrote: > Mark Hahn wrote: > > >> > > I can't imagine Itanium being a mass-market item for years, if ever. > > and I pledge allegiance to the Orthodox Church of Beowulf, which > > holds that if it's not mass-market, it's not cluster-Kosher ;) > > > The AMD Sledge/Hammer series will also be nice for clusters whenever > they finally make it to market. Hopefully there will be some nice > chipset support to go along with them. For the time being Mips has the > price performance edge since nobody has taken the ARM 10 to market yet > and Intel yanked the FPU out of the XScale before they released it. > > It's great to see Beowulf clusters offering similar performance to > traditional supercomputers for coarse grained applications and even some > fine grained for a fraction of the cost, but X86 with OTS motherboards > will also always be a kludge. X86 has 20 years of baggage for legacy > support and also produce enormous amounts of heat as compared to RISC. > > Low cost RISC clusters will outperform any x86 mass-market OTS clusters. > RISC offers lower cost, smaller footprint, far less heat along with > higher fixed and floating point performance. Grasshopper. You forget the wonders of commodity components. This wonder is what has driven the Beowulf "revolution." There are many arenas where x86 performs very well. Several billion chips versus a few hundred thousand allow many great things to develop. Couple this, grasshopper, with Open-Source OS, compilers, message passing, queuing systems, and many dedicated people and you have the Beowulf revolution. (Removing my teacher mask for a moment). One ALWAYS needs to benchmarks their app(s) on all potential cluster solutions. For example, we tested our primary application on Intel and Clusters, SGI Origin systems, Cray T3E, and IBM SP2s. On a pure performance level, the Intel cluster outperformed the SGI, Cray, and IBM systems. It also outperformed the 21164 Alpha clusters. Only when we got to the 21264 Alpha running at higher frequencies than the Intel did we see any performance gain over the Intels. Remember, this is PERFORMANCE only. Guess what happens when we went to price-performance? The Intel clusters where 4 TIMES better than the Alpha clusters. As for the SGI, Cray, and IBM, forget it. We didn't even compute it. Again, this is our application. YMMV. On the other hand, I've seen some benchmark results for another Lockheed application. The Alphas eat it for lunch. The Intels don't perform too badly though. When one gets down to price-performance, guess who wins? Alternatively, when we consider the most power for a fixed price, guess who wins? I'm not trying to say that Intel is the way to go always. Sometimes you need as much speed as possible and in this case I would guess that the RISC stuff will do pretty well (although as I mentioned on our code it didn't do too well until we hit the bleeding edge of current RISC CPUs). But, while many of us are performance junkies, we are also not endowed with large budgets. Therefore, in many cases, commodity pricing does a wonderful thing for price-performance. Moreover, in many cases, for a fixed total price, you will get more bang for the buck from Intels (although that's not always true). Personally, I try to stay CPU agnostic. I really don't care what I run on as long as I get the most speed for a fixed price. If it's Intel, AMD, Alpha, Transmeta, some Russian abomination, a Chinese copy, whatever, I don't care. As long as I can get an OS to run on it, good compilers, god message-passing, and someone to support it when I need it, then it's a candidate. Enjoy the holiday! Jeff Layton P.S. My father is a historian. One of his favorite quotes is, "Those who refuse to study history are doomed to repeat it." (or something like that). I like to modify it for those new to Beowulfs, "Those who don't study commodity components are doomed to be crushed by it." > > > Bari Ari > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mprinkey at aeolusresearch.com Wed May 23 10:44:57 2001 From: mprinkey at aeolusresearch.com (Michael T. Prinkey) Date: Wed Nov 25 01:01:23 2009 Subject: 64bit/66MHz PCI mobos (Intel STL2, Asus CUR-DLS) References: <20010522184655.E3180@getafix.EraGen.com> <3B0BB77D.44F248E6@scali.no> <20010523131659.B7870@getafix.EraGen.com> Message-ID: <3B0BF719.8CA7BC08@aeolusresearch.com> Chris, I tried at length to work with the IDE controller on the 370DLE without much success. For the cluster that I built with them, I only cared about hard drive performance on the head node, so I used two Promise Ultra controllers for those drives. I found that it was worth $25 each for those boards to avoid the problem. Mike Prinkey Aeolus Research, Inc. Chris Black wrote: > > On Wed, May 23, 2001 at 03:13:33PM +0200, Steffen Persvold wrote: > > Chris Black wrote: > > > > > > We have been looking into motherboards that provide 64-bit 66MHz > > > PCI slots and haven't had much luck. We are now evaluating the > [stuff deleted] > > > > The SuperMicro (http://www.supermicro.com) 370DE{6,R} cards are actually > > quite nice. They both have onboard SCSI-3. > > > > The SuperMicro 370DL{3,E,R} are a cheaper variant (LE chipset) and the > > 370DLE is without SCSI. > > > > I haven't checked too much, but I believe all of these boards are > > cheaper than both the Intel and the ASUS boards. > > Have you or anyone used the onboard IDE on these motherboards? > The person working with the Intel serverworks board seems to be > having trouble getting IDE working in ultradma mode. Also, do > any of these boards have onboard video/ethernet? > > Chris > > ------------------------------------------------------------------------ > Part 1.2Type: application/pgp-signature From Todd_Henderson at readwo.com Thu May 24 13:16:40 2001 From: Todd_Henderson at readwo.com (Todd Henderson) Date: Wed Nov 25 01:01:23 2009 Subject: scyld, mpich, and bpsh References: <200102220857.f1M8vWw24101@scispor.dolphinics.no> Message-ID: <3B0D6C28.3A0C5277@readwo.com> Is it possible to use the standard mpich distribution and the -p4pg option with mpirun on a scyld? I've compiled a program and mpich 1.2.1 with PGI's compilers and created a p4pg file with the following: .-1 0 /home/tools/bin/cobalt .0 1 /home/tools/bin/cobalt When I run it, I get: .0: Connection refused When I configured mpich I used the options -rsh=/usr/bin/bpsh and -rshnol. Thanks, Todd From declerck at sistina.com Tue May 22 08:52:31 2001 From: declerck at sistina.com (Michael J. Declerck) Date: Wed Nov 25 01:01:23 2009 Subject: [gfs-devel] GFS v4.1 released Message-ID: <20010522155231.9EDC932631@spook> All, GFS version 4.1 has been released and is available via the Sistina web site -> http://www.sistina.com/gfs/software/ All known bugs have been documented in the Release Notes. This is not to say that there are not other bugs, just that we have not seen them in our testing. If you experience any problems please report them via Bugzilla at Sistina.com -> Bugs ==> http://bugzilla.sistina.com If you would like to tell us how you are using GFS version 4.1 so that we can provide a better product please fill out the survery at -> Feedback ==> http://www.sistina.com/gfs/Pages/gfs_eval.html Thank you for your continued support. Now, go grab and start using it! ****************************************************************************** Features / Bug Fixes for GFS v4.1 ****************************************************************************** ########################################################################### --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ############################################################################ The addition of Lock Value Blocks (LVBs) to GFS. Please see the note in the `Caveats and Usage' in the Release Notes for instructions on how to upgrade from a prior release of v4.x.y to v4.1. ########################################################################### --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ATTENTION --- ############################################################################ o Support for Linux kernel 2.4.4 o FCNTL and FLOCK support o A complete rewrite of Pool tools with new command line options and enhanced functionality o Continued improvements to the IP lock server - `memexpd' o Performance improvements for GFS when it is utilized as a local filesystem instead of its normal context as a cluster filesystem. o Improved `df' performance due to the addition of LVB support o `atime' bug has been fixed o New STOMITH methods (Vixel switches and updates to the Brocade methods) o New mount options (please see the man page) --- Michael Declerck, declerck@sistina.com +1.510.823.7991 _______________________________________________ gfs-devel mailing list gfs-devel@sistina.com http://lists.sistina.com/mailman/listinfo/gfs-devel Read the GFS Howto: http://www.sistina.com/gfs/Pages/howto.html From James at armstrong.uk.net Thu May 31 09:38:13 2001 From: James at armstrong.uk.net (James@armstrong.uk.net) Date: Wed Nov 25 01:01:23 2009 Subject: ethernet switch recommendation? Message-ID: Hi, We use a D-Link DES-3225G 24 port switch, it has a 10Gbps backplane and can process 5mil pps. It cost us at the time £676 but I don't know what it would be now. Hope this helps James Email: James@Armstrong.uk.net WWW: http://rsazure.swan.ac.uk/~rsjames Smail: Rockfield Software Ltd. Innovation Centre, SA2 8PP United Kingdom. Tel: +44 (0)1792 295551 Fax: +44 (0)1792 295532 From jeffrey.b.layton at lmco.com Tue May 22 03:29:54 2001 From: jeffrey.b.layton at lmco.com (Jeffrey B Layton) Date: Wed Nov 25 01:01:23 2009 Subject: Disk reliability (Was: Node cloning) References: Message-ID: <3B0A3FA2.3B659572@lmco.com> Mark Hahn wrote: > > I hate to dredge up this topic again, but ... . I've got a machine > > with an IBM drive that is giving me the following errors, > > it's not an error. > > > kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > > kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } > > translation: "the udma checksum detected a corrupt transaction > due to noise on the cable; it was automatically retried". > > this *only* happens due to corruption of the signal on the cable; > normally, that's because someone has a bogus cable, or is overclocking. > if it happens at relatively low rate, there is no performance cost. Not overclocking. So, it looks like a cable? It's a typical IDE cable since I'm running the IBM drive in a standard IDE connection (couldn't get the silly HPT366 to work consistently). > > > (for reference, all non-bogus IDE cables must be 18" or less, > with both ends plugged in (no stub). for a mode > udma33, > the 80-conductor cable must also be used, and yes, it's still > at most 18".) Check. > > > > as discussed in previous emails on the list. I followed the pointers > > that Josip gave and ran the IBM code on the drive. It said the drive > > the code in question probably just configured the drive > to default to udma33 or something modest. this shouldn't ever > be necessary, since the bios shouldn't misconfigure a too-high > speed, and any modern Linux will not. (though you can choose your > own mode using hdparm, if you wish.) Yep, set the drive parameters with hdparm in rc.local. Thanks, Jeff From scott at moriarty.chem.ualberta.ca Thu May 31 09:35:36 2001 From: scott at moriarty.chem.ualberta.ca (Scott Delinger) Date: Wed Nov 25 01:01:23 2009 Subject: ethernet switch recommendation? In-Reply-To: Message-ID: We're currently looking at HP 2524 switches (managed wirespeed), and if you're pressed for $$ there is also an unmanaged version (2324?). Scott -- Scott Delinger scott.delinger@ualberta.ca I.T. Administrator Department of Chemistry University of Alberta Edmonton, Alberta, Canada T6G 2G2 From ewt at redhat.com Wed May 23 14:24:48 2001 From: ewt at redhat.com (Erik Troan) Date: Wed Nov 25 01:01:23 2009 Subject: Kickstart/DHCP In-Reply-To: <20010523140253.B26265@blueraja.scyld.com> Message-ID: On Wed, 23 May 2001, Sean Dilda wrote: > On Tue, 22 May 2001, Sam Pottle wrote: > > > I have a question about using (Redhat 7.0) Kickstart to do the automagical > > headless install on my compute nodes. The boxes have two NICs apiece (for > > eventual channelbonding purposes), and when a node kickstarts off the floppy, > > the first thing it does is to ask which device to install from (eth0/eth1), > > at which point the installation stops dead because I'm not there to answer. > > > > How can I get the installer not to ask this question? This happens before > > any DHCP request is made, so putting things in the kickstart file (which is > > located on the head node) won't help, as the installer hasn't seen it yet. > > The reference manual for Red Hat Linux 7.0 doesn't show anything about > this. However, the people on kickstart-list@redhat.com might have some > other ideas. Add ksdevice=eth0 to your boot arguments. Erik ------------------------------------------------------------------------------- | "Amazingly, there are significant numbers of 30-something women now who | | want to learn how to box - something that most sensible American men gave | | up years ago." - New York Times Magazine | From Todd_Henderson at readwo.com Wed May 30 11:08:22 2001 From: Todd_Henderson at readwo.com (Todd Henderson) Date: Wed Nov 25 01:01:23 2009 Subject: Scyld References: <20010529214240.B25886@icantbelieveimdoingthis.com> Message-ID: <3B153716.BE53535F@readwo.com> I once built a script for a cluster of workstations where I never new if people would be on them or not that went out and checked a couple of things, if anyone was physically on it, and if certain programs were running and then built a list of free machines. Seems like with some of the tools with scyld, it wouldn't be that hard to see what machines were being used and which were and script it up to build a p4pg file. You could create a wrapper for mpirun or something? Just a thought. Todd Art Edwards wrote: > I want to thank Sean Dilda and Keith Undd for pointing to ways to use a Scyld > cluster without running jobs on the head node and taking advantage of local > disk. The remedy for the latter is quite straightforward. However, the use > of p4pg files is virtually unworkable in a multiuser environment. To avoid > crowding the head node, each user would have to query all the nodes to find the > inactive ones, writea prpg file specifying which nodes they want, and deposit it > on the node they decide will be node0 for their run. The chances for collision > are finite. In coming releases of Scyld I hope the -nolocal option is activated. > This is, to my mind, the cleaner way to get off of the head node. > > Art Edwards > -- > Arthur H. Edwards > 712 Valencia Dr. NE > Abq. NM 87108 > > (505) 256-0834 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From brian at valinux.com Thu May 24 12:56:25 2001 From: brian at valinux.com (Brian Elliott Finley) Date: Wed Nov 25 01:01:23 2009 Subject: [rcferri@us.ibm.com: Problem with va-systemimager] Message-ID: <20010524145625.D2020@thefinleys.com> Jon, I just had this forwarded to me from the beowulf list. If I remember right, you posted this issue to the SystemImager list about a week ago. Can I presume that your issues didn't go away when you tried prepareclient with the -e option? This is an issue that I've not encountered before and I'd like to find out why it's happening. I'll certainly be getting a copy of Partition Magic for myself to test this out, but I'd also like to find out more about your setup. o If you *don't* fix the error, does the system operate properly? o If you "fix the error" with Partition Magic, does the system still operate properly? o Do you have the "error" on your golden client? o If you install directly from your Linux distribution media, do you have the same error? o Which distribution and version are you using? o Does Linux give you any errors in any of it's logs? o Are your disks ide, scsi? o If scsi, what kind of controller? o Are the disks on each of your systems identical? o Was there a non-linux operating system on these machines prior to autoinstalling them via SystemImager? o Does the disk geometry reported by Linux match the disk geometry reported by Partition Magic? Example: [bfinley@dragonfly:~] $ cat /proc/ide/hda/geometry physical 16383/15/63 logical 1559/240/63 or with sfdisk: [bfinley@dragonfly:~] $ sudo sfdisk -g /dev/hda /dev/hda: 1559 cylinders, 240 heads, 63 sectors/track And a couple of quick notes about SystemImager and disk partitioning: o sfdisk is the utility used o if you run "prepareclient" without the -explicit-geometry option, then your client's disks will be sized based on megabytes instead of sectors (MB is the default). The last partition will be dynamically sized to the end of the disk. Cheers, -Brian ----- Forwarded message from Richard C Ferri ----- Envelope-to: brian@valinux.com Delivery-date: Mon, 21 May 2001 18:49:18 -0700 Subject: Problem with va-systemimager To: brian@valinux.com From: "Richard C Ferri" Brian, wasn't sure if you follow the beowulf forum, this question popped up... Rich ---------------------- Forwarded by Richard C Ferri/Poughkeepsie/IBM on 05/21/2001 09:37 PM --------------------------- Jon Tegner @beowulf.org on 05/14/2001 02:45:22 PM Sent by: beowulf-admin@beowulf.org To: beowulf@beowulf.org cc: Subject: Problem with va-systemimager Are about to set up a cluster and figured systemimager would be a good way (have used kickstart previously - is there a consensus of which method is "best"?). However, when testing on a fresh system Partition Magic detects some kind of error: "Partition Magic has detected an error 116 on the partition starting at sector 17157420 at disk 1. The starting LBA value is 17157420 and the CHS value is 16450559. The LBA and the CHS values must be equal, Partition Magic has verified that the LBA value is correct." Partition Magic can even fix this error, but I don't want to to load in Partition Magic on all nodes (would take too long time), so I was wondering if there is another easy way to fix this problem, or preventing it from occurring in the first place (nothing seems to be wrong when I make the image). Regards, /jon _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ----- End forwarded message ----- -- ------------------------------------------------------- Brian Elliott Finley VA Linux http://valinux.com/ http://thefinleys.com/ phone: 972.447.9563 http://systemimager.org/ phax: 801.912.6057 CSA, C2000, CNE, CLSE, MCP, and Certifiable Linux Nut ------------------------------------------------------- From haohe at me1.eng.wayne.edu Wed May 23 18:14:09 2001 From: haohe at me1.eng.wayne.edu (Hao He) Date: Wed Nov 25 01:01:24 2009 Subject: SOS: Channel Bonding Problem Message-ID: <200105240020.UAA12814@me1.eng.wayne.edu> Hi, all. I am trying to bond our cluster with 3C905 cards. Since my Linux distribution is SuSE 6.1 (2.2.5 kernel upgraded to 2.4.4), I have to run ifconfig and ifenslave at command line. Finally I got success in one try, I think, but failed in all others. I am confused. Here are the details. When I ran ifconfig bond0 192.168.1.1 up No error prompted. When I check ifconfig, I find that bond0 got IP 192.168.1.1 and HWADDR is 00:00:00:00:00:00. Seems it is OK. Then I ran ifenslave bond0 eth0 I got following error message: SIOCSIFHWADDR on bond0 failed: Device or resource busy. The master device bond0 is busy: it must be idle before running this command. What's wrong? Could you tell me how to correct this problem? Youradvice will be highly appreciated. Best regards, Hao He From kinghorn at pqs-chem.com Mon May 21 15:22:45 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:24 2009 Subject: Heat pipes? Message-ID: <3B099535.D81ABEDF@pqs-chem.com> Has anyone looked into using heat pipes for cpu cooling ... surely someone has done this(?) -Don From paul.milligan at 4thwaveimaging.com Mon May 21 17:36:17 2001 From: paul.milligan at 4thwaveimaging.com (Paul Milligan) Date: Wed Nov 25 01:01:24 2009 Subject: Inter-process I/O on dual CPU machines Message-ID: <3B09B481.9D9FDB5C@4thwaveimaging.com> We are currently running the Scyld bproc system on our cluster of 8 dual P3 machines, and I am curious about how processes assigned to run on the same slave node, but on separate CPUs, communicate with each other using MPI. Do they have to use the external network connection, or do they talk to each other over the local loopback interface? If they indeed do use the local loopback, then perhaps one could specifically use this 'feature' to lessen I/O through the switch node. Perhaps someone at Scyld could answer my question. Paul. ---------------- Paul A. Milligan 4th Wave Imaging 949-464-0943 x6 (work) e-mail: paul.milligan@4thwaveimaging.com "DYNAMIC LINKING ERROR: Your mistake is now everywhere." From keith.mcdonald-fct at btinternet.com Sat May 26 03:08:54 2001 From: keith.mcdonald-fct at btinternet.com (keith.mcdonald-fct@btinternet.com) Date: Wed Nov 25 01:01:24 2009 Subject: PVM with a Scyld cluster Message-ID: <3b0f80b6.1056.0@btinternet.com> Robert Sands wrote: I have a customer that is more familiar with using pvm rather than mpi so I need some instructions on how to get pvm working with the SCYLD cluster. Is there anyone out there using pvm on a scyld cluster and if so can I get instructions to get pvm to work with the cluster? I found one way to run PVM. On the Master, edit the inetd.conf and uncomment the rshd and rlogin parts. Issue a 'killall -HUP inetd.conf' to re-read the file. Prepare a suitable /etc/exports file to allow the nodes to have nfs access. You will need to stop and restart nfs to re-read the new exports file. Prepare a suitable .rhosts file in $HOME to rsh without a password. The following commands should then be run: bpsh hostname for each of your nodes. I do this for friendliness though it is not necessary. You will need a suitable /etc/hosts file if you use this method. Mount the following: bpsh -a mount -t nfs -n 192.168.1.1:/home /home (if not mounted already) bpsh -a mount -t nfs -n 192.168.1.1:/etc /etc bpsh -a mount -t nfs -n 192.168.1.1:/bin /bin bpsh -a mount -t nfs -n 192.168.1.1:/usr /usr bpsh -a mount -t nfs -n 192.168.1.1:/lib /lib The -n flag stops any problems with mtab. Run bpsh -a /usr/sbin/inetd Start a pvm session on the master and then add your nodes by hostname. Once done, check pvm>conf to see that all is in order. This is very much the long way round and should really be scripted! I hope that this helps, Keith McDonald From Jon.Tegner at wiglaf.se Sat May 26 23:15:21 2001 From: Jon.Tegner at wiglaf.se (Jon Tegner) Date: Wed Nov 25 01:01:24 2009 Subject: Help on cluster hang problem... References: <200105270523.AAA23169@sijer.mayo.edu> Message-ID: <3B109B79.B80474B6@wiglaf.se> > I've been using Linux for several years, but am new to Linux cluster computing. > > I set up a "proof of concept cluster" with 4 nodes- each node is a 1.2GHz Athlon > on a MicroStar K7TPro2-A motherboard with 1GB of RAM (RackSaver 1200). > > RedHat 7.1 is loaded locally on each system. Also loaded mpich-1.2.0-10.i386.rpm > on each system and set up the rhosts/hosts.equiv to allow all the rsh stuff... > > Systems are interconnected with Intel 10/100 Ethernet cards. > > One of the research PhD's in my group has a program that has run successfully on > other supercomputer-class systems (Cray and SGI). Very CPU-intensive, but > does nothing fancy other than using MPI for communication (very little disk I/O, > etc.). > > /home file system is NFS mounted on each system. I've tried NFS server is the master > node or another system outside the cluster. > > Even though this code runs as a normal user (not root), it will hard-hang the > "master" node in about 10 minutes. "Hard-hang" means nothing on console, disk light on > solid, doesn't respond to reset or power switches- have to reset by pulling plug. > > I've tried the stock 2.4.2-2 kernel that loads with RedHat 7.1, I've tried the 2.4.2 > kernel recompiled to specifically call the CPU an Athlon, and I've tried > downloading/using the 2.4.4 kernel. All of my attempts produce the same result- > his program can crash the system every time it is run. > > I've searched the normal dejanews/altavista sites for Linux/Athlon/hang, but nothing > interesting pops out. I must be missing something simple- the 2.4.X kernels > can't be that unstable. > > Does this ring a bell with anyone in the group? Hi, doubt this is the reason, but could be worth checking out... /jon ----- Forwarded message from Theodore Tso ----- Envelope-to: bfinley@valinux.com Delivery-date: Mon, 21 May 2001 20:15:27 -0700 From: Theodore Tso To: tech@lists.valinux.com Subject: [VA-Tech] [tytso@MIT.EDU: Re: repeated forced fsck] Sender: tech-admin@lists.valinux.com Warning, it looks like there may be some cases where Red Hat 7.1's partitioning software may be producing corrupt partition tables. This can cause filesystem corruption on the root partition if you're using LILO to boot your system. Folks who are thinking about installing Red Hat 7.1 may want to check and make sure their partition table looks sane.... - Ted -- Envelope-to: tytso@localhost Delivery-date: Mon, 21 May 2001 23:05:05 -0400 From: Theodore Tso To: eichin-oa@boxedpenguin.com Cc: tytso@mit.edu Subject: Re: repeated forced fsck On Mon, May 21, 2001 at 08:43:50PM -0400, eichin-oa@boxedpenguin.com wrote: > This sounds a little odd, but didn't you mention having a problem on a > debian install with fsck happening on every boot? I don't remembering seeing such a problem with a Debian install, but I have seen this problem before, and I may have mentioned it to you. The problem is that the Linux kernel uses the LBA value to determine where the partition starts, but the LILO uses the CHS value to determine the partition start location (which it has to since it's using the BIOS functions) when it's writing out the first sector of the LILO map file, which it does on each boot because of a desire to make lilo -R a one-shot. So doing this will cause filesystem corruption (or at least some kind of corruption), since LILO will write out the map file to the wrong place. Whatever fdisk-like program Red Hat is using in 71, it's definitely really, really buggy. I'm surprised they didn't catch this in their testing. - Ted > > ------- Start of forwarded message ------- > To: Simon Josefsson > Subject: Re: [OpenAFS] rh71, oafs 1.04: unloading unused kernel module crash machine > Message-ID: <990430724.3b08c60494374@mail1.nada.kth.se> > From: tegner@nada.kth.se > Cc: openafs-info@openafs.org > References: > MIME-Version: 1.0 > Content-Type: text/plain; charset=iso-8859-1 > Content-Transfer-Encoding: 8bit > Date: Mon, 21 May 2001 09:38:44 +0200 (MET DST) > > Probably not related, but have had disk problems with RH 7.1 (e.g. constantly > forced fsck on reboot). This has happened on two (independent) machines which > were upgraded from RH 6.2, and seems to be a result of incompatible LBA and CHS > values. Obtained the following from Partition Magic > > ``Partition Magic has detected an error on the partition starting at sector > 19390455 on disk 1. The starting LBA value is 19390455 and the CHS value is > 16450559. The LBA value and the CHS value must be equal. Partition Magic has > verified that the LBA value is correct and can fix the CHS value''. > > Have also experienced this on a fresh install of RH 7.1 on a machine with a 75 > Gb disk. > > /jon From thomas.wainwright at noaa.gov Mon May 21 14:26:11 2001 From: thomas.wainwright at noaa.gov (Tom Wainwright) Date: Wed Nov 25 01:01:24 2009 Subject: MPI or PVM enabled jre? References: <3B092FA1.F851DB65@aeolusresearch.com> <20010521194411.B29415@unthought.net> Message-ID: <3B0987F3.D934117E@noaa.gov> It's obvious from this thread that there is little overlap between the Beowulf and Java numerics communities. There is much going on on numerical work in Java. For an introduction, see the Java Numerics page: http://math.nist.gov/javanumerics/ which has some tantalizing benchmarks suggesting the best Java jre's have ca. 90% the speed of optimized C for some numerical problems. Also, IBM is working on improving java efficiency: see their Ninja project: http://www.research.ibm.com/ninja/ Also, a simple web search on "Java MPI" or "Java PVM" will bring up several implementations of each. Personally, I am working on a Java/Beowulf ecosystem modeling project, but am too early in the development to say anything regarding performance, etc. (I don't even have the machines plugged in yet.) If anyone gets to testing the Java MPI/PVM implementations before me, I'd love to hear your reviews. -- Tom Wainwright NOAA/NMFS/NWFSC 2030 S Marine Science Dr Newport, OR 97365 USA thomas.wainwright@noaa.gov