From treed at ultraviolet.org Wed Dec 1 14:30:44 2004 From: treed at ultraviolet.org (Tracy R Reed) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Small packets causing context switch thrashing? Message-ID: <20041201223044.GE6633@copilotcom.com> Ok, this is not exactly beowulf or supercomputer related but it is definitely a form of high performance computing and I am hoping the beowulf community has applicable experience. I am building a box to convert VOIP traffic from H323 to SIP. The system is an AMD64. Both of these protocols use RTP to transmit the voice data which means many many small packets. We are currently looking at 8000 packets per second due to 96 simultaneous voice channels and the box is already at 50% cpu. I really think this box should be able to handle a lot more than this. I have seen people talk about proxying 2000 RTP streams on a P4. We get around 15,000 context switches and 8000 interrupts per second and the box is heavily loaded and the load average starts going up. Is 9000 packets per second a lot? I would not have thought so but it is hammering our box. I have applied several of the applicable tuning suggestions (tcp stuff is not applicable since RTP is all UDP) from: http://216.239.57.104/search?q=cache:0VItqrkQdO0J:datatag.web.cern.ch/datatag/howto/tcp.html+linux+maximum+network+buffer+size&hl=en but the improvement has been minimal. We have some generic 100Mb ethernet chipset in the box. I have seen a number of high performance computing guys talk about interrupt coalescence in mailing list archives found via google while researching this problem. Can the Pro 100 card do this or do I need the 1000? Does it seem likely that if I run down to the store and pay my $25 for an Intel Pro 1000 card and load up the driver with the InterruptThrottleRate set (and what is a good value for this?) that I will get dramatically improved performance? I would do it right now just to see but the box is in a colo a considerable drive away so I want to have a good idea that it will work before we make the drive. Ideally I would like to get 10x (76800pps) the performance out of the box but could settle for 5x (38400pps). Thanks for any tips you can provide! -- Tracy Reed http://copilotcom.com This message is cryptographically signed for your protection. Info: http://copilotconsulting.com/sig -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041201/88f1c3e2/attachment.bin From reuti at staff.uni-marburg.de Wed Dec 1 11:56:29 2004 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Re:Gaussian in parallel (Geoff Galitz) In-Reply-To: <200411301604.08364.kinghorn@pqs-chem.com> References: <200411301604.08364.kinghorn@pqs-chem.com> Message-ID: <1101930989.41ae21ed312b7@home.staff.uni-marburg.de> > Gaussian tests with SuSE 9.0 for the Opteron and had some difficulties with > > SuSE 9.1. They usually recommend SuSE 9.0. SuSE 9.1 we have only on 32 bit, but maybe it's similar on Opteron: The "difficulties" with SuSE 9.1 were for us with the latest updates of SuSE. You will get a glibc which is too new and a missing (renamed) symbol. Since we compile on our own, we only needed a small update of Linda to 7.1a, then it worked again. If you have G03 binaries, it may be similar. Cheers - Reuti From daniel.kidger at quadrics.com Thu Dec 2 01:55:10 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Cluster reviews In-Reply-To: References: Message-ID: <200412020955.10019.daniel.kidger@quadrics.com> Carlos, > Hi, i saw your e-mail address on a beowulf webpage. Im trying to build a > cluster system, but there?s a lot of questions and i was wondering if you > can help me with some information about clustering on linux suse 9.0. I > know that it uses a file system called Lustre, but i haven?t found how to > configurate that file. Can you give me some links or if you know how to do > it, please help me, Thanks a lot. Lustre is a distributed parallel file system. It is not a part of suse9.0 and indeed it can be used under a wide variety of Linux distributions. It just needs a few kernel patches and a small rpm of binaries. It was developed by Cluster file systems inc. from money from the US Government. Its current status is that you can download it for free, but naturally need to pay if you want support. Lustre has its own website, epinomously titled www.lustre.org This links you to https://wiki.clusterfs.com/lustre/FrontPage and thence https://wiki.clusterfs.com/lustre/LustreHowto which is a must first read. Feel free to ask more questions once you have read this ( probably best though to use your own email account rather than hotmail. Folk will likely be more helpfull :-) ) Remember Lustre has its own mailing lists which may be more appropriate than the beowulf list - however I have noticed Lustre is increasingly becoming the parallel filesystem of choice for linux clusters. Hope this helps, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From nixon at nsc.liu.se Wed Dec 1 04:51:33 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Gaussian in parallel In-Reply-To: <20041130215129.GA2539@greglaptop.internal.keyresearch.com> (Greg Lindahl's message of "Tue, 30 Nov 2004 13:51:30 -0800") References: <9FC8D98E-4309-11D9-AE5F-000A95A5025C@galitz.org> <20041130215129.GA2539@greglaptop.internal.keyresearch.com> Message-ID: <87y8gikwq2.fsf@nsc.liu.se> Greg Lindahl writes: > But, hey, I don't know, I've never seen the source! You have my congratulations. The Gaussian build process is the second worst I have ever seen. And I have, unfortunately, seen some horrific stuff. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ From nixon at nsc.liu.se Thu Dec 2 02:21:57 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Re:Gaussian in parallel (Geoff Galitz) In-Reply-To: <200411301604.08364.kinghorn@pqs-chem.com> (Donald Kinghorn's message of "Tue, 30 Nov 2004 16:04:08 -0600") References: <200411301604.08364.kinghorn@pqs-chem.com> Message-ID: <87eki9huey.fsf@nsc.liu.se> Donald Kinghorn writes: > Compiling is sometimes a nuisance but no more than you'd expect for a huge > program that is supported on so many platforms. What? Are we talking about the same program here? I mean, a Makefile that promptly removes all produced *.o files, so that each time you run the build script (written in csh, for crissakes!) every source file must be recompiled? Don't get me started, but http://deaddog.duch.udel.edu/~frey/research/gaussian.php contains a good rant and instructions for working around some of the major braindead stuff. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ From eugen at leitl.org Thu Dec 2 02:33:05 2004 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Cluster reviews In-Reply-To: <200412020955.10019.daniel.kidger@quadrics.com> References: <200412020955.10019.daniel.kidger@quadrics.com> Message-ID: <20041202103304.GO9221@leitl.org> On Thu, Dec 02, 2004 at 09:55:10AM +0000, Dan Kidger wrote: > Lustre is a distributed parallel file system. It is not a part of suse9.0 and > indeed it can be used under a wide variety of Linux distributions. It just > needs a few kernel patches and a small rpm of binaries. No 2.6 kernels yet, though. No way to check out betas from CVS. Since 1.4 is not yet shipping to paying customers, it will take more than a year before it's available publicly. -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041202/743c7062/attachment.bin From rgb at phy.duke.edu Thu Dec 2 07:08:03 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:35 2009 Subject: [Beowulf] Small packets causing context switch thrashing? In-Reply-To: <20041201223044.GE6633@copilotcom.com> References: <20041201223044.GE6633@copilotcom.com> Message-ID: On Wed, 1 Dec 2004, Tracy R Reed wrote: > Ok, this is not exactly beowulf or supercomputer related but it is > definitely a form of high performance computing and I am hoping > the beowulf community has applicable experience. > > I am building a box to convert VOIP traffic from H323 to SIP. The system > is an AMD64. Both of these protocols use RTP to transmit the voice data > which means many many small packets. We are currently looking at 8000 > packets per second due to 96 simultaneous voice channels and the box is > already at 50% cpu. I really think this box should be able to handle a lot > more than this. I have seen people talk about proxying 2000 RTP streams on > a P4. We get around 15,000 context switches and 8000 interrupts per second > and the box is heavily loaded and the load average starts going up. Is > 9000 packets per second a lot? I would not have thought so but it is I worked through a lot of the math associated with this sort of thing in a series of columns on TCP/IP and network protocols in CMW over the last 4-5 months. Measurements also help you understand things -- look into * lmbench: http://www.bitkeeper.com * netperf: http://www.netperf.org * netpipe: http://www.scl.ameslab.gov/Projects/NetPIPE/NetPIPE.html as network testing/benchmark tools. IIRC, netperf may actually be the most relevant tool for you with its RR tests, but all of these tools will measure packet latencies. In one of those columns I present the results of measuring 100 BT latency with all three tools, getting a number on the order of 150 usec. The inverse of 1.50 x 10^-4 is 6666 packets per second, using relatively old/slow hardware throughout, so 8000 pps is not at all unreasonable for faster/more modern hardware. Now, you are not alone in looking into this. I found: www.cs.columbia.edu/~dutta/research/sip-ipv6.pdf which looks like it might be relevant to your efforts and maybe would provide you with somebody to collaborate with (I was looking for a description of H23, which is not a protocol I'm familiar with, making it hard to know just what your limits are going to be). > hammering our box. I have applied several of the applicable tuning > suggestions (tcp stuff is not applicable since RTP is all UDP) from: > > http://216.239.57.104/search?q=cache:0VItqrkQdO0J:datatag.web.cern.ch/datatag/howto/tcp.html+linux+maximum+network+buffer+size&hl=en > > but the improvement has been minimal. We have some generic 100Mb ethernet Not surprising. TCP or UDP, if you want to end up with a reliable transmission protocol, you have to include pretty much the same features that are found in TCP anyway, and chances are excellent that unless you really know what you are doing and work very hard, you'll end up with something that is ultimately less efficient and/or reliable than TCP anyway. Besides, a goodly chunk of a latency hit is at the IP level and protocol independent (the wire, the switch, the cards, the kernel interface pre-TCP). In fact, you might be better off running a TCP based protocol and using one of the (relatively) new cards that support onboard TCP and RDMA. That might offload a significant amount of the packet header processing onto a NIC-based co-processor and spare your CPU from having to manage all those interrupts. > chipset in the box. I have seen a number of high performance computing > guys talk about interrupt coalescence in mailing list archives found via > google while researching this problem. Can the Pro 100 card do this or do > I need the 1000? Does it seem likely that if I run down to the store and > pay my $25 for an Intel Pro 1000 card and load up the driver with the > InterruptThrottleRate set (and what is a good value for this?) that I will > get dramatically improved performance? I would do it right now just to see > but the box is in a colo a considerable drive away so I want to have a > good idea that it will work before we make the drive. Ideally I would like > to get 10x (76800pps) the performance out of the box but could settle for > 5x (38400pps). > > Thanks for any tips you can provide! Well, it does seem (if you are indeed using 100BT) that an obvious first thing to try is to go to gigabit ethernet. I don't think it is going to get you out to where you want to go "easily" (or cheaply), but it might get there. For example, here is a pair of dual Opteron 242's using gigabit ethernet in a netperf TCP RR test: Testing with the following command line: ./netperf -l 60 -H s01 -t TCP_RR -i 10,3 -I 99,5 -- -r 1,1 -s 0 -S 0 TCP REQUEST/RESPONSE TEST to s01 : +/-2.5% @ 99% conf. Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 60.00 17431.26 65536 262142 My dual Opterons (Penguin Altus 1000E's) have integrated dual gigabit ethernet; it is not unlikely that they could sustain close to twice this rate on both channels going flat out at the same time, which would get you to your 35 Kpps just about exactly and would likely give you a bit of change from your processor dollar so that you can actually do things with the packet streams as well while upper/lower half handlers do their thing. However, in a real asynchronous environment where the packets are not just streaming in, your performance will likely be somewhat lower. Note also that your pps (latency) performance will gradually drop as the packets themselves carry more than a single byte of data until they reach the data/wirespeed bounds as opposed to the latency bounds. I'm seeing a 10-20% drop off in the TCP RR results as packet payload sizes get closer to 100 bytes, and would expect them to drop to a rate determined by a mix of the MTU selected and wirespeed as they get out to the MTU and beyond in size. >From this it looks to me like you will have marginally acceptable performance with gigabit ethernet, at best, although I >>am<< using a relatively cheap gigE switch and there are likely switches out there that cost more money that can deliver better switch latency. However, you'll also have the problem of partitioning your data stream onto two switches, and this may or may not be terribly easy. This suggests that you look into faster networks. You haven't mentioned the actual context of the conversion -- how it is being fed a packet stream, where the output packet stream goes. This seems to me to be as much of an issue as "the box" that does the actual conversion. The same limits are going to be in place at ALL LEVELS of the up/down stream networks -- a single host is only going to be able to feed your conversion box at MOST at the rates you measure for an ideal connection at the network you eventually select, very likely degraded, posssibly SIGNIFICANTLY degraded, by asynchronous contention for the resource if you are hammering the conversion box with switched packet streams from twenty or thirty hosts at once. You might actually need to consider an architecture where several hosts accept those incoming packet streams (providing a "high availability" type interface to the outside world, where traffic to the "conversion host" is dynamically rerouted to one of a small farm of conversion servers) and then either distributing the conversion process (probably smartest) or using a faster (lower latency) network to funnel the traffic back to a single conversion host. This is presuming that you can't already use a faster network between the sources of the conversion stream and the conversion host, which seems unlikely unless it is already embedded in an architecture de facto "like" this one. Hope this helps. rgb > > -- > Tracy Reed http://copilotcom.com > This message is cryptographically signed for your protection. > Info: http://copilotconsulting.com/sig > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From epaulson at cs.wisc.edu Wed Dec 1 21:38:58 2004 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] mpirun and batch systems Message-ID: <20041202053858.GA30470@cobalt.cs.wisc.edu> Hello - I have a question about how various batch systems run MPI jobs. Say I have a head node and 16 compute nodes. I submit a job that requires 8 nodes, and the batch system allocates me nodes 4,5,6,7,8,9,10,11. Where does my 'mpirun' get executed from? On the head node? Or on node 4? Thanks, -Erik From rgb at phy.duke.edu Thu Dec 2 07:09:20 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Cluster reviews In-Reply-To: References: Message-ID: On Tue, 30 Nov 2004, [iso-8859-1] Carlos Castañeda B. wrote: > Hi, i saw your e-mail address on a beowulf webpage. Im trying to build a cluster system, but there´s a lot of questions and i was wondering if you can help me with some information about clustering on linux suse 9.0. I know that it uses a file system called Lustre, but i haven´t found how to configurate that file. Can you give me some links or if you know how to do it, please help me, Thanks a lot. www.lustre.com (remember, google is your friend). rgb > Carlos Castañeda. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From daniel.kidger at quadrics.com Thu Dec 2 08:11:46 2004 From: daniel.kidger at quadrics.com (daniel.kidger@quadrics.com) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] mpirun and batch systems Message-ID: <30062B7EA51A9045B9F605FAAC1B4F62811FF7@exch01.quadrics.com> Erik, > I have a question about how various batch systems run MPI jobs. > > Say I have a head node and 16 compute nodes. I submit a job > that requires 8 nodes, and the batch system allocates me > nodes 4,5,6,7,8,9,10,11. > > Where does my 'mpirun' get executed from? On the head node? Or on > node 4? This can depend upon the batch system and how it is configured. What is common is for the 'first' (not necessarily the lowest numbered) of the list of nodes assigned to a job is the one where the batch scripts executes. The most common alternative is that batch script gets executed on what is classed as a 'login node'. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From daniel.kidger at quadrics.com Thu Dec 2 08:15:32 2004 From: daniel.kidger at quadrics.com (daniel.kidger@quadrics.com) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] RE: Lustre (was Cluster reviews) Message-ID: <30062B7EA51A9045B9F605FAAC1B4F6280A856@exch01.quadrics.com> Robert Brown wrote in an unusually brief manner: > www.lustre.com > > (remember, google is your friend). make that www.lustre.org. Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From rgb at phy.duke.edu Thu Dec 2 09:25:41 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] RE: Lustre (was Cluster reviews) In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F6280A856@exch01.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F6280A856@exch01.quadrics.com> Message-ID: On Thu, 2 Dec 2004 daniel.kidger@quadrics.com wrote: > Robert Brown wrote in an unusually brief manner: > > www.lustre.com > > > > (remember, google is your friend). > > make that www.lustre.org. Uh, oops. But google is still your friend, even if you ARE an idiot like me..;-) rgb > > > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From angel.leiva at uam.es Thu Dec 2 09:08:12 2004 From: angel.leiva at uam.es (Rafael Garcia Leiva) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Announce: quattor 1.0.0 is out Message-ID: <200412021808.12175.angel.leiva@uam.es> Dear all, After more than three years of hard work, tests, and production usage at CERN computing centre (Geneva), we are confident to make a public, widely available, release of the quattor toolsuite (see http://www.quattor.org). Quattor is a tool suite for the automatic installation, configuration and management of computer fabrics and clusters based on Linux. Among the benefits of quattor we can mention the following: * Centralized management of configuration information: with the help of a new configuration description language (called Pan.) * Configuration information validation: the configuration information is validated before its deployment, minimizing the errors due to misconfiguration. * Automatic installations: the only thing we have to do with new machines is to define them in quattor, and they will get automatically installed and configured through the network. * Configuration components: today with quattor we can configure a wide range of services on our client nodes (user accounts, grub, iptables, NFS, SSH, etc.) including Grid services. * Software packages management: with the use of a managed software repository (administrators, ACLs, multiple platforms, ...), and with the new powerful software management tool SPMA (equivalent to yum or apt-get but with the possibility to install any version of software packages, and with support for software downgrading.) * Many other facilities: SQL-based queries of configuration information, rollback of configuration deployments, highly modular, based on well known standards, etc. Quattor is being distributed under the European Union DataGrid license (an OpenSource license). Source code and binaries can be freely downloaded from the quattor web page: http://www.quattor.org, together with the "quattor installation and user guide". Typical use cases of quattor are: installation and management of Linux fabrics (servers, workstations, personal desktops, ...), installation and configuration of clusters, management of Grid environments, and so on. Quattor is being used today in many production environments: CERN IT (more than 2000 machines managed), UAM University in Madrid (desktop management), LAL Orsay-Paris (Grid computing), NIKHEF Amsterdam, CNAF-INFN, CC-IN2P3, DESY Zeuthen and DESY Hamburg, Poznan Supercomputing Center, Forschungszentrum Karlsruhe, UCS-CESGA, LIP Lisboa, etc. Best regards. Rafael -- Rafael Angel Garcia Leiva Universidad Autonoma Madrid http://www.uam.es/angel.leiva From kinghorn at pqs-chem.com Thu Dec 2 09:06:40 2004 From: kinghorn at pqs-chem.com (Donald Kinghorn) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Re:Gaussian in parallel (Geoff Galitz) Message-ID: <200412021106.40187.kinghorn@pqs-chem.com> OK, I'll concede that :-) I've done quite a few Gaussian installs for people, binary and source. It's true that I have never had one that didn't give me some kind of trouble (usually different trouble from the last install) However, I've always managed to get it working. The first thing I do now days is go in and change their install scripts because I don't use csh (and I don't see why anyone would ...no flames please :-) But, there are worse installs. I'm always stunned when scientific code builds and installs without a hitch. Best wishes to all -Don > Donald Kinghorn writes: > > > Compiling is sometimes a nuisance but no more than you'd expect for a huge > > program that is supported on so many platforms. > > What? Are we talking about the same program here? I mean, a Makefile > that promptly removes all produced *.o files, so that each time you > run the build script (written in csh, for crissakes!) every source file > must be recompiled? > > Don't get me started, but > > http://deaddog.duch.udel.edu/~frey/research/gaussian.php > > contains a good rant and instructions for working around some of the > major braindead stuff. > > -- > Leif Nixon Systems expert > ------------------------------------------------------------ > National Supercomputer Centre Linkoping University > ------------------------------------------------------------ -- Dr. Donald B. Kinghorn Parallel Quantum Solutions LLC http://www.pqs-chem.com From reuti at staff.uni-marburg.de Thu Dec 2 07:40:20 2004 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] mpirun and batch systems In-Reply-To: <20041202053858.GA30470@cobalt.cs.wisc.edu> References: <20041202053858.GA30470@cobalt.cs.wisc.edu> Message-ID: <1102002020.41af3764e5666@home.staff.uni-marburg.de> > I have a question about how various batch systems run MPI jobs. > > Say I have a head node and 16 compute nodes. I submit a job > that requires 8 nodes, and the batch system allocates me > nodes 4,5,6,7,8,9,10,11. > > Where does my 'mpirun' get executed from? On the head node? Or on > node 4? For SGE, but I think for others also, on one of the nodes e.g. node 4 when the job is scheduled. I always try to speak to my users of "master node" of the cluster and "head node" of a MPI job to avoid confusion. - Reuti From reuti at staff.uni-marburg.de Thu Dec 2 09:41:21 2004 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] mpirun and batch systems In-Reply-To: <30062B7EA51A9045B9F605FAAC1B4F62811FF7@exch01.quadrics.com> References: <30062B7EA51A9045B9F605FAAC1B4F62811FF7@exch01.quadrics.com> Message-ID: <1102009281.41af53c1e4f1a@home.staff.uni-marburg.de> Quoting daniel.kidger@quadrics.com: Hi, > Erik, > > I have a question about how various batch systems run MPI jobs. > > > > Say I have a head node and 16 compute nodes. I submit a job > > that requires 8 nodes, and the batch system allocates me > > nodes 4,5,6,7,8,9,10,11. > > > > Where does my 'mpirun' get executed from? On the head node? Or on > > node 4? > > This can depend upon the batch system and how it is configured. What is > common is for the 'first' (not necessarily the lowest numbered) of the list > of nodes assigned to a job is the one where the batch scripts executes. > The most common alternative is that batch script gets executed on what is > classed as a 'login node'. what's your definition of a "login node"? We have one "login node" where the users can "log in", prepare their scripts and input files, one file server/queue master (SGE), and many computing nodes. But the submitted batch scripts are always executed on the computing nodes only. - Reuti From mathog at mendel.bio.caltech.edu Thu Dec 2 11:37:31 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] S2466 nodes poweroff, stay off Message-ID: A while back I upgraded my S2466 nodes from RH 7.3 to Mandrake 10.0 with a 2.6.8-1 kernel (from kernel.org). Recently I've discovered that poweroff on these nodes tends to be permanent. The node does power down, and afterword pressing the power button on the front panel lights it up and the fans spin - but it won't boot, not even to the BIOS. It doesn't beep error codes either, it just sits there. Usually the reset button does nothing, but sometimes it will allow a reboot. Sometimes hitting the reset button 2 or 3 times in rapid succession will boot it. Sometimes not. All of these different behaviors have been observed on a single node, it isn't that one node does it one way and another the other way. I hadn't noticed this poweroff to never never land since the upgrade previously because the only time one was powered down was to pull a node, and that required unplugging it. Unplugging it for a while resets the problem and it will start. The only way to reliably boot one now following a poweroff is to: unplug for 1 minute replug power on (and for that extra je ne sais quois which seems to raise the success rate to 100%) [ wait a few seconds, then hit reset ] After that it boots normally. One possible clue, when logging to a serial line the end of the poweroff sequence is: #normal shutdown sequence messages deleted Power down. acpi_power_off called ACPI-0352: *** Error: Looking up [IO2B] in namespace, AE_NOT_FOUND search_node f7f4f220 start_node f7f4f220 return_node 00000000 ACPI-1133: *** Error: Method execution failed [\_PTS] (Node f7f4f220), AE_NOT_FOUND /etc/modprobe.preload contains a "button" line. button is loading (lsmod shows it). If it didn't the front panel button wouldn't respond at all after a power down. lilo.conf has "acpi=on" for starting the kernel. Some possibly relevant BIOS settings (the same on all nodes) ACPI enabled ECC SCRUB enabled Quickboot enabled Diagnostic disabled Summary disabled In /var/log/messages it says: Dec 2 12:15:52 monkey08 kernel: ACPI: Subsystem revision 20040326 Dec 2 12:15:52 monkey08 kernel: ACPI: Interpreter enabled Dec 2 12:15:52 monkey08 kernel: ACPI: Using IOAPIC for interrupt routing Dec 2 12:15:52 monkey08 kernel: ACPI: PCI Root Bridge [PCI0] (00:00) Dec 2 12:15:52 monkey08 kernel: ACPI: PCI Interrupt Link [LNKA] (IRQs 3 5 10 *11) Dec 2 12:15:52 monkey08 kernel: ACPI: PCI Interrupt Link [LNKB] (IRQs 3 5 10 11) *0, disabled. Dec 2 12:15:52 monkey08 kernel: ACPI: PCI Interrupt Link [LNKC] (IRQs 3 5 10 11) *0, disabled. Dec 2 12:15:52 monkey08 kernel: ACPI: PCI Interrupt Link [LNKD] (IRQs 3 5 *10 11) Dec 2 12:15:52 monkey08 kernel: PCI: Using ACPI for IRQ routing Dec 2 12:15:52 monkey08 kernel: ACPI: PCI interrupt 0000:00:08.0[A] -> GSI 20 (level, low) -> IRQ 20 Dec 2 12:15:52 monkey08 kernel: ACPI: PCI interrupt 0000:02:08.0[A] -> GSI 19 (level, low) -> IRQ 19 Dec 2 12:15:52 monkey08 kernel: apm: overridden by ACPI. Dec 2 12:15:52 monkey08 kernel: ACPI: (supports S0 S1 S4 S5) Dec 2 12:15:52 monkey08 kernel: ACPI: Power Button (FF) [PWRF] Dec 2 12:15:52 monkey08 kernel: ACPI: Sleep Button (FF) [SLPF] Dec 2 12:15:52 monkey08 kernel: ACPI: PCI interrupt 0000:02:08.0[A] -> GSI 19 (level, low) -> IRQ 19 I use exactly the same kernel and settings on an S2468UGN and it is happy enough to reboot following a poweroff. These S2466 nodes used to reboot following poweroff (not with 100% reliability, but much better than now) using RH 7.3. Any ideas what this might be or how to fix it? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From bill at cse.ucdavis.edu Thu Dec 2 13:48:10 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Tyan S2882 + Chenbro incompatibility (and solution) Message-ID: <20041202214810.GA26079@cse.ucdavis.edu> I was installing a new head node for a cluster, Tyan S2882 motherboard, dual opteron, dual 3ware cards (installed on seperate pci-x busses), and 16 400 GB SATA disks. The headnode seemed mostly fine, fast, no error messages, but occasionally during burn in it would hang. At that point I couldn't turn the machine off with the power switch. When unplugged and replugged in the machine would immediate spin up all fans, link lights on both network interfaces, but not boot up, no video sync, nothing on the serial console. Troubleshooting involved removing raid cards which fixed the problem, but it turns out it wasn't the cards but the screws holding the PCI-x cards in. Further exploration showed that the motherboard has 12 mounting holes but the case had 13 standoffs. One directly under the 2 raid cards did not have a matching motherboard hole. After repeating this effect several dozen times I confirmed that this was the problem. A large pair of channel locks removed the offending motherboard post. After a careful wipedown of the inside of the case (to remove any conductive particles) the server seems to be working well. I'm not sure if I should blame Chenbro, Tyan, or the person who assembled the machine, but in any case I figure I'd let people know about it. I've heard occasional stories about flaky tyan motherboards and it's possible this is a contributing factor. So if you have any flakey tyan boards especially if in a chenbro enclosure make sure that your mounting posts and motherboard mounting holes line up. -- Bill Broadley Computational Science and Engineering UC Davis From mathog at mendel.bio.caltech.edu Fri Dec 3 13:05:01 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] serious NFS problem on Mandrake 10.0 Message-ID: I'm seeing a serious NFS problem with MDK 10.0 (using a 2.6.8-1 kernel.org kernel). All of the machines run the same OS version. In a 20 node cluster each node NFS mounts /u1 from the master. They run a calculation and generate a file in /tmp of about 26000 lines coming to 1.3Mb (both the number of lines and total size vary a little). When it completes the process on each end node does: mv /tmp/blah.$NODENAME . mv -f /tmp/blah.$NODENAME /tmp/SAVEblah The home directory (".") is a couple of levels down under /u1, so this effectively performs a network copy from /tmp on the compute node to /u1 on the master node. The copies are largely asynchronous since the end nodes complete at various times. On the master node there are occasionally (defined as: 1 bad line, out of 20 files, every 3rd or 4th run) a very long bad line. Here are four lines from the original file on /tmp: '4827135'=='-22004070' (3254 9815 3391 9675) 22 '4827135'=='-22004070' (75050 11805 75081 11774) 0 '4827086'=='-22004070' (79588 9817 79809 9594) 28 '4827086'=='-22004070' (34069 11794 34308 11555) 34 Here are the four lines from the copy on /u1 . '4827135'=='-22004070' (3254 9815 3391 9675) 22 '4827135'=='-22004070' (75050 11805 75081 11774) 0 '4827086'=='-22004070' (79588 9817 798(MANY times)1 '4156131'=='+22004070' (58122 9687 58250 9818) 11 The final line on /u1 does appear in /tmp, but much, much farther into the file. I very carefully cut out the missing text from the original file, pasted it into a new file, and found: % wc deleted.txt 642 3849 32769 deleted.txt So it looks like a block of 32768 bytes was lost (+1 probably for an extra EOL in my deleted.txt file) during the mv operation and all bytes replaced with . On repeated runs on the same data (same output files each time) the problem line never occurs twice in the same place, and it hops from node to node, suggesting that it's a rare event somewhere in the data transport (mv) operation. This is very, very, VERY bad. No relevant messages show up in /var/log/messages. /u1 is /dev/sde1 and smartctl -a on that device shows no errors. On the master /u1 is in /etc/fstab as: LABEL=usrdisk /u1 ext2 defaults,quota 1 2 and is exported as: /u1 *.cluster(rw,no_root_squash) Has anybody else seen this bug? Is there a patch for it? Possibly relevant software: coreutils-5.1.2-1mdk #/bin/mv nfs-utils-clients-1.0.6-1mdk #nfs client kernel 2.6.8-1 #kernel.org Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From hahn at physics.mcmaster.ca Fri Dec 3 15:41:44 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] serious NFS problem on Mandrake 10.0 In-Reply-To: Message-ID: > > > This is very, very, VERY bad. > > > > indeed. is it safe to assume your machines are quite stable > > (memtest86-wise)? > > Yes. They run days and days without any errors. (S2466 motherboards > with single Athlon MP 2200+ processors, ECC enabled.) hmm. I don't think I've ever talked to anyone who had a cluster of AthlonMP's that they considered completely stable. at least not at the level of stability of clusters of Opterons, Xeons, etc. > just running two at a time in a couple of tries so total load > seems to matter, presumably on the master node, but maybe on > the switch? I'd be most surprised if the switch was implicated, simply because the 32K doesn't correspond well with switch behavior. > Every node logged these sorts of errors and the variation > looks like random scatter. Given the low rate (relatively speaking) > it is probably one event per file, and since the files are around > 1.3 Mb each, or about 40 blocks of 32k per mv, it seems like > the error rate per 32k block is 142/(1800*40) = .00197. not bad ;) seriously, I guess its also worth asking whether the 32K is aligned. > > does your > > server have DIRECT_IO enabled on NFS? > > CONFIG_NFS_DIRECTIO=y hmm. that was a "this is marked experimental and could be a race of some sort " sort of hmm. > > what kind of block device is it > > writing too, > > It's an IBM scsi disk going through the Adaptec controller > on the Tyan S2468UGN motherboard. Shows up as /dev/sde. > Not sure if that answers the question. that's fine - the driver/disk seems unlikley to just lose track of the occasional 32K chunk ;) > >and what filesystem for that matter? > ext2 OK, so 32K doesn't really match there, since ext2 tends to think in terms of 4k blocks. > > or have you already > > tried a different filesystem? > > No spare disk to build another filesystem on. I was mainly thinking ahead if you said xfs or reiserfs; the former has some larger-sized blocks, and the latter is something I don't really trust. I would definitely not rank ext2 as a high risk here. > Doesn't seem likely to be the file system since if it was > giving those error rates in normal writes that disk would be > swiss cheese by now. right, though this is a different load than you generate by other means, probably. > One last thing, one of these events was registered: > > Dec 3 14:52:10 safserver ifplugd(eth1)[1649]: Link beat lost. > Dec 3 14:52:11 safserver ifplugd(eth1)[1649]: Link beat detected. > > but it wouldn't explain all the errors because they were scattered > through the run time, and that took a lot longer than 1 second > to complete. > > Network is 100baseT through a DLINK DSS-24 switch. ouch! no jumbo frames then :( > cp /tmp/SAVELASTMEGABLAST.txt /tmp/TESTLAST.txt > mv /tmp/TESTLAST.txt ./TESTLAST.txt.$NODE > set `md5sum TESTLAST.txt.$NODE` > NEWMD=$1 > /bin/rm ./TESTLAST.txt.$NODE > if [ "$NEWMD" != "$HOLDMD" ] hmm. you're doing both the writes and reads from the slave node here. was that part of your original description? I'm wondering about bad writes vs bad reads. what happens if you run the md5sum on the master instead? in any case, I think I'd turn off DIRECT_IO first. it's an attractive feature, but it's easy to imagine how it might not quite work right, given the length of time nfs has been doing IO only to/from page cache. switching to a different wsize would be even easier. regards, mark hahn. From hvidal at tesseract-tech.com Fri Dec 3 16:30:13 2004 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Wed Nov 25 01:03:36 2009 Subject: [Fwd: [Beowulf] serious NFS problem on Mandrake 10.0] Message-ID: <41B10515.4010006@tesseract-tech.com> Which filesystem do you use? On recent research into NAS appliances, we found out that things like ext2 and ext3 really don't play so nice with NFS, especially under older 2.4 kernels. However, in general, our vendor warned us to keep away from large, multi-user shares via NFS under ext<2|3> filesystems. Don't know if this is helpful, perhaps just a data point..... hv David Mathog wrote: > I'm seeing a serious NFS problem with MDK 10.0 > (using a 2.6.8-1 kernel.org kernel). All of the > machines run the same OS version. > > In a 20 node cluster each node NFS mounts /u1 from > the master. They run a calculation and generate > a file in /tmp of about 26000 lines coming to > 1.3Mb (both the number of lines and total size > vary a little). When it completes the process on > each end node does: > > mv /tmp/blah.$NODENAME . > mv -f /tmp/blah.$NODENAME /tmp/SAVEblah > > The home directory (".") is a couple of > levels down under /u1, so this effectively performs > a network copy from /tmp on the compute node to /u1 > on the master node. The copies are largely asynchronous > since the end nodes complete at various times. > > On the master node there are occasionally > (defined as: 1 bad line, out of 20 files, every > 3rd or 4th run) a very long bad line. > > Here are four lines from the original file on /tmp: > > '4827135'=='-22004070' (3254 9815 3391 9675) 22 > '4827135'=='-22004070' (75050 11805 75081 11774) 0 > '4827086'=='-22004070' (79588 9817 79809 9594) 28 > '4827086'=='-22004070' (34069 11794 34308 11555) 34 > > Here are the four lines from the copy on /u1 . > > '4827135'=='-22004070' (3254 9815 3391 9675) 22 > '4827135'=='-22004070' (75050 11805 75081 11774) 0 > '4827086'=='-22004070' (79588 9817 798(MANY times)1 > '4156131'=='+22004070' (58122 9687 58250 9818) 11 > > The final line on /u1 does appear in /tmp, but much, much > farther into the file. I very carefully cut out the missing > text from the original file, pasted it into a new file, and found: > > % wc deleted.txt > 642 3849 32769 deleted.txt > > So it looks like a block of 32768 bytes was lost > (+1 probably for an extra EOL in my deleted.txt file) > during the mv operation and all bytes replaced > with . On repeated runs on the same data (same > output files each time) the problem line never occurs > twice in the same place, and it hops from node to node, > suggesting that it's a rare event somewhere in the > data transport (mv) operation. > > This is very, very, VERY bad. > > No relevant messages show up in /var/log/messages. > /u1 is /dev/sde1 and smartctl -a on that device shows > no errors. On the master /u1 is in /etc/fstab as: > > LABEL=usrdisk /u1 ext2 defaults,quota 1 2 > > and is exported as: > > /u1 *.cluster(rw,no_root_squash) > > Has anybody else seen this bug? > > Is there a patch for it? Possibly relevant software: > > > coreutils-5.1.2-1mdk #/bin/mv > nfs-utils-clients-1.0.6-1mdk #nfs client > kernel 2.6.8-1 #kernel.org > > Thanks, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An embedded message was scrubbed... From: "David Mathog" Subject: [Beowulf] serious NFS problem on Mandrake 10.0 Date: Fri, 03 Dec 2004 13:05:01 -0800 Size: 5077 Url: http://www.scyld.com/pipermail/beowulf/attachments/20041203/394fe724/BeowulfseriousNFSproblemonMandrake10.mht From mathog at mendel.bio.caltech.edu Fri Dec 3 15:23:02 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] serious NFS problem on Mandrake 10.0 Message-ID: > > This is very, very, VERY bad. > > indeed. is it safe to assume your machines are quite stable > (memtest86-wise)? Yes. They run days and days without any errors. (S2466 motherboards with single Athlon MP 2200+ processors, ECC enabled.) >the fact that it's 32K is interesting, since > I suspect your NFS block size is that (see /proc/mounts to verify). Yes, that is the size for /u1 in /proc/mounts. I wrote a little script to beat on the NFS system with copies from remote nodes to the master, it's attached after my signature. It did the mv over NFS operation 100 times on each of 18 nodes and then did the md5sum when it got there. All running simultaneously. For 1800 of these network copies there were 142 where the md5sum didn't match. Note that I couldn't get an error out of this just running two at a time in a couple of tries so total load seems to matter, presumably on the master node, but maybe on the switch? Every node logged these sorts of errors and the variation looks like random scatter. Given the low rate (relatively speaking) it is probably one event per file, and since the files are around 1.3 Mb each, or about 40 blocks of 32k per mv, it seems like the error rate per 32k block is 142/(1800*40) = .00197. > does your > server have DIRECT_IO enabled on NFS? CONFIG_NFS_DIRECTIO=y > what kind of block device is it > writing too, It's an IBM scsi disk going through the Adaptec controller on the Tyan S2468UGN motherboard. Shows up as /dev/sde. Not sure if that answers the question. >and what filesystem for that matter? ext2 > or have you already > tried a different filesystem? No spare disk to build another filesystem on. Doesn't seem likely to be the file system since if it was giving those error rates in normal writes that disk would be swiss cheese by now. One last thing, one of these events was registered: Dec 3 14:52:10 safserver ifplugd(eth1)[1649]: Link beat lost. Dec 3 14:52:11 safserver ifplugd(eth1)[1649]: Link beat detected. but it wouldn't explain all the errors because they were scattered through the run time, and that took a lot longer than 1 second to complete. Network is 100baseT through a DLINK DSS-24 switch. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech cat testmv.sh #this may wrap!!!! #!/bin/sh cd ~safrun NODE=`hostname` count=100 set `md5sum /tmp/SAVELASTMEGABLAST.txt` HOLDMD=$1 echo "initial md5sum is $HOLDMD" > /tmp/ERRORS.$NODE while [ $count -gt 1 ] do count=`expr $count - 1` cp /tmp/SAVELASTMEGABLAST.txt /tmp/TESTLAST.txt mv /tmp/TESTLAST.txt ./TESTLAST.txt.$NODE set `md5sum TESTLAST.txt.$NODE` NEWMD=$1 /bin/rm ./TESTLAST.txt.$NODE if [ "$NEWMD" != "$HOLDMD" ] then echo "error: md5sum is $NEWMD at $count" >>/tmp/ERRORS.$NODE fi done echo "error: done" >>/tmp/ERRORS.$NODE From mathog at mendel.bio.caltech.edu Fri Dec 3 16:07:38 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] serious NFS problem on Mandrake 10.0 Message-ID: > > > cp /tmp/SAVELASTMEGABLAST.txt /tmp/TESTLAST.txt > > mv /tmp/TESTLAST.txt ./TESTLAST.txt.$NODE > > set `md5sum TESTLAST.txt.$NODE` > > NEWMD=$1 > > /bin/rm ./TESTLAST.txt.$NODE > > if [ "$NEWMD" != "$HOLDMD" ] > > hmm. you're doing both the writes and reads from the slave node here. > was that part of your original description? I'm wondering about > bad writes vs bad reads. what happens if you run the md5sum on > the master instead? It was originally found as corrupted data on the master. Then it was confirmed that the data looked corrupted from the slave too, so the script ran entirely on the slave. You do have a point though, presumably the slave is rereading the data back across the net for the md5sum, so there are two passes where it could go wrong, and the script didn't check to see that the corrupted data was of the same type. Poked around in bugzilla for kernel.org, this sounds like it may be the same or a closely related problem, if so, it's still around in 2.6.9: http://bugzilla.kernel.org/show_bug.cgi?id=3608 I'll try some of your suggested changes next week - not the sort of thing to attempt late on a Friday... Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From hvidal at tesseract-tech.com Sat Dec 4 08:05:36 2004 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Wed Nov 25 01:03:36 2009 Subject: [Fwd: Re: [Fwd: [Beowulf] serious NFS problem on Mandrake 10.0]] Message-ID: <41B1E050.30208@tesseract-tech.com> Keep traffic on list for benefit of others. No, this is something I cannot share because I cannot 'direct' you to the references. It is vendor data, based on their experimentation and testing, in house and at customer sites. I will try to see if I can get more info on this from engineers at vendor, then post same. hv Ariel Sabiguero wrote: > This thing is very interesting. > Is the research available somewhere? > I would appreciate if you help me accessing to that study. > > Regards. > > Ariel > > H.Vidal, Jr. wrote: > >> Which filesystem do you use? >> >> On recent research into NAS appliances, we found out >> that things like ext2 and ext3 really don't play so nice >> with NFS, especially under older 2.4 kernels. However, >> in general, our vendor warned us to keep away from large, >> multi-user shares via NFS under ext<2|3> filesystems. >> >> Don't know if this is helpful, perhaps just a data point..... >> >> hv >> >> David Mathog wrote: >> >>> I'm seeing a serious NFS problem with MDK 10.0 >>> (using a 2.6.8-1 kernel.org kernel). All of the >>> machines run the same OS version. >>> >>> In a 20 node cluster each node NFS mounts /u1 from >>> the master. They run a calculation and generate >>> a file in /tmp of about 26000 lines coming to >>> 1.3Mb (both the number of lines and total size >>> vary a little). When it completes the process on >>> each end node does: >>> >>> mv /tmp/blah.$NODENAME . mv -f /tmp/blah.$NODENAME /tmp/SAVEblah >>> >>> The home directory (".") is a couple of >>> levels down under /u1, so this effectively performs >>> a network copy from /tmp on the compute node to /u1 >>> on the master node. The copies are largely asynchronous >>> since the end nodes complete at various times. >>> >>> On the master node there are occasionally >>> (defined as: 1 bad line, out of 20 files, every >>> 3rd or 4th run) a very long bad line. >>> >>> Here are four lines from the original file on /tmp: >>> >>> '4827135'=='-22004070' (3254 9815 3391 9675) 22 >>> '4827135'=='-22004070' (75050 11805 75081 11774) 0 >>> '4827086'=='-22004070' (79588 9817 79809 9594) 28 >>> '4827086'=='-22004070' (34069 11794 34308 11555) 34 >>> >>> Here are the four lines from the copy on /u1 . >>> >>> '4827135'=='-22004070' (3254 9815 3391 9675) 22 >>> '4827135'=='-22004070' (75050 11805 75081 11774) 0 >>> '4827086'=='-22004070' (79588 9817 798(MANY times)1 >>> '4156131'=='+22004070' (58122 9687 58250 9818) 11 >>> >>> The final line on /u1 does appear in /tmp, but much, much >>> farther into the file. I very carefully cut out the missing >>> text from the original file, pasted it into a new file, and found: >>> >>> % wc deleted.txt >>> 642 3849 32769 deleted.txt >>> >>> So it looks like a block of 32768 bytes was lost >>> (+1 probably for an extra EOL in my deleted.txt file) >>> during the mv operation and all bytes replaced >>> with . On repeated runs on the same data (same >>> output files each time) the problem line never occurs >>> twice in the same place, and it hops from node to node, >>> suggesting that it's a rare event somewhere in the >>> data transport (mv) operation. >>> >>> This is very, very, VERY bad. >>> No relevant messages show up in /var/log/messages. >>> /u1 is /dev/sde1 and smartctl -a on that device shows >>> no errors. On the master /u1 is in /etc/fstab as: >>> >>> LABEL=usrdisk /u1 ext2 defaults,quota 1 2 >>> >>> and is exported as: >>> >>> /u1 *.cluster(rw,no_root_squash) >>> >>> Has anybody else seen this bug? >>> >>> Is there a patch for it? Possibly relevant software: >>> >>> >>> coreutils-5.1.2-1mdk #/bin/mv >>> nfs-utils-clients-1.0.6-1mdk #nfs client >>> kernel 2.6.8-1 #kernel.org >>> >>> Thanks, >>> >>> David Mathog >>> mathog@caltech.edu >>> Manager, Sequence Analysis Facility, Biology Division, Caltech >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> >> >> ------------------------------------------------------------------------ >> >> Subject: >> [Beowulf] serious NFS problem on Mandrake 10.0 >> From: >> "David Mathog" >> Date: >> Fri, 03 Dec 2004 13:05:01 -0800 >> To: >> beowulf@beowulf.org >> >> To: >> beowulf@beowulf.org >> >> >>I'm seeing a serious NFS problem with MDK 10.0 >>(using a 2.6.8-1 kernel.org kernel). All of the >>machines run the same OS version. >> >>In a 20 node cluster each node NFS mounts /u1 from >>the master. They run a calculation and generate >>a file in /tmp of about 26000 lines coming to >>1.3Mb (both the number of lines and total size >>vary a little). When it completes the process on >>each end node does: >> >> mv /tmp/blah.$NODENAME . >> mv -f /tmp/blah.$NODENAME /tmp/SAVEblah >> >>The home directory (".") is a couple of >>levels down under /u1, so this effectively performs >>a network copy from /tmp on the compute node to /u1 >>on the master node. The copies are largely asynchronous >>since the end nodes complete at various times. >> >>On the master node there are occasionally >>(defined as: 1 bad line, out of 20 files, every >>3rd or 4th run) a very long bad line. >> >>Here are four lines from the original file on /tmp: >> >>'4827135'=='-22004070' (3254 9815 3391 9675) 22 >>'4827135'=='-22004070' (75050 11805 75081 11774) 0 >>'4827086'=='-22004070' (79588 9817 79809 9594) 28 >>'4827086'=='-22004070' (34069 11794 34308 11555) 34 >> >>Here are the four lines from the copy on /u1 . >> >>'4827135'=='-22004070' (3254 9815 3391 9675) 22 >>'4827135'=='-22004070' (75050 11805 75081 11774) 0 >>'4827086'=='-22004070' (79588 9817 798(MANY times)1 >>'4156131'=='+22004070' (58122 9687 58250 9818) 11 >> >>The final line on /u1 does appear in /tmp, but much, much >>farther into the file. I very carefully cut out the missing >>text from the original file, pasted it into a new file, and found: >> >>% wc deleted.txt >> 642 3849 32769 deleted.txt >> >>So it looks like a block of 32768 bytes was lost >>(+1 probably for an extra EOL in my deleted.txt file) >>during the mv operation and all bytes replaced >>with . On repeated runs on the same data (same >>output files each time) the problem line never occurs >>twice in the same place, and it hops from node to node, >>suggesting that it's a rare event somewhere in the >>data transport (mv) operation. >> >>This is very, very, VERY bad. >> >>No relevant messages show up in /var/log/messages. >>/u1 is /dev/sde1 and smartctl -a on that device shows >>no errors. On the master /u1 is in /etc/fstab as: >> >>LABEL=usrdisk /u1 ext2 defaults,quota 1 2 >> >>and is exported as: >> >>/u1 *.cluster(rw,no_root_squash) >> >>Has anybody else seen this bug? >> >>Is there a patch for it? Possibly relevant software: >> >> >>coreutils-5.1.2-1mdk #/bin/mv >>nfs-utils-clients-1.0.6-1mdk #nfs client >>kernel 2.6.8-1 #kernel.org >> >>Thanks, >> >>David Mathog >>mathog@caltech.edu >>Manager, Sequence Analysis Facility, Biology Division, Caltech >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> >> > -------------- next part -------------- An embedded message was scrubbed... From: Ariel Sabiguero Subject: Re: [Fwd: [Beowulf] serious NFS problem on Mandrake 10.0] Date: Sat, 04 Dec 2004 13:54:29 +0100 Size: 19257 Url: http://www.scyld.com/pipermail/beowulf/attachments/20041204/7d635a70/BeowulfseriousNFSproblemonMandrake10.mht From ctierney at HPTI.com Sat Dec 4 09:42:41 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? Message-ID: <1102182161.3107.11.camel@hpti10.fsl.noaa.gov> I have noticed some significant differences in disk performance when moving some of my cluster I/O servers from linux-2.4 to linux-2.6. For several different raid systems with several linux-2.6 kernel and distributions, the read performance is quite poor. Distros: SuSE 9.1 Professional AMD64, Redhat 4.0 beta, IA64 Kernels: Standard distro kernels, also tried stock 2.6.9 on Redhat 4/IA64. For the 2.4 kernels, I needed to tweak /proc/sys/vm/max_readahead to improve performance. In 2.6, the command is blockdev --setra. Using blockdev helps some, but not much. Adjusting blockdev gets performance for one raid system from 10 MB/s to 30 MB/s. Under 2.4 the read performance is 90 MB/s. I looked at a few of the anticipatory scheduler options and also tried using the deadline scheduler. Nothing helped. This problem appears on different storage. I have tried DDN (raid 3), Nexsan (raid 5), and Infotrend (raid 3 and 5). For those using PVFS1/2, Lustre, or building fast NFS servers with Linux 2.6, have you seen this? Thanks, Craig From eugen at leitl.org Sat Dec 4 12:48:19 2004 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] [Lustre-announce] Lustre 1.2.8 is now available (fwd from phil@clusterfs.com) Message-ID: <20041204204819.GU9221@leitl.org> ----- Forwarded message from Phil Schwan ----- From: Phil Schwan Date: Sat, 04 Dec 2004 15:30:19 -0500 To: Subject: [Lustre-announce] Lustre 1.2.8 is now available User-Agent: Microsoft-Entourage/11.1.0.040913 Lustre 1.2.8 has been released. Given the imminent release of Lustre 1.4.0, the Lustre 1.2.x series will soon come to an end. There are some significant, user-visible changes in this release: - a defect in the 1.2.7 networking code caused small messages (stat, create, etc.) to move very slowly over TCP/IP. This has been fixed. - users should notice a more accurate "mtime" during and after file writes - an issue was fixed in which the signals sent by "strace" could cause a Lustre RPC to abort too soon - if an asynchronous write fails (because a server disk has failed, for example), we now do our best to tell the application that issued the write (although it may not always be possible) - Lustre 1.2.7 contained a partial fix for problems which, in certain cases, prevented binaries from being executed properly when stored in Lustre. This introduced different problems, so this code has been removed from 1.2.8 -- mmap support has reverted to the same state as in 1.2.6. A proper fix will appear in an early 1.4.x release. - An issue was fixed which could cause a page to be incorrectly partially zeroed on the server (resulting in data loss of between 1 and PAGE_SIZE-1 bytes). This was extremely rare, and most likely when using large-page (IA64) clients to write to small-page (IA32, x86-64) servers. - Mounting Lustre could result in significantly-degraded NFS read performance on the same node, because Lustre disables kernel readahead in favour of its own. We now do this in a way that does not impact other file systems. A complete list of changes can be found at http://www.clusterfs.com/changelog.html Lustre 1.2.8 RPMs are available immediately to customers with a CFS support contract, and will be made available to the general public within 12 months. A link to the download page can be found at http://www.clusterfs.com/lustre.html If you are not already a CFS customer and would like early access to 1.2.8, details regarding support and evaluation can be found at http://clusterfs.com/services.html. In addition to the newest Lustre releases and expert CFS file system support, CFS support customers gain access to Lustre tools, a copy of the administration manual and discounts for classroom training. For more information, feel free to contact sales@clusterfs.com For the Lustre team, -Phil _______________________________________________ Lustre-announce mailing list Lustre-announce@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-announce ----- End forwarded message ----- -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041204/93469fbd/attachment.bin From daniel.pfenniger at obs.unige.ch Mon Dec 6 00:19:13 2004 From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] S2466 nodes poweroff, stay off In-Reply-To: References: Message-ID: <41B41601.7020601@obs.unige.ch> Hi, David Mathog wrote: > A while back I upgraded my S2466 nodes from RH 7.3 to Mandrake 10.0 > with a 2.6.8-1 kernel (from kernel.org). Recently I've > discovered that poweroff on these nodes tends to be permanent. > ... This has probably nothing to do with the software, or even the BIOS, but rather the ATX power supply. We had similar bizarre symptoms due to ageing power supplies. Regards, Dan From ctierney at HPTI.com Mon Dec 6 08:50:20 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: References: Message-ID: <1102351819.3088.15.camel@hpti10.fsl.noaa.gov> On Mon, 2004-12-06 at 09:39, Bogdan Costescu wrote: > On Sat, 4 Dec 2004, Craig Tierney wrote: > > > This problem appears on different storage. I have tried DDN (raid 3), > > Nexsan (raid 5), and Infotrend (raid 3 and 5). > > So the subject is a bit misleading as you are not using software RAID, > but only connecting a hardware RAID box through SCSI or FC; is this > right ? If so, what SCSI/FC card and what Linux module are you using ? > For the 2.4 test, the hardware RAID box was connected to the same > computer or to another one ? All of these solutions are hardware raid. Two of them (DDN and Nexsan) were connected by Fibre channel. The infotrend box was U320 SCSI. The FC is QLA2200F (1 Gbit/s) using the vendor driver. This is v8.x in the linux-2.6 and v6.x for the linux-2.4 kernel. The SCSI was the built-in card on Itanium systems (not sure which one right now, system is down). I tried several 2.4 systems and all performed consistently after tweaking the max_readahead. I tested a dual P3 with Fedora Core 1 (linux 2.4.26), an a Opteron dual 1.8 Ghz and an Itanium dual 1.4 Ghz, both running White Box. When I tested the 2.6 kernel, I tested SuSE 9.1 Pro on the Opteron and Redhat Enterprise Beta on the Itanium. The same physical computer was not used in all tests, or even for the same architecture. However, since the behavior seems to cross different architectures with different interfaces, I don't think the problem is with an individual server. Thanks, Craig From akhtar_samo at yahoo.com Mon Dec 6 00:39:44 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] problem in mounting Message-ID: <20041206083944.65998.qmail@web20026.mail.yahoo.com> hi, i m unable to share the /home of server. when i mount -a on the slave an error is generated mount: host1:/home failed, reason given by server: permission denied the /etc/hosts files on both server and slave r same. on the server the contents of file /etc/exports are /home *.192.168.0.20/24(rw,sync,no_root_squash) Akhtar --------------------------------- Do you Yahoo!? The all-new My Yahoo! – What will yours do? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041206/100f0cdb/attachment.html From bogdan.costescu at iwr.uni-heidelberg.de Mon Dec 6 08:39:36 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: <1102182161.3107.11.camel@hpti10.fsl.noaa.gov> Message-ID: On Sat, 4 Dec 2004, Craig Tierney wrote: > This problem appears on different storage. I have tried DDN (raid 3), > Nexsan (raid 5), and Infotrend (raid 3 and 5). So the subject is a bit misleading as you are not using software RAID, but only connecting a hardware RAID box through SCSI or FC; is this right ? If so, what SCSI/FC card and what Linux module are you using ? For the 2.4 test, the hardware RAID box was connected to the same computer or to another one ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From clwang at cs.hku.hk Sun Dec 5 17:38:12 2004 From: clwang at cs.hku.hk (clwang@cs.hku.hk) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] CFP: ISPA 2005 Message-ID: <1102297092.41b3b8044fb3b@intranet.cs.hku.hk> Third International Symposium on Parallel and Distributed Processing and Applications (ISPA 2005) Nanjing, China, Nov. 2-5, 2005 URL: http://keysoftlab.nju.edu.cn/ispa2005/ Following the traditions of previous successful ISPA conferences, ISPA '03 (held in Aizu-Wakamatsu City, Japan) and ISPA '04 (held in Hong Kong), the objective of ISPA '05 is to provide a forum for scientists and engineers in academia and industry to exchange and discuss their experiences, new ideas, research results, and applications about all aspects of parallel and distributed computing and networking. ISPA '05 will feature session presentations, workshops, tutorials and keynote speeches. Topics of particular interest include, but are not limited to : Computer networks Network routing and communication algorithms Parallel/distributed system architectures Tools and environments for software development Parallel/distributed algorithms Parallel compilers Parallel programming languages Distributed systems Wireless networks, mobile and pervasive computing Reliability, fault-tolerance, and security Performance evaluation and measurements High-performance scientific and engineering computing Internet computing and Web technologies Database applications and data mining Grid and cluster computing Parallel/distributed applications High performance bioinformatics Submissions should include an abstract, key words, the e-mail address of the corresponding author, and must not exceed 15 pages, including tables and figures, with PDF, PostScript, or MS Word format. Electronic submission through the submission website is strongly encouraged. Hard copies will be accepted only if electronic submission is not possible. Submission of a paper should be regarded as an undertaking that, should the paper be accepted, at least one of the authors will register and attend the conference to present the work. Important Dates: Workshop proposals due: April 1, 2005 Paper submission due: April 1, 2005 Acceptance notification: July 1, 2005 Camera-ready due: July 30, 2005 Conference: Nov. 2-5, 2005 Publication: The proceedings of the symposium will be published in Springer's Lecture Notes in Computer Science. A selection of the best papers for the conference will be published in a special issue of The Journal of Supercomputing and International Journal of High Performance Computing and Networking (IJHPCN). General Co-Chairs: Jack Dongarra, University of Tennessee, USA Jiannong Cao, Hong Kong Polytechnic University, China Jian Lu, Nanjing University, China Program Co-Chair: Yi Pan, Georgia State University, USA Daoxu Chen, Nanjing University, China Vice Program Co-Chairs: Algorithms Ivan Stojmenovic, University of Ottawa, Canada Architecture and Networks Mohamed Ould-Khaoua, University of Glasgow, UK Middleware and Grid Computing Mark Baker, University of Portsmouth, UK Software Jingling Xue, University of New South Wales, Australia Applications Zhi-Hua Zhou, Nanjing University, China Steering Committee Co-Chairs Sartaj Sahni, University of Florida, USA Yaoxue Zhang, Ministry of Education, China Minyi Guo, University of Aizu, Japan Steering Committee: Jiannong Cao, Hong Kong PolyU, China Francis Lau, Univ. of Hong Kong, China Yi Pan, Georgia State Univ. USA Li Xie, Nanjing University, China Jie Wu, Florida Altantic Univ. USA Laurence T. Yang, St. Francis Xavier Univ. Canada Hans P. Zima, California Institute of Technology, USA Weiming Zheng, Tsinghua University, China Local Organizing Committee Co-Chairs: Xianglin Fei, Nanjing University, China Baowen Xu, Southeast University, China Ling Chen, Yangzhou University, China Workshop Chair Guihai Chen, Nanjing University, China Tutorial Chair Yuzhong Sun, Institute of Computing Technology, CAS, China Publicity Chair: Cho-Li Wang, Univ. of Hong Kong, China Publication Chair: Hui Wang, University of Aizu, Japan Registration Chair: Xianglin Fei, Nanjing University, China Program Committee: See web page http://keysoftlab.nju.edu.cn/ispa2005/ for details. ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From jrajiv at hclinsys.com Mon Dec 6 02:32:58 2004 From: jrajiv at hclinsys.com (Rajiv) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] PBS Job submission Message-ID: <013701c4db7e$f4e754e0$39140897@PMORND> Dear All, I would like to create a pbs script which can submit a job to no. of nodes. Pls guide me with good sites and batch programs for the same. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041206/ebc79959/attachment.html From jakob at unthought.net Mon Dec 6 11:23:34 2004 From: jakob at unthought.net (Jakob Oestergaard) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] problem in mounting In-Reply-To: <20041206083944.65998.qmail@web20026.mail.yahoo.com> References: <20041206083944.65998.qmail@web20026.mail.yahoo.com> Message-ID: <20041206192334.GF347@unthought.net> On Mon, Dec 06, 2004 at 12:39:44AM -0800, akhtar Rasool wrote: > hi, > i m unable to share the /home of server. when i mount -a on the slave an error is generated > > mount: host1:/home failed, reason given by server: permission denied > > > the /etc/hosts files on both server and slave r same. > on the server the contents of file /etc/exports are > > /home *.192.168.0.20/24(rw,sync,no_root_squash) man exports Try: /home 192.168.0.20/25(rw,sync,no_root_squash) -- / jakob From kums at mpi.mpi-softtech.com Mon Dec 6 11:37:53 2004 From: kums at mpi.mpi-softtech.com (Kumaran Rajaram) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] problem in mounting In-Reply-To: <20041206083944.65998.qmail@web20026.mail.yahoo.com> References: <20041206083944.65998.qmail@web20026.mail.yahoo.com> Message-ID: Akthar, You need to export the file system/directories on the server (exportfs -ra). Also, make sure to start the nfs services on the clients (on redhat: service nfs start). Also, remove the leading dot in your exportfs entry (see below) On Mon, 6 Dec 2004, akhtar Rasool wrote: > hi, > i m unable to share the /home of server. when i mount -a on the slave an error is generated > > mount: host1:/home failed, reason given by server: permission denied > > > the /etc/hosts files on both server and slave r same. > on the server the contents of file /etc/exports are > > /home *.192.168.0.20/24(rw,sync,no_root_squash) > should be /home 192.168.0.20/24(rw,sync,no_root_squash) or /home *(rw,sync,no_root_squash) If necessary, stop the iptables service. HTH, -Kums > > > Akhtar > > > --------------------------------- > Do you Yahoo!? > The all-new My Yahoo! – What will yours do? From akhtar_samo at yahoo.com Mon Dec 6 21:58:37 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh Message-ID: <20041207055837.48073.qmail@web20027.mail.yahoo.com> Hi, Actually i m unable to achieve paswordless ssh. Without this i cannot successfully install MPICH. What i have did - server's /home is shared with the nodes - kuser on a node has generated its key like ssh-keygen -t dsa which is saved in /home/kuser/.ssh of the node. - now i have copied this key(id_dsa.pub) to server's /root/.ssh/authorized_keys scp /home/kuser/.ssh/id_dsa.pub server:/root/.ssh/authorized_keys - then i have restarted sshd.... Still password asking.. kindly solve if rsh is easy to use for a cluster tell me that Akhtar --------------------------------- Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041206/eebbcde4/attachment.html From bogdan.costescu at iwr.uni-heidelberg.de Tue Dec 7 05:07:09 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: <1102351819.3088.15.camel@hpti10.fsl.noaa.gov> Message-ID: On Mon, 6 Dec 2004, Craig Tierney wrote: > an a Opteron dual 1.8 Ghz and an Itanium dual 1.4 Ghz, both running > White Box. I seriously doubt this. WhiteBox for Itanium ? Since when ? :-) > The same physical computer was not used in all tests, or even for > the same architecture. That's messy. Different devices can behave differently with the same driver on different architectures, due to differences in PCI implementation, cache alignment, etc. I don't say however that this explains the big difference that you saw between 2.4 and 2.6... > However, since the behavior seems to cross different architectures > with different interfaces, I don't think the problem is with an > individual server. Based on the description so far, I would lean towards file-system differences. But you did not specify how you got the speed figures: hdparm, bonnie, zcav, iozone, tiobench, something else ? The main question being: at device level or file-system level ? -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From csamuel at vpac.org Mon Dec 6 22:44:17 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: <1102182161.3107.11.camel@hpti10.fsl.noaa.gov> References: <1102182161.3107.11.camel@hpti10.fsl.noaa.gov> Message-ID: <200412071744.19448.csamuel@vpac.org> On Sun, 5 Dec 2004 04:42 am, Craig Tierney wrote: > For the 2.4 kernels, I needed to tweak /proc/sys/vm/max_readahead > to improve performance. Out of interest, what was your tweak ? cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041207/9483c76b/attachment.bin From csamuel at vpac.org Mon Dec 6 22:56:07 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Cluster reviews In-Reply-To: <200412020955.10019.daniel.kidger@quadrics.com> References: <200412020955.10019.daniel.kidger@quadrics.com> Message-ID: <200412071756.10109.csamuel@vpac.org> On Thu, 2 Dec 2004 08:55 pm, Dan Kidger wrote: > Its current status is that you can download it for free, but > naturally need to pay if you want support. Only the 1.0 version of Lustre is available under the GPL at the moment, 1.2 is still non-free, though that will change according to ClusterFS. See: https://lists.clusterfs.com/pipermail/lustre-discuss/2004-April/000244.html for more info. Chris -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041207/b519c4eb/attachment.bin From pesch at attglobal.net Tue Dec 7 14:51:24 2004 From: pesch at attglobal.net (pesch@attglobal.net) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Opteron performance References: <77673C9ECE12AB4791B5AC0A7BF40C8F1542E2@exchange02.fed.cclrc.ac.uk> Message-ID: <41B633EC.61017270@attglobal.net> Where can I find a decent comparison betwenn Opteron and Athlon64 Paul "Kozin, I (Igor)" wrote: > > > > On Fri, 2004-11-26 at 06:29, Kozin, I (Igor) wrote: > > > > > I tested the same executable with GNU 2.6.8 kernel > > > > Linux is not a GNU product. I think the term you want is a 'vanilla > > 2.6.8 kernel' or '2.6.8 kernel from kernel.org' > > My apologies. Yes, this is what I wanted to say. > > > > > As far as GNU kernel's go, their kernel is called the GNU Hurd, and I > > don't believe its reached 1.0 yet. > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hartner at cs.utah.edu Tue Dec 7 10:23:57 2004 From: hartner at cs.utah.edu (Mark Hartner) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <20041207055837.48073.qmail@web20027.mail.yahoo.com> Message-ID: > Hi, > Actually i m unable to achieve paswordless ssh. Without this i cannot successfully install MPICH. > What i have did > - server's /home is shared with the nodes > - kuser on a node has generated its key like > ssh-keygen -t dsa > which is saved in /home/kuser/.ssh of the node. > - now i have copied this key(id_dsa.pub) to server's /root/.ssh/authorized_keys > scp /home/kuser/.ssh/id_dsa.pub server:/root/.ssh/authorized_keys > - then i have restarted sshd.... > > Still password asking.. kindly solve What are you trying to do here? What you have set up is the ability for kuser to ssh into 'server' as root without a password (assuming remote root logins are enabled and you have an ssh-agent running). root's home directory is not NFS mounted, so you will have to copy the authorized keys file to every machine. Mark From mathog at mendel.bio.caltech.edu Tue Dec 7 09:59:33 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] serious NFS problem on Mandrake 10.0 Message-ID: Greg Lindahl pointed out the problem - soft nfs mounts. I'm still not entirely clear why this was triggering problems in the original case, where the entire set of NFS copies took <4.5 seconds. From my understanding of timeo and retrans there should have been at least .7 + 1.4 + 2.8 delays for 3 retrans before a major timeout was declared. That adds up to 4.9 seconds. In any case, changing the mount to "hard" eliminated the problem. Maybe it only happened when there was a short burst of other net activity at the same time? Oddly, leaving the mount at soft and changing retrans to 50 did not completely eliminate the problem when the test script ran, it only reduced it to 2 errors out of 1800 transfers. The script ran in <360 seconds, so apparently the 60 second minor timeout is promoted to a major timeout no matter how many retrans are left. Either that or something else went wrong unrelated to retrans. Is there any facility in linux, or as an add on, to serialize file transfers? In other words, in this case we know that N files of roughly the same size must be transferred to one disk. The current method sends the data asynchronously and so there is some interference between the nodes. It also hops the head around on the disk as it tries to write simultaneously to all N files. (Subject to whatever the disk subsystem can do to sort that out.) Ideally rather than just doing "mv /tmp/blah.nodename /wherever" on each compute node in this situation a script could do instead: "queuemv /tmp/blah.nodename /wherever" where "queuemv" would take care of moving the data as fast as possible over the network _without contention_ and writing it sequentially to the N files, one file at a time. Is there something like queuemv available? I can see how to do this using a standard SGE sort of qsub but the overhead for a conventional queue system is awfully high for this particular application. If it was just one node then my msgqueue application (a command line interface to ipcs) could be used: ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/msgqueue.html but this particular operation requires synchronization between multiple nodes, and ipcs doesn't share message queues across the network. Hmm, I suppose that each node could rsh to the master node and run the msgqueue in a script there. Alternatively, is there a network/cluster variant of ipcs??? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From pjs at eurotux.com Tue Dec 7 10:20:47 2004 From: pjs at eurotux.com (Paulo Silva) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <20041207055837.48073.qmail@web20027.mail.yahoo.com> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> Message-ID: <1102443647.4080.6.camel@valen> Hello, Seg, 2004-12-06 ?s 21:58 -0800, akhtar Rasool escreveu: > Hi, > Actually i m unable to achieve paswordless ssh. Without this i cannot > successfully install MPICH. > What i have did > - server's /home is shared with the nodes > - kuser on a node has generated its key like > ssh-keygen -t dsa > which is saved in /home/kuser/.ssh of the node. > - now i have copied this key(id_dsa.pub) to > server's /root/.ssh/authorized_keys > scp /home/kuser/.ssh/id_dsa.pub server:/root/.ssh/authorized_keys Did you changed the permissions of the /root/.ssh/authorized_keys file to something like 600? > - then i have restarted sshd.... > > Still password asking.. kindly solve > > if rsh is easy to use for a cluster tell me that > > Akhtar -- Paulo Silva Eurotux Inform?tica, SA -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://www.scyld.com/pipermail/beowulf/attachments/20041207/d50c5698/attachment.bin From list-beowulf at onerussian.com Tue Dec 7 10:32:16 2004 From: list-beowulf at onerussian.com (Yaroslav Halchenko) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <20041207055837.48073.qmail@web20027.mail.yahoo.com> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> Message-ID: <20041207183216.GJ31133@washoe.rutgers.edu> Are you trying to ssh to root account? Then make sure you permit root to login: ~>grep Root /etc/ssh/sshd_config PermitRootLogin yes if you want to ssh to kuser account why the hack you copy public key to root's .ssh? because your home is shared just cp id_dsa.pub authorized_keys and then make sure that permissions on .ssh directory are restrictive. hope this helps -- Yarik On Mon, Dec 06, 2004 at 09:58:37PM -0800, akhtar Rasool wrote: > Hi, > Actually i m unable to achieve paswordless ssh. Without this i cannot > successfully install MPICH. > What i have did > - server's /home is shared with the nodes > - kuser on a node has generated its key like > ssh-keygen -t dsa > which is saved in /home/kuser/.ssh of the node. > - now i have copied this key(id_dsa.pub) to > server's /root/.ssh/authorized_keys > scp /home/kuser/.ssh/id_dsa.pub server:/root/.ssh/authorized_keys > - then i have restarted sshd.... > Still password asking.. kindly solve > if rsh is easy to use for a cluster tell me that > Akhtar > _________________________________________________________________ > Do you Yahoo!? > Yahoo! Mail - Easier than ever with enhanced search. [1]Learn more. > References > 1. http://us.rd.yahoo.com/evt=29916/*http://info.mail.yahoo.com/mail_250 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555] Key http://www.onerussian.com/gpg-yoh.asc GPG fingerprint 3BB6 E124 0643 A615 6F00 6854 8D11 4563 75C0 24C8 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://www.scyld.com/pipermail/beowulf/attachments/20041207/e12355ed/attachment.bin From lindahl at pathscale.com Tue Dec 7 11:23:32 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Opteron performance In-Reply-To: <41B633EC.61017270@attglobal.net> References: <77673C9ECE12AB4791B5AC0A7BF40C8F1542E2@exchange02.fed.cclrc.ac.uk> <41B633EC.61017270@attglobal.net> Message-ID: <20041207192331.GC1554@greglaptop.internal.keyresearch.com> On Tue, Dec 07, 2004 at 02:51:24PM -0800, pesch@attglobal.net wrote: > Where can I find a decent comparison betwenn Opteron and Athlon64 Unfortunately not much Linux-relevant performance info is published about Athlon64; nobody publishes SPECcpu numbers, for example. Lots of people publish Windows desktop performance info for Athlon64, but often Windows people don't publish Opteron numbers. So... -- greg From ctierney at HPTI.com Tue Dec 7 11:23:34 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: References: Message-ID: <1102447413.3088.7.camel@hpti10.fsl.noaa.gov> On Tue, 2004-12-07 at 06:07, Bogdan Costescu wrote: > On Mon, 6 Dec 2004, Craig Tierney wrote: > > > an a Opteron dual 1.8 Ghz and an Itanium dual 1.4 Ghz, both running > > White Box. > > I seriously doubt this. WhiteBox for Itanium ? Since when ? :-) I meant White Box distribution which is a rebuild of Red Hat Enterprise 3. It was an early release for Opteron and Itanium. Since then, the Gelato Foundation has taken over the rebuilds and should be available. Now if you are referring to 'White box' Itanium machines, then I guess that depends on how you define white box. I can go buy a non-Intel Itanium motherboard, CPUs, and memory and build on myself if I so choose. > > > The same physical computer was not used in all tests, or even for > > the same architecture. > > That's messy. Different devices can behave differently with the same > driver on different architectures, due to differences in PCI > implementation, cache alignment, etc. I don't say however that this > explains the big difference that you saw between 2.4 and 2.6... True. However I did do two tests on the same physical hardware. I ran the White Box Distro based on a 2.4 kernel and Suse 9.1 Profession based on a 2.6 kernel on an Opteron system. I saw the same behavior. It isn't exact, but every single 2.6 based system I have tried has shown this problem. Every 2.4 based system has not shown this problem. > > > However, since the behavior seems to cross different architectures > > with different interfaces, I don't think the problem is with an > > individual server. > > Based on the description so far, I would lean towards file-system > differences. But you did not specify how you got the speed figures: > hdparm, bonnie, zcav, iozone, tiobench, something else ? The main > question being: at device level or file-system level ? I primarily used lmdd to test the performance of the filesystem. All I care about is big streaming IO. I did try sgpdd which accesses the device directly and I saw the same behavior. Craig From ctierney at HPTI.com Tue Dec 7 11:25:44 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: <200412071744.19448.csamuel@vpac.org> References: <1102182161.3107.11.camel@hpti10.fsl.noaa.gov> <200412071744.19448.csamuel@vpac.org> Message-ID: <1102447544.3088.11.camel@hpti10.fsl.noaa.gov> On Mon, 2004-12-06 at 23:44, Chris Samuel wrote: > On Sun, 5 Dec 2004 04:42 am, Craig Tierney wrote: > > > For the 2.4 kernels, I needed to tweak /proc/sys/vm/max_readahead > > to improve performance. > > Out of interest, what was your tweak ? > > cheers! > Chris The max_readahead is too small for high performance RAID hardware. I typically set: echo 511 > /proc/sys/vm/max_readahead But it depends. If I am using LVM to stripe across volumes I have seen better performance with 1023. I would test out different values, but I have never seen where the default value of 31 was adequate. I also changed /proc/sys/vm/min_readahead, but that didn't seem to make a difference on the large streaming IO opterations. Craig From sdutta at cfa.harvard.edu Tue Dec 7 11:03:13 2004 From: sdutta at cfa.harvard.edu (Suvendra Nath Dutta) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <1102443647.4080.6.camel@valen> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> Message-ID: On this note, I know this has been rehashed many times before, but using OpenSSH 3.8 on SUSE 9.1, I couldn't get host authentication to work. I followed all the instructions out in the web but everything failed. I ended up copying the root's dsa key to every user's ssh directory and using public-key authentication. Has someone successfully implemented host authentication using SSH (hopefully v2) and has written it up in a nice How To? Suvendra. On Dec 7, 2004, at 1:20 PM, Paulo Silva wrote: > Hello, > > Seg, 2004-12-06 ?s 21:58 -0800, akhtar Rasool escreveu: >> Hi, >> Actually i m unable to achieve paswordless ssh. Without this i cannot >> successfully install MPICH. >> What i have did >> - server's /home is shared with the nodes >> - kuser on a node has generated its key like >> ssh-keygen -t dsa >> which is saved in /home/kuser/.ssh of the node. >> - now i have copied this key(id_dsa.pub) to >> server's /root/.ssh/authorized_keys >> scp /home/kuser/.ssh/id_dsa.pub server:/root/.ssh/authorized_keys > > Did you changed the permissions of the /root/.ssh/authorized_keys file > to something like 600? > >> - then i have restarted sshd.... >> >> Still password asking.. kindly solve >> >> if rsh is easy to use for a cluster tell me that >> >> Akhtar > > -- > Paulo Silva > Eurotux Inform?tica, SA > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From josh.kayse at gmail.com Tue Dec 7 14:23:40 2004 From: josh.kayse at gmail.com (Josh Kayse) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs Message-ID: <7c8f279204120714232806210a@mail.gmail.com> Ok, my first post, so please be gentle. I've recently been tasked to build a diskless cluster for one of our engineers. This was easy because we already had an image for the set of machines. Once we started testing, the performance was very poor. Basic setup follows: Master node: system drive is 1 36GB SCSI drive /home raid5 5x 36GB SCSI drives Master node exports /tftpboot/192.168.1.x for the nodes. all of the nodes are diskless and get their system from the master node over gigabit ethernet. All that worsk fine. The engineers use files over nfs for message passing, and no, they will not change their code to mpi even though it would be an improvement in terms of manageability and probably performance. Basically, my question is: what are some ways of testing the performance of nfs ande then, how can I improve the performance? Thanks for any help in advance. PS: nfs mount options: async,rsize=8192,wsize=8192,hard file sizes: approx 2MB -- Joshua Kayse Computer Engineering From ctierney at HPTI.com Tue Dec 7 14:56:46 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: <1102460206.3088.42.camel@hpti10.fsl.noaa.gov> On Tue, 2004-12-07 at 15:23, Josh Kayse wrote: > Ok, my first post, so please be gentle. > > I've recently been tasked to build a diskless cluster for one of our > engineers. This was easy because we already had an image for the set > of machines. Once we started testing, the performance was very poor. What performance is poor? Is it the whole code that is slowing down or is it just the disk IO? > Basic setup follows: > > Master node: system drive is 1 36GB SCSI drive > /home raid5 5x 36GB SCSI drives Have you tuned the performance of your raid5 device? Depending on your controller, should probably be seeing 100 MB/s for both read and write. Or is this software raid? > Master node exports /tftpboot/192.168.1.x for the nodes. > > all of the nodes are diskless and get their system from the master > node over gigabit ethernet. > All that worsk fine. > > The engineers use files over nfs for message passing, and no, they So this used to work ok? Did it work with the same NFS server before you had it export the diskless filesystem? > will not change their code to mpi even though it would be an > improvement in terms of manageability and probably performance. > > Basically, my question is: what are some ways of testing the > performance of nfs ande then, how can I improve the performance? > > Thanks for any help in advance. > > PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB If possible, increase the MTU on your interfaces. Also test your NFS performance and compare it to the raw disk performance. Craig From csamuel at vpac.org Tue Dec 7 15:23:00 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: <200412081023.05139.csamuel@vpac.org> On Wed, 8 Dec 2004 09:23 am, Josh Kayse wrote: > Master node: system drive is 1 36GB SCSI drive > /home raid5 5x 36GB SCSI drives What distro, kernel and filesystem are you using ? > all of the nodes are diskless and get their system from the master > node over gigabit ethernet. Is your network infrastructure capable of jumbo frames ? > The engineers use files over nfs for message passing, and no, they > will not change their code to mpi even though it would be an > improvement in terms of manageability and probably performance. You have my considerable sympathies. :-) > Basically, my question is: what are some ways of testing the > performance of nfs ande then, how can I improve the performance? A good start is the NFS Performance HOWTO at: http://nfs.sourceforge.net/nfs-howto/performance.html A lot of folks report that XFS makes a superior underlying FS for NFS than ext3 and we're planning to use that on our next NFS server we build. > Thanks for any help in advance. > > PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB Out of interest, what does /proc/mounts say when you do and when you don't specify the rsize and wsize when mounting the filesystem ? cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041208/cc738f1b/attachment.bin From joelja at darkwing.uoregon.edu Tue Dec 7 15:36:30 2004 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Opteron performance In-Reply-To: <41B633EC.61017270@attglobal.net> References: <77673C9ECE12AB4791B5AC0A7BF40C8F1542E2@exchange02.fed.cclrc.ac.uk> <41B633EC.61017270@attglobal.net> Message-ID: On Tue, 7 Dec 2004 pesch@attglobal.net wrote: > Where can I find a decent comparison betwenn Opteron and Athlon64 functionaly the only difference betten the two is memory bandwidth and the number of ht ports. > Paul > > "Kozin, I (Igor)" wrote: > >>> >>> On Fri, 2004-11-26 at 06:29, Kozin, I (Igor) wrote: >>> >>>> I tested the same executable with GNU 2.6.8 kernel >>> >>> Linux is not a GNU product. I think the term you want is a 'vanilla >>> 2.6.8 kernel' or '2.6.8 kernel from kernel.org' >> >> My apologies. Yes, this is what I wanted to say. >> >>> >>> As far as GNU kernel's go, their kernel is called the GNU Hurd, and I >>> don't believe its reached 1.0 yet. >>> >>> >>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja@darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 From bernd-schubert at web.de Tue Dec 7 16:31:25 2004 From: bernd-schubert at web.de (Bernd Schubert) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: <200412080131.25259.bernd-schubert@web.de> Hi Joshua, > > PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB better switch to tcp, its always suggested on the linux NFS mailinglist by the kernel maintainers. So additionally give the mount option tcp. Do you export async or sync (/etc/exports)? Its not importing for reading, but exporting async may give much better write throughput. On the other hand it might cause data loss on a server crash, but if one needs speed its often neccessary. Hope it helps, Bernd PS: We are using a pretty nice diskless environment, probably with a completely different approach than you are doing ;) Just replace ClusterNFS by unfs3, I will update this howto over X-mas. -- Bernd Schubert Physikalisch Chemisches Institut / Theoretische Chemie Universit?t Heidelberg INF 229 69120 Heidelberg e-mail: bernd.schubert@pci.uni-heidelberg.de From bill at cse.ucdavis.edu Wed Dec 8 01:08:46 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Opteron performance In-Reply-To: References: <77673C9ECE12AB4791B5AC0A7BF40C8F1542E2@exchange02.fed.cclrc.ac.uk> <41B633EC.61017270@attglobal.net> Message-ID: <20041208090846.GA14091@cse.ucdavis.edu> On Tue, Dec 07, 2004 at 03:36:30PM -0800, Joel Jaeggli wrote: > On Tue, 7 Dec 2004 pesch@attglobal.net wrote: > >Where can I find a decent comparison betwenn Opteron and Athlon64 > > functionaly the only difference betten the two is memory bandwidth and the > number of ht ports. Er, it's a bit more complicated than that. Athlon 64's are available in 3 flavors: s754 athlon 64s have a 64 bit memory bus (unregistered memory, 1 less cycle) s939 athlon 64s have a 128 bit memory bus (unregistered memory, 1 less cycle) s940 athlon 64s have a 128 bit memory bus (registered, 1 more cycle) The FX is available in both s939 and s940, but are "unlocked" so you can change the multiplier (useful mostly for overclocking). Athlon 64s have 2 cache sizes, 512K and 1024K. Be careful often the same cpu rating (like the athlon 64 3200) are available in 2 cache sizes and 2 clock speeds. So the main performance differences are 1/2 the memory bandwidth for s754 chips, and 1/2 the cache size for some athlon 64's. The effect of either will depend on the application. AFAIK, the s939 athlon 64's have 3 HT's just like the opteron, but none of them are coherent, so you can't run SMP with them. -- Bill Broadley Computational Science and Engineering UC Davis From rgb at phy.duke.edu Wed Dec 8 06:21:51 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: On Tue, 7 Dec 2004, Josh Kayse wrote: > Ok, my first post, so please be gentle. > > I've recently been tasked to build a diskless cluster for one of our > engineers. This was easy because we already had an image for the set > of machines. Once we started testing, the performance was very poor. > Basic setup follows: > > Master node: system drive is 1 36GB SCSI drive > /home raid5 5x 36GB SCSI drives > Master node exports /tftpboot/192.168.1.x for the nodes. > > all of the nodes are diskless and get their system from the master > node over gigabit ethernet. > All that worsk fine. > > The engineers use files over nfs for message passing, and no, they > will not change their code to mpi even though it would be an > improvement in terms of manageability and probably performance. > > Basically, my question is: what are some ways of testing the > performance of nfs ande then, how can I improve the performance? > > Thanks for any help in advance. > > PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB Is this a trick question? You begin by saying performance is poor. Then you say that you (they) won't take the obvious step to improve your performance. Sigh. OK, let's start by analyzing the problem. You haven't said much about the application. Testing NFS is a fine idea, but before spending too much time on any single metric of peformance let's analyze your cluster and task. You say you have gigabit ethernet between nodes. You don't say how MANY nodes you have, or how fast/what kind they are (even in general terms), or how much memory they have, or whether they have one or two processors. These all matter. Then there is the application. If the nodes compute for five minutes, then write a 2 MB file, then read a 2 MB file (or even several 2 MB files) parallel scaling is likely to be pretty good, even on top of NFS. If they compute 0.001 seconds, then write 2 MB and read 2+ MB, parallel scaling is likely to be poor (NFS or not). Why? If you don't already know the answer, you should check out my online book (http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php) and read up on Amdahl's Law and parallel scaling. Let's do some estimation. Forget NFS. The theoretical peak bandwidth of gigabit ethernet is 1000/8 = 125 MB/sec (this ignores headers and all sorts of reality). It takes (therefore) a minimum of 0.016 seconds to send 2 MB. In the real world, bandwidth is generally well under 125 MB/sec for a variety of reasons -- say 100 MB/sec. If you are computing for only 0.001 seconds and then communicating for 0.04 seconds, parallel scaling will be, um, "poor", MPI or NFS notwithstanding. Once you understand that fundamental ratio, you can determine what the EXPECTED parallel scaling is of the application in a good world might be. A good world would be one where each node only communicated with one other node (2 MB each way) AND the communications could proceed in parallel. A worse but still tolerable world might be one where the communications can proceed at least partially in parallel without a bottleneck -- one to many communications require (e.g. tree) algorithms or broadcasts to proceed efficiently. However, you do NOT live in a good world. You have N hosts engaged in what sounds like a synchronous computation (where everybody has to finish a step before going on to the next) with a single communications master (the NFS server). Writing to the NFS server and reading from the NFS server is strictly serialized. If you have N hosts, it will take at least Nx0.02 seconds to write all of the output files from a step of computation, at least Nx0.02 seconds to READ all of the output files (and that's assuming each node just reads one) and now you've got something like 0.001 seconds of computation compared to Nx(0.04) or worse seconds of communication. The more nodes you add, the slower it goes! In fact, if you just KEEP the data on a single node it takes Nx0.001 seconds to advance the computation a step compared to 0.001+Nx0.04 seconds in the cluster! Even if you were computing one second instead of 0.001, this sort of scaling relation will kill parallel speedup at some number of nodes. Note that I've gone into some detail here, because you are going to have to explain this, in some detail, to your engineers after working out the parallel scaling for the task at hand. There is no way out of this or around this. Tuning the hell out of NFS is going to yield at most a factor of 2 or so in speedup of the communications phase, and your problem is probably a SCALING relation and bottlenecked COMMUNICATIONS PATTERN that could care less about factors of 2 and is intrinsic to serialized NFS. In other words, your engineers are either going to have to accept algebraically derived reality and restructure their code so it speeds up (almost certainly abandoning the NFS communications model) or accept that it not only won't speed up, it will actually slow down run in parallel on your cluster. Engineers tend to be pretty smart, especially about the constraints imposed by unforgiving nature. If you give them a short presentation and teach them about parallel scaling, they'll probably understand that it isn't a matter of tweaking something and NFS working, it is a fundamental mathematical relation that keeps NFS from EVER working in this context as an IPC channel. Unless, of course, you compute for minutes compared to communicate for seconds, in which case your speedup should be fine. That's why you have to analyze the problem itself and learn the details of its work pattern BEFORE designing a cluster or parallelization. rgb > -- > Joshua Kayse > Computer Engineering > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jimlux at earthlink.net Wed Dec 8 06:46:28 2004 From: jimlux at earthlink.net (Jim Lux) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] NEC4 and beowulf Message-ID: <001801c4dd34$b3becb80$32a8a8c0@LAPTOP152422> I'm looking for information on anyone who has run NEC4 (Numerical Electromagnetics Code) on a Beowulf. The source is F77 (well, actually, DEC Powerstation Visual Fortran v.6, but doesn't use any graphics, so the visual is sort of superfluous). Aside from simple parallelizing schemes where you farm out multiple invocations to multiple processors, I'm looking for approaches where the inner grunt work is spread out a bit. It's mostly matrix math (solving a big matrix at one point, but, I don't know if it uses standard library calls). It does support some form of intelligent partitioning because you can set a total model size and separately set a smaller "in core matrix size" when you compile, so that it can swap to disk. Before I start really digging into the source, I thought I'd just ask. Jim Lux JPL From agrajag at dragaera.net Wed Dec 8 06:27:04 2004 From: agrajag at dragaera.net (Sean Dilda) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] Password less ssh In-Reply-To: References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> Message-ID: <41B70F38.6030403@dragaera.net> Suvendra Nath Dutta wrote: > On this note, I know this has been rehashed many times before, but using > OpenSSH 3.8 on SUSE 9.1, I couldn't get host authentication to work. I > followed all the instructions out in the web but everything failed. I > ended up copying the root's dsa key to every user's ssh directory and > using public-key authentication. Has someone successfully implemented > host authentication using SSH (hopefully v2) Yes and has written it up in a > nice How To? No :) Some stuff that might be useful: in ssh_config: HostbasedAuthentication yes EnableSSHKeysign yes # This may not be needed, depending on your version of ssh and the 'HostbasedAuthentication' flag needs to be set in sshd_config as well. You also need to make sure all the appropriate keys are in /etc/ssh/ssh_known_hosts And /etc/ssh/shosts.equiv needs to be setup. I did mine with netgroups. And if you want root to be able to ssh in with host based, you need to setup /root/.shosts as well. I did this on RHL9 and RHEL3. From asabigue at fing.edu.uy Wed Dec 8 01:22:12 2004 From: asabigue at fing.edu.uy (Ariel Sabiguero) Date: Wed Nov 25 01:03:36 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: <41B6C7C4.8080107@fing.edu.uy> Josh Kayse wrote: >Ok, my first post, so please be gentle. > > bienvenido! >Basic setup follows: > >Master node: system drive is 1 36GB SCSI drive > /home raid5 5x 36GB SCSI drives >Master node exports /tftpboot/192.168.1.x for the nodes. > > I would also recomend you to have some sort of fault tolerance on the / filesystem. >The engineers use files over nfs for message passing, and no, they >will not change their code to mpi even though it would be an >improvement in terms of manageability and probably performance. > > I do not know what are your "messages" like, but how "big" are they? If size is not a problem you might want to create a ramfs drive and export it for message-passing. Maybe an extra GB of ram at the server solves your problem and messages never get to the disk. You might want to split what is message passing and application data this way. Maybe you need to recode "something" to use different directories for messages and data (my apologies for those engineers who don't like recoding!), but if the concept of "message" that the application uses is not the one of a petabyte-message this may help. >Basically, my question is: what are some ways of testing the >performance of nfs ande then, how can I improve the performance? > > I believe that previous messages on the list gave you great clues on how to speed-up the network (MTU+jumboframes), nfs and the underlying raid. My approach is different: why to persist something volatile? Any way, MTU, jumboframes and nfs stuff is also worth trying here. >Thanks for any help in advance. > > Just remember helping-back in the future ;-) Ariel >PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB > > From jrajiv at hclinsys.com Wed Dec 8 00:53:49 2004 From: jrajiv at hclinsys.com (Rajiv) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] CPU Benchmark Result Message-ID: <018901c4dd03$81ebc850$0f120897@PMORND> Dear Sir, I would like to get CPU benchmark results for various architectures. Any good sites where I could find this information. I found the site http://www.unc.edu/atn/hpc/performance/ useful. But I am unable to access - I am asked for username and password. How can I access this site. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041208/28b420cc/attachment.html From ed at eh3.com Tue Dec 7 16:08:35 2004 From: ed at eh3.com (Ed Hill) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: <1102464515.3089.64.camel@localhost.localdomain> On Tue, 2004-12-07 at 17:23 -0500, Josh Kayse wrote: > I've recently been tasked to build a diskless cluster for one of our > engineers. This was easy because we already had an image for the set > of machines. Once we started testing, the performance was very poor. > Basic setup follows: Hi Josh, We've spent some time tuning NFS on our clusters and still have NFS performance thats somewhere between "not so good" and "awful" relative to raw disk speeds. The short list of things to check is: - async option (which you're using--good) - increasing the number of nfsd processes - having an SMP (or even HyperThreaded) NFS server All other things equal, we'd seen substantially better NFS performance (throughput) when using SMP NFS servers. Also, you may want to look into lustre if throughput is a big concern. Ed -- Edward H. Hill III, PhD office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave. Cambridge, MA 02139-4307 emails: eh3@mit.edu ed@eh3.com URLs: http://web.mit.edu/eh3/ http://eh3.com/ phone: 617-253-0098 fax: 617-253-4464 From rgb at phy.duke.edu Wed Dec 8 09:18:51 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <41B70F38.6030403@dragaera.net> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> Message-ID: On Wed, 8 Dec 2004, Sean Dilda wrote: > Suvendra Nath Dutta wrote: > > On this note, I know this has been rehashed many times before, but using > > OpenSSH 3.8 on SUSE 9.1, I couldn't get host authentication to work. I > > followed all the instructions out in the web but everything failed. I > > ended up copying the root's dsa key to every user's ssh directory and > > using public-key authentication. Has someone successfully implemented > > host authentication using SSH (hopefully v2) > > Yes > > and has written it up in a > > nice How To? > > No :) Actually, there IS a mini-HOWTO out there on the web. I can't remember the URL, but google for it and you'll find it. Here is a snippet from my March CWM column of last year as well. Probably doesn't contain anything "new" compared to what folks have already told you, but just in case. %< snip snip snippet ==============(forgive the markup)================= Now, let's arrange it so that we can login to a remote host (also running sshd) without a password. Let's start by seeing if we can login to the remote host at all, I a password: C< rgb@lucifer|T:151>ssh lilith The authenticity of host 'lilith (192.168.1.131)' can't be established. RSA key fingerprint is 8d:55:10:15:8b:6c:64:65:17:00:a7:84:a3:35:9f:f6. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'lilith,192.168.1.131' (RSA) to the list of known hosts. rgb@lilith's password: rgb@lilith|T:101> > So far, so good. Note that the FIRST time we remotely login, ssh will ask you to verify that the host you are connecting to is really that host. When you answer yes it will save its key fingerprint and use it thereafter to automatically verify that the host is who you think it is. This is one small part of the ssh security benefit. However, we had to enter a password to login. This is no big deal for a single host, but is a BIG deal if you have to do it 1024 times on a big cluster just to get pvm started up! To avoid this, we use the ssh-keygen command to generate a public/private ssh key pair of our very own: C< rgb@lucifer|T:104>ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/rgb/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/rgb/.ssh/id_rsa. Your public key has been saved in /home/rgb/.ssh/id_rsa.pub. The key fingerprint is: c3:aa:6b:ba:35:57:95:aa:7b:45:48:94:c3:83:81:11 > This generates a default 1024 bit RSA key; alternatively we could have made a DSA key or increased or decreased the number of bits in the key (decreasing being a Bad Idea). Note that we used a blank passphrase; this will keep ssh from prompting us for a passphrase when we connect. The last step is to create an authorized keys file in your ~/.ssh directory. If your home directory is NFS exported to all the nodes, then you are done; otherwise you'll also need to copy the I to all the hosts that don't already have it mounted. The following illustrates the steps and a test. C< rgb@lucifer|T:113>cd .ssh rgb@lucifer|T:114>ls id_rsa id_rsa.pub known_hosts rgb@lucifer|T:115>cp id_rsa.pub authorized_keys rgb@lucifer|T:116>cd .. rgb@lucifer|T:118>scp -r .ssh lilith: rgb@lilith's password: known_hosts 100% |*****************************| 231 00:00 id_rsa 100% |*****************************| 883 00:00 id_rsa.pub 100% |*****************************| 220 00:00 authorized_keys 100% |*****************************| 220 00:00 rgb@lucifer|T:120>ssh lilith rgb@lilith|T:101> > Note that with the last ssh we logged into lilith with no password! ssh is really pretty easy to set up this way; if you read the man page(s) you can learn how to generate and add additional authorized keys and do fancier things with it, but many users will need no more than what we've done so far. A warning - it is a good idea to log into each host in your cluster one time after setting it up I proceeding further, to build up the known_hosts file so that you aren't prompted for each host the first time PVM starts up a virtual machine. Go do that, and then we'll get PVM itself going. (Sorry about the "PVM", but this was a column on running PVM. Obviously it is the same set of steps for MPI). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Wed Dec 8 09:50:57 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] CPU Benchmark Result In-Reply-To: <018901c4dd03$81ebc850$0f120897@PMORND> References: <018901c4dd03$81ebc850$0f120897@PMORND> Message-ID: On Wed, 8 Dec 2004, Rajiv wrote: > Dear Sir, > I would like to get CPU benchmark results for various architectures. > Any good sites where I could find this information. I found the site > http://www.unc.edu/atn/hpc/performance/ useful. But I am unable to > access - I am asked for username and password. How can I access this > site. Here are at least some of the the primary/famous benchmarks: SPEC -- probably the "best" of the application-level benchmark suites. Fairly tight rules, but deep-pocketed vendors doubtless maintain an edge beyond just having decent hardware. lmbench -- I think without question the best of the microbenchmark suites. If you want to find out how fast the CPU does any basic operation, this is probably the first place to look. This suite is heavily used by the linux kernel developers including Linus Himself because it provides accurate and reproducible timings of things like interrupt handling rates, context switch rates, memory latency and bandwidth, and some selected CPU operational rates. stream -- If you are interested in CPU-memory combined rates in operations on streaming vectors (e.g. copy, add, multiply-add) stream is the microbenchmark of choice. Its one weakness is that it doesn't provide one (easily) with a picture of rates as a function of vector size, so that one cannot observe the variation as one increases the vector size across the various CPU cache sizes. It is therefore better suited (as a predictor of application performance) for people running large applications involving linear algebra than for people operating on small blocks of data. Oh, and another weakness is that it doesn't provide any measure that includes the division operation. This is important because some code REQUIRES division in a streaming context or otherwise, and division is often several times slower than multiplication. linpack -- Another linear algebra type benchmark. Not terribly relevant at the application level any more, and a bit too complex to be a microbenchmark -- IMHO this is a benchmark that could be retired without anyone really missing it for practical reasons. However, it is has been around a long time and there is a fair bit of data derived from it. When someone tells you how many "MFLOPS" a system has, they are probably referring to Linpack MFLOPS. Historically, this has been a highly misleading predictor of relative systems performance at the application level and has also proven relatively easy to "cheat" on a bit at the hardware and software level, but there it is. savage -- This is a nearly forgotten benchmark that measures how fast a system does transcendental function evaluations (e.g. sin, tan). These are typically library calls, but some CPUs have had them built into microcode so that they execute several times faster (typically) than library code. Libaries can also exhibit some variation depending on the algorithms used for evaluation. Some of these benchmarks are wrapped into one another. For example, the HPC Challenge suite will contain stream, and I recall that lmbench has stream available in it as well now (don't shoot me if that is wrong -- I'm just remembering and could be mistaken). My own benchmark wrapper, cpu_rate (available on my website below under either General or Beowulf, can't remember which) contains stream WITH a variable length vector size, a stream-like measure of "bogomflops" (arithmetic mean of +-*/ times/rates), savage, and a memory read/write test that permits one to shuffle the order of access to compare streaming with random access rates. It is still a bit buggy and is on my list for more work (along with about four other projects:-) over Xmas break, but what it is really designed to be is a shell for drop-in microbenchmarks of your own design (arbitrary code fragments). Benchmarking whole applications is easy -- just use wall-clock time. Benchmarking small code fragments is remarkably difficult, especially if their execution time is comparable to the time required to read the most accurate system clock avaiable (typically the onboard CPU cycle counter). Benchmarking e.g. library calls is difficult to do completely accurately, but you can get a decent idea from using the -p flag and gmon (profiler) where there is a bit of heisenberg uncertainty in all of these -- the process of measurement can change the results, hopefully not too much to be useful. I'm not providing URLs because all of the above can easily be found with google, and because I don't know the exact URLs of lists of results derived from the benchmarks anyway. SPEC is pretty good about publishing a result list per submitted architecture. stream has started to do this as well, although it is also (unfortunately) playing a variant of the "Top X" Game where vendors get to tune and are "ranked". lmbench has the strictest rules of them all -- no vendor tuning whatsoever and you have to publish a whole SUITE of results if you publish any one. The more I look at and write about this stuff, the more I appreciate what Larry (McVoy) is fighting against... rgb > Regards, > Rajiv -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From anoony at yahoo.co.uk Wed Dec 8 10:55:12 2004 From: anoony at yahoo.co.uk (Ayaz Ali) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] parallel debugger for MPICH under WINDOWS Message-ID: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> Hello, Can anybody tell me if there is an mpi debugger available in the market? I suppose MS is planning a release soon. Can I get hold of a beta release for testing? Regards, Ayaz __________________________________ Do you Yahoo!? All your favorites on one personal page – Try My Yahoo! http://my.yahoo.com From bogdan.costescu at iwr.uni-heidelberg.de Wed Dec 8 08:10:40 2004 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: <1102447413.3088.7.camel@hpti10.fsl.noaa.gov> Message-ID: On Tue, 7 Dec 2004, Craig Tierney wrote: > I meant White Box distribution which is a rebuild of Red Hat Enterprise > 3. It was an early release for Opteron and Itanium. Since then, > the Gelato Foundation has taken over the rebuilds and should be > available. I don't think that this is "official" WhiteBox Linux. Their web page only mentions x86 and x86_86 architectures. They might even don't know about it... ;-) The guy that initially built WhiteBox x64_64 and ia64, Pasi Pirhonen, is now their TaoLinux maintainer (along with s390(x)). > It isn't exact, but every single 2.6 based system I have tried > has shown this problem. Every 2.4 based system has not shown this > problem. It is indeed strange... > I primarily used lmdd to test the performance of the filesystem. All > I care about is big streaming IO. ext3 and xfs (at least) care about the underlying sector size or RAID stripe size. Have you paid attention to this when you formatted the device (if you formatted after moving to the new computers) ? Do you use something else between the physical device and the file-system, like lvm (lvm1 in 2.4, lvm2 in 2.6) or software RAID ? > I did try sgpdd which accesses the device directly and I saw the > same behavior. I never heard of this tool. A simple search also didn't found anything. Care to provide a link ? Could you run hdparm/zcav/dd/etc. reading directly from real SCSI device (no lv*, md*) ? Yes, I know, some of these tools are not very precise, but we're talking about almost an order of magnitude... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From i.kozin at dl.ac.uk Wed Dec 8 10:11:35 2004 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] CPU Benchmark Result Message-ID: <77673C9ECE12AB4791B5AC0A7BF40C8F154302@exchange02.fed.cclrc.ac.uk> Hi Rajiv, try this http://www.cse.clrc.ac.uk/disco/hw-perf.shtml we don't charge for it ;) Igor I. Kozin (i.kozin at dl.ac.uk) CCLRC Daresbury Laboratory tel: 01925 603308 http://www.cse.clrc.ac.uk/disco -----Original Message----- From: Rajiv [mailto:jrajiv@hclinsys.com] Sent: 08 December 2004 08:54 To: beowulf@beowulf.org Subject: [Beowulf] CPU Benchmark Result Dear Sir, I would like to get CPU benchmark results for various architectures. Any good sites where I could find this information. I found the site http://www.unc.edu/atn/hpc/performance/ useful. But I am unable to access - I am asked for username and password. How can I access this site. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041208/e7bf20c5/attachment.html From sdutta at cfa.harvard.edu Wed Dec 8 08:16:50 2004 From: sdutta at cfa.harvard.edu (Suvendra Nath Dutta) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <41B70F38.6030403@dragaera.net> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> Message-ID: This is exactly the steps I followed from another past email in this list. But it didn't work for me. Which is why I wondered if something was different about this particular version of OpenSSH or SUSE. Suvendra On Wed, 8 Dec 2004, Sean Dilda wrote: > Suvendra Nath Dutta wrote: >> On this note, I know this has been rehashed many times before, but using >> OpenSSH 3.8 on SUSE 9.1, I couldn't get host authentication to work. I >> followed all the instructions out in the web but everything failed. I ended >> up copying the root's dsa key to every user's ssh directory and using >> public-key authentication. Has someone successfully implemented host >> authentication using SSH (hopefully v2) > > Yes > > and has written it up in a >> nice How To? > > No :) > > Some stuff that might be useful: > > in ssh_config: > > HostbasedAuthentication yes > EnableSSHKeysign yes # This may not be needed, depending on your version of > ssh > > and the 'HostbasedAuthentication' flag needs to be set in sshd_config as > well. > > You also need to make sure all the appropriate keys are in > /etc/ssh/ssh_known_hosts > > And /etc/ssh/shosts.equiv needs to be setup. I did mine with netgroups. > > And if you want root to be able to ssh in with host based, you need to setup > /root/.shosts as well. > > I did this on RHL9 and RHEL3. > From tmattox at gmail.com Wed Dec 8 10:46:08 2004 From: tmattox at gmail.com (Tim Mattox) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] diskless cluster nfs In-Reply-To: <7c8f279204120714232806210a@mail.gmail.com> References: <7c8f279204120714232806210a@mail.gmail.com> Message-ID: Hello Josh, We used to run our diskless clusters with hacked-together root over NFS, but have switched to a more manageable system based on ramdisks, called Warewulf. Using the Warewulf cluster management tools, the contents of the root filesystem on the nodes is easily maintained, updated, changed, tweaked, customised, and whatever else you care to do with them. And in your particular case, there can be a significant reduction in the NFS traffic on your cluster's network, since all critical system files are loaded into RAM on the nodes at boot time. Since you are just getting started, it is probably worth your time to check out at least one of these cluster management tools for "diskless" clusters: Warewulf http://warewulf-cluster.org/ OneSIS http://onesis.sourceforge.net/ They can save you a lot of headaches later. Disclaimer: I liked Warewulf so much, I became one of it's developers earlier this year. I have never used OneSIS, but it appears to be a viable approach if you really like root over NFS. In your case, I would suspect you would want to reduce any extraneous NFS traffic as best you can. Good luck. -- Tim P.S. - Take a good look at RGB's advice about the scalability of your problem before you bang your head against the "performance wall" for very long. P.P.S - I sometimes think that NFS actually stands for Not a File System... ;-) It has no direct way to force a "sync". I'd be very wary of any messaging scheme that used NFS as the medium. On Tue, 7 Dec 2004 17:23:40 -0500, Josh Kayse wrote: > Ok, my first post, so please be gentle. > > I've recently been tasked to build a diskless cluster for one of our > engineers. This was easy because we already had an image for the set > of machines. Once we started testing, the performance was very poor. > Basic setup follows: > > Master node: system drive is 1 36GB SCSI drive > /home raid5 5x 36GB SCSI drives > Master node exports /tftpboot/192.168.1.x for the nodes. > > all of the nodes are diskless and get their system from the master > node over gigabit ethernet. > All that worsk fine. > > The engineers use files over nfs for message passing, and no, they > will not change their code to mpi even though it would be an > improvement in terms of manageability and probably performance. > > Basically, my question is: what are some ways of testing the > performance of nfs ande then, how can I improve the performance? > > Thanks for any help in advance. > > PS: nfs mount options: async,rsize=8192,wsize=8192,hard > file sizes: approx 2MB > -- > Joshua Kayse > Computer Engineering -- Tim Mattox - tmattox@gmail.com - http://homepage.mac.com/tmattox/ From rgb at phy.duke.edu Wed Dec 8 13:01:18 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> Message-ID: On Wed, 8 Dec 2004, Suvendra Nath Dutta wrote: > This is exactly the steps I followed from another past email in this list. > But it didn't work for me. Which is why I wondered if something was > different about this particular version of OpenSSH or SUSE. I doubt it, although I don't use SUSE so I cannot be certain. I think (in agreement with several others on the list) that the problem is that you were doing things as root that are really dangerous, really bad things to do as root. For example, if you REALLY copied root's /root/.ssh directory to all your users' directories and had set root's directory up so that password-free login was possible, it is quite possible that now all of your users can login as root without a password. EACH user has to set up password-free logins for THEMSELVES, one at a time. You cannot do this for them, or well, I suppose you could but you'd need to do it by running the keygen-thing one user at a time, as those users. Not something you really want to be doing. The best that you could do is wrap it up in a script for users to run to do it in one step without knowing what they are doing. This would give you a degree of control over certain choices such as rsa vs dsa, number of bits in the key. rgb > > Suvendra > > > On Wed, 8 Dec 2004, Sean Dilda wrote: > > > Suvendra Nath Dutta wrote: > >> On this note, I know this has been rehashed many times before, but using > >> OpenSSH 3.8 on SUSE 9.1, I couldn't get host authentication to work. I > >> followed all the instructions out in the web but everything failed. I ended > >> up copying the root's dsa key to every user's ssh directory and using > >> public-key authentication. Has someone successfully implemented host > >> authentication using SSH (hopefully v2) > > > > Yes > > > > and has written it up in a > >> nice How To? > > > > No :) > > > > Some stuff that might be useful: > > > > in ssh_config: > > > > HostbasedAuthentication yes > > EnableSSHKeysign yes # This may not be needed, depending on your version of > > ssh > > > > and the 'HostbasedAuthentication' flag needs to be set in sshd_config as > > well. > > > > You also need to make sure all the appropriate keys are in > > /etc/ssh/ssh_known_hosts > > > > And /etc/ssh/shosts.equiv needs to be setup. I did mine with netgroups. > > > > And if you want root to be able to ssh in with host based, you need to setup > > /root/.shosts as well. > > > > I did this on RHL9 and RHEL3. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From mwill at penguincomputing.com Wed Dec 8 12:23:56 2004 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] parallel debugger for MPICH under WINDOWS In-Reply-To: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> References: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> Message-ID: <200412081223.56393.mwill@penguincomputing.com> There are several under Linux. Maybe you can contact their sales group to see if they also offer a windows product - or just switch over to linux where you can expect better performance and less license fees anyways. 1. Streamline Computing "ddt" patched up version of gdb with frontend for clustering, resold by absoft who claim it works for Scyld: http://www.absoft.com/Products/Debuggers/ddt/ddt.html 2. PGI Cluster Kit "pgdb" Also, the PGI cluster kit comes with pgdb, did anybody ever evaluate that? http://www.pgroup.com/products/pgdbg.htm Supports: * Linux for 32bit and 64bit processors * MPICH and LAM-MPI as well as OpenMPI? * up to 64 processes (DDT can do 1024) Michael Will On Wednesday 08 December 2004 10:55 am, Ayaz Ali wrote: > Hello, > Can anybody tell me if there is an mpi debugger > available in the market? > I suppose MS is planning a release soon. Can I get > hold of a beta release for testing? > Regards, > Ayaz > > > > __________________________________ > Do you Yahoo!? > All your favorites on one personal page ? Try My Yahoo! > http://my.yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com From Angel.R.Rivera at conocophillips.com Wed Dec 8 11:48:44 2004 From: Angel.R.Rivera at conocophillips.com (Rivera, Angel R) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyoneseen this? Message-ID: I am trying to catch up here so please bear with me. We have been doing a lot of testing with the 2.6 kernel and are in fact moving to it for our disk nodes because of issues with the 2.4 kernel. We have seen an across the board increase in performance over the 2.4 kernel. Our tests have included DAS NFS boxes as well as SAN attached dual Opteron puppies w/ Qlogic cards. I can tell you that it does take a little bit of work to make it all play together well, but so does the 2.4 kernel. From sdutta at cfa.harvard.edu Wed Dec 8 12:14:43 2004 From: sdutta at cfa.harvard.edu (Suvendra Nath Dutat) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> Message-ID: <1102536883.24032.4.camel@itctestcluster.cfa.harvard.edu> On Wed, 2004-12-08 at 16:01 -0500, Robert G. Brown wrote: > On Wed, 8 Dec 2004, Suvendra Nath Dutta wrote: > > > This is exactly the steps I followed from another past email in this list. > > But it didn't work for me. Which is why I wondered if something was > > different about this particular version of OpenSSH or SUSE. > > I doubt it, although I don't use SUSE so I cannot be certain. > > I think (in agreement with several others on the list) that the problem > is that you were doing things as root that are really dangerous, really > bad things to do as root. For example, if you REALLY copied root's > /root/.ssh directory to all your users' directories and had set root's > directory up so that password-free login was possible, it is quite > possible that now all of your users can login as root without a > password. > With trepidation (always advised when speaking to someone who harnesses the Brahma), I wonder if this absolutely true. Because, public keys don't identify users, they identify machines. So although every user uses public keys generated by the root user, they all just identify the originating machine. SSH verifies the machine is who they claim to be, and allow access to the user (but only as the user). If someone now says ssh -l root clientmachine they'll be asked for the root password. This is I believe as it should be and easily verified to be true (I just did it before emailing to be sure). Suvendra. From rgb at phy.duke.edu Wed Dec 8 17:47:30 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <1102536883.24032.4.camel@itctestcluster.cfa.harvard.edu> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> <1102536883.24032.4.camel@itctestcluster.cfa.harvard.edu> Message-ID: On Wed, 8 Dec 2004, Suvendra Nath Dutat wrote: > On Wed, 2004-12-08 at 16:01 -0500, Robert G. Brown wrote: > > On Wed, 8 Dec 2004, Suvendra Nath Dutta wrote: > > > > > This is exactly the steps I followed from another past email in this list. > > > But it didn't work for me. Which is why I wondered if something was > > > different about this particular version of OpenSSH or SUSE. > > > > I doubt it, although I don't use SUSE so I cannot be certain. > > > > I think (in agreement with several others on the list) that the problem > > is that you were doing things as root that are really dangerous, really > > bad things to do as root. For example, if you REALLY copied root's > > /root/.ssh directory to all your users' directories and had set root's > > directory up so that password-free login was possible, it is quite > > possible that now all of your users can login as root without a > > password. > > > > With trepidation (always advised when speaking to someone who harnesses > the Brahma), I wonder if this absolutely true. Because, public keys > don't identify users, they identify machines. So although every user > uses public keys generated by the root user, they all just identify the > originating machine. SSH verifies the machine is who they claim to be, > and allow access to the user (but only as the user). If someone now says > ssh -l root clientmachine they'll be asked for the root password. This > is I believe as it should be and easily verified to be true (I just did > it before emailing to be sure). Try it not as root. In fact, if you've copied the same keypairs into all your user's directories: a) su to root b) su to the first user of your choice (user1) c) ssh machine -l user2 and you should be able to login as user2 from user1's account without a password. In the best experimental tradition, I just tried this, and it most definitely >>can<< work. Whether or not it DOES work, and whether or not it works for root in particular, depends (IIRC) on the contents of various files in /etc/pam.d and settings in /etc/ssh/ssh*_config. As in I believe that one can set it up so that passwordless root logins from any source are always forbidden -- or not -- in the authentication stack in various places. I think this is one of the reasons that ssh seems so complicated and seems to work differently for different persons on different machines. I also could be mistaken -- I'm not a PAM expert and am not totally familiar with the effect of all the controls therein, although I have played with it various times in the past to try to get things to work. That's the (double) reason I was warning you, as I don't know whether or not there are things in root's authentication chain that will prevent password free login in your particular SUSE setup, but it is very likely that what you've done will enable any user to become any other user at will. This is obviously just as bad. Each user needs their own private keypair, or Bad Things Can Happen. Hmmm, on some of MY systems (at home inside my firewall), I've just set it up so one CAN do ssh hostname -l root if one copies the appropriate public key into /root/.ssh/authorized_keys. So that certainly can work as well. Yessir, Bad Things. You Have Been Warned. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jcownie at etnus.com Thu Dec 9 01:49:16 2004 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] parallel debugger for MPICH under WINDOWS In-Reply-To: Message from Ayaz Ali of "Wed, 08 Dec 2004 10:55:12 PST." <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> References: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> Message-ID: <20041209094916.1DA653F4D8@amd64.cownie.net> > Can anybody tell me if there is an mpi debugger available in the > market? If you're on Linux (or other Unix) platform it's worth looking at our TotalView debugger which has support for most of the MPI implementations and many different Linux architectures (x86, x86-64, Power). You can visit our web site and download a copy and a free (time-limited but full function) demo license (for up to 8 processors, I think). (Disclaimer: I get paid to write TotalView, so I won't make any recommendations :-). -- -- Jim -- James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From rgb at phy.duke.edu Thu Dec 9 08:04:48 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: <41B8610A.3020709@dragaera.net> References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> <1102536883.24032.4.camel@itctestcluster.cfa.harvard.edu> <41B8610A.3020709@dragaera.net> Message-ID: On Thu, 9 Dec 2004, Sean Dilda wrote: > Robert G. Brown wrote: > > > > > Try it not as root. In fact, if you've copied the same keypairs into > > all your user's directories: > > Rob, I believe you've responded to the wrong person. The original > poster, named akhtar Rasool, did a really weird and potentially > dangerous thing with user keys. Then later on Suvendra Nath Dutat asked > about hostbased authentication in ssh which uses host keys instead of > user keys. These are two different people with two different setups. Maybe I misread things, sorry. > As for sshing as different users. I know that the hostbased will not > let you do that, as I'm using hostbased in my cluster. Actually, the only difference between host based and user based ssh authentication is where the host keys were stored and how reliable they are likely to be (see e.g. man sshd). In fact I recall a time where passwordless login only tended to be permitted by sysadmins if you DID have a ssh_known_hosts table, as this is the only form of host authentication likely to be valid. In host based, host keys are collected by the sysadmin (presumably a trusted and perfectly knowledgeable agent) and put in /etc/ssh/ssh_known_hosts on all hosts. Doing this actualy allows users to skip the silly tell-me-again step where it asks users to verify that the host key of the host they are connecting to the first time is correct (as if they have any way they are every likely to use to tell, or even CAN use without connecting to the host in question first). At best, building up ~user/.ssh/known_hosts in this way adds a questionable amount to the overall security of any LAN. At worst over a WAN it is probably an open but unnoticed invitation for MitM attacks. I suspect that the thing that prevents users from using authorized key based (passwordless) authentication is a PAM setting or setting in /etc/ssh/ssh*.config, but pam is really hard for me to untangle in a truly deterministic way (too many settings, too many complicated interactions). sshd_config is pretty deterministic, though; look at: PermitEmptyPasswords PermitRootLogin which enable/disable most of the stuff we've been talking about and which have settings that vary according to the whim of the packager in any given distribution for their defaults. Usually I just tweak these settings a bit (and sometimes end up having to mess with PAM) and eventually find a combination that permits user and/or root login with or without passwords required, as the environment and my needs seems to require. FWIW, I just did another simple experiment and proved that I could (still) install ssh_known_hosts on two nodes in my home cluster (running pretty much stock dulug RH 9), delete the host entries in my ~/.ssh/known_hosts file, copy my id_dsa.pub into a son's authorized_keys file, and ssh directly to my son's account without either typing a password or "approving" the host key and having a new table entry in my ~/.ssh/known_hosts. So I'm >>certain<< that this isn't actually relevant to passworded vs passwordless login in the authentication stack or the dangerous elements of sharing keypairs among different individuals with a desire not to have their mail or files or ssh encrypted datastreams (all keyed to this pair) openly accessible to others. ssh_keyscan can be used to easily gather ssh host keys and build an ssh_known_hosts file. Doing this likely marginally increases the security of your ssh connections (IF you do it under circumstances that you know cannot be spoofed, e.g. inside a firewall and not over a WAN connection, of course) and keeps users from having to constantly "validate" host keys. If you keep it well-maintained, you can also avoid having ten users complain about the man-in-the-middle warning (and having to tell them what to do about it) that inevitably pops up after a reinstall unless you carefully preserve and restore the old keypairs. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From agrajag at dragaera.net Thu Dec 9 06:28:26 2004 From: agrajag at dragaera.net (Sean Dilda) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Password less ssh In-Reply-To: References: <20041207055837.48073.qmail@web20027.mail.yahoo.com> <1102443647.4080.6.camel@valen> <41B70F38.6030403@dragaera.net> <1102536883.24032.4.camel@itctestcluster.cfa.harvard.edu> Message-ID: <41B8610A.3020709@dragaera.net> Robert G. Brown wrote: > > Try it not as root. In fact, if you've copied the same keypairs into > all your user's directories: Rob, I believe you've responded to the wrong person. The original poster, named akhtar Rasool, did a really weird and potentially dangerous thing with user keys. Then later on Suvendra Nath Dutat asked about hostbased authentication in ssh which uses host keys instead of user keys. These are two different people with two different setups. As for sshing as different users. I know that the hostbased will not let you do that, as I'm using hostbased in my cluster. From akhtar_samo at yahoo.com Thu Dec 9 01:26:13 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] errors while testing machines Message-ID: <20041209092613.86335.qmail@web20023.mail.yahoo.com> After the extraction of MPICH in /usr/local 1- tcsh 2- ./configure –with-comm=shared --prefix=/usr/local 3- make 4- make install 5- util/tstmachines in the 5th step error was Errors while trying to run rsh 192.168.0.25 –n /bin/ls /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from 192.168.0.25 n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo: n no such file or directory The ls test failed on some machines. This usually means that u donot have a common filesystem on all of the machines in your machines list; MPICH requires this for mpirun (it is possible to handle this in a procgroup file; see the……) Other possible problems include:- The remote shell command rsh doesnot allow you to run ls. See the doc abt remote shell & rhosts You have common filesystem, but with inconsistent names See the doc on the automounter fix 1 error were encountered while testing the machines list for LINUX only these machines seem to be available host1 now since this is only a two node cluster host1 is the server on to which MPICH is being installed. & 192.168.0.25 is the client….. rsh on both nodes is logging freely……. On the server side the file “ machines.LINUX “ contains -192.168.0.25 -host1 Kindly help Akhtar --------------------------------- Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041209/9a613596/attachment.html From roger at ERC.MsState.Edu Thu Dec 9 08:18:27 2004 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Oldest functioning clusters In-Reply-To: References: Message-ID: There weren't all that many responses to my original post about the oldest still-functional cluster system. I decided not to post mine immediately, but now I thought I'd post it to see if anyone here has one older. Our oldest cluster is the Super MSPARC. It is based on 8 Sun SPARCstation 10 workstations, each with four 90MHz Ross processor modules and 288MB of RAM. The interconnects between the nodes include Myrinet (this is one of the earliest Myrinet-based systems), ethernet (10Mb/s), and ATM (OC3). It was originally developed as a research testbed, but still sees some use as a teaching tool for a parallel algorithms class. The according to our records, the components for the system were originally purchased in June of 1993, so the system is now 11 years old! Here's a link to a photo and description of it, as well as some of our other cluster projects: http://www.erc.msstate.edu/about/facilities/clusterhistory.html On Mon, 22 Nov 2004, Roger L. Smith wrote: > > During a conversation at the LECCIBG at SC'04 this year, I openly wondered > where and what the oldest still-functioning cluster system is. > > So, who has the oldest cluster on this list? For my curiousity, I'm not > limiting the submissions to Intel or Linux, so it doesn't have to be a > traditional "Beowulf" system, but it should be a system designed and used > exclusively as a cluster, and it should still be in service of some sort > today. > > > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ > | Roger L. Smith Phone: 662-325-3625 | > | Sr. Systems Administrator FAX: 662-325-7692 | > | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | > | Mississippi State University | > |____________________________________ERC__________________________________| > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| From joachim at ccrl-nece.de Thu Dec 9 00:30:17 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] parallel debugger for MPICH under WINDOWS In-Reply-To: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> References: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> Message-ID: <41B80D19.1030903@ccrl-nece.de> Ayaz Ali wrote: > Hello, > Can anybody tell me if there is an mpi debugger > available in the market? > I suppose MS is planning a release soon. Can I get > hold of a beta release for testing? Check the beowulf archives: http://www.beowulf.org/pipermail/beowulf/2004-June/010200.html Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de From john.hearns at streamline-computing.com Thu Dec 9 01:39:12 2004 From: john.hearns at streamline-computing.com (John Hearns) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] parallel debugger for MPICH under WINDOWS In-Reply-To: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> References: <20041208185512.77725.qmail@web25001.mail.ukl.yahoo.com> Message-ID: <1102585152.5375.11.camel@Vigor51> On Wed, 2004-12-08 at 10:55 -0800, Ayaz Ali wrote: > Hello, > Can anybody tell me if there is an mpi debugger > available in the market? With the proviso that I work for Streamline, you could look at ddt from Allinea http://www.allinea.com If you have any questions, send me an email off-list. From mathog at mendel.bio.caltech.edu Wed Dec 8 14:25:55 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along Message-ID: It's official, the Tyan S2466 nodes get "biggest PITA award" for systems that I've used. The two nodes that were crashing frequently had their power supplies replaced and then they were stable for a couple of months. Now they've both become unstable again. Evil motherboard juju eating power supplies? Who knows? Not that I can make them crash at will, oh no, that would be too easy. cpuburn (20 minutes) doesn't even make them hiccup They run memtest86+ 24 hours without a glitch. The problem never moved with memory anyway. but leave them running linux, doing not much of anything, and you never know when they're going to come down. Sometimes there's an oops, sometimes not. When there is an oops it can be in any piece of software. Today I upgraded the BIOS to 4.06 and, naturally, it didn't fix any of the many little annoyances the S2466N produces, ie, "who me boot?". I don't seriously expect it to fix the unstability. So rather than keep trying to fix these monsters I'm starting to think about the cheapest way to keep the cluster running by replacing just the mobo/CPU with something else (as I'm not expecting enough $$$ anytime soon to do more, and obtaining Athlon MPs and S2466N mobos now is problematical anyway.) I'll happily give up Tyan's serial line bios access for a system where I don't have to employ that feature quite so often! The S2466N is an ATX form factor, each one has one Athlon MP 2200+ and 1 Gb of 2100 DDR RAM, a 40G ATA disk, a floppy and a little PCI graphics card in a 2U case. If I could find a nice mobo/CPU combo for, oh, <$200 that could replace the S2466N and Athlon MP, and still do ECC, then I'd probably go that route to patch systems up as they break. Best if it has at least as much cache as the MP though. Is there anything out there fitting that description? Historically ECC support isn't something that shows up on cheap mobos but maybe on some low end Athlon 64 variant? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From tmattox at gmail.com Thu Dec 9 07:58:25 2004 From: tmattox at gmail.com (Tim Mattox) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Raid disk read performance issue in Linux 2.6, anyone seen this? In-Reply-To: References: <1102447413.3088.7.camel@hpti10.fsl.noaa.gov> Message-ID: Hello Craig, This just released article from LWN may shed some light on your IO performance issues with the 2.6 kernel: "Which is the fairest I/O scheduler of them all?" http://lwn.net/Articles/114770/ That is currently a subscriber only article, but will be available to all on December 16th. (I highly recommend a subscription, LWN is a fantastic resource.) You should look into which IO scheduler works best for your workload. The 2.6 kernel has a few to choose from... Also, you may be interested in the latest CentOS 3.3 distribution, since it has an actively supported x86_64 port. The upcoming cAos-2 distribution is also worth a look for x86-64 users... for info about both see: http://caosity.org/ On Wed, 8 Dec 2004 17:10:40 +0100 (CET), Bogdan Costescu wrote: > On Tue, 7 Dec 2004, Craig Tierney wrote: > > > I meant White Box distribution which is a rebuild of Red Hat Enterprise > > 3. It was an early release for Opteron and Itanium. Since then, > > the Gelato Foundation has taken over the rebuilds and should be > > available. > > I don't think that this is "official" WhiteBox Linux. Their web page > only mentions x86 and x86_86 architectures. They might even don't know > about it... ;-) > The guy that initially built WhiteBox x64_64 and ia64, Pasi Pirhonen, > is now their TaoLinux maintainer (along with s390(x)). > > > It isn't exact, but every single 2.6 based system I have tried > > has shown this problem. Every 2.4 based system has not shown this > > problem. > > It is indeed strange... > > > I primarily used lmdd to test the performance of the filesystem. All > > I care about is big streaming IO. > > ext3 and xfs (at least) care about the underlying sector size or RAID > stripe size. Have you paid attention to this when you formatted the > device (if you formatted after moving to the new computers) ? Do you > use something else between the physical device and the file-system, > like lvm (lvm1 in 2.4, lvm2 in 2.6) or software RAID ? > > > I did try sgpdd which accesses the device directly and I saw the > > same behavior. > > I never heard of this tool. A simple search also didn't found > anything. Care to provide a link ? > Could you run hdparm/zcav/dd/etc. reading directly from real SCSI > device (no lv*, md*) ? Yes, I know, some of these tools are not very > precise, but we're talking about almost an order of magnitude... > > > > -- > Bogdan Costescu > > IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen > Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY > Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 > E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Tim Mattox - tmattox@gmail.com - http://homepage.mac.com/tmattox/ From roger at ERC.MsState.Edu Thu Dec 9 08:38:05 2004 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Oldest functioning clusters In-Reply-To: References: Message-ID: On Thu, 9 Dec 2004, Roger L. Smith wrote: > The according to our records, the components for the system were > originally purchased in June of 1993, so the system is now 11 years old! > > Here's a link to a photo and description of it, as well as some of our > other cluster projects: > > http://www.erc.msstate.edu/about/facilities/clusterhistory.html Oops, I mistyped the URL! It should be: http://www.erc.msstate.edu/about/facilities/clusterhistory/ _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Sr. Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |____________________________________ERC__________________________________| From hvidal at tesseract-tech.com Thu Dec 9 10:19:03 2004 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Oldest functioning clusters In-Reply-To: References: Message-ID: <41B89717.9070805@tesseract-tech.com> Roger L. Smith wrote: > > Here's a link to a photo and description of it, as well as some of our > other cluster projects: > > http://www.erc.msstate.edu/about/facilities/clusterhistory.html sounds interesting, but this link is not found on site. Do you know corrected URL? hv From atp at piskorski.com Thu Dec 9 09:55:09 2004 From: atp at piskorski.com (Andrew Piskorski) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: References: Message-ID: <20041209175509.GA32083@piskorski.com> On Wed, Dec 08, 2004 at 02:25:55PM -0800, David Mathog wrote: > cpuburn (20 minutes) doesn't even make them hiccup They run > memtest86+ 24 hours without a glitch. The problem never moved with > memory anyway. but leave them running linux, doing not much of > anything, and you never know when they're going to come down. Do these machines have hard drives? If so, you definitely want to use the drive somehow while doing stability testing, as that can be necessary for a bad power supply to show itself (I have seen this). Unfortunatley, AFAIK you can't do that with memtest86. > The S2466N is an ATX form factor, each one has one Athlon MP > 2200+ and 1 Gb of 2100 DDR RAM, a 40G ATA disk, a floppy > and a little PCI graphics card in a 2U case. If I could > find a nice mobo/CPU combo for, oh, <$200 that could > replace the S2466N and Athlon MP, and still do ECC, then I'd Why exactly did you buy the more expensive Athlon MPs and dual motherboards, and then use only 1 cpu per motherboard? I believe the socket 754 and 939 Athlon 64s do not support ECC, while the socket 940 Athlon 64 and Opteron do. Prices per cpu seem to start around $125 for socket 754 or 939, $190 for socket 940, and go up from there, so you're going to have a very hard time squeaking in under $300 for the CPU + motherboard that you want, never mind $200. Under $400, that you could do. A cheaper option would be to keep the Athlon MP and simply replace the motherboard. The Athlon MP will work just fine in any motherboard taking the Athlon XP, but finding a non-dual motherboard that supports ECC might be tricky. Of course, if you're willing to try a different dual Athlon MP board, they'll pretty much all support ECC, and should certainly be under $200. I have a Tyan Tiger MP S2460 dual Athlon MP workstation that has given me no problems at all, but then, I don't use it with cluster-like workloads either. -- Andrew Piskorski http://www.piskorski.com/ From mathog at mendel.bio.caltech.edu Thu Dec 9 10:22:12 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along Message-ID: > > > The S2466N is an ATX form factor, each one has one Athlon MP > > 2200+ and 1 Gb of 2100 DDR RAM, a 40G ATA disk, a floppy > > and a little PCI graphics card in a 2U case. If I could > > find a nice mobo/CPU combo for, oh, <$200 that could > > replace the S2466N and Athlon MP, and still do ECC, then I'd > > Why exactly did you buy the more expensive Athlon MPs and dual > motherboards, and then use only 1 cpu per motherboard? 1. We were hoping the price of the MP chips would fall dramatically, allowing us to fill out the servers. What happened instead is that it leveled out at around $150/chip and then they disappeared. (The same trick had worked once before for us, with some Intel Pentium II 400s and ASUS motherboards, the second CPUs were dirt cheap when we bought them a few years after the initial system purchase and they extended the life of those workstations.) 2. We needed ECC - these nodes run continuously. ECC for Athlons was very hard to find at the time we purchased these machines. > > I believe the socket 754 and 939 Athlon 64s do not support ECC, while > the socket 940 Athlon 64 and Opteron do. It's hard to tell what the 754 boards support since many say they will _accept_ ECC memory but they don't say that they can actually _use_ it. > Prices per cpu seem to start > around $125 for socket 754 or 939, $190 for socket 940, and go up from > there, so you're going to have a very hard time squeaking in under > $300 for the CPU + motherboard that you want, never mind $200. Under > $400, that you could do. I was thinking of a 754 option primarily - 70 for motherboard + 130 for Athlon 64 2800 CPU (more or less.) > > A cheaper option would be to keep the Athlon MP and simply replace the > motherboard. The Athlon MP will work just fine in any motherboard > taking the Athlon XP, but finding a non-dual motherboard that supports > ECC might be tricky. Tricky is right - name _one_ Athlon XP motherboard that supports ECC. Besides, it may well be that the CPUs are bad and the motherboards are all right. Not having any spare known good CPUs or known good motherboards there's no way to play mix and match to figure out which component is the problem. Well, not unless we sacrifice one of the working nodes, and I'm really hesitant to do that in case it's one of those horrible situations where component A breaks component B, which would result in us having 3 flakey nodes instead of 2. In any case, if the CPUs are bad they'll be around $150 to replace from a vendor and the motherboard about $190 (somewhat less on Ebay, but that's not how I want to buy components.) Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From rgb at phy.duke.edu Thu Dec 9 11:53:26 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: References: Message-ID: On Wed, 8 Dec 2004, David Mathog wrote: > It's official, the Tyan S2466 nodes get "biggest PITA award" > for systems that I've used. The two nodes that were crashing > frequently had their power supplies replaced and then they > were stable for a couple of months. Now they've both become > unstable again. As you know, you have more than just my sympathies. We have people who are lined up with baseball bats in hand to give our dual 2466's a lick if/when we can finally afford to move them out. In fact, one person I know is preparing a small but powerful explosive device to use on the whole pile...;-) > So rather than keep trying to fix these monsters I'm starting > to think about the cheapest way to keep the cluster running by > replacing just the mobo/CPU with something else (as I'm not > expecting enough $$$ anytime soon to do more, and obtaining > Athlon MPs and S2466N mobos now is problematical anyway.) > I'll happily give up Tyan's serial line bios access for a system > where I don't have to employ that feature quite so often! > > The S2466N is an ATX form factor, each one has one Athlon MP > 2200+ and 1 Gb of 2100 DDR RAM, a 40G ATA disk, a floppy > and a little PCI graphics card in a 2U case. If I could > find a nice mobo/CPU combo for, oh, <$200 that could > replace the S2466N and Athlon MP, and still do ECC, then I'd > probably go that route to patch systems up as they break. > Best if it has at least as much cache as the MP though. > Is there anything out there fitting > that description? Historically ECC support isn't something > that shows up on cheap mobos but maybe on some low end > Athlon 64 variant? I just bought an intermediate AMD64 mobo for my home cluster three days ago. The motherboard (ASUS) was $155, the CPU was $245 (for a slot 754 3400, which is really 2400 MHz, 128 MB L1, 512 MB L2 cache according to their numbering scheme). Also populated with a gigabyte of PC 3200 non-ECC memory it was about $500, and I'm very interested in seeing it go head to head with my opterons. I have NOT installed it yet so I can't give you any speed reports. Looking over the other prices from that vendor (intrex.com, although I go to a local store) their cheapest AMD64 is the 2800 for $150, and they had an MSI socket 754 motherboard for $100. All these motherboards seem to want non-ECC DDR400 (or slower) memory -- to get maximum performance you'd likely want to replace the memory anyway. So the absolute minimum sounds like it would be around $250, with a gig of new non-ECC memory around $350. Full speed ECC memory adds a LOT to this. Going socket 939 adds almost nothing (and I'm wondering if I should have gone this way after Bill's review of the AMD64 configs a few days ago). Going socket 940 adds a ton of money -- too much for a home box (and probably too much for your upgrade). Note that these are not pricewatch prices, so they are probably 20% or so more than you could find on the street if you tried hard. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Dec 9 12:07:46 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: <20041209175509.GA32083@piskorski.com> References: <20041209175509.GA32083@piskorski.com> Message-ID: On Thu, 9 Dec 2004, Andrew Piskorski wrote: > I have a Tyan Tiger MP S2460 dual Athlon MP workstation that has given > me no problems at all, but then, I don't use it with cluster-like > workloads either. We have a stack (or had, we may finally have pitched them) of close to 100 of these pieces of crap. These are what we bought the 2466's to replace, as they are actually BETTER than 2460's. They still haven't got the bios right in the 2466's, but the 2460 bios -- just don't get me started. We are still, shall we say, "sensitive" about purchasing Tyan products at this point. Although they have been pretty good about replacing all the 2466 motherboards that have fried for no visible reason in the last year or so. Counting the days, we are. I've got 16 nodes that have to last another year; somebody else has 32 or so nodes that might have to do a bit more than that (to get to 3 years), and a third group here is actually right at 3 years, although part of that time was spent with 2460's before we spent several thousand dollars replacing them all with 2466's. At 3 years max (sooner if people get new grant money for Opterons) those nodes will be outa here slicker than a greased goose. Or some other fast metaphor. rgb > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Dec 9 12:12:06 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: References: Message-ID: On Thu, 9 Dec 2004, David Mathog wrote: > Besides, it may well be that the CPUs are bad and the > motherboards are all right. Not having any spare known > good CPUs or known good motherboards there's no way to > play mix and match to figure out which component is the problem. > Well, not unless we sacrifice one of the working nodes, and > I'm really hesitant to do that in case it's one of those horrible > situations where component A breaks component B, which would > result in us having 3 flakey nodes instead of 2. In any case, if > the CPUs are bad they'll be around $150 to replace from a vendor > and the motherboard about $190 (somewhat less on Ebay, but that's not > how I want to buy components.) Both AMD and Tyan have been pretty decent about fullfilling the terms of their 3 year mfrs warranty on both the CPUs and motherboards. So before you throw anything away, most definitely look into RMAing broken parts less than 3 years old. If nothing else, you could put replaced processors into the second socket of good motherboards, and keep replaced motherboards around as spares. If you kept one "known good" system more or less powered off, it would give you a testbed of sorts, although we see problems with heating fans and more that only surface (as you note) under heavy or variable load in a sealed case and not open running just a single exerciser on the bench. rgb > > Regards, > > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From bill at cse.ucdavis.edu Thu Dec 9 14:59:22 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: References: Message-ID: <20041209225922.GA23566@cse.ucdavis.edu> > Looking over the other prices from that vendor (intrex.com, although I > go to a local store) their cheapest AMD64 is the 2800 for $150, and > they had an MSI socket 754 motherboard for $100. All these motherboards > seem to want non-ECC DDR400 (or slower) memory -- to get maximum Hrm, I'm unsure, many motherboards like the MSI Neo line mention ECC as "compatible" but I'm not sure that means that the ECC functionality is enabled. The north bridge is onchip and does support ECC: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_9487%5E9493,00.html Where it mentions: 72-bit DDR SDRAM memory (64-bit interface + 8-bit ECC) I've yet to find a s939 or s754 motherboard that specifically mentions compatibility with ECC as well as the functionality. Some of the older athlon xp motherboards pulled this trick. In particular having BIOS allow turning ECC on/off would be reassuring. -- Bill Broadley Computational Science and Engineering UC Davis From bari at onelabs.com Thu Dec 9 15:41:52 2004 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: References: <20041209175509.GA32083@piskorski.com> Message-ID: <41B8E2C0.5020705@onelabs.com> Robert G. Brown wrote: > We have a stack (or had, we may finally have pitched them) of close to > 100 of these pieces of crap. These are what we bought the 2466's to > replace, as they are actually BETTER than 2460's. They still haven't > got the bios right in the 2466's, but the 2460 bios -- just don't get me > started. Have you looked at using LinuxBIOS for them? Tyan has a full-time LinuxBIOS developer. He posts near daily on the Linuxbios@clustermatic.org list and I hear that LANL uses plenty of them. Bari Ari From mathog at mendel.bio.caltech.edu Thu Dec 9 14:24:06 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along Message-ID: > Both AMD and Tyan have been pretty decent about fullfilling the terms of > their 3 year mfrs warranty on both the CPUs and motherboards. Unfortunately these nodes fail at relatively long intervals (days to weeks). I did speak to Tyan support and they offered to RMA the motherboards, but they didn't want the CPUs to go with them. AMD is presumably the other way around. One can easily imagine the resulting "it's their problem" finger pointing following a 24 diagnostic hour test by both manufacturers during which neither's component failed. I wouldn't hesitate to RMA if a node failed dramatically, preferably with a nicely scorched component to indicate the exact point of failure. Regards David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From hvidal at tesseract-tech.com Thu Dec 9 22:30:04 2004 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] large memory use Message-ID: <41B9426C.1010609@tesseract-tech.com> Fellow beowulf-ers: do you have any references to use and optimization of large memory installations for large datasets, perhaps with resulting visualization? say x86 machines with 16-32G of ram, or more or perhaps notes on AMD 64-bit memory gains and optimizations? This would be very helpful. Many thanks. Hernando Vidal, Jr. Tesseract Technology From rgb at phy.duke.edu Fri Dec 10 05:49:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Keeping the Athlon MP cluster limping along In-Reply-To: <41B8E2C0.5020705@onelabs.com> References: <20041209175509.GA32083@piskorski.com> <41B8E2C0.5020705@onelabs.com> Message-ID: On Thu, 9 Dec 2004, Bari Ari wrote: > Robert G. Brown wrote: > > > We have a stack (or had, we may finally have pitched them) of close to > > 100 of these pieces of crap. These are what we bought the 2466's to > > replace, as they are actually BETTER than 2460's. They still haven't > > got the bios right in the 2466's, but the 2460 bios -- just don't get me > > started. > > Have you looked at using LinuxBIOS for them? Tyan has a full-time > LinuxBIOS developer. He posts near daily on the > Linuxbios@clustermatic.org list and I hear that LANL uses plenty of them. > > Bari Ari We just want all the 246x systems to go away. When the hardware doesn't break, we've got the cluster set up so that it will boot and run reasonably well (the systems are, or were, very admirable performers numerically). At this point, it is human interaction and human time wasted on hardware problems that are the real expense. If it weren't opportunity cost time vs money we don't have, we would have thrown the entire cluster away long ago and replaced it. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From akhtar_samo at yahoo.com Fri Dec 10 21:10:44 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] errors while testing machines Message-ID: <20041211051044.82812.qmail@web20025.mail.yahoo.com> After the extraction of MPICH in /usr/local 1- tcsh 2- ./configure –with-comm=shared --prefix=/usr/local 3- make 4- make install 5- util/tstmachines in the 5th step error was Errors while trying to run rsh 192.168.0.25 –n /bin/ls /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from 192.168.0.25 n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo: n no such file or directory The ls test failed on some machines. This usually means that u donot have a common filesystem on all of the machines in your machines list; MPICH requires this for mpirun (it is possible to handle this in a procgroup file; see the……) Other possible problems include:- The remote shell command rsh doesnot allow you to run ls. See the doc abt remote shell & rhosts You have common filesystem, but with inconsistent names See the doc on the automounter fix 1 error were encountered while testing the machines list for LINUX only these machines seem to be available host1 now since this is only a two node cluster host1 is the server on to which MPICH is being installed. & 192.168.0.25 is the client….. rsh on both nodes is logging freely……. On the server side the file “ machines.LINUX “ contains -192.168.0.25 -host1 Kindly help Akhtar --------------------------------- Do you Yahoo!? The all-new My Yahoo! – What will yours do? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041210/b80a0605/attachment.html From mechti01 at luther.edu Fri Dec 10 18:08:21 2004 From: mechti01 at luther.edu (Timo Mechler) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] History of Beowulf Computing Message-ID: <6.2.0.14.0.20041210200531.01fc5300@pop.luther.edu> Hello, I'm going to be writing a paper on Beowulf and Clustered Computing and its applications. I'm specifically looking for some resources that share a little bit about the History of Beowulf and Clustered Computing. If someone knows of any papers or online articles I could check out, please let me know. Thanks in advance for your help. Regards, -Timo Mechler From rgb at phy.duke.edu Sat Dec 11 14:37:47 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] History of Beowulf Computing In-Reply-To: <6.2.0.14.0.20041210200531.01fc5300@pop.luther.edu> References: <6.2.0.14.0.20041210200531.01fc5300@pop.luther.edu> Message-ID: On Fri, 10 Dec 2004, Timo Mechler wrote: > Hello, > > I'm going to be writing a paper on Beowulf and Clustered Computing and its > applications. I'm specifically looking for some resources that share a > little bit about the History of Beowulf and Clustered Computing. If > someone knows of any papers or online articles I could check out, please > let me know. Thanks in advance for your help. There are actually quite a few resources available, but you'll probably have to google some to find them. The beowulf site has a short history: http://www.beowulf.org/overview/history.html I've archived the following from a mirror/snapshot of the much earlier beowulf site: http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf/intro.html Then there are various articles published in online interviews and articles, e.g. -- http://www.hq.nasa.gov/hpcc/insights/vol7/beowulf.htm (and others google turns up with e.g. "don becker beowulf history" or other nifty strings). There's probably a similar but less authoritative blurb on this in my online book, too. rgb > > Regards, > > -Timo Mechler > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Glen.Gardner at verizon.net Sat Dec 11 19:46:41 2004 From: Glen.Gardner at verizon.net (Glen Gardner) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] errors while testing machines References: <20041211051044.82812.qmail@web20025.mail.yahoo.com> Message-ID: <41BBBF21.6020304@verizon.net> The error in the 5th step is caused by a chatty login message. This makes mpi complain but it ought to work anyway. You want to turn off motd, and if using freebsd create a file called ".huslogin" and put it in the users home directory. The next error is to do with paths to mpich and to the program being launched. All the nodes need to be able to "see" the mpi binaries and need to be able to see the executable program. The paths to mpi and the program being launched need to be the same for all nodes and for the root node. Make sure the path is seutup properly in the environment. You may need to chek your mount points and setup NFS properly. The last one probably has to do with name resolution. The root node usually won't need to be in the machines.linux file, but all other nodes need to be. I believe you need to list machines by hostname, not ip addresses so be sure that both machines have the same hostfile, same .rhosts, etc. Glen The next message indicates that the path to the executable "mpichfoo" was not found. akhtar Rasool wrote: > After the extraction of MPICH in /usr/local > > > > 1- tcsh > > 2- ./configure -with-comm=shared --prefix=/usr/local > > 3- make > > 4- make install > > 5- util/tstmachines > > in the 5th step error was > > Errors while trying to run rsh 192.168.0.25 -n /bin/ls > /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from > 192.168.0.25 > > > > n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo: > > n no such file or directory > > The ls test failed on some machines. > > This usually means that u donot have a common filesystem on all of the > machines in your machines list; MPICH requires this for mpirun (it is > possible to handle this in a procgroup file; see the......) > > Other possible problems include:- > > The remote shell command rsh doesnot allow you to run ls. > > See the doc abt remote shell & rhosts > > > > You have common filesystem, but with inconsistent names > > See the doc on the automounter fix > > 1 error were encountered while testing the machines list for LINUX > > only these machines seem to be available > > host1 > > > > > > > > > > now since this is only a two node cluster host1 is the server on to > which MPICH is being installed. & 192.168.0.25 is the client..... > > rsh on both nodes is logging freely....... > > On the server side the file " machines.LINUX " contains > > -192.168.0.25 > > -host1 > > Kindly help > > > > > > Akhtar > > ------------------------------------------------------------------------ > Do you Yahoo!? > The all-new My Yahoo! - What will yours do? > >------------------------------------------------------------------------ > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Glen E. Gardner, Jr. AA8C AMSAT MEMBER 10593 Glen.Gardner@verizon.net http://members.bellatlantic.net/~vze24qhw/index.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041211/7b1a3d46/attachment.html From sfh103 at york.ac.uk Sun Dec 12 12:36:22 2004 From: sfh103 at york.ac.uk (SF Husain) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor Message-ID: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Hi I'm currently designing a beowulf style parallel processor and am trying to decide which processor to use for the nodes. My project requires my final design for the parallel processor to be able to provide a sustained throuput of 0.25 TFlops. My research tells me that in general that the flop rate scales up linearly. My trouble is that I'm having trouble finding estimates for the flop rates of the processors I'm looking at. I've looked at the specfp2000 results but as far as I can tell their numbers do not easily convert to a flop rate. Could anyone tell me how I can find estimates for the flop rates of processors or if there is any rough sort of conversion that I can do on these spec (or any other) benchmark results. I'm aware that the actual rate is dependent on type of work given to the processors. However my project is only a design exercise aimed at developing research skills so even a very rough conversion or sourse of sample results would be suitable for my purposes. If anyone could help that would be great. Thanks Sufi From rgb at phy.duke.edu Mon Dec 13 05:13:37 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor In-Reply-To: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> References: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Message-ID: On Sun, 12 Dec 2004, SF Husain wrote: > Hi > > I'm currently designing a beowulf style parallel processor and am trying to > decide which processor to use for the nodes. My project requires my final > design for the parallel processor to be able to provide a sustained throuput > of 0.25 TFlops. > > My research tells me that in general that the flop rate scales up linearly. > My trouble is that I'm having trouble finding estimates for the flop rates > of the processors I'm looking at. > > I've looked at the specfp2000 results but as far as I can tell their numbers > do not easily convert to a flop rate. Could anyone tell me how I can find > estimates for the flop rates of processors or if there is any rough sort of > conversion that I can do on these spec (or any other) benchmark results. > > I'm aware that the actual rate is dependent on type of work given to the > processors. However my project is only a design exercise aimed at developing > research skills so even a very rough conversion or sourse of sample results > would be suitable for my purposes. > > If anyone could help that would be great. Sigh. I suppose that the first question one has to ask is "what's a FLOPS", isn't it? "Floating point operations per second" seems a bit ambiguous, given the wide range of things that can be considered a floating point operation. The second question one MIGHT ask is why you are designing a system with a targeted FLOPS rating regardless of budget and regardless of the relationship between FLOPS and the work you actually want to accomplish. This is not a specious question -- in actual fact the "correct" thing to do for a variety of fairly obvious reasons is to design a system with a mind towards a particular work capacity of the work you want to accomplish, or more reasonably, to take what you can afford to spend and design a machine that can do as much of that work as possible per dollar spent. Optimizing cost-benefit is what the game is all about, not being able to boast of 0.25 TFLOPS (whatever that means). This matters quite a bit, because an optimal design for a real parallel project may well spend your budget in ways than just optimizing FLOPS. This will almost certainly be true if your application has any sort of rich structure at all -- interprocessor communications over the network, a large memory footprint, a mix of local and nonlocal memory accesses, trancendental function calls. After the lecture, I suppose I'll answer your question. There are several benchmarks that return FLOPS. "The" benchmark that returns FLOPS is likely linpack, a linear algebra benchmark. However, stream returns MFLOPS fractionated across four distinct floating point operations -- copying a vector of floats, scaling a vector of floats, multiplying two vectors of floats, and multiplying and adding three vectors of floats (multiply/add is a single pipelined operation on many processors). Stream acts strictly on a sequential vector too large (>4x) to fit in cache and is as much a memory speed benchmark as it is floating point. cpu_rate contains embedded stream with the ability to vary vector size, so you can measure FLOPS for vectors inside cache. This will (naturally) give you a much higher rate (and let you reach your design goal with many fewer processors) but won't necessarily mean anything to your application, which you assert doesn't matter but obviously does. It also gives you a stream-like +-*/ (including DIVISION, which is much slower than multiplication or addition) test that returns "bogoFLOPS" and which will let you pump the number of processors much HIGHER. It contains a "savage" benchmark which measures trancendental rates. lmbench contains microbenchmark code for directly measuring floating point rates. Finally, you can always look up vendor spec per processor and use "theoretical peak" FLOPS, which will be something like a multiplier like 0.5-2 x the CPU clock. Hmmm, that doesn't really answer your question, but it does indicate why the question is a silly one. Pick your definition and you can push the FLOPS rating of any given CPU a factor of any given CPU up and down over nearly an order of magnitude, and be able to fully justify the number either way. You might as well design a system with a target aggregate (bits*clock) where bits is the datapath width and clock is the CPU clock. That's likely to be within a factor of two or so of comparable across many architectures.... rgb > > Thanks > > Sufi > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hahn at physics.mcmaster.ca Mon Dec 13 11:38:38 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor In-Reply-To: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Message-ID: > I'm currently designing a beowulf style parallel processor and am trying to > decide which processor to use for the nodes. My project requires my final > design for the parallel processor to be able to provide a sustained throuput > of 0.25 TFlops. by what measure? > My research tells me that in general that the flop rate scales up linearly. well, there are factors which can cause sublinearity. > My trouble is that I'm having trouble finding estimates for the flop rates > of the processors I'm looking at. www.top500.org basically, top500 is a ranking of the fastest 500 computers in the world, when running a benchmark which is FP-intensive. it's a real code, but not a very real real code ;) the critical numebrs are Rpeak and Rmax: Rmax = ncpus*clock*flops-per-cycle it's the peak theoretical aggregate flops of the machine/cluster. interestingly, you can get a pretty decent approximation of Rpeak (the actual HPL score) using: Rpeak ~= Rmax * interconnect-efficiency with: interconnect rmax/rpeak quadrics .75 myrinet .7 infiniband .7 gigabit .6 this is not too surprising - it would be strange if gigabit were not less efficient, and quadrics is pretty much the premium interconnect (unless you count numaflex/etc). there are undoubtedly other factors which might be conflated here - for instance, I'd expect HPL scaling to depend on memory-size-per-cpu as well as memory-bandwidth-per-cpu. and for a slower interconnect, you can probably get higher efficiency by maximizing on-node work (minimizing interconnect dependency.) needless to say, real and useful apps are probably going to achieve lower useful flops than HPL. note also that HPL strongly rewards chips which have fused multiply-add, which can be entirely irrelevant to real codes... regards, mark hahn. From becker at scyld.com Mon Dec 13 16:00:25 2004 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] BWBUG December 14, 2004: sFlow and Cluster Interconnects Message-ID: --- Special Notes: -- This months meeting will be held in Greenbelt MD, not in Virginia -- This meeting returns to our usual second Tuesday of the month schedule -- The meeting will focus on advances in Ethernet cluster networks -- See http://www.bwbug.org/ for full information Date: Tuesday December 14, 2004 Time: 2:45 PM - 5:00 PM Location: Northrop Grumman IT, Greenbelt MD Titles: sFlow, A new Network Monitoring System Interconnect and Switch Fabrics for Clusters. Speakers: Mr. Wes Medley Federal Systems Engineer with Foundry Networks Donald Becker CTO of Penguin Computing, and CS of the Scyld Software division Abstracts: Foundry Networks and Inmon Corporation will provide a technology brief on sFlow, a new network monitoring technology. sFlow provides a network-wide view of user level traffic flows. sFlow is a scalable technique for measuring network traffic, collecting, storing, and analyzing network traffic data. sFlow allows administrators of Supercomputing Clusters to understand possible congestion points within their application traffic flow information and to analysis traffic flows to ensure correct cluster operation. Donald Becker will give a brief review of the Interconnect and Switch Fabrics for Clusters. Donald is one of the world experts on network drivers, having created most of the network drivers for the Linux during the first decade of its existance. He was the founder of the first company that specialized in commerical cluster management software, SCYLD. ____ This month's meeting will be in our Maryland venue, the Northrop Grumman Information Technology Offices at 7501 Greenway Center Drive, Suite 1200 Greenbelt, MD, 20770 See http://www.bwbug.org/ web page for directions. Registration on the web site is highly encourage to speed sign-in. As usual there will be door prizes, food and refreshments. Essential questions: Need to be a member?: No, and guests are welcome. Parking and parking fees: Free surface lot parking is readily available Ease of access: 30 seconds from the D.C. beltway Also as usual, the organizer and host for the meeting is T. Michael Fitzmaurice, Jr. 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell mail michael.fitzmaurice at ngc.com -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From maurice at harddata.com Fri Dec 10 22:12:13 2004 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: <200412092000.iB9K0AhQ003111@bluewest.scyld.com> References: <200412092000.iB9K0AhQ003111@bluewest.scyld.com> Message-ID: <41BA8FBD.1000302@harddata.com> Couple of comments below: David Mathog wrote: 1. We were hoping the price of the MP chips would fall dramatically, allowing us to fill out the servers. What happened instead is that it leveled out at around $150/chip and then they disappeared. (The same trick had worked once before for us, with some Intel Pentium II 400s and ASUS motherboards, the second CPUs were dirt cheap when we bought them a few years after the initial system purchase and they extended the life of those workstations.) Currently AthlonMP 2600+ with 512K cache can still be bout new, in 3 year warranty boxed version with heatsink and fan for around $160 2. We needed ECC - these nodes run continuously. ECC for Athlons was very hard to find at the time we purchased these machines. >> I believe the socket 754 and 939 Athlon 64s do not support ECC, while >> the socket 940 Athlon 64 and Opteron do. > > It's hard to tell what the 754 boards support since many say they will _accept_ ECC memory but they don't say that they can actually _use_ it. If you want ECC in Opterons, you need socket 940 Socket 939 supports 128 bit memory path, no ECC Socket 754 supports 64 bit memory path, no ECC Some may accept it, but will NOT be able to use it. I was thinking of a 754 option primarily - 70 for motherboard + 130 for Athlon 64 2800 CPU (more or less.) I would look at Socket939 128 bit memory path And HT bus is at 2000 on the current VIA chipset. It is still only at 1600 on the socket 940 boards. Also, the new nVidia nForce4 chipsets supports 940, with 2000HT bus. Also has PCI Express 16X, and there are already dual Opteron Socket 940 with this chipset. Look, for example at the Tyan S2895 which just came out. There is really nothing wrong with the S2466N-4M. I would be more inclined to distrust power, cooling or RAM. Have you installed lm_sensors on these? BTW, for a good load/burning try running the distributed.net keygen. One can easily use all RAM and CPU on that. memtest86 is only a memory tester, and even then does not work it hard enough to create significant heating. Also, if using factory heatsink fans you might want to check them. AMD had a bum bunch of AthlonMP fans in the time frame of April to December 2002. If asked they will replace them. >> >> A cheaper option would be to keep the Athlon MP and simply replace the > motherboard. The Athlon MP will work just fine in any motherboard > taking the Athlon XP, but finding a non-dual motherboard that supports ECC might be tricky. > > Doubtful you will find ANY AthlonMP boards any more. This stuff is now end of life. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 This email, message, and content, should be considered confidential, and is the copyrighted property of Hard Data Ltd., unless stated otherwise. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041210/8d3e78e2/attachment.html From john.hearns at streamline-computing.com Sun Dec 12 23:47:56 2004 From: john.hearns at streamline-computing.com (John Hearns) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Job in New Zealand Message-ID: <1102924076.3661.14.camel@Vigor51> A friend emailed me this job opportunity. Work on the bigg http://jobs.massey.ac.nz/positiondetail.asp?p=2989 Beowulf cluster, 26 dual-Opteron nodes running Rocks http://double-helix.massey.ac.nz/ From john.hearns at streamline-computing.com Sun Dec 12 23:52:10 2004 From: john.hearns at streamline-computing.com (John Hearns) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor In-Reply-To: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> References: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Message-ID: <1102924331.3661.17.camel@Vigor51> On Sun, 2004-12-12 at 20:36 +0000, SF Husain wrote: > Hi > > I'm currently designing a beowulf style parallel processor and am trying to > decide which processor to use for the nodes. My project requires my final > design for the parallel processor to be able to provide a sustained throuput > of 0.25 TFlops. > Sufi, I work for one of the leading clustering companies in the UK. We will be happy to offer you advice like that. We frequently help customers benchmark codes etc. Will contact you off-list if that is OK. John Hearns Streamline Computing From mark.westwood at ohmsurveys.com Mon Dec 13 02:00:23 2004 From: mark.westwood at ohmsurveys.com (Mark Westwood) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor In-Reply-To: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> References: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Message-ID: <41BD6837.5090404@ohmsurveys.com> Hi I think you'll find at least the beginning of what you want on the Top500 site (you have found that haven't you ?). Many of the results reported there are measured in Gflops. For theoretical performance figures the processor manufacturers web-sites are usually informative. Regards Mark Westwood SF Husain wrote: > Hi > > I'm currently designing a beowulf style parallel processor and am trying to > decide which processor to use for the nodes. My project requires my final > design for the parallel processor to be able to provide a sustained throuput > of 0.25 TFlops. > > My research tells me that in general that the flop rate scales up linearly. > My trouble is that I'm having trouble finding estimates for the flop rates > of the processors I'm looking at. > > I've looked at the specfp2000 results but as far as I can tell their numbers > do not easily convert to a flop rate. Could anyone tell me how I can find > estimates for the flop rates of processors or if there is any rough sort of > conversion that I can do on these spec (or any other) benchmark results. > > I'm aware that the actual rate is dependent on type of work given to the > processors. However my project is only a design exercise aimed at developing > research skills so even a very rough conversion or sourse of sample results > would be suitable for my purposes. > > If anyone could help that would be great. > > Thanks > > Sufi > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > -- Mark Westwood Software Engineer OHM Ltd The Technology Centre Offshore Technology Park Claymore Drive Aberdeen AB23 8GD United Kingdom +44 (0)870 429 6586 www.ohmsurveys.com From Florent.Calvayrac at univ-lemans.fr Mon Dec 13 02:14:34 2004 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Picking a processor In-Reply-To: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> References: <000301c4e08a$3eafbdf0$2d852090@csrv.ad.york.ac.uk> Message-ID: <41BD6B8A.4090205@univ-lemans.fr> SF Husain wrote: > Hi > > I'm currently designing a beowulf style parallel processor and am trying to > decide which processor to use for the nodes. My project requires my final > design for the parallel processor to be able to provide a sustained throuput > of 0.25 TFlops. > > My research tells me that in general that the flop rate scales up linearly. > My trouble is that I'm having trouble finding estimates for the flop rates > of the processors I'm looking at. > > I've looked at the specfp2000 results but as far as I can tell their numbers > do not easily convert to a flop rate. Could anyone tell me how I can find > estimates for the flop rates of processors or if there is any rough sort of > conversion that I can do on these spec (or any other) benchmark results. > > I'm aware that the actual rate is dependent on type of work given to the > processors. However my project is only a design exercise aimed at developing > research skills so even a very rough conversion or sourse of sample results > would be suitable for my purposes. > > Hi you should either focus on codes whose cost is well known (system resolution in Scalapack), hence speed easy to compute, or if you are using a custom program run it with either hard or soft performance counters (code instrumenting), which can be quite reliable within a 50% margin of error. From my experience you get say 100 MFlops per Ghz on a normal Pentium and a reasonably well written typical computational code (not too optimized). If you use linear algebra and large matrixes (more than 10000 squared) only with, say, ATLAS, you can get theoretical performance (ie 1GFlops / GHz on a Pentium). If you look at specialized hardware (workstations, vector computers) you can get much more than that, and if you run physicist grade Fortran with complicated formulas or self-proclaimed IT expert C++ with heavy objects and indirections, and not too much effort in cache locality in either case you get only 20 MFlops/Ghz. greetings -- Florent Calvayrac | Tel : 02 43 83 26 26 | Fax : 02 43 83 35 18 Laboratoire de Physique de l'Etat Condense | UMR-CNRS 6087 Inst. de Recherche en Ingenierie Moleculaire et Matx Fonctionnels FR CNRS 2575 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 !!!!!!! NOUVEAU : OFFICE HOURS de 8h00 ? 12h00 si present !!!!!!!! From akhtar_samo at yahoo.com Mon Dec 13 10:43:02 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Upgrading a Cluster Message-ID: <20041213184302.77698.qmail@web20026.mail.yahoo.com> Hi, Actually I’ve made a two node MPICH based LINUX CLUSTER. Now I want to add Two more nodes do I’ve to reinstall MPICH, or just making file system common, and editing some files like hosts.equiv, hosts.allow,/etc/hosts,machines.LINUX Akhtar --------------------------------- Do you Yahoo!? Send holiday email and support a worthy cause. Do good. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041213/867221a0/attachment.html From schuang21 at yahoo.com Mon Dec 13 11:10:03 2004 From: schuang21 at yahoo.com (SC Huang) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] PBS question -- qdel does not kill jobs Message-ID: <20041213191003.73654.qmail@web52605.mail.yahoo.com> Hi, We have a new system set up. The vendor set up the PBS for us. For administration reasons, we created a new queue "dque" (set to default) using the "qmgr" command: create queue dque queue_type=e s q dqueue enabled=true, started=true I was able to submit jobs using the "qsub" command to queue "dque". However, when I use "qdel" to kill a job, the job disappears from the job list shown by "qstat -a", but the executable is still running on the compute nodes. Every time I have to login the corresponding the compute node and kill the running job. I am wondering if I missed something in setting up the queue so that I am unable to kill the job completely using "qdel". Thanks. __________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail From atp at piskorski.com Mon Dec 13 17:02:45 2004 From: atp at piskorski.com (Andrew Piskorski) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Job in New Zealand In-Reply-To: <1102924076.3661.14.camel@Vigor51> References: <1102924076.3661.14.camel@Vigor51> Message-ID: <20041214010245.GA86537@piskorski.com> On Mon, Dec 13, 2004 at 07:47:56AM +0000, John Hearns wrote: > A friend emailed me this job opportunity. > Work on the bigg > > http://jobs.massey.ac.nz/positiondetail.asp?p=298 > > Beowulf cluster, 26 dual-Opteron nodes running Rocks > http://double-helix.massey.ac.nz/ Are salaries for recent computer science grads really that shockingly low in New Zealand? They seem to be offering $30 to $45 k NZD per year for that job, which according to www.xe.com (0.709252 USD/NZD) is only $22 to $32 k USD. Or does "computing graduate" mean a current graduate student, which would make that that a part time position? -- Andrew Piskorski http://www.piskorski.com/ From hahn at physics.mcmaster.ca Mon Dec 13 17:05:23 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: <41BA8FBD.1000302@harddata.com> Message-ID: > >> I believe the socket 754 and 939 Athlon 64s do not support ECC, while > >> the socket 940 Athlon 64 and Opteron do. > > > It's hard to tell what the 754 boards support since many say they > will _accept_ ECC memory but they don't say that they can actually > _use_ it. all K8's support ECC - AMD definitely does not disable ECC on some chips. I just checked the s754 functional spec, and it certainly supports ECC. it's conceivable that MB vendors are such twits that they'd fail to detect ECC dimms and enable ECC in the bios. it's even possible they'd be such idiots as to fail to connect memcheck pins to the dimm slots. > If you want ECC in Opterons, you need socket 940 > Socket 939 supports 128 bit memory path, no ECC > Socket 754 supports 64 bit memory path, no ECC > Some may accept it, but will NOT be able to use it. perhaps; I'm guessing that *ANY* lack of ECC is strictly due to the bios-writer's choice. perhaps linuxbios is salvation here. it would actually be amusing to try doing this from within linux, since reconfiguring the memory controller is probably doable via setpci or wrmsr. if you try, please backup beforehand ;) regards, mark hahn. From schuang21 at yahoo.com Mon Dec 13 16:28:12 2004 From: schuang21 at yahoo.com (SC Huang) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Linux on Alpha? In-Reply-To: <200411222125.iAMLOUFw019961@bluewest.scyld.com> Message-ID: <20041214002812.72138.qmail@web52602.mail.yahoo.com> I maintain an "older" 16-node (16 processors) cluster of Alpha EV6 (500MHz). It runs RH 7.2. It is heavily used -- we run long-time job on it and it is pretty stable. The interconnecting network is 100Mbps ethernet. -- SC Huang > On Sun, Nov 21, 2004 at 02:58:27PM -0600, Erik Paulson wrote: > > If people are still using Linux on Alpha for HPC, what distros are > you > > using? > > I maintain a *very* small cluster (4 nodes, 6 CPUs) of 666MHz Alphas. > They are in > fairly heavy use still since for 2 reasons: 1) They (still!) > perform > reasonably well (comparable to 1.6Ghz P4s) for the code they run, and > 2) lack > of other computational resources for that group, so they run wherever > they can. > > Sadly, they still run RH7.2 (the last version released for Alphas). > Were I to > set them up now, I'd look at Debian or Gentoo as a distribution. > > -- > Jesse Becker > GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0 2720 0083 0931 9A2B 06A2 > -------------- next part -------------- __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 From kim.branson at csiro.au Mon Dec 13 18:01:50 2004 From: kim.branson at csiro.au (Kim Branson) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Job in New Zealand In-Reply-To: <20041214010245.GA86537@piskorski.com> References: <1102924076.3661.14.camel@Vigor51> <20041214010245.GA86537@piskorski.com> Message-ID: <1E9C3EB6-4D74-11D9-AA28-000A9579AE94@csiro.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Postdoc postions in australia are in the order of 45-50k, so for a grad position (i'm presuming a bachelors degree, since the url is borked) thats not to bad. Cost of living is less too, so it all works out. maybe.. Kim On 14/12/2004, at 12:02 PM, Andrew Piskorski wrote: > On Mon, Dec 13, 2004 at 07:47:56AM +0000, John Hearns wrote: >> A friend emailed me this job opportunity. >> Work on the bigg >> >> http://jobs.massey.ac.nz/positiondetail.asp?p=298 >> >> Beowulf cluster, 26 dual-Opteron nodes running Rocks >> http://double-helix.massey.ac.nz/ > > Are salaries for recent computer science grads really that shockingly > low in New Zealand? They seem to be offering $30 to $45 k NZD per > year for that job, which according to www.xe.com (0.709252 USD/NZD) is > only $22 to $32 k USD. > > Or does "computing graduate" mean a current graduate student, which > would make that that a part time position? > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > Kim Branson Diffraction and Theory CSIRO Health Sciences and Nutrition 343 Royal Parade, Parkville Melbourne Ph +613 9662 7136 kim.branson@csiro.au -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFBvkmPer2hmGbHcokRAm0TAJ9tjZvHShW2KxbFbM+E2PqOxollhgCgnQki 0SK9FaWBdc+CkHLxorpl6e0= =AziW -----END PGP SIGNATURE----- From rgb at phy.duke.edu Mon Dec 13 19:02:00 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:37 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: Message-ID: On Mon, 13 Dec 2004, Mark Hahn wrote: > > >> I believe the socket 754 and 939 Athlon 64s do not support ECC, while > > >> the socket 940 Athlon 64 and Opteron do. > > > > > It's hard to tell what the 754 boards support since many say they > > will _accept_ ECC memory but they don't say that they can actually > > _use_ it. > > all K8's support ECC - AMD definitely does not disable ECC on some chips. > I just checked the s754 functional spec, and it certainly supports ECC. > > it's conceivable that MB vendors are such twits that they'd fail to > detect ECC dimms and enable ECC in the bios. it's even possible they'd > be such idiots as to fail to connect memcheck pins to the dimm slots. FWIW, I now am installing my ASUS K8NE, and it caims in its user guide to support unbuffered ECC or non-ECC SDRAM (up to PC3200). In a wee short bit I'll probably have some stream results. I'm flashing an upgrade from RH9 to i386 FC2 as I sit here. However, it will take me a few days and a reinstall to get to x64 binaries as I've got to rsync FC x86_64 from Duke over a DSL connection, so whatever I post will be just play. I didn't GET ECC, BTW, because this is a home cluster box and won't be used (I expect) for anything more than testing, benchmarks, games, and rare production runs. rgb > > > If you want ECC in Opterons, you need socket 940 > > Socket 939 supports 128 bit memory path, no ECC > > Socket 754 supports 64 bit memory path, no ECC > > Some may accept it, but will NOT be able to use it. > > perhaps; I'm guessing that *ANY* lack of ECC is strictly due > to the bios-writer's choice. perhaps linuxbios is salvation here. > > it would actually be amusing to try doing this from within linux, > since reconfiguring the memory controller is probably doable > via setpci or wrmsr. if you try, please backup beforehand ;) > > regards, mark hahn. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hahn at physics.mcmaster.ca Tue Dec 14 09:05:25 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: Message-ID: > Back to the original thought, which was recycling the PC2100 > memory from the Tyan S2466N boards. The consensus seems to be that > registered ECC recycled from the S2466N systems will not work in > boards designed for unbuffered ECC, which means that both > the 754 and 939 are out. I believe the relation is the other way: you cannot use unbuffered dimms in a board which assumes buffered dimms. that is, the dependecy is based on what fanout the drivers can handle: unbuffered dimms have more devices hanging off the same signals. I don't think there's a difference in pinout. note that sidedness of the dimm also matters (some dimm vendors call double-sided dimms "dual-rank". > Very unclear to me what, if anything is gained by using > registered ECC vs. unbuffered ECC on a smallish system. On a big 1 clock latency (actually, I think it's a major cycle, so for ddr, it's the equivalent of two transfers) overhead for registered dimms (theoretically, there are also buffered dimms which have pass-through tranceivers which mitigate the drive/fanout problem, but don't impose a whole cycle latency. but I think no one does that anymore.) > be a plus. But if the system only holds 1Gb of RAM in one > or two memory slots the unbuffered memory should be slightly > faster, and with the ECC enabled, just as reliable. Correct? that's my understanding... From rgb at phy.duke.edu Tue Dec 14 09:31:24 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: References: Message-ID: On Tue, 14 Dec 2004, David Mathog wrote: > Very unclear to me what, if anything is gained by using > registered ECC vs. unbuffered ECC on a smallish system. On a big > system, with lots of memory, I can see where registered ECC would > be a plus. But if the system only holds 1Gb of RAM in one > or two memory slots the unbuffered memory should be slightly > faster, and with the ECC enabled, just as reliable. Correct? I hope that is rhetorical instead of asking >>me<< in particular. All I know on the issue is from what people like Don Becker on list have written about it over the years, and that's probably all muddled up. To my own personal experience, buffered/unbuffered ECC/nonECC memory all pretty much works perfectly as long as one a) doesn't overclock; b) doesn't overheat; c) aren't in the process of physically failing even without overclocking or overheating. At least, if I've dropped bits over the years, I have little overt evidence of it, or else it seems to occur just as often on systems with ECC memory (like those good old 2466's) as it does on my many cheaper systems without it. More often, even. I vaguely recall Don saying something about ECC being essentially redundant with checks that take place anyway and just slowing you down, but it was a lot time ago and I also recall various also smart persons arguing that ECC was essential (and offering an example of a cluster where without it they would throw an error a week or something like that. Maybe Josip? So I plead ignorance on the technical front and inadequate personal/anecdotal experience. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From becker at scyld.com Tue Dec 14 10:21:50 2004 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Reminder: Today's BWBUG meeting is at Greenbelt MD Message-ID: The BWBUG.org site seem to be down, and the archive on Beowulf.org hasn't yet been updated so I'll repeat and add to the announcement. The meeting is at the Greenbelt MD location 7501 Greenway Center Drive, Suite 1200 (6th floor -- there will be signs) Greenbelt, MD, 20770 http://maps.yahoo.com/maps_result?ed=lz2hg.p_0TqL0Yhiv1qfwXsEKcUexu7GY8N7srmXDCUrNE.i&csz=20770&country=us&new=1&name=&qty= The map only shows the government roads -- you may also cut through the shopping center parking lot. The building is on the southeast side of the BW Parkway - D.C beltway intersection, adjacent to the Maryland Trade Center buildings. ________________ --- Special Notes: -- This months meeting will be held in Greenbelt MD, not in Virginia -- This meeting returns to our usual second Tuesday of the month schedule -- The meeting will focus on advances in Ethernet cluster networks -- See http://www.bwbug.org/ for full information Date: Tuesday December 14, 2004 Time: 2:45 PM - 5:00 PM Location: Northrop Grumman IT, Greenbelt MD Titles: sFlow, A new Network Monitoring System Interconnect and Switch Fabrics for Clusters. Speakers: Mr. Wes Medley Federal Systems Engineer with Foundry Networks Donald Becker CTO of Penguin Computing, and CS of the Scyld Software division Abstracts: Foundry Networks and Inmon Corporation will provide a technology brief on sFlow, a new network monitoring technology. sFlow provides a network-wide view of user level traffic flows. sFlow is a scalable technique for measuring network traffic, collecting, storing, and analyzing network traffic data. sFlow allows administrators of Supercomputing Clusters to understand possible congestion points within their application traffic flow information and to analysis traffic flows to ensure correct cluster operation. Donald Becker will give a brief review of the Interconnect and Switch Fabrics for Clusters. Donald is one of the world experts on network drivers, having created most of the network drivers for the Linux during the first decade of its existance. He was the founder of the first company that specialized in commerical cluster management software, SCYLD. ____ This month's meeting will be in our Maryland venue, the Northrop Grumman Information Technology Offices at 7501 Greenway Center Drive, Suite 1200 Greenbelt, MD, 20770 See http://www.bwbug.org/ web page for directions. Registration on the web site is highly encourage to speed sign-in. As usual there will be door prizes, food and refreshments. Essential questions: Need to be a member?: No, and guests are welcome. Parking and parking fees: Free surface lot parking is readily available Ease of access: 30 seconds from the D.C. beltway Also as usual, the organizer and host for the meeting is T. Michael Fitzmaurice, Jr. 8110 Gatehouse Road, Suite 400W Falls Church, VA 22042 703-205-3132 office 240-475-7877 cell mail michael.fitzmaurice at ngc.com -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From akhtar_samo at yahoo.com Tue Dec 14 00:06:39 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Let me know abt Bench Marks Message-ID: <20041214080639.85064.qmail@web20027.mail.yahoo.com> Hi, Actually I want to run an benchmark on my MPICH based cluster, its just a 4 node cluster. Kindly let me know what to do , where to get it from with proper installation method Akhtar --------------------------------- Do you Yahoo!? Send holiday email and support a worthy cause. Do good. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041214/6d8f9a4e/attachment.html From scheinin at crs4.it Tue Dec 14 00:46:37 2004 From: scheinin at crs4.it (Alan Louis Scheinine) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Opteron with PCI Express Message-ID: <41BEA86D.7050104@crs4.it> Maurice W. Hilarius wrote: > Also has PCI Express 16X, and there are already dual > Opteron Socket 940 with this chipset. > Look, for example at the Tyan S2895 which just came out. Where did it come out, Australia? The Tyan site has an announcement of an announcement to be made in November in Australia but I don't see a followup of the content of the announcement and I do not see the S2895 listed among the Opteron products. Of course, marketing is not the role of beowulf mailing list members, but PCI Express with Opteron is interesting, so indications of COTS boards is useful. (For example, some developers here program the GPU (graphics chip) and the faster I/O of PCI express is needed.) -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From fabrice.pardo at lpn.cnrs.fr Tue Dec 14 07:02:30 2004 From: fabrice.pardo at lpn.cnrs.fr (Fabrice Pardo) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: probs with mpdboot for 4 nodes Message-ID: <41BF0086.1070508@lpn.cnrs.fr> Hi, Your problem comes from the fact that "beomaster" initiate only 2 ssh connections, the lchild one and the rchild one, to beoslave and beoslave1 nodes. The subsequent connections are initiated in the same left-right childs schema from one of these slaves. Then you need to be able to do a ssh from beoslave to beoslave2 or from beoslave1 to beoslave2 without password. A way is to suppress your passphrase. beomaster$ ssh-keygen -p (empty for no passphrase) Warning, your secret key .ssh/id_Xsa is now unencrypted. Now, try beomaster$ ssh beoslave beslave$ ssh beolslave2 beomaster$ ssh beoslave1 beslave1$ ssh beolslave2 This is not documented in mpich2-1.0/README, but the --debug listing is a little bit more explicit (the name of the ssh initiator is written) Regards. -- Fabrice From mathog at mendel.bio.caltech.edu Tue Dec 14 07:59:16 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 Message-ID: > On Mon, 13 Dec 2004, Mark Hahn wrote: > > > > >> I believe the socket 754 and 939 Athlon 64s do not support ECC, while > > > >> the socket 940 Athlon 64 and Opteron do. > > > > > > > It's hard to tell what the 754 boards support since many say they > > > will _accept_ ECC memory but they don't say that they can actually > > > _use_ it. > > > > all K8's support ECC - AMD definitely does not disable ECC on some chips. > > I just checked the s754 functional spec, and it certainly supports ECC. > > > > it's conceivable that MB vendors are such twits that they'd fail to > > detect ECC dimms and enable ECC in the bios. it's even possible they'd > > be such idiots as to fail to connect memcheck pins to the dimm slots. > > FWIW, I now am installing my ASUS K8NE, and it caims in its user guide > to support unbuffered ECC or non-ECC SDRAM (up to PC3200). > Back to the original thought, which was recycling the PC2100 memory from the Tyan S2466N boards. The consensus seems to be that registered ECC recycled from the S2466N systems will not work in boards designed for unbuffered ECC, which means that both the 754 and 939 are out. The 940 boards require registered ECC but they cost more than the other two and the CPUs that go in them also cost more. Very unclear to me what, if anything is gained by using registered ECC vs. unbuffered ECC on a smallish system. On a big system, with lots of memory, I can see where registered ECC would be a plus. But if the system only holds 1Gb of RAM in one or two memory slots the unbuffered memory should be slightly faster, and with the ECC enabled, just as reliable. Correct? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From Bogdan.Costescu at iwr.uni-heidelberg.de Tue Dec 14 10:39:37 2004 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 16 In-Reply-To: Message-ID: [ took out the cc-list, everybody's subscribed ] On Tue, 14 Dec 2004, Mark Hahn wrote: > > The consensus seems to be that registered ECC recycled from the > > S2466N systems will not work in boards designed for unbuffered > > ECC, which means that both the 754 and 939 are out. > I believe the relation is the other way: you cannot use unbuffered > dimms in a board which assumes buffered dimms. Hmm, my experience was with an Intel 875P based mainboard that was advertised as being able to use only unbuffered ECC: it did work with unbuffered ECC, but not with buffered ECC - both memory modules were from Kingston and the only difference in specs was the buffering thing. Furthermore, it was my understading that, for systems that did not have support for ECC, using unbuffered ECC was still possible, as the ECC-related signals were not connected or ignored, but the rest of the signalling was the same. However, buffered ECC would mostly not work in such systems - and this was later supported by practice: a buffered ECC module (incidentally also from a retired Tyan S2460 system) did not work in most non-ECC mainboards that I've had on hand to try; it only worked in an AMD760-based single CPU board that was also not advertised to support ECC, but I guess that the pins were not connected while the signalling was just common in the chipset with the MP version. This data supports the original statement, while not really contradicting Mark's ;-) -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From fant at pobox.com Tue Dec 14 15:10:41 2004 From: fant at pobox.com (Andrew D. Fant) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] MPI Implementations for SMP use Message-ID: <41BF72F1.2090707@pobox.com> This may be opening up a can of worms, but does anyone have any information about the relative merits of the various open implementations of MPI in SMP systems? I'm setting up a Linux server with multiple CPUs and I wanted to know if one implementation is significantly faster than others under these conditions. Thanks, Andy From maurice at harddata.com Wed Dec 15 00:07:31 2004 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Opteron with PCI Express In-Reply-To: <200412141834.iBEIXrwS030343@bluewest.scyld.com> References: <200412141834.iBEIXrwS030343@bluewest.scyld.com> Message-ID: <41BFF0C3.7010708@harddata.com> Alan Louis Scheinin wrote: >Maurice W. Hilarius wrote: > > Also has PCI Express 16X, and there are already dual > > Opteron Socket 940 with this chipset. > > Look, for example at the Tyan S2895 which just came out. >Where did it come out, Australia? The Tyan site has an >announcement of an announcement to be made in November in >Australia but I don't see a followup of the content of the >announcement and I do not see the S2895 listed among the >Opteron products. > > Of course, marketing is not the role of beowulf mailing list >members, but PCI Express with Opteron is interesting, so indications >of COTS boards is useful. (For example, some developers here program >the GPU (graphics chip) and the faster I/O of PCI expr > Tyan has boards arriving in their USA warehouse as of the 15th of December. PCI Express is especially interesting in HPC as many of the advanced network boards are now coming out in PCI Express interface. GbE and Infiniband already, and Myrinet, and others soon. As we have already hit the wall of PCI-X with these devices, Express adds a significant improvement in bandwidth and latency. Opteron HTX will add even more performance when it come out in 2005. The main reason I posted is that I thought it significant that the first Opteron dual board with PCI-Express is finally out. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 This email, message, and content, should be considered confidential, and is the copyrighted property of Hard Data Ltd., unless stated otherwise. From mathog at mendel.bio.caltech.edu Wed Dec 15 10:01:11 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] RE: S2466 systems won't reboot after linux poweroff Message-ID: > Problem: each node in a 20 node beowulf typically will > not reboot following a linux poweroff command. Power comes > back on, but it never even shows the BIOS screens. > > Hardware: > > S2466 MPX mobo Two of these nodes are flakey and aren't in the compute pool. These were both upgraded to BIOS v4.06. This DID resolve the problem with a "poweroff" followed by "turning power switch on" not rebooting. In other words, they now boot as they should following a poweroff/power switch on cycle. The oddball message cited in the first post that comes out the serial line at the end of "poweroff" remains. Tests: "poweroff" followed by "power switch on": worked 5/5 times "reboot": worked 5/5 times However, the new BIOS didn't make these two nodes any more stable - they still crash at about the same rate. Conclusion, it might be worth the effort to upgrade the BIOS if your cluster is down for some reason anyway. WARNING1. All my nodes seemed to "forget" how to read floppy disks. If the nodes had been up for a while and then were rebooted, and a known good floppy placed in the drive, they would NOT boot from it. If, however, while the node was up, the same floppy was put into the drive and explicitly mounted, listed, and unmounted a couple of times, THEN on the subsequent reboot the system could read from the floppy. I've never seen this on any other system (Tyan's are just full of suprises :-( ). Subsequent to the V4.06 upgrade these nodes seem to recognize the floppy better and so far have not had any problems rebooting directly from a floppy without the kludge described above. However, if you are at Bios V4.03 (which is what they were at, not V4.01 as I had previously posted) you may have the same problems booting from floppy in order to do the BIOS upgrades. So either flash from the net (I have no idea how) or verify that your floppy drives work before rebooting the nodes to be upgraded. WARNING2: update with: >phlash16 244v406.rom left the BIOS settings as they were. But: >flash which ran flash.bat, WIPED the BIOS settings. WARNING3: These are BIOS settings seem to be equivalent: v4.03 v4.06 quickboot enabled disabled diagnostic disabled disabled summary disabled disabled If quickboot is enabled in v4.06 it appears to skip the BIOS memory test entirely. It boots MUCH faster but you may have a hard time ever getting in an F2 to get back to change the BIOS. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From rgb at phy.duke.edu Wed Dec 15 14:49:09 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... Message-ID: Just for those of you who were asking after AMD64's as viable compute platforms, I just ran stream and the bogomflops benchmark in my renamed "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ (metatron) and on a 1.8 MHz P2 (lucifer). Of course I also ran stream by hand on them just to make sure it was giving correct results. They are all below. Executive summary is that the AMD barely beats (real) clock speed scaling compared to the P2 for stream. I suspect that this is not yet the end of the story, though, as I see little difference between the i386 benchmark results and the x86_64 results when running the program compiled both ways on metatron. The INTERESTING story is in bogomflops, which includes division. There metatron was a whopping 2.8x faster than lucifer, while its clock is only 1.33x faster. It more than doubled its relative clockspeed advantage, so to speak. One can see how having 64 bits would really speed up 64 bit division compared to doing it in software across multiple 32 bit registers... It should also be carefully noted that metatron is running Fedora Core 3, x86_64. In other words, blood is dripping down the installation. I wouldn't be terribly surprised to learn that I've screwed up the libraries (or they were conservative with the package binaries) or something so that I'm not getting full 64 bit speed out of it. I'd really expect to see a bit more of an advantage on stream relative to clock from AMD's wide data path and faster memory (PC3200). Hope this is interesting/useful to somebody. I put "real stream" at the very end. "real stream" uses the best time where benchmaster uses the average time so benchmaster results are typically a few percent lower (and likely just that much more realistic as well). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.009 # Test: stream copy # Test Description: d[i] = a[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 21476487.571592 (nsec) time empty = 3.335250 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream copy" 2000000 1 1.07e+01 1.35e-02 1.490e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.009 # Test: stream scale # Test Description: d[i] = xtest*d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 22124394.180132 (nsec) time empty = 3.336800 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream scale" 2000000 1 1.11e+01 1.52e-02 1.446e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.022 # Test: stream add # Test Description: d[i] = a[i] + b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 29229924.717210 (nsec) time empty = 3.334787 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream add" 2000000 1 1.46e+01 1.42e-02 1.642e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 2.837 # Test: stream triad # Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 29402273.082914 (nsec) time empty = 3.334403 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream triad" 2000000 1 1.47e+01 1.40e-02 1.633e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 2.940 # Test: bogomflops # Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 17716108.582773 (nsec) time empty = 3.333979 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "bogomflops" 2000000 1 2.21e+00 2.36e-03 4.516e+02 .......................................................................... #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.392 # Test: stream copy # Test Description: d[i] = a[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 30751936.233069 (nsec) time empty = 14.403535 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream copy" 2000000 1 1.54e+01 4.97e-02 1.041e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.363 # Test: stream scale # Test Description: d[i] = xtest*d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 30298036.984022 (nsec) time empty = 12.929858 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream scale" 2000000 1 1.51e+01 3.61e-02 1.056e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.270 # Test: stream add # Test Description: d[i] = a[i] + b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 39540020.016525 (nsec) time empty = 13.573450 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream add" 2000000 1 1.98e+01 3.55e-02 1.214e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.315 # Test: stream triad # Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 39778004.853398 (nsec) time empty = 12.493777 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream triad" 2000000 1 1.99e+01 5.81e-02 1.207e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.160 # Test: bogomflops # Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 49586377.513218 (nsec) time empty = 12.461900 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "bogomflops" 2000000 1 6.20e+00 5.03e-03 1.613e+02 (metatron x86_64 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1563.8717 0.0205 0.0205 0.0207 Scale: 1540.5368 0.0209 0.0208 0.0213 Add: 1729.2831 0.0283 0.0278 0.0290 Triad: 1731.3502 0.0278 0.0277 0.0280 (metatron i386 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1542.3957 0.0209 0.0207 0.0213 Scale: 1525.1148 0.0213 0.0210 0.0218 Add: 1732.2291 0.0280 0.0277 0.0286 Triad: 1698.8726 0.0284 0.0283 0.0286 (lucifer i386 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1076.4977 0.0300 0.0297 0.0314 Scale: 1078.9293 0.0298 0.0297 0.0302 Add: 1231.8450 0.0392 0.0390 0.0401 Triad: 1230.7681 0.0392 0.0390 0.0403 From bill at cse.ucdavis.edu Wed Dec 15 18:29:56 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: Message-ID: <20041216022956.GC16587@cse.ucdavis.edu> Group reply: On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote: > Just for those of you who were asking after AMD64's as viable compute > platforms, I just ran stream and the bogomflops benchmark in my renamed > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ That is a s754 amd64? > They are all below. Executive summary is that the AMD barely beats > (real) clock speed scaling compared to the P2 for stream. I suspect > that this is not yet the end of the story, though, as I see little > difference between the i386 benchmark results and the x86_64 results > when running the program compiled both ways on metatron. Double registers only help if you need them. Most codes won't automatically utilize native 64 bit ints or pointers to any significant advantage. > The INTERESTING story is in bogomflops, which includes division. There > metatron was a whopping 2.8x faster than lucifer, while its clock is > only 1.33x faster. It more than doubled its relative clockspeed > advantage, so to speak. One can see how having 64 bits would really > speed up 64 bit division compared to doing it in software across > multiple 32 bit registers... Interesting data point. > Hope this is interesting/useful to somebody. I put "real stream" at the > very end. "real stream" uses the best time where benchmaster uses the > average time so benchmaster results are typically a few percent lower > (and likely just that much more realistic as well). Similar data points for an opteron, dual (stream using 1 cpu) 2.2 GHz, with PC3200 memory (915.5MB array). Not sure why the timer is so lousy, I had to make the array large to get a reasonably accurate time: I suspect the below numbers would be higher if I had a uniprocessor system (never have a remote memory access or wait for the memory coherency) or with a 2.6 Kernel (which is better about insuring that pages and the process acting on the page is on the same cpu). Kudos for the pathscale-1.4 compiler with -O3. gcc-3.2.3 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 2206.8823 0.3010 0.2900 0.3800 Scale: 2285.7067 0.2880 0.2800 0.3700 Add: 2285.7087 0.4140 0.4200 0.5300 Triad: 2285.7152 0.3910 0.4200 0.4700 -O2 Function Rate (MB/s) Avg time Min time Max time Copy: 1777.7736 0.3240 0.3600 0.3600 Scale: 1777.7783 0.3240 0.3600 0.3600 Add: 1882.3495 0.4590 0.5100 0.5100 Triad: 1882.3530 0.4590 0.5100 0.5100 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 1777.7924 0.3260 0.3600 0.3700 Scale: 1828.4723 0.3230 0.3500 0.3600 Add: 1882.3679 0.4640 0.5100 0.5200 Triad: 1846.1717 0.4720 0.5200 0.5300 gcc-3.4.3 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 1729.6823 0.3330 0.3700 0.3700 Scale: 1828.5184 0.3230 0.3500 0.3600 Add: 1846.1048 0.4680 0.5200 0.5200 Triad: 1846.1040 0.4680 0.5200 0.5200 -O2: Function Rate (MB/s) Avg time Min time Max time Copy: 2133.3337 0.2960 0.3000 0.3500 Scale: 2133.3337 0.2980 0.3000 0.3500 Add: 2232.5578 0.4270 0.4300 0.5100 Triad: 2181.8132 0.4310 0.4400 0.5100 -O3: Function Rate (MB/s) Avg time Min time Max time Copy: 2285.6561 0.2630 0.2800 0.3600 Scale: 2285.6581 0.2580 0.2800 0.3100 Add: 2341.4071 0.3800 0.4100 0.4700 Triad: 2285.6555 0.3880 0.4200 0.5200 Pathscale-1.4 -O1: Function Rate (MB/s) Avg time Min time Max time Copy: 1999.9498 0.2880 0.3200 0.3200 Scale: 2064.4625 0.2840 0.3100 0.3200 Add: 2232.5009 0.3950 0.4300 0.4400 Triad: 2232.4910 0.3930 0.4300 0.4400 -O2 Function Rate (MB/s) Avg time Min time Max time Copy: 2461.5205 0.2410 0.2600 0.2700 Scale: 2285.6970 0.2530 0.2800 0.2900 Add: 2341.4466 0.3730 0.4100 0.4200 Triad: 2399.9765 0.3670 0.4000 0.4100 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 3764.6831 0.1540 0.1700 0.1800 Scale: 3764.6831 0.1530 0.1700 0.1700 Add: 4173.8781 0.2080 0.2300 0.2400 Triad: 4173.8781 0.2110 0.2300 0.2400 -- Bill Broadley Computational Science and Engineering UC Davis From hahn at physics.mcmaster.ca Wed Dec 15 20:35:32 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: > > They are all below. Executive summary is that the AMD barely beats > > (real) clock speed scaling compared to the P2 for stream. I suspect sure - stream is normally dram test, not a CPU test. > Double registers only help if you need them. Most codes won't > automatically utilize native 64 bit ints or pointers to any > significant advantage. indeed, going 64b often costs a noticable overhead in code size expansion and inflation of space to store pointers. the real appeal of x86-64 is that you get twice as many registers. yes, being able to actually use more than about 2.5 GB is nice, and important to some people. but almost any real code will take advantage of having twice as many registers (integer and SIMD). > or with a 2.6 Kernel (which is better about insuring that pages and the > process acting on the page is on the same cpu). don't forget to turn on node interleave in the bios, too. > Kudos for the pathscale-1.4 compiler with -O3. ironically, icc -xW generates pretty good-for-opteron code, though of course, it's 32b. I haven't tried using icc to generate em64t/and64 code. regards, mark hahn. From bill at cse.ucdavis.edu Wed Dec 15 22:29:43 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: <20041216062943.GA19789@cse.ucdavis.edu> > and important to some people. but almost any real code will take > advantage of having twice as many registers (integer and SIMD). Indeed, assuming you have the source to recompile. > don't forget to turn on node interleave in the bios, too. Assuming a 2.4 kernel I believe that helps single processes running on a dual, but doesn't help when 2 processes are running. With 2.6 I think it's usually faster to have node interleaving off with 2 processes (if not 1). > > > Kudos for the pathscale-1.4 compiler with -O3. > > ironically, icc -xW generates pretty good-for-opteron code, > though of course, it's 32b. I haven't tried using icc to > generate em64t/and64 code. With 8.0 and 8.1 I can't seem to get it working on a RHEL x86-64 box or a Rocks (RHEL-3 based) x86-64 box. I'll retry with a nacoma based machine and see if I can get the intel compiler working. I get errors with 8.0 like: /usr/bin/ld: skipping incompatible /opt/intel_cc_80/lib/libsvml.a when searching for -lsvml or with 8.1: ld: skipping incompatible /usr/lib64/libm.so when searching for -lm -- Bill Broadley Computational Science and Engineering UC Davis From bill at cse.ucdavis.edu Wed Dec 15 22:59:52 2004 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: <20041216065952.GA20370@cse.ucdavis.edu> > ironically, icc -xW generates pretty good-for-opteron code, > though of course, it's 32b. I haven't tried using icc to > generate em64t/and64 code. > > regards, mark hahn. Ah, got icc-8.1 to cooperate, dual 2.2 Ghz opteron+pc3200+2.4 kernel, 915.5MB array: -O1 Function Rate (MB/s) Avg time Min time Max time Copy: 2285.8039 0.2640 0.2800 0.3200 Scale: 2206.9798 0.2690 0.2900 0.3000 Add: 2341.5554 0.3740 0.4100 0.4200 Triad: 2181.9031 0.4060 0.4400 0.4800 -O2 Function Rate (MB/s) Avg time Min time Max time Copy: 2370.4856 0.2570 0.2700 0.3400 Scale: 2285.8280 0.2670 0.2800 0.3400 Add: 2461.6513 0.3710 0.3900 0.4600 Triad: 2285.8229 0.3920 0.4200 0.5000 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 2461.5867 0.2730 0.2600 0.3400 Scale: 2370.4237 0.2910 0.2700 0.3500 Add: 2526.3684 0.4050 0.3800 0.4800 Triad: 2341.5151 0.4320 0.4100 0.5100 The strange thing is they are 32 bit binaries, despite being built on a 64 bit os on a 64 bit hardware. I played around with various mentioned optimizations (including -xW) on the manpage, I never managed a 64 bit binary with icc-8.1 though. The man page has numerous i32em and em64t references. -- Bill Broadley Computational Science and Engineering UC Davis From lindahl at pathscale.com Thu Dec 16 02:21:42 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216022956.GC16587@cse.ucdavis.edu> References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: <20041216102142.GA1649@greglaptop.t-mobile.de> On Wed, Dec 15, 2004 at 06:29:56PM -0800, Bill Broadley wrote: > Kudos for the pathscale-1.4 compiler with -O3. Thank you! The not-so-secret secret is to use non-temporal stores, which we do automagically where needed with plain -O3. -- greg From rgb at phy.duke.edu Thu Dec 16 05:08:21 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216022956.GC16587@cse.ucdavis.edu> References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: On Wed, 15 Dec 2004, Bill Broadley wrote: > Group reply: > > On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote: > > Just for those of you who were asking after AMD64's as viable compute > > platforms, I just ran stream and the bogomflops benchmark in my renamed > > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ > > That is a s754 amd64? Yes (as per earlier discussion, an Asus K8NE, but I should have restated it -- the P2 is an MSI mobo but I'm downstairs and don't remember which one). > > > They are all below. Executive summary is that the AMD barely beats > > (real) clock speed scaling compared to the P2 for stream. I suspect > > that this is not yet the end of the story, though, as I see little > > difference between the i386 benchmark results and the x86_64 results > > when running the program compiled both ways on metatron. > > Double registers only help if you need them. Most codes won't > automatically utilize native 64 bit ints or pointers to any > significant advantage. Well, stream is as much a memory bandwidth test as it is a floating point test per se anyway. I always hope for something dramatic when I use faster/wider memory, but usually reality is fairly sedate. > > The INTERESTING story is in bogomflops, which includes division. There > > metatron was a whopping 2.8x faster than lucifer, while its clock is > > only 1.33x faster. It more than doubled its relative clockspeed > > advantage, so to speak. One can see how having 64 bits would really > > speed up 64 bit division compared to doing it in software across > > multiple 32 bit registers... > > Interesting data point. > > > Hope this is interesting/useful to somebody. I put "real stream" at the > > very end. "real stream" uses the best time where benchmaster uses the > > average time so benchmaster results are typically a few percent lower > > (and likely just that much more realistic as well). > > Similar data points for an opteron, dual (stream using 1 cpu) 2.2 GHz, > with PC3200 memory (915.5MB array). Not sure why the timer is so lousy, > I had to make the array large to get a reasonably accurate time: You should try stream inside my leedle harness that uses the CPU cycle counter clock. It autotunes iterations and so forth and generates an SD as well as mean. That's the "clock granularity" thing in my test results. Note that it is 3 nsec on the AMD 64 and almost 100 nsec on the P2. This is also an interesting data point -- it suggests that integer instructions may be considerably faster on the AMD64. I'll have to run a mixed code program to find out, though. > I suspect the below numbers would be higher if I had a uniprocessor system > (never have a remote memory access or wait for the memory coherency) > or with a 2.6 Kernel (which is better about insuring that pages and the > process acting on the page is on the same cpu). > > Kudos for the pathscale-1.4 compiler with -O3. Now that's something to try. I still haven't started my thirty day three trial that I signed up for two months ago (I should know better than to do that right before the end of classes). I've got almost a good month of reduced teaching ahead -- maybe I'll start it now. From everything I've heard and seen, I'm going to end up buying a license or two anyway -- it seems like it is just a really, really good product being maintained by some very serious people. Of course, a factor of two in speed (for certain code) for the cost of a software license is a hell of a lot cheaper than buying a cluster twice as large. That helps. rgb > > gcc-3.2.3 -O1: > Function Rate (MB/s) Avg time Min time Max time > Copy: 2206.8823 0.3010 0.2900 0.3800 > Scale: 2285.7067 0.2880 0.2800 0.3700 > Add: 2285.7087 0.4140 0.4200 0.5300 > Triad: 2285.7152 0.3910 0.4200 0.4700 > > -O2 > Function Rate (MB/s) Avg time Min time Max time > Copy: 1777.7736 0.3240 0.3600 0.3600 > Scale: 1777.7783 0.3240 0.3600 0.3600 > Add: 1882.3495 0.4590 0.5100 0.5100 > Triad: 1882.3530 0.4590 0.5100 0.5100 > > -O3 > Function Rate (MB/s) Avg time Min time Max time > Copy: 1777.7924 0.3260 0.3600 0.3700 > Scale: 1828.4723 0.3230 0.3500 0.3600 > Add: 1882.3679 0.4640 0.5100 0.5200 > Triad: 1846.1717 0.4720 0.5200 0.5300 > > gcc-3.4.3 -O1: > Function Rate (MB/s) Avg time Min time Max time > Copy: 1729.6823 0.3330 0.3700 0.3700 > Scale: 1828.5184 0.3230 0.3500 0.3600 > Add: 1846.1048 0.4680 0.5200 0.5200 > Triad: 1846.1040 0.4680 0.5200 0.5200 > > -O2: > Function Rate (MB/s) Avg time Min time Max time > Copy: 2133.3337 0.2960 0.3000 0.3500 > Scale: 2133.3337 0.2980 0.3000 0.3500 > Add: 2232.5578 0.4270 0.4300 0.5100 > Triad: 2181.8132 0.4310 0.4400 0.5100 > > -O3: > Function Rate (MB/s) Avg time Min time Max time > Copy: 2285.6561 0.2630 0.2800 0.3600 > Scale: 2285.6581 0.2580 0.2800 0.3100 > Add: 2341.4071 0.3800 0.4100 0.4700 > Triad: 2285.6555 0.3880 0.4200 0.5200 > > Pathscale-1.4 -O1: > Function Rate (MB/s) Avg time Min time Max time > Copy: 1999.9498 0.2880 0.3200 0.3200 > Scale: 2064.4625 0.2840 0.3100 0.3200 > Add: 2232.5009 0.3950 0.4300 0.4400 > Triad: 2232.4910 0.3930 0.4300 0.4400 > > -O2 > Function Rate (MB/s) Avg time Min time Max time > Copy: 2461.5205 0.2410 0.2600 0.2700 > Scale: 2285.6970 0.2530 0.2800 0.2900 > Add: 2341.4466 0.3730 0.4100 0.4200 > Triad: 2399.9765 0.3670 0.4000 0.4100 > > -O3 > Function Rate (MB/s) Avg time Min time Max time > Copy: 3764.6831 0.1540 0.1700 0.1800 > Scale: 3764.6831 0.1530 0.1700 0.1700 > Add: 4173.8781 0.2080 0.2300 0.2400 > Triad: 4173.8781 0.2110 0.2300 0.2400 > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Dec 16 05:49:44 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> References: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> Message-ID: On Thu, 16 Dec 2004, Kozin, I (Igor) wrote: > Hi Bill, very interesting results. > > > Ah, got icc-8.1 to cooperate, dual 2.2 Ghz opteron+pc3200+2.4 kernel, > > 915.5MB array: > > -O1 > > Function Rate (MB/s) Avg time Min time Max time > > Copy: 2285.8039 0.2640 0.2800 0.3200 > > Scale: 2206.9798 0.2690 0.2900 0.3000 > > Add: 2341.5554 0.3740 0.4100 0.4200 > > Triad: 2181.9031 0.4060 0.4400 0.4800 > > > > -O2 > > Function Rate (MB/s) Avg time Min time Max time > > Copy: 2370.4856 0.2570 0.2700 0.3400 > > Scale: 2285.8280 0.2670 0.2800 0.3400 > > Add: 2461.6513 0.3710 0.3900 0.4600 > > Triad: 2285.8229 0.3920 0.4200 0.5000 > > pls note that your "average time" is sometimes less than "min time". In the words of James Bond, "Well, that's a neat trick...";-) The min/max times are also a bit odd in that they are all two digit numbers, and the average is a three digit number (%6.4f format). Something is Not Right. Good eye, Igor. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Dec 16 06:43:32 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: On Thu, 16 Dec 2004, Joshua Baker-LePain wrote: > On Thu, 16 Dec 2004 at 8:08am, Robert G. Brown wrote > > > On Wed, 15 Dec 2004, Bill Broadley wrote: > > > > > Group reply: > > > > > > On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote: > > > > Just for those of you who were asking after AMD64's as viable compute > > > > platforms, I just ran stream and the bogomflops benchmark in my renamed > > > > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ > > > > > > That is a s754 amd64? > > > > Yes (as per earlier discussion, an Asus K8NE, but I should have restated > > it -- the P2 is an MSI mobo but I'm downstairs and don't remember which > > one). ^^ > > You keep using that word. I do not think it means what you think it > means. > \end[castilian]{accent} > > Didn't your initial post state that the Intel CPU tested was a > "GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz)"? Or > am I missing something (I haven't had my coffee yet, you see)? Well, y'see, all current Intel processors are successors to the P5, making the whole processor set: Pentium Pro, Celeron, PII, PIII and P4 into P6's. As is well-known by all Sicilians, P6 minus P4 clearly equals P2, so I cannot run the processor close to me. But Celerons are sometimes referred to as "Celeries", and Celery is well known to be a laxative and tasteless, so I cannot run the processor close to you. Wait, wait, I'm just getting going! If you cripple a PIII by removing its cache, you get a Celeron, making a P6 that (knowing that all men are mortal) you would want to keep as far from you as possible, so I clearly cannot run the processor near to me...so you run the processor in front of you and I'll run the one in front of me In the meantime, I've got to P...2. rg-head-already-aching-b -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rbw at ahpcrc.org Thu Dec 16 07:16:45 2004 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216102142.GA1649@greglaptop.t-mobile.de> References: <20041216022956.GC16587@cse.ucdavis.edu> <20041216102142.GA1649@greglaptop.t-mobile.de> Message-ID: <41C1A6DD.2020208@ahpcrc.org> All, Here are the data again comparing gcc, PGI, the Pathscale compilers on our cluster and Bill's Opteron with prefetching turned on in PGI and gcc as well. Our system has the same clock as Bill's, 2.2 GHz, but slower memory (PC2700). I have thrown in some X1 SSP timings are well. The numbers demonstrate the importance of explicitly asking for prefetching on the non-Pathscale compilers. Pathscale still comes out on top (at about half the X1 SSP rate) here, but the numbers are now much closer, and these differences may be somewhat accounted for by Bill's system's faster memory (PC32000 versus PC2700 for our system). I include the X1 single SSP data as well. Of course if you are focused on raw bandwidth, you should get numbers with and without prefetching otherwise you are silently including cache effects. The equivalent *one processo*r megaflop ratings for the triad data below are: gcc (noprefetch): 186 MFLOPs gcc (prefetch): 279 MFLOPs pgcc (prefetch): 300 MFLOPs pscalecc (prefetch): 347 MFLOPs x1cc (vector, 1ssp): 780 MFLOPs Dual processor ratings should be close to double these on the Opteron. So I expect one node (two CPUs) on the Opteron is almost equal one SSP on the X1. Enjoy and prefetch! rbw gcc-3.2.3 -O4 -Wall -pedantic: Function Rate (MB/s) RMS time Min time Max time Copy: 2004.8056 0.0095 0.0080 0.0099 Scale: 2044.7551 0.0099 0.0078 0.0105 Add: 2272.3092 0.0133 0.0106 0.0137 Triad: 2237.3599 0.0134 0.0107 0.0137 gcc-3.2.3 -O4 -fprefetch-loop-arrays -Wall -pedantic: Function Rate (MB/s) RMS time Min time Max time Copy: 3259.9273 0.0049 0.0049 0.0052 Scale: 3294.9803 0.0049 0.0049 0.0049 Add: 3306.7241 0.0073 0.0073 0.0073 Triad: 3349.1914 0.0072 0.0072 0.0072 pgcc -fast -Mvect=sse -Mnontemporal Function Rate (MB/s) RMS time Min time Max time Copy: 3227.6291 0.0050 0.0050 0.0052 Scale: 3210.1824 0.0050 0.0050 0.0050 Add: 3571.3935 0.0067 0.0067 0.0068 Triad: 3604.1280 0.0067 0.0067 0.0068 Pathscale-1.4 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 3764.6831 0.1540 0.1700 0.1800 Scale: 3764.6831 0.1530 0.1700 0.1700 Add: 4173.8781 0.2080 0.2300 0.2400 Triad: 4173.8781 0.2110 0.2300 0.2400 X1cc -c -h inline3,scalar3,vector3 -h stream0 Function Rate (MB/s) RMS time Min time Max time Copy: 7600.2280 0.0022 0.0021 0.0022 Scale: 7600.5529 0.0024 0.0021 0.0030 Add: 9259.1164 0.0026 0.0026 0.0027 Triad: 9360.5935 0.0026 0.0026 0.0026 Greg Lindahl wrote: >On Wed, Dec 15, 2004 at 06:29:56PM -0800, Bill Broadley wrote: > > > >>Kudos for the pathscale-1.4 compiler with -O3. >> >> > >Thank you! The not-so-secret secret is to use non-temporal stores, >which we do automagically where needed with plain -O3. > >-- greg > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > From ctierney at HPTI.com Thu Dec 16 08:12:21 2004 From: ctierney at HPTI.com (Craig Tierney) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: Message-ID: <1103213541.3091.6.camel@hpti10.fsl.noaa.gov> On Wed, 2004-12-15 at 21:35, Mark Hahn wrote: > > > They are all below. Executive summary is that the AMD barely beats > > > (real) clock speed scaling compared to the P2 for stream. I suspect > > sure - stream is normally dram test, not a CPU test. > > > Double registers only help if you need them. Most codes won't > > automatically utilize native 64 bit ints or pointers to any > > significant advantage. > > indeed, going 64b often costs a noticable overhead in code size > expansion and inflation of space to store pointers. > > the real appeal of x86-64 is that you get twice as many registers. > yes, being able to actually use more than about 2.5 GB is nice, > and important to some people. but almost any real code will take > advantage of having twice as many registers (integer and SIMD). > > > or with a 2.6 Kernel (which is better about insuring that pages and the > > process acting on the page is on the same cpu). > > don't forget to turn on node interleave in the bios, too. Why? If you are planning to have a single process access the memory of all of the nodes (cpus) then yes. If you are running MPI jobs or multiple processes that stay local to their own memory, they don't you want bank interleave on but node interleave off? I have seen better performance for MPI jobs, 1 process per cpu, with node interleave off. Craig > > > Kudos for the pathscale-1.4 compiler with -O3. > > ironically, icc -xW generates pretty good-for-opteron code, > though of course, it's 32b. I haven't tried using icc to > generate em64t/and64 code. > > regards, mark hahn. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mwill at penguincomputing.com Thu Dec 16 10:16:35 2004 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <1103213541.3091.6.camel@hpti10.fsl.noaa.gov> References: <1103213541.3091.6.camel@hpti10.fsl.noaa.gov> Message-ID: <200412161016.35994.mwill@penguincomputing.com> On Thursday 16 December 2004 08:12 am, Craig Tierney wrote: > > don't forget to turn on node interleave in the bios, too. > > Why? If you are planning to have a single process access > the memory of all of the nodes (cpus) then yes. If you are > running MPI jobs or multiple processes that stay local to their > own memory, they don't you want bank interleave on but node > interleave off? > > I have seen better performance for MPI jobs, 1 process per > cpu, with node interleave off. This assumes you have a 2.6 or numa kernel that makes sure that the process stays on the CPU with the memory. A year ago I did benchmarking on SLES8 on a dual opteron and when running it over and over again I would observe two very distinct results, depending on if the process and its ram were on the same CPU or not. Turning on node-interleave averages the results out, which is not needed in real life except for when you need to have consistent publishable results, or are worried about timing variations making your debugging harder. Michael > Craig > > > > > > > Kudos for the pathscale-1.4 compiler with -O3. > > > > ironically, icc -xW generates pretty good-for-opteron code, > > though of course, it's 32b. I haven't tried using icc to > > generate em64t/and64 code. > > > > regards, mark hahn. > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com From rgb at phy.duke.edu Thu Dec 16 15:06:34 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] benchmaster Message-ID: OK, I spent a day and a half cleaning up cpu_rate, renaming it (as it no longer just tests CPU per se anyway), making it a neat little website, and packaging it all up to go. I even made the repository yum-distributable, so in FC3 you can drop a benchmaster.repo into your yum repo directory and automagically update from it. Of course the primary purpose for a benchmark harness isn't to be installed via a package, it is to play with it. So you might want to get the tarball instead of the binary rpm so you can e.g. change compilers and flags and so forth. This is fully GPL code with no restrictions on it other than the purely nominal "beverage" clause. It is designed for you to be able to add your own code fragments and time them fairly easily. Its other main virtue is that it makes it very easy to generate a list of benchmark results for different values of the parameters (such as vector length in stream). This lets you make plots, and plots carry a lot more information than a short table of numbers. For example, look at the jpegs on the lucifer.html page below. Main page: http://www.phy.duke.edu/~rgb/General/benchmaster.php Sample/example page: http://www.phy.duke.edu/~rgb/General/benchmaster/lucifer.html I've worked on this enough that I'm HOPING that it is largely decrufted and reasonably bug free and even tolerably documented. But no guarantees. It IS free software, after all...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From deadline at linux-mag.com Fri Dec 17 12:02:36 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: Message-ID: FYI, starting in January Jeff Layton and I will be running a series in ClusterWorld magazine on how to build a cluster for under $2500 (that is right two zeros). Obviously not a great interest to the hardcore readers of the list, but maybe of some interest to the newbies. You can find out more (and pictures) at: http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 Doug ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Fax: 610.865.6618 www.clusterworld.com From mwill at penguincomputing.com Fri Dec 17 12:22:07 2004 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: References: Message-ID: <200412171222.07796.mwill@penguincomputing.com> Quite impressive - the compute nodes have 256 bytes of ram ? Michael PS: put in 'mega' there On Friday 17 December 2004 12:02 pm, Douglas Eadline, Cluster World Magazine wrote: > > FYI, starting in January Jeff Layton and I will be running a series > in ClusterWorld magazine on how to build a cluster for under $2500 > (that is right two zeros). > > Obviously not a great interest to the hardcore readers of the list, but > maybe of some interest to the newbies. You can find out more (and > pictures) at: > > http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > > Doug > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > Desk: 610.865.6061 > Fax: 610.865.6618 www.clusterworld.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com From deadline at linux-mag.com Fri Dec 17 12:38:55 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: <200412171222.07796.mwill@penguincomputing.com> Message-ID: On Fri, 17 Dec 2004, Michael Will wrote: > Quite impressive - the compute nodes have 256 bytes of ram ? well we had to cut costs somewhere! Doug > > Michael > PS: put in 'mega' there > On Friday 17 December 2004 12:02 pm, Douglas Eadline, Cluster World Magazine wrote: > > > > FYI, starting in January Jeff Layton and I will be running a series > > in ClusterWorld magazine on how to build a cluster for under $2500 > > (that is right two zeros). > > > > Obviously not a great interest to the hardcore readers of the list, but > > maybe of some interest to the newbies. You can find out more (and > > pictures) at: > > > > http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > > > > Doug > > ---------------------------------------------------------------- > > Editor-in-chief ClusterWorld Magazine > > Desk: 610.865.6061 > > Fax: 610.865.6618 www.clusterworld.com > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Fri Dec 17 13:44:33 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: References: Message-ID: <6.1.1.1.2.20041217134312.041c8850@mail.jpl.nasa.gov> A year or so back, there was a discussion on the list about building a cluster entirely of stuff bought from WalMart (based on the $199 computers Walmart was selling back then). At 12:02 PM 12/17/2004, Douglas Eadline, Cluster World Magazine wrote: >FYI, starting in January Jeff Layton and I will be running a series >in ClusterWorld magazine on how to build a cluster for under $2500 >(that is right two zeros). > >Obviously not a great interest to the hardcore readers of the list, but >maybe of some interest to the newbies. You can find out more (and >pictures) at: > >http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > >Doug >---------------------------------------------------------------- >Editor-in-chief ClusterWorld Magazine >Desk: 610.865.6061 >Fax: 610.865.6618 www.clusterworld.com > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Fri Dec 17 14:06:17 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: References: Message-ID: <6.1.1.1.2.20041217140554.041cd798@mail.jpl.nasa.gov> >A year or so back, there was a discussion on the list about building a >cluster entirely of stuff bought from WalMart (based on the $199 computers >Walmart was selling back then). At 12:02 PM 12/17/2004, Douglas Eadline, Cluster World Magazine wrote: >FYI, starting in January Jeff Layton and I will be running a series >in ClusterWorld magazine on how to build a cluster for under $2500 >(that is right two zeros). > >Obviously not a great interest to the hardcore readers of the list, but >maybe of some interest to the newbies. You can find out more (and >pictures) at: > >http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > >Doug >---------------------------------------------------------------- >Editor-in-chief ClusterWorld Magazine >Desk: 610.865.6061 >Fax: 610.865.6618 www.clusterworld.com > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From deadline at linux-mag.com Fri Dec 17 17:57:50 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: <6.1.1.1.2.20041217134312.041c8850@mail.jpl.nasa.gov> Message-ID: On Fri, 17 Dec 2004, Jim Lux wrote: > A year or so back, there was a discussion on the list about building a > cluster entirely of stuff bought from WalMart (based on the $199 computers > Walmart was selling back then). The Walmart systems are what started this idea. I had purchased a $199 box with a VIA C3, then some Duron boxes. The problem with the "Walmart/Microtel" approach was that the consistency of the boxes was not very good. Indeed, several times I got something better than advertised! Motherboards would change, larger hard-drives, slightly faster processor, etc. So it could be hard building a reproducible cluster. Plus the cases they were using were vented on sides -- not good for close stacking. So we decided it would be better to use parts that were available from several sources so our efforts could be easily duplicated. Doug > > > At 12:02 PM 12/17/2004, Douglas Eadline, Cluster World Magazine wrote: > > >FYI, starting in January Jeff Layton and I will be running a series > >in ClusterWorld magazine on how to build a cluster for under $2500 > >(that is right two zeros). > > > >Obviously not a great interest to the hardcore readers of the list, but > >maybe of some interest to the newbies. You can find out more (and > >pictures) at: > > > >http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > > > >Doug > >---------------------------------------------------------------- > >Editor-in-chief ClusterWorld Magazine > >Desk: 610.865.6061 > >Fax: 610.865.6618 www.clusterworld.com > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > From maurice at harddata.com Wed Dec 15 00:00:27 2004 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Memory for Athlon64 In-Reply-To: <200412140021.iBE0LBKJ029650@bluewest.scyld.com> References: <200412140021.iBE0LBKJ029650@bluewest.scyld.com> Message-ID: <41BFEF1B.2010901@harddata.com> Since this topic came up last week I tripped over an article that may be useful to many: http://www.lostcircuits.com/memory/reg_ddr/ Also, earlier I had stated that the S754 and S939 Athlon64 CPUs could not use ECC RAM. I stand corrected. However few maybe none) of the motherboard makers provide for this in the BIOS. The assumption is probably that if you want ECC, you want an Opteron board. At present Opterons and Athlon64 FX CPUs REQUIRE ECC RAM. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 This email, message, and content, should be considered confidential, and is the copyrighted property of Hard Data Ltd., unless stated otherwise. From kyron at jrtad.com Wed Dec 15 12:56:05 2004 From: kyron at jrtad.com (Eric Thibodeau) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] MPI Implementations for SMP use Message-ID: <1103144165.16504@van.jrtad.com> Probably opening up another can of worms but, you might want to seriously consider a hybrid (MPI + OpenMP or Pthreads) approach in the case of SMP machines. Local exeution of OpenMP generated code is considerably faster than the locally executed MPI equivalent. PThreads will give you even more gain but requires some code modification (as opposed to OpenMP code if you stick to the #pragma approach). With the advent of HT processors and more affordable dual motherboards comming on the market, threading cumputation loops (hybrid approach) will become a more and more "logical" approach in structuring parallel programs. Cheers, Eric "Andrew D. Fant" wrote .. > This may be opening up a can of worms, but does anyone have any > information about the relative merits of the various open > implementations of MPI in SMP systems? I'm setting up a Linux server > with multiple CPUs and I wanted to know if one implementation is > significantly faster than others under these conditions. > > Thanks, > Andy > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From csamuel at vpac.org Wed Dec 15 16:52:03 2004 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: Message-ID: <200412161152.05780.csamuel@vpac.org> On Thu, 16 Dec 2004 09:49 am, Robert G. Brown wrote: > and on a 1.8 MHz P2 (lucifer) I didn't know they made 1.8MHz Pentium 2's :-) -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041216/266ff0f1/attachment.bin From michael at beethovenweb.com Wed Dec 15 19:31:26 2004 From: michael at beethovenweb.com (Michael R. Hines) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] looking for a switch to purchase Message-ID: <6.2.0.14.2.20041215222758.02ff1078@garnet.acns.fsu.edu> Hi everyone, I'm a graduate student at Florida State University and I'm looking to buy a basic, level 2 gigabit switch that supports jumbo frames. I only need a few ports on this switch: no more than 16, really. From the searching on the internet I've been doing, I'm having trouble understanding exactly how to eliminate the switches from different companies. There are tons of them, and many of them have support for other levels, ( 3 through 7), which I don't really need. My price range is a maximum of $500. Anyone have any suggestions? Your help is much appreciated. /* ----------------------------------------------- */ Michael R. Hines Graduate Student, Dept. Computer Science Florida State University Jusqu'? ce que le futur vienne........ /* ----------------------------------------------- */ From kshitij_sanghi at yahoo.com Wed Dec 15 22:11:30 2004 From: kshitij_sanghi at yahoo.com (Kshitij Sanghi) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] OpenPBS vs Condor? Message-ID: <20041216061130.79612.qmail@web11413.mail.yahoo.com> Hi, I'm new to Linux clusters. I wanted to know how does OpenPBS compare with Condor. We already have a small grid running Condor but wanted some scheduling facilities for our jobs. Is it necessary to shift to OpenPBS to provide job scheduling or will Condor do? The scheduler we require is nothing complex even the most basic one would do. Thanks and Regards, Kshitij ___________________________________________________________ Win a castle for NYE with your mates and Yahoo! Messenger http://uk.messenger.yahoo.com From mb at gup.jku.at Thu Dec 16 01:08:12 2004 From: mb at gup.jku.at (Markus Baumgartner) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216065952.GA20370@cse.ucdavis.edu> References: <20041216022956.GC16587@cse.ucdavis.edu> <20041216065952.GA20370@cse.ucdavis.edu> Message-ID: <41C1507C.5090101@gup.jku.at> Bill Broadley wrote: > > The strange thing is they are 32 bit binaries, despite being built > on a 64 bit os on a 64 bit hardware. > > I played around with various mentioned optimizations (including -xW) > on the manpage, I never managed a 64 bit binary with icc-8.1 though. > The man page has numerous i32em and em64t references. You probably need to download the EM64T edition of the Intel compiler 8.1. It is called l_cce_pc_8.1.xxx. (Note the "e" in the filename). There are also versions l_cc_... that do not create code for EM64T. $ file stream_d stream_d: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped This is what I get on my dual Opteron 1.6 GHz workstation (using Gentoo Linux, icc -O3): Function Rate (MB/s) RMS time Min time Max time Copy: 1769.1100 0.2923 0.2894 0.3154 Scale: 1695.7910 0.3021 0.3019 0.3023 Add: 1646.2104 0.4668 0.4665 0.4670 Triad: 1793.2239 0.4286 0.4283 0.4294 Markus -- Markus Baumgartner Institute of Graphics and Parallel Processing, JKU Linz, Austria www.gup.uni-linz.ac.at From i.kozin at dl.ac.uk Thu Dec 16 05:27:33 2004 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... Message-ID: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> Hi Bill, very interesting results. > Ah, got icc-8.1 to cooperate, dual 2.2 Ghz opteron+pc3200+2.4 kernel, > 915.5MB array: > -O1 > Function Rate (MB/s) Avg time Min time Max time > Copy: 2285.8039 0.2640 0.2800 0.3200 > Scale: 2206.9798 0.2690 0.2900 0.3000 > Add: 2341.5554 0.3740 0.4100 0.4200 > Triad: 2181.9031 0.4060 0.4400 0.4800 > > -O2 > Function Rate (MB/s) Avg time Min time Max time > Copy: 2370.4856 0.2570 0.2700 0.3400 > Scale: 2285.8280 0.2670 0.2800 0.3400 > Add: 2461.6513 0.3710 0.3900 0.4600 > Triad: 2285.8229 0.3920 0.4200 0.5000 pls note that your "average time" is sometimes less than "min time". > -O3 > Function Rate (MB/s) Avg time Min time Max time > Copy: 2461.5867 0.2730 0.2600 0.3400 > Scale: 2370.4237 0.2910 0.2700 0.3500 > Add: 2526.3684 0.4050 0.3800 0.4800 > Triad: 2341.5151 0.4320 0.4100 0.5100 > > The strange thing is they are 32 bit binaries, despite being built > on a 64 bit os on a 64 bit hardware. how do you know they are not 64bit? From what I see it is. quad 2.2 Opteron, 9 GB, SLES 9, 2.4.21 it seems my memory is a bit slower than yours. PARAMETER (n=32000000,offset=0,ndim=n+offset,ntimes=50) i.e. using 732 MB pathscale 1.4 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 3555.5778 0.1444 0.1440 0.1450 Scale: 3483.0084 0.1473 0.1470 0.1480 Add: 3588.8372 0.2142 0.2140 0.2150 Triad: 3605.6772 0.2134 0.2130 0.2140 ifort -O3 -xW Copy: 3657.1588 0.1458 0.1400 0.1500 Scale: 3657.1588 0.1475 0.1400 0.1500 Add: 3490.9503 0.2273 0.2200 0.2300 Triad: 3339.1509 0.2317 0.2300 0.2400 > Not sure why the timer is so lousy, > I had to make the array large to get a reasonably accurate time: This is indeed another interesting point. I'd really like to understand it. In addition when I re-run stream the rates vary quite a bit despite the high loop count (50) and very small std dev (min & max are pretty close). e.g. two more times ifort -O3 -xW Copy: 2560.0146 0.2094 0.2000 0.2200 Scale: 2560.0146 0.2094 0.2000 0.2200 Add: 2477.4389 0.3219 0.3100 0.3300 Triad: 2400.0023 0.3285 0.3200 0.3300 Copy: 3657.1588 0.1454 0.1400 0.1500 Scale: 3657.1588 0.1473 0.1400 0.1500 Add: 3490.9503 0.2256 0.2200 0.2300 Triad: 3339.1371 0.2300 0.2300 0.2300 Igor > I played around with various mentioned optimizations (including -xW) > on the manpage, I never managed a 64 bit binary with icc-8.1 though. > The man page has numerous i32em and em64t references. > > > > > -- > Bill Broadley > Computational Science and Engineering > UC Davis > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > I. Kozin (i.kozin at dl.ac.uk) CCLRC Daresbury Laboratory tel: 01925 603308 http://www.cse.clrc.ac.uk/disco From jlb17 at duke.edu Thu Dec 16 06:05:53 2004 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: <20041216022956.GC16587@cse.ucdavis.edu> Message-ID: On Thu, 16 Dec 2004 at 8:08am, Robert G. Brown wrote > On Wed, 15 Dec 2004, Bill Broadley wrote: > > > Group reply: > > > > On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote: > > > Just for those of you who were asking after AMD64's as viable compute > > > platforms, I just ran stream and the bogomflops benchmark in my renamed > > > "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ > > > > That is a s754 amd64? > > Yes (as per earlier discussion, an Asus K8NE, but I should have restated > it -- the P2 is an MSI mobo but I'm downstairs and don't remember which > one). ^^ You keep using that word. I do not think it means what you think it means. \end[castilian]{accent} Didn't your initial post state that the Intel CPU tested was a "GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz)"? Or am I missing something (I haven't had my coffee yet, you see)? -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From kus at free.net Thu Dec 16 07:58:16 2004 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216062943.GA19789@cse.ucdavis.edu> Message-ID: In message from Bill Broadley (Wed, 15 Dec 2004 22:29:43 -0800): >> and important to some people. but almost any real code will take >> advantage of having twice as many registers (integer and SIMD). > >Indeed, assuming you have the source to recompile. > >> don't forget to turn on node interleave in the bios, too. > >Assuming a 2.4 kernel I believe that helps single processes running >on a dual, but doesn't help when 2 processes are running. With 2.6 I >think >it's usually faster to have node interleaving off with 2 processes >(if not >1). >> >> > Kudos for the pathscale-1.4 compiler with -O3. >> >> ironically, icc -xW generates pretty good-for-opteron code, >> though of course, it's 32b. I haven't tried using icc to >> generate em64t/and64 code. > >With 8.0 and 8.1 I can't seem to get it working on a RHEL x86-64 box >or a Rocks (RHEL-3 based) x86-64 box. > >I'll retry with a nacoma based machine and see if I can get the intel >compiler working. > >I get errors with 8.0 like: >/usr/bin/ld: skipping incompatible /opt/intel_cc_80/lib/libsvml.a >when searching for -lsvml >or with 8.1: >ld: skipping incompatible /usr/lib64/libm.so when searching for -lm > Yes, it's because ld need to know that you work w/32 bits, i.e. you should add -Wl,-melf_i386 (if I remember keys right :-)) Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow >-- >Bill Broadley >Computational Science and Engineering >UC Davis >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf From mbaumgar at gup.jku.at Thu Dec 16 12:28:26 2004 From: mbaumgar at gup.jku.at (Markus Baumgartner) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <20041216191532.GC27290@cse.ucdavis.edu> References: <20041216022956.GC16587@cse.ucdavis.edu> <20041216065952.GA20370@cse.ucdavis.edu> <41C1507C.5090101@gup.jku.at> <20041216191532.GC27290@cse.ucdavis.edu> Message-ID: <41C1EFEA.5060802@gup.jku.at> Bill Broadley wrote: >>You probably need to download the EM64T edition of the Intel compiler >>8.1. It is called l_cce_pc_8.1.xxx. (Note the "e" in the filename). >>There are also versions l_cc_... that do not create code for EM64T. >> >> > >lftp download.intel.com:/software/products/compilers/downloads> ls l_cc* >-r-xr-xr-x 1 owner group 1843 Sep 17 2002 l_cc_p_6.0.1.304.htm >-r-xr-xr-x 1 owner group 2867 Aug 27 2002 l_cc_p_6.0.139.htm >-r-xr-xr-x 1 owner group 1799 Nov 20 2002 l_cc_p_7.0.065.htm >-r-xr-xr-x 1 owner group 64921600 Nov 20 2002 l_cc_p_7.0.065.tar >-r-xr-xr-x 1 owner group 63406080 Apr 1 2003 l_cc_p_7.1.006.tar >-r-xr-xr-x 1 owner group 67399682 Dec 8 2003 l_cc_p_8.0.055.tar.gz >-r-xr-xr-x 1 owner group 131436906 Sep 14 19:12 l_cc_p_8.1.021.tar.gz >-r-xr-xr-x 1 owner group 133091963 Dec 2 22:20 l_cc_pu_8.1.024.tar.gz >lftp download.intel.com:/software/products/compilers/downloads> > >I don't see one. > > The files cannot be found on the public FTP server. You have to register at the intel.com site to get them. They usually have newer releases there than on the FTP, too. $ ls -l l_* -rw-r--r-- 1 mbaumgar gup 19232 Dec 3 12:04 l_cce_pc_8.1.022_RN.zip -rw-r--r-- 1 mbaumgar gup 15359419 Dec 3 12:04 l_cce_pc_8.1.023.tar.gz -rw-r--r-- 1 mbaumgar gup 20260 Dec 3 12:04 l_fce_pc_8.1.022_RN.zip -rw-r--r-- 1 mbaumgar gup 17789430 Dec 3 12:04 l_fce_pc_8.1.023.tar.gz Markus -- Markus Baumgartner Institute of Graphics and Parallel Processing, JKU Linz, Austria www.gup.uni-linz.ac.at From mathog at mendel.bio.caltech.edu Thu Dec 16 14:12:04 2004 From: mathog at mendel.bio.caltech.edu (David Mathog) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... Message-ID: > Well, stream is as much a memory bandwidth test as it is a floating > point test per se anyway. I always hope for something dramatic when I > use faster/wider memory, but usually reality is fairly sedate. > > Enjoy and prefetch! > > rbw > > gcc-3.2.3 -O4 -Wall -pedantic: > Function Rate (MB/s) RMS time Min time Max time > Copy: 2004.8056 0.0095 0.0080 0.0099 > Scale: 2044.7551 0.0099 0.0078 0.0105 > Add: 2272.3092 0.0133 0.0106 0.0137 > Triad: 2237.3599 0.0134 0.0107 0.0137 > > gcc-3.2.3 -O4 -fprefetch-loop-arrays -Wall -pedantic: > Function Rate (MB/s) RMS time Min time Max time > Copy: 3259.9273 0.0049 0.0049 0.0052 > Scale: 3294.9803 0.0049 0.0049 0.0049 > Add: 3306.7241 0.0073 0.0073 0.0073 > Triad: 3349.1914 0.0072 0.0072 0.0072 > That was what happened even on the Athlon MP - the prefetch tricks made a pretty big difference, although not so great percentage wise, apparently, as for the Opteron / Athlon64. 1.2 was a typical prefetch/normal ratio, whereas here it seems to be 1.6. Much better prefetch in the newer chips. Stream is pretty simple code though and I have found a couple of instances in the last few years in other programs where just enabling -prefetch in gcc didn't work that well - the prefetch pattern was too complex for the compiler to figure out. Maybe later versions of gcc have fixed this. Anyway, to hand tune prefetches in gcc add something like this: # # Prefetch 192 bytes ahead of the current pointer. # The "w" form is for data that will be written. # How far upstream to prefetch depends on the code. # Prefetch too close and it won't be in cache when needed. # Prefetch too far and it may swap out before it gets used. # #if defined(AMD_PREFETCH) static __inline__ void CPU_prefetchwR(const void *s) { __asm__ ("prefetchw 192(%0)" :: "r" ((s)) ); } static __inline__ void CPU_prefetchR(const void *s) { __asm__ ("prefetch 192(%0)" :: "r" ((s)) ); } #endif And then sprinkle these in as needed: #if defined(AMD_PREFETCH) CPU_prefetchR(&a[i]); CPU_prefetchwR(&c[i]); #endif for instance, before this: a[i]=c[i]; For more or less sane code you can generally just do a few runs varying the 192 (above) to place the prefetch in the optimal position. That's only for sane code though. For something awful like this: b=a[i]; d=c[b]; f[d]=e[b]; you have to know a priori what's (likely) to be in a[] and c[] to guess ahead of time what b,d will be, so that f[d] and e[b] can be prefetched. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From akhtar_samo at yahoo.com Thu Dec 16 14:48:56 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] About Bench Marks Message-ID: <20041216224856.5166.qmail@web20027.mail.yahoo.com> Hi, Actually I want to run an benchmark on my MPICH based cluster, its just a 4 node cluster. Kindly let me know what to do , where to get it from with proper installation method Akhtar --------------------------------- Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. Learn more. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041216/9ac6b963/attachment.html From jrajiv at hclinsys.com Thu Dec 16 20:52:58 2004 From: jrajiv at hclinsys.com (Rajiv) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] HPC and SAN Message-ID: <003f01c4e3f4$47695fb0$0f120897@PMORND> Dear All, Is there any thing like Beowulf cluster and SAN. I would like to have all the data in the Beowulf cluster to be in SAN also. Pls excuse if in case you find my question silly. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041217/152fcea7/attachment.html From michael at beethovenweb.com Fri Dec 17 11:19:52 2004 From: michael at beethovenweb.com (Michael R. Hines) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] looking for a switch to purchase Message-ID: <6.2.0.14.2.20041217141948.02ff0a10@garnet.acns.fsu.edu> Hi everyone, I'm a graduate student at Florida State University and I'm looking to buy a basic, level 2 gigabit switch that supports jumbo frames. I only need a few ports on this switch: no more than 16, really. From the searching on the internet I've been doing, I'm having trouble understanding exactly how to eliminate the switches from different companies. There are tons of them, and many of them have support for other levels, ( 3 through 7), which I don't really need. My price range is a maximum of $500. Anyone have any suggestions? Your help is much appreciated. /* ----------------------------------------------- */ Michael R. Hines Graduate Student, Dept. Computer Science Florida State University Jusqu'? ce que le futur vienne........ /* ----------------------------------------------- */ From laytonjb at charter.net Fri Dec 17 12:30:44 2004 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: <200412171222.07796.mwill@penguincomputing.com> References: <200412171222.07796.mwill@penguincomputing.com> Message-ID: <41C341F4.6020304@charter.net> Gotcha - thanks for the catch! Jeff >Quite impressive - the compute nodes have 256 bytes of ram ? > >Michael >PS: put in 'mega' there >On Friday 17 December 2004 12:02 pm, Douglas Eadline, Cluster World Magazine wrote: > > >>FYI, starting in January Jeff Layton and I will be running a series >>in ClusterWorld magazine on how to build a cluster for under $2500 >>(that is right two zeros). >> >>Obviously not a great interest to the hardcore readers of the list, but >>maybe of some interest to the newbies. You can find out more (and >>pictures) at: >> >>http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 >> >>Doug >>---------------------------------------------------------------- >>Editor-in-chief ClusterWorld Magazine >>Desk: 610.865.6061 >>Fax: 610.865.6618 www.clusterworld.com >> >> >> From nathan at iwantka.com Fri Dec 17 12:28:48 2004 From: nathan at iwantka.com (Nathan Littlepage) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: Message-ID: <200412172025.iBHKPEKZ029905@bluewest.scyld.com> Anything that's a value add is of interest. > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of Douglas > Eadline, Cluster World Magazine > Sent: Friday, December 17, 2004 2:03 PM > To: Beowulf Mailing List > Subject: [Beowulf] The Value Cluster > > > FYI, starting in January Jeff Layton and I will be running a > series in ClusterWorld magazine on how to build a cluster for > under $2500 (that is right two zeros). > > Obviously not a great interest to the hardcore readers of the > list, but maybe of some interest to the newbies. You can find > out more (and > pictures) at: > > http://www.clusterworld.com/article.pl?sid=04/12/17/1954203 > > Doug > ---------------------------------------------------------------- > Editor-in-chief ClusterWorld Magazine > Desk: 610.865.6061 > Fax: 610.865.6618 www.clusterworld.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org To change your > subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From josip at lanl.gov Thu Dec 16 08:17:06 2004 From: josip at lanl.gov (Josip Loncaric) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: References: Message-ID: <41C1B502.2080900@lanl.gov> Robert G. Brown wrote: > [...] One can see how having 64 bits would really > speed up 64 bit division compared to doing it in software across > multiple 32 bit registers... Correct me if I'm wrong, but doesn't the floating point unit normally use an internal iterative process to perform the division? This would not involve 32-bit registers... I'm not so sure about *integer* 64-bit division. Integer division may involve multiple 32-bit integer registers. Good ole' Cray-1 used an iterative process for floating point division which worked like this: given a floating point number x, use the first 8 bits of the mantissa to index into a lookup table containing initial guesses, then do a few steps of Newton-Raphson iteration involving only multiply-add operations to get the fully converged reciprocal mantissa, fix the exponent, thus obtaining 1/x, then multiply y*(1/x) to get y/x. As I recall, the famous Pentium FDIV bug involved some corner cases in a similar iterative process, all of which is internal to the floating point unit. Moreover, in addition to following the 32/64-bit IEEE 754 standard for floating point arithmetic, some implementations (e.g. Pentium, Opteron) support x87 legacy internal 80-bit representations of floating point numbers, which can really help when accumulating long sums and computing square roots, etc. Prof. Kahane has numerous arguments in favor of this internal 80-bit representation... Sincerely, Josip From hvidal at tesseract-tech.com Thu Dec 16 22:06:50 2004 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] benchmaster In-Reply-To: References: Message-ID: <41C2777A.9060002@tesseract-tech.com> Many, many thanks for this little suite of software stuff..... hv Robert G. Brown wrote: > > > Main page: > > http://www.phy.duke.edu/~rgb/General/benchmaster.php > > Sample/example page: > > http://www.phy.duke.edu/~rgb/General/benchmaster/lucifer.html > > I've worked on this enough that I'm HOPING that it is largely decrufted > and reasonably bug free and even tolerably documented. > > But no guarantees. It IS free software, after all...;-) > > rgb > From joachim at ccrl-nece.de Thu Dec 16 07:32:23 2004 From: joachim at ccrl-nece.de (Joachim Worringen) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <41C1A6DD.2020208@ahpcrc.org> References: <20041216022956.GC16587@cse.ucdavis.edu> <20041216102142.GA1649@greglaptop.t-mobile.de> <41C1A6DD.2020208@ahpcrc.org> Message-ID: <41C1AA87.3000103@ccrl-nece.de> Richard Walsh wrote: > X1cc -c -h inline3,scalar3,vector3 -h stream0 > Function Rate (MB/s) RMS time Min time Max time > Copy: 7600.2280 0.0022 0.0021 0.0022 > Scale: 7600.5529 0.0024 0.0021 0.0030 > Add: 9259.1164 0.0026 0.0026 0.0027 > Triad: 9360.5935 0.0026 0.0026 0.0026 I think you should check the X1 numbers - it should have a much higher bandwidth (about 20GB/s/CPU, IIRC). Or am I missing something with this SSP issue? Joachim -- Joachim Worringen - NEC C&C research lab St.Augustin fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de From lindahl at pathscale.com Fri Dec 17 21:58:00 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> References: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> Message-ID: <20041218055800.GA2181@greglaptop.attbi.com> On Thu, Dec 16, 2004 at 01:27:33PM -0000, Kozin, I (Igor) wrote: > quad 2.2 Opteron, 9 GB, SLES 9, 2.4.21 > it seems my memory is a bit slower than yours. Perhaps node interleave is set "on" in your BIOS? It should be off. Then again, with 9 GB of mem on a quad, you may not have a symmetrical memory configuration, unless it's 2.25 GB/cpu... -- greg From lindahl at pathscale.com Fri Dec 17 22:00:30 2004 From: lindahl at pathscale.com (Greg Lindahl) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] MPI Implementations for SMP use In-Reply-To: <1103144165.16504@van.jrtad.com> References: <1103144165.16504@van.jrtad.com> Message-ID: <20041218060030.GB2181@greglaptop.attbi.com> On Wed, Dec 15, 2004 at 08:56:05PM +0000, Eric Thibodeau wrote: > Probably opening up another can of worms but, you might want to > seriously consider a hybrid (MPI + OpenMP or Pthreads) approach in > the case of SMP machines. Local exeution of OpenMP generated code is > considerably faster than the locally executed MPI > equivalent. You would think that, but the nice thing about pure MPI is that locality is perfect. So in most cases, a pure MPI code beats a hybrid code. That's good, because hybrid programming is more complicated than straight MPI or straight OpenMP. Most of the folks interested in hybrid models a few years ago have now given it up. -- greg From maurice at harddata.com Fri Dec 17 21:20:59 2004 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] Re: looking for a switch to purchase (Michael R. Hines) In-Reply-To: <200412180457.iBI4vdZY009491@bluewest.scyld.com> References: <200412180457.iBI4vdZY009491@bluewest.scyld.com> Message-ID: <41C3BE3B.3010801@harddata.com> The DLink DGS-1224T is a pretty good bang for the buck. 24 ports of GbE, 2 uplinks, and right on your budget point of $500 >Hi everyone, > > I'm a graduate student at Florida State University and I'm looking >to buy a basic, level 2 gigabit switch that supports jumbo frames. I only >need a few ports on this switch: no more than 16, really. > > From the searching on the internet I've been doing, I'm having trouble >understanding exactly how to eliminate the switches from different >companies. There are tons of them, and many of them have support for other >levels, ( 3 through 7), which I don't really need. > >My price range is a maximum of $500. > >Anyone have any suggestions? Your help is much appreciated. > >/* ----------------------------------------------- */ >Michael R. Hines With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 This email, message, and content, should be considered confidential, and is the copyrighted property of Hard Data Ltd., unless stated otherwise. From landman at scalableinformatics.com Sat Dec 18 07:24:37 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] MPI Implementations for SMP use In-Reply-To: <41BF72F1.2090707@pobox.com> References: <41BF72F1.2090707@pobox.com> Message-ID: <41C44BB5.2050404@scalableinformatics.com> Hi Andy: Short version: Use shared memory based devices (--comm=shared, or use ch_shmem for MPI that will not go outside of this box) for MPI on a single SMP node. OpenMP is (IMO, and no, this is not intended to be troll bait) very easy to use/program with. MPI is (also IMO, and also this is not intended to be troll bait) a bit harder to program with, but, and this is quite important, it forces you to think in distributed mode, which means that you typically are more focused upon the job of spatial localization of data, and data layout in general. That stricter discipline is very helpful. There will be people who take issue with both of these statements, and it is worth while to have a healthy discussion of the relative merits of each. I should be covering a little of this in a class I am going to teach next semester (see http://www.cs.wayne.edu/winter2005courses.htm# ) . Joe Andrew D. Fant wrote: > This may be opening up a can of worms, but does anyone have any > information about the relative merits of the various open > implementations of MPI in SMP systems? I'm setting up a Linux server > with multiple CPUs and I wanted to know if one implementation is > significantly faster than others under these conditions. > > Thanks, > Andy > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From james.p.lux at jpl.nasa.gov Sat Dec 18 07:30:37 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster References: Message-ID: <001401c4e516$866a9670$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Douglas Eadline, Cluster World Magazine" To: "Jim Lux" Cc: "Beowulf Mailing List" Sent: Friday, December 17, 2004 5:57 PM Subject: Re: [Beowulf] The Value Cluster > On Fri, 17 Dec 2004, Jim Lux wrote: > > > A year or so back, there was a discussion on the list about building a > > cluster entirely of stuff bought from WalMart (based on the $199 computers > > Walmart was selling back then). > > > The Walmart systems are what started this idea. I had purchased a $199 box > with a VIA C3, then some Duron boxes. The problem with the > "Walmart/Microtel" approach was that the consistency of the boxes was not > very good. Indeed, several times I got something better than advertised! > Motherboards would change, larger hard-drives, slightly faster processor, > etc. So it could be hard building a reproducible cluster. Plus the cases > they were using were vented on sides -- not good for close stacking. So we > decided it would be better to use parts that were available from several > sources so our efforts could be easily duplicated. My Microtel is somewhat noisy, as well. In your $2500 cluster challenge you laid out some requirements, but didn't have something like "minimum number of processors"... Is a cluster of two legal? I know you're using 8, which seems a reasonable number to demonstrate lots of things (and you certainly run into scaling problems by that time). Do they have to be in cases? Think of the several clusters made of bare mobos stacked on a shelf, or threaded rod, or,.... Save yourself $50/case, and on 8 cases, you might buy another processor? Since the fundamental purpose of a $2500 cluster is pedagogy, it's instructive to go through this tradeoff for all scales, from 4 processors to 1024. One suggestion I have is to assign some value or limit to "fooling around time", just to keep the concept of the dirt-cheap cluster sound. Jim Lux From jimlux at earthlink.net Sat Dec 18 07:52:41 2004 From: jimlux at earthlink.net (Jim Lux) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] $2500 cluster. What it's good for? Message-ID: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> I think it would be interesting to contemplate potential uses of a $2500 cluster. Once you've had the thrill of putting it together and rendering something with POVray, what next? You want to avoid the "gosh, I can run 8 times as many Seti@Home units as I could before" or "Look, I can calculate Pi" kind of not-particularly-value-laden-to-the-casual-observer tasks. Sure, there's some value in learning how to build and manage a cluster, but I think the real value is in doing something useful with that $2500. So, what sort of "useful" could one do? Say you were to negotiate with your spouse to get $2500 to play with (or you were able to get a "mini-grant" at a high school). Is there something that is useful to the "general consumer public" that could be done better with a cluster than with a $2500 desktop machine? One computationally intensive task that might be applicable is making panoramas from multiple digital photos. It's incredibly tedious and time consuming to stitch together 30 or 40 digital photos into one seamless panorama (google for PanoTools and PTGui for ideas). What about kids in school? Is there some simulation that, if clusterized, would be more interactive and useful? What about interactive rendering from one of NASA's world view databases: layering the terrain models and imagery to do "fly bys"? Are there consumer type iterative optimization problems that could profit from a cluster? In my own fooling around, I do lots of antenna simulations, which are essentially embarassingly parallel. The ham radio community likes "scrounged and homebuilt" solutions to problems, so the $2500 cluster is a potential winner there. What about outreach to poverty stricken branches of academe who don't use computers much? literary analysis searching texts for common phrases? figuring out how to fit potsherds together? Jim Lux From hahn at physics.mcmaster.ca Sat Dec 18 09:45:51 2004 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: <003f01c4e3f4$47695fb0$0f120897@PMORND> Message-ID: > Is there any thing like Beowulf cluster and SAN. sure, but why? SAN is just short for "breathtakingly expensive fibrechannel storage nonesense". but if you've got the money, there's no reason you couldn't do it. put a FC HBA on each node and plug them all into some godawful FC switch that your storage targets are also plugged into. there's the rub: even a small beowulf cluster these days is, say, 64 nodes, and to be at all interesting, bandwidth-wise, you'll need approximately 64 storage targets. oops! what's the hang-up on SAN? just that you've bought the marketing crap about how SAN managability is the only way to go? I find that the managability/virtualization jabber comes from "enterprise" folk, who really have no clue about HPC. for instance, I basically never want to partition anything - as big storage chunks as possible means better sharing of resources. and I don't change the chunks either, I add more bigger/faster chunks. (at least in the funding environment here, where money comes in large chunks at multi-year intervals.) > I would like to have all the data in the Beowulf cluster to be in SAN > also. Pls excuse if in case you find my question silly. it's like asking whether you can do webserving from beowulf. sure you can, and it might even make sense in some niche. but beowulf is mostly about message-passing HPC. as such, it often has serious IO issues, but SAN solves a different problem (how to take a slice of a FC volume from enginering because the accounting DB needs more space.) that said, the current HPC trend of using fast cluster interconnects along with filesystems like lustre/pvfs could be considered a SAN approach. technically, I'd say it's between SAN and NAS, since the protocol is some block-like (SAN) properties, and some file-level (NAS) ones... regards, mark hahn. From mwill at penguincomputing.com Sat Dec 18 10:33:53 2004 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: <003f01c4e3f4$47695fb0$0f120897@PMORND> References: <003f01c4e3f4$47695fb0$0f120897@PMORND> Message-ID: <200412181033.54167.mwill@penguincomputing.com> A lot of times clusters need fast access to terabytes of storage. The cheap solution is to have DAS (directly attached storage) added to a separate fileserver or even just the headnode, serving it via NFS through gigabit ethernet. Upside: no special software needed. Downside: bottleneck and single point of failure in the fileserver. The more extreme counter-example is to have a fibre channel card in each compute node and hook that up to a fibre switch which has fibre-attached storage attached as well. Unless you want every compute node have their own separate volume, you then still need a cluster filesystem that allows read/write access to the same volume from more than one node - the linux filesystems as they come out of the box are not capable of doing so. Think about the cached data invisible to the other nodes etc. Veritas has something called VxFS that could be used for that, and there also special cluster-filesystems like gfs and lustre that are supposed to solve that problem. In that case, you can also have just some compute nodes act as storage nodes, and so you don't need fibre channel cards in all of them. The storage nodes then act similar to redundant nfs servers. Another interesting case is PVFS (and hopefully soon PVFS2) that accumulates local storage of the nodes into a parallel virtual filesystem allowing distributed storage and access. In case of PVFS the data is not distributed redundandly, which means that one node going down means part of your filesystem data disappears - so unless you have rock solid nodes connected to a UPS, this might be good only for a large fast /tmp. Michael Will On Thursday 16 December 2004 08:52 pm, Rajiv wrote: > Dear All, > Is there any thing like Beowulf cluster and SAN. I would like to have all the data in the Beowulf cluster to be in SAN also. Pls excuse if in case you find my question silly. > > Regards, > Rajiv -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com From deadline at linux-mag.com Sat Dec 18 12:03:52 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] The Value Cluster In-Reply-To: <001401c4e516$866a9670$32a8a8c0@LAPTOP152422> Message-ID: On Sat, 18 Dec 2004, Jim Lux wrote: > > ----- Original Message ----- > From: "Douglas Eadline, Cluster World Magazine" > To: "Jim Lux" > Cc: "Beowulf Mailing List" > Sent: Friday, December 17, 2004 5:57 PM > Subject: Re: [Beowulf] The Value Cluster > > --snip-- > > My Microtel is somewhat noisy, as well. > > In your $2500 cluster challenge you laid out some requirements, but didn't > have something like "minimum number of processors"... Is a cluster of two > legal? I know you're using 8, which seems a reasonable number to > demonstrate lots of things (and you certainly run into scaling problems by > that time). Well in the magazine we mention that you can certainly use less (or more) nodes. And, a two node/X-over cable cluster should not break the smallest budget. We chose the 8/$2500 because it was a good balance between nodes/cost (and as you know most small switches have 5 or 8 ports). we also want to have a nice "pile of CPUs" to play with later. Of course there some other issue such as using standard household/office electrical service, heat, and space. BTW, we will be introducing some scripts to do "poweroff"/"Wake on LAN" kinds of things so that if you are paying the electric bill and the cluster is idle, you can cut down on the power usage. This is also why homogeneous hardware is important -- "If you use what we use, then we can be sure it will work for you kind of thing." > > Do they have to be in cases? Think of the several clusters made of bare > mobos stacked on a shelf, or threaded rod, or,.... Save yourself $50/case, > and on 8 cases, you might buy another processor? > Since the fundamental purpose of a $2500 cluster is pedagogy, it's > instructive to go through this tradeoff for all scales, from 4 processors to > 1024. Well, this could get interesting. My view is that there should be no custom work/parts involved and there should be enough "best practice" information to support problems. Let me explain. You could save $40 for the cases, but then that requires some other kind of custom work which means you are out on a limb of sorts. For instance, you could be more space efficient, but the hassle of mounting motherboards, power supplies in some custom enclosure would soon out weigh the an easier commodity approach. Plus, if you run into problems, there are plenty of books and web sites that explain how to "build a basic PC" with commodity parts. I am often tempted to say, "Gee these Micro ATX boards are so small, I could just ...", but then it becomes a custom project and harder to reproduce. One of the goals is to provide a low cost and reproducible cluster on which people can play. The way I figure it, the more "cluster play" the better. > > One suggestion I have is to assign some value or limit to "fooling around > time", just to keep the concept of the dirt-cheap cluster sound. > The idea for the magazine series is to minimize this time. We are providing an 8 node recipe and if you follow it your chances of success are pretty good. You can also stray a bit from the path and invest some "fooling around time" and build a variation of the "recipe". The article also explains much of the rational we used in selecting components. I believe one of the biggest time and cost savings is that we identified low cost parts that work well. Of course, they are not "server quality", but at least we know they work as described (which can not be guaranteed for all low coat "value hardware"). For instance, I believe one of the triumphs was finding an inexpensive case that was well constructed and did not slice up you fingers. Doug ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Fax: 610.865.6618 www.clusterworld.com From alex at DSRLab.com Sat Dec 18 09:04:26 2004 From: alex at DSRLab.com (Alex Vrenios) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] FW: $2500 cluster. What it's good for? Message-ID: <200412181700.iBIH0G4v030532@bluewest.scyld.com> Hi, The SETI@home group could use some extra cycles, and I understand that the AIDS research people have a similar approach. The question now becomes: "How do I run a Win screen saver on my cluster? Speaking of "value clusters" my 8-node Compaq DeskPro 386 cluster cost me just under $2000 back in 1995. It runs Linux Red Hat 4.2 because nothing more recent would fit on a 30 MB hard drive! I am looking into a project for it. Running MOSIX might be fun and I would love to parallelize the SETI code - any thoughts? The only math reference I could find is Papagiannis, "The Search for Extraterrestrial Life: Recent Developments," Int'l Astro Union, Symp #112. Dr. Alex in Phoenix > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jim Lux > Sent: Saturday, December 18, 2004 8:53 AM > To: beowulf@beowulf.org > Subject: [SPAM] [Beowulf] $2500 cluster. What it's good for? > > I think it would be interesting to contemplate potential uses > of a $2500 cluster. Once you've had the thrill of putting it > together and rendering something with POVray, what next? > > You want to avoid the "gosh, I can run 8 times as many > Seti@Home units as I could before" or "Look, I can calculate > Pi" kind of not-particularly-value-laden-to-the-casual-observer tasks. > > Sure, there's some value in learning how to build and manage > a cluster, but I think the real value is in doing something > useful with that $2500. So, what sort of "useful" could one > do? Say you were to negotiate with your spouse to get $2500 > to play with (or you were able to get a "mini-grant" at a > high school). Is there something that is useful to the > "general consumer public" that could be done better with a > cluster than with a $2500 desktop machine? > > One computationally intensive task that might be applicable > is making panoramas from multiple digital photos. It's > incredibly tedious and time consuming to stitch together 30 > or 40 digital photos into one seamless panorama (google for > PanoTools and PTGui for ideas). > > What about kids in school? Is there some simulation that, if > clusterized, would be more interactive and useful? > > What about interactive rendering from one of NASA's world > view databases: > layering the terrain models and imagery to do "fly bys"? > > Are there consumer type iterative optimization problems that > could profit from a cluster? In my own fooling around, I do > lots of antenna simulations, which are essentially > embarassingly parallel. The ham radio community likes > "scrounged and homebuilt" solutions to problems, so the $2500 > cluster is a potential winner there. > > What about outreach to poverty stricken branches of academe > who don't use computers much? literary analysis searching > texts for common phrases? > figuring out how to fit potsherds together? > > Jim Lux > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org To change your > subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From ed at eh3.com Sat Dec 18 07:40:56 2004 From: ed at eh3.com (Ed Hill) Date: Wed Nov 25 01:03:38 2009 Subject: [Beowulf] MPI Implementations for SMP use In-Reply-To: <20041218060030.GB2181@greglaptop.attbi.com> References: <1103144165.16504@van.jrtad.com> <20041218060030.GB2181@greglaptop.attbi.com> Message-ID: <1103384456.5441.92.camel@localhost.localdomain> On Fri, 2004-12-17 at 22:00 -0800, Greg Lindahl wrote: > On Wed, Dec 15, 2004 at 08:56:05PM +0000, Eric Thibodeau wrote: > > > Probably opening up another can of worms but, you might want to > > seriously consider a hybrid (MPI + OpenMP or Pthreads) approach in > > the case of SMP machines. Local exeution of OpenMP generated code is > > considerably faster than the locally executed MPI > > equivalent. > > You would think that, but the nice thing about pure MPI is that > locality is perfect. So in most cases, a pure MPI code beats a hybrid > code. That's good, because hybrid programming is more complicated than > straight MPI or straight OpenMP. > > Most of the folks interested in hybrid models a few years ago have now > given it up. Hi Greg, Do you have any references concerning hybrid pthreads+MPI vs. MPI-only on clusters of SMP/NUMA systems? I'm not at all trying to dispute your claim! I'd just like to learn more about the details. I'm interested because our code (MITgcm.org) has the ability (at least, theoretically) to do ptheads, MPI, or hybrids though people rarely use pthreads and almost never try hybrid arrangements. It seems that there might be some very real benefits from the hybrid approach. But thats just my intuition speaking! Perhaps there are MPI implementations that are competitive with threads since they can very efficiently handle this "two-level" nature of communication: keeping it local for processes on the same node while still *simultaneously* taking care of the over-a-network bits for the rest. Ed -- Edward H. Hill III, PhD office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave. Cambridge, MA 02139-4307 emails: eh3@mit.edu ed@eh3.com URLs: http://web.mit.edu/eh3/ http://eh3.com/ phone: 617-253-0098 fax: 617-253-4464 From landman at scalableinformatics.com Sat Dec 18 17:39:46 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: <200412181033.54167.mwill@penguincomputing.com> References: <003f01c4e3f4$47695fb0$0f120897@PMORND> <200412181033.54167.mwill@penguincomputing.com> Message-ID: <41C4DBE2.40204@scalableinformatics.com> Michael Will wrote: >Veritas has something called VxFS that could be used for that, and there also special cluster-filesystems > > Hmmm... Last I heard VxFS was limited to 4 or 8 hosts. Not very HPC like... >like gfs and lustre that are supposed to solve that problem. In that case, you can also have just some >compute nodes act as storage nodes, and so you don't need fibre channel cards in all of them. The >storage nodes then act similar to redundant nfs servers. > > I remain skeptical on the value proposition for a SAN in a cluster. In short, you need to avoid single points of information flow within clusters. The absolute best aggregate bandwidth you are going to get will be local storage. At 50+ MB/s, a SATA drive in a compute node multipled by N compute nodes rapidly outdistances all (save one) hardware storage design that I am aware of. And it does it at a tiny fraction of the cost. Unfortunately you have N namespaces for your files (think of the file URI as file://node/path/to/filename.ext, and the value of "node" varies). Most code designs assume a single shared storage, or common namespace for the files. This is where the file systems folks earn their money (well one does anyway IMO). >Another interesting case is PVFS (and hopefully soon PVFS2) that accumulates local storage of the >nodes into a parallel virtual filesystem allowing distributed storage and access. In case of PVFS > > Having used PVFS (or at least tried to use PVFS) for a project, I discovered rather quickly some of the missing functionality (soft links, etc), resulted in large chunks of wrapper code not working (and no, it made no sense to change the wrapper code to suit this file system), and at least 2 MPI codes that I played with did not like it. I don't want to knock all the hard work that went into it, but I am not sure I would try PVFS2 without a very convincing argument that it implements full unix file system (POSIX) interfaces, and things work transparently. >the data is not distributed redundandly, which means that one node going down means part of >your filesystem data disappears - so unless you have rock solid nodes connected to a UPS, this >might be good only for a large fast /tmp. > > There are alternatives to this that work today. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From rgb at phy.duke.edu Sun Dec 19 07:26:06 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> References: <77673C9ECE12AB4791B5AC0A7BF40C8F15430B@exchange02.fed.cclrc.ac.uk> Message-ID: > > Not sure why the timer is so lousy, > > I had to make the array large to get a reasonably accurate time: > > This is indeed another interesting point. I'd really like to understand it. > In addition when I re-run stream the rates vary quite a bit despite > the high loop count (50) and very small std dev (min & max are pretty close). One useful question is what kernel are you running? Running (e.g.) FC 1 and the 2.4 kernel one would expect to get a wide range of times because that kernel was more or less broken on X86_64 duals. I would see vastly different times for the same job run in multiple instances on the system, which would certainly increase the spread. I haven't observed this at all on FC2 (2.6 kernels) and so far FC3 is behaving itself on the AMD64. I've got a nasty cold and will probably stay tucked up with my laptop at least part of the day (it generates enough heat to function as a sort of hot-water bottle:-). I may try to run a couple of application-level benchmarks on metatron to get a rough idea of timings in mixed code. My microbenchmark results suggest that it will do pretty well in my Monte Carlo program because it involves sqrt, ln, exp and some trig per site in a lattice problem and is hence heavily CPU bound (as opposed to memory). If the CPU really does these things 2x-3x faster than a P4 AND has a higher clock than most of what I have available at home or at work, I will a happy camper be. BTW, anyone who tried to visit the benchmaster site I posted earlier and found the download link broken -- this was pointed out offline and it is all fixed now. I'm working on making all the stuff I build and maintain yum-downloadable, and my onsite php functions had to be hacked to support the requisite repository structure. There are binary rpm's for centos 3.3 (i386 only, which will probably work for RHEL even without a rebuild), FC2, FC3, and RH 9 (i386 only), as well as source RPMs and tarballs to match. I'd be very interested in feedback on how these work as straight RPMs. I've tested a download/install on a few of my own systems and they seem to work for me. Of course, the real fun in having a benchmarking program is being able to recompile it and play with things, but there is also at least some virtue in running the exact same code on multiple platforms. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Sun Dec 19 07:43:18 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] AMD64 results... In-Reply-To: <41C1B502.2080900@lanl.gov> References: <41C1B502.2080900@lanl.gov> Message-ID: On Thu, 16 Dec 2004, Josip Loncaric wrote: > Robert G. Brown wrote: > > [...] One can see how having 64 bits would really > > speed up 64 bit division compared to doing it in software across > > multiple 32 bit registers... > > Correct me if I'm wrong, but doesn't the floating point unit normally > use an internal iterative process to perform the division? This would > not involve 32-bit registers... > > I'm not so sure about *integer* 64-bit division. Integer division may > involve multiple 32-bit integer registers. > > Good ole' Cray-1 used an iterative process for floating point division > which worked like this: given a floating point number x, use the first 8 > bits of the mantissa to index into a lookup table containing initial > guesses, then do a few steps of Newton-Raphson iteration involving only > multiply-add operations to get the fully converged reciprocal mantissa, > fix the exponent, thus obtaining 1/x, then multiply y*(1/x) to get y/x. > > As I recall, the famous Pentium FDIV bug involved some corner cases in a > similar iterative process, all of which is internal to the floating > point unit. Moreover, in addition to following the 32/64-bit IEEE 754 > standard for floating point arithmetic, some implementations (e.g. > Pentium, Opteron) support x87 legacy internal 80-bit representations of > floating point numbers, which can really help when accumulating long > sums and computing square roots, etc. Prof. Kahane has numerous > arguments in favor of this internal 80-bit representation... This may well be -- I used to hand code the 8087 back on the IBM PC and thought that the 80 bit internal representation was peachy keen at the time. I haven't tracked precisely how the x87 coprocessor model has evolved (legacy or not) into P6-class processors, though -- the mixing of RISC, CISC, CISC-interpreted-to-RISC-onchip left me confused years ago. I was really just making an empirical observation, and struggling to understand it. As I pointed out yesterday, trancendental evals seem to be much faster as well, which would certainly be consistent with a resurrection of an efficient internal x87 architecture. If so, I'm all for it -- HPC code (at least MY HPC code:-) tends to have more than just triad-like operations on vectors -- things like the trig functions, exponentials and logs, floating point division. I remember when my Sun 386i could turn in a savage that compared pretty well with the otherwise much faster Sun 110 and Sparc 1 because it had a real CISC 80387 and Sun was doing all of its trancendental calls in (RISC) software. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From deadline at linux-mag.com Sun Dec 19 10:58:28 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> Message-ID: On Sat, 18 Dec 2004, Jim Lux wrote: > I think it would be interesting to contemplate potential uses of a $2500 > cluster. Once you've had the thrill of putting it together and rendering > something with POVray, what next? That is the $64,000 dollar question. Here is my 2 cent answer. BTW, your ideas are great. I would love to see a discussion like this continue because we all know the hardware is easy part! There is part of this project which has a "build it and they will come (and write software)" dream. Not being that naive, I believe there are some uses for systems like this. The indented audience are not the uber-cluster-geeks on this list, but rather the education, home, hacker, crowd. In regards to education, I think if cluster technology is readily available, then perhaps students will look to these technologies to solve problems. And who knows maybe the "Lotus 123 of the cluster" will be built by some person or persons with some low cost hardware and an idea everyone said would not work. If you have followed the magazine, you will see that we highlighted many open projects that are useful today. From an educational standpoint, a small chemistry/biology department that can do quantum chemistry, protein folding, or sequence analysis is pretty interesting to me. There are others ares as well. There are also some other immediate things like running Mosix or Condor on the cluster. A small group that has a need for a computation server could find this useful for single process computational jobs. I also have an interest in seeing a cluster version of Octave or SciLab set to work like a server. (as I recall rgb had some reasons not to use these high level tools, but we can save this discussion for later) What I can say as part of the project, we will be collecting a software list of applications and projects. Finally, once we all have our local clusters and software running to our hearts content, maybe we can think about a grid to provide spare compute cycles to educational and public projects around the world. Oh well, enough Sunday afternoon philosophizing. Doug > > You want to avoid the "gosh, I can run 8 times as many Seti@Home units as I > could before" or "Look, I can calculate Pi" kind of > not-particularly-value-laden-to-the-casual-observer tasks. > > Sure, there's some value in learning how to build and manage a cluster, but > I think the real value is in doing something useful with that $2500. So, > what sort of "useful" could one do? Say you were to negotiate with your > spouse to get $2500 to play with (or you were able to get a "mini-grant" at > a high school). Is there something that is useful to the "general consumer > public" that could be done better with a cluster than with a $2500 desktop > machine? > > One computationally intensive task that might be applicable is making > panoramas from multiple digital photos. It's incredibly tedious and time > consuming to stitch together 30 or 40 digital photos into one seamless > panorama (google for PanoTools and PTGui for ideas). > > What about kids in school? Is there some simulation that, if clusterized, > would be more interactive and useful? > > What about interactive rendering from one of NASA's world view databases: > layering the terrain models and imagery to do "fly bys"? > > Are there consumer type iterative optimization problems that could profit > from a cluster? In my own fooling around, I do lots of antenna simulations, > which are essentially embarassingly parallel. The ham radio community likes > "scrounged and homebuilt" solutions to problems, so the $2500 cluster is a > potential winner there. > > What about outreach to poverty stricken branches of academe who don't use > computers much? literary analysis searching texts for common phrases? > figuring out how to fit potsherds together? > > Jim Lux > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Cell: 610.390.7765 Redefining High Performance Computing Fax: 610.865.6618 www.clusterworld.com From james.p.lux at jpl.nasa.gov Sun Dec 19 12:02:32 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? References: Message-ID: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Douglas Eadline, Cluster World Magazine" To: "Jim Lux" Cc: Sent: Sunday, December 19, 2004 10:58 AM Subject: Re: [Beowulf] $2500 cluster. What it's good for? > On Sat, 18 Dec 2004, Jim Lux wrote: > > > I think it would be interesting to contemplate potential uses of a $2500 > > cluster. Once you've had the thrill of putting it together and rendering > > something with POVray, what next? > > That is the $64,000 dollar question. Here is my 2 cent answer. > BTW, your ideas are great. I would love to see a discussion like this > continue because we all know the hardware is easy part! > > There is part of this project which has a "build it and they will come > (and write software)" dream. Not being that naive, I believe there are > some uses for systems like this. The indented audience are not the > uber-cluster-geeks on this list, but rather the education, home, hacker, > crowd. In regards to education, I think if cluster technology is readily > available, then perhaps students will look to these technologies to solve > problems. And who knows maybe the "Lotus 123 of the cluster" will be built > by some person or persons with some low cost hardware and an idea everyone > said would not work. > > If you have followed the magazine, you will see that we highlighted > many open projects that are useful today. From an educational standpoint, > a small chemistry/biology department that can do quantum chemistry, > protein folding, or sequence analysis is pretty interesting to me. > There are others ares as well. I was thinking of the cluster video wall idea, however the video hardware would be kind of pricey (more than the cluster!). Something like using the cluster to provide the crunch to provide an immersive environment might be interesting. > > There are also some other immediate things like running Mosix or Condor > on the cluster. A small group that has a need for a computation server > could find this useful for single process computational jobs. This brings up an interesting optimization question. Just like in many things (I'm thinking RF amplifiers in specific) it's generally cheaper/more cost effective to buy one big thing IF it's fast enough to meet the requirements. Once you get past what ONE widget can do, then, you're forced to some form of parallelism or combining smaller widgets, and to a certain extent it matters not how many you need to combine (to an order of magnitude). The trade comes from the inevitable increase in system management/support/infrastructure to support N things compared to supporting just one. (This leaves aside high availability/high reliability kinds of things). So, for clusters, where's the breakpoint? Is it at whatever the fastest currently available processor is? This is kind of the question that's been raised before.. Do I buy N processors now with my grant money, or do I wait a year and buy N processors that are 2x as fast and do all the computation in the second of two years? If one can predict the speed of future processors, this might guide you whether you should wait for that single faster processor, or decide that no matter if you wait 3 years, you'll need more than the crunch of a single processor to solve your problem, so you might as well get cracking on the cluster. Several times, I've contemplated a cluster to solve some problem, and then, by the time I had it all spec'd out and figured out and costed, it turned out that I'd been passed by AMD/Intel, and it was better just to go buy a (single) faster processor. There are some interesting power/MIPS trades that are non-obvious in this regime, as well as anomalous application environments where the development cycle is much slower (not too many "Rad Hard" Xeons out there). There are also inherently parallel kinds of tasks where you want to use commodity hardware to get multiples of some resource, rather than some special purpose thing (say, recording multi-track audio or the aforementioned video wall). Another thing is some sort of single input stream, multiple parallel processes for multiple outputs. High performance speech recognition might be an example. What about some sort of search process with applicability to casual users (route finding for robotics or such...) > > I also have an interest in seeing a cluster version of Octave or SciLab > set to work like a server. (as I recall rgb had some reasons not to use > these high level tools, but we can save this discussion for later) I'd be real interested in this... Mathworks hasn't shown much interest in accomodating clusters in the Matlab model, and I spend a fair amount of time running Matlab code. > > What I can say as part of the project, we will be collecting a software > list of applications and projects. > > Finally, once we all have our local clusters and software running to our > hearts content, maybe we can think about a grid to provide spare compute > cycles to educational and public projects around the world. > From jimlux at earthlink.net Sun Dec 19 12:07:52 2004 From: jimlux at earthlink.net (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? References: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> Message-ID: <002701c4e606$6c7cd890$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Dean Johnson" To: "Jim Lux" Cc: Sent: Sunday, December 19, 2004 11:49 AM Subject: Re: [Beowulf] $2500 cluster. What it's good for? > > On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > > > I think it would be interesting to contemplate potential uses of a > > $2500 > > cluster. Once you've had the thrill of putting it together and > > rendering > > something with POVray, what next? > > While not terribly elegant, scientifically interesting, or > cluster-sexy, you could do distcc. "Look ma, I'm compiling over my > cluster". High school kids might like to do cluster BLAST type stuff. > There is certain value in getting the dog-looking-at-the-tv look from > their parents when they explain their activities. "Well Son, I got no > damn idea what you are talking about, but your Mom and I are very > proud". Having spent the morning watching Jimmy Neutron cartoons with the kids.. Why not do Particle in Cell simulations for nuclear fusion? You could optimize the design of the grids in your very own inertial electrostatic confinement fusion apparatus (you, too, can do table top fusion for <$1000... google for "fusor" and "farnsworth") From dtj at uberh4x0r.org Sun Dec 19 11:49:28 2004 From: dtj at uberh4x0r.org (Dean Johnson) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> References: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> Message-ID: <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > I think it would be interesting to contemplate potential uses of a > $2500 > cluster. Once you've had the thrill of putting it together and > rendering > something with POVray, what next? While not terribly elegant, scientifically interesting, or cluster-sexy, you could do distcc. "Look ma, I'm compiling over my cluster". High school kids might like to do cluster BLAST type stuff. There is certain value in getting the dog-looking-at-the-tv look from their parents when they explain their activities. "Well Son, I got no damn idea what you are talking about, but your Mom and I are very proud". Maybe someone should put together something akin to the OpenCD for cluster usage by novices. Put together a bunch of the standard cluster-aware apps in nice buildable or RPM type packages. --Dean From akhtar_samo at yahoo.com Sun Dec 19 13:34:49 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Regarding Benchmarks Message-ID: <20041219213449.3693.qmail@web20023.mail.yahoo.com> Hi, Actually I want to run an benchmark on my MPICH based cluster, its just a 4 node cluster. Kindly let me know what to do , where to get it from with proper installation method Akhtar --------------------------------- Do you Yahoo!? Yahoo! Mail - You care about security. So do we. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041219/0cb1d333/attachment.html From andrewxwang at yahoo.com.tw Sun Dec 19 15:00:08 2004 From: andrewxwang at yahoo.com.tw (Andrew Wang) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] OpenPBS vs Condor? In-Reply-To: <20041216061130.79612.qmail@web11413.mail.yahoo.com> Message-ID: <20041219230008.58325.qmail@web18002.mail.tpe.yahoo.com> OpenPBS is dead )no new version for several years), you should use SGE or Torque. http://gridengine.sunsource.net/ http://www.supercluster.org/torque Andrew. --- Kshitij Sanghi ªº°T®§¡G > Hi, > > I'm new to Linux clusters. I wanted to know how does > OpenPBS compare with Condor. We already have a small > grid running Condor but wanted some scheduling > facilities for our jobs. Is it necessary to shift to > OpenPBS to provide job scheduling or will Condor do? > The scheduler we require is nothing complex even the > most basic one would do. > > Thanks and Regards, > Kshitij > > > > ___________________________________________________________ > > Win a castle for NYE with your mates and Yahoo! > Messenger > http://uk.messenger.yahoo.com > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________________________________ Yahoo!©_¼¯¹q¤l«H½c 250MB ¶W¤j§K¶O«H½c¡A«H¥ó¦A¦h¤]¤£©È¡I http://mail.yahoo.com.tw/ From idooley2 at uiuc.edu Sun Dec 19 15:27:07 2004 From: idooley2 at uiuc.edu (Isaac Dooley) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Re: MPI Implementations for SMP use In-Reply-To: <200412191956.iBJJu25v009271@bluewest.scyld.com> References: <200412191956.iBJJu25v009271@bluewest.scyld.com> Message-ID: <41C60E4B.1080304@uiuc.edu> Charm++ and AMPI(an adaptive MPI implementation) also do this(http://charm.cs.uiuc.edu). They conceptually run multiple Virtual Processors on each node, and they have both local and remote delivery options. There are some real benefits to this. On a technical level they can use our own user level threads, which I believe are faster than pthreads. Isaac Dooley >But thats just my intuition speaking! Perhaps there are MPI >implementations that are competitive with threads since they can very >efficiently handle this "two-level" nature of communication: keeping it >local for processes on the same node while still *simultaneously* taking >care of the over-a-network bits for the rest. >Ed From idooley2 at uiuc.edu Sun Dec 19 15:37:20 2004 From: idooley2 at uiuc.edu (Isaac Dooley) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Re: MPI Implementations for SMP use In-Reply-To: <200412191956.iBJJu25v009271@bluewest.scyld.com> References: <200412191956.iBJJu25v009271@bluewest.scyld.com> Message-ID: <41C610B0.7070603@uiuc.edu> >>You would think that, but the nice thing about pure MPI is that >>locality is perfect. So in most cases, a pure MPI code beats a hybrid >>code. That's good, because hybrid programming is more complicated than >>straight MPI or straight OpenMP. >> >> Not necessarily. Charm++ uses an abstraction that does not concern the programmer with the location/node of a given object. Hence it is not more complicated than straight MPI. Its AMPI(an adaptive MPI implementation) works the same way, and both are free. (http://charm.cs.uiuc.edu). There are some papers at the site on this topic. Also locality may be an issue for some, but as we all know, applications have different bottlenecks and issues. Locality may be usefull for small programs which fit in cache. However there are many real applications which have network bottlenecks, IO bottlenecks or other problems which make the CPU and/or memory locality not the bottleneck. Maybe I'm just misunderstanding your reference to locality though. cheers, Isaac From idooley2 at uiuc.edu Sun Dec 19 15:40:40 2004 From: idooley2 at uiuc.edu (Isaac Dooley) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] use for a cluster -- wall of monitors In-Reply-To: <200412191956.iBJJu25v009271@bluewest.scyld.com> References: <200412191956.iBJJu25v009271@bluewest.scyld.com> Message-ID: <41C61178.9030106@uiuc.edu> The NCSA has a project called Display Wall-in-a-Box http://www.ncsa.uiuc.edu/Projects/AllProjects/Projects82.html Their system lets you build a wall of displays... Isaac From gmpc at sanger.ac.uk Sun Dec 19 03:37:26 2004 From: gmpc at sanger.ac.uk (Guy Coates) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: <41C4DBE2.40204@scalableinformatics.com> References: <003f01c4e3f4$47695fb0$0f120897@PMORND> <200412181033.54167.mwill@penguincomputing.com> <41C4DBE2.40204@scalableinformatics.com> Message-ID: > I remain skeptical on the value proposition for a SAN in a cluster. > In short, you need to avoid single points of information flow within > clusters. True, and the grown up cluster filesystems (GPFS, Lustre) allow you to avoid those. You take N storage nodes with locally attached disk (IDE, SCSI or FC) and export those to the cluster over a LAN, and glue it all together with a cluster file-system. The larger you make N, the faster your IO goes, as the file-systems automatically stripe IO across all your storage nodes. The speed of the individual disks attached to your nodes doesn't actually matter too much, so long as you have enough of them. On our clusters, we see the GPFS limiting factor for single client access is how fast they can drive their gigabit cards, and the limiting factor for multiple clients is how much non-blocking LAN bandwidth we can put between the storage nodes and clients. The only time SAN attached storage helps is in the case of storage node failures, as you have redundant paths between storage nodes and disks. (You can set up redundant IO nodes even without a SAN.) Whether this matters to you or not depends on what QoS you are trying to maintain. The other big win is that we can also achieve these IO rates under production conditions. Users can run unmodified binaries and code and get the benefit of massive IO without having to re-write apps to use specific APIs such as MPI-IO or PVFS. Cheers, Guy Coates -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK Tel: +44 (0)1223 834244 ex 7199 From laytonjb at charter.net Sun Dec 19 14:30:55 2004 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> Message-ID: <41C6011F.4000809@charter.net> Jim Lux wrote: >>On Sat, 18 Dec 2004, Jim Lux wrote: >> >> >> >>>I think it would be interesting to contemplate potential uses of a $2500 >>>cluster. Once you've had the thrill of putting it together and >>> >>> >rendering > > >>>something with POVray, what next? >>> >>> >>That is the $64,000 dollar question. Here is my 2 cent answer. >>BTW, your ideas are great. I would love to see a discussion like this >>continue because we all know the hardware is easy part! >> >>There is part of this project which has a "build it and they will come >>(and write software)" dream. Not being that naive, I believe there are >>some uses for systems like this. The indented audience are not the >>uber-cluster-geeks on this list, but rather the education, home, hacker, >>crowd. In regards to education, I think if cluster technology is readily >>available, then perhaps students will look to these technologies to solve >>problems. And who knows maybe the "Lotus 123 of the cluster" will be built >>by some person or persons with some low cost hardware and an idea everyone >>said would not work. >> >>If you have followed the magazine, you will see that we highlighted >>many open projects that are useful today. From an educational standpoint, >>a small chemistry/biology department that can do quantum chemistry, >>protein folding, or sequence analysis is pretty interesting to me. >>There are others ares as well. >> >> > > >I was thinking of the cluster video wall idea, however the video hardware >would be kind of pricey (more than the cluster!). Something like using the >cluster to provide the crunch to provide an immersive environment might be >interesting. > > I think this is coming RSN. Have you seen the prices of home projectors? They are dropping very fast. So fast that I gave up trying to track them for my own home. Of course, the projectors and the nodes are only part of the whole system. There was a very cool article in ClusterWorld from some people at NCSA that have developed a video wall in a box kind of thing. >>There are also some other immediate things like running Mosix or Condor >>on the cluster. A small group that has a need for a computation server >>could find this useful for single process computational jobs. >> >> > >This brings up an interesting optimization question. Just like in many >things (I'm thinking RF amplifiers in specific) it's generally cheaper/more >cost effective to buy one big thing IF it's fast enough to meet the >requirements. Once you get past what ONE widget can do, then, you're forced >to some form of parallelism or combining smaller widgets, and to a certain >extent it matters not how many you need to combine (to an order of >magnitude). The trade comes from the inevitable increase in system >management/support/infrastructure to support N things compared to supporting >just one. (This leaves aside high availability/high reliability kinds of >things). > >So, for clusters, where's the breakpoint? Is it at whatever the fastest >currently available processor is? This is kind of the question that's been >raised before.. Do I buy N processors now with my grant money, or do I wait >a year and buy N processors that are 2x as fast and do all the computation >in the second of two years? If one can predict the speed of future >processors, this might guide you whether you should wait for that single >faster processor, or decide that no matter if you wait 3 years, you'll need >more than the crunch of a single processor to solve your problem, so you >might as well get cracking on the cluster. > > You've hit the nail on the head! I think it's good to start thinking about parallelizing your codes or your ideas and testing them on a small but useful cluster. You can learn from them - find where the bottlenecks are, try different approaches, try different filesystems even - and then adjust your code/algorithm. You can also learn some interesting things along the way, such as being able to tune a code for specific cluster hardware (BTW - only clusters can do this). You can also learn how clusters are put together, to some degree, which you can use later when and if you want to buy a large production style cluster. This information will help you make reasonable trade-offs and judgments about what you want in the cluster. It will also help you keep the vendor honest :) Then once the code has been tuned and you are ready for production runs, get a bigger cluster, either by building one or by buying one from a vendor, and have at it! >Several times, I've contemplated a cluster to solve some problem, and then, >by the time I had it all spec'd out and figured out and costed, it turned >out that I'd been passed by AMD/Intel, and it was better just to go buy a >(single) faster processor. There are some interesting power/MIPS trades >that are non-obvious in this regime, as well as anomalous application >environments where the development cycle is much slower (not too many "Rad >Hard" Xeons out there). > >There are also inherently parallel kinds of tasks where you want to use >commodity hardware to get multiples of some resource, rather than some >special purpose thing (say, recording multi-track audio or the >aforementioned video wall). Another thing is some sort of single input >stream, multiple parallel processes for multiple outputs. High performance >speech recognition might be an example. > > I was working on an article about unique uses for clusters and I interviewed a guy who was using a small cluster with OpenMOSIX to rip his massive album/tape/CD collection into MP3's. He built a nice automated system in his garage with a fairly large but inexpensive storage system for his MP3's. >What about some sort of search process with applicability to casual users >(route finding for robotics or such...) > > > >>I also have an interest in seeing a cluster version of Octave or SciLab >>set to work like a server. (as I recall rgb had some reasons not to use >>these high level tools, but we can save this discussion for later) >> >> > >I'd be real interested in this... Mathworks hasn't shown much interest in >accomodating clusters in the Matlab model, and I spend a fair amount of time >running Matlab code. > > Mathworks has a new toolbox that allows you to do parallel computations. HOWEVER, the current version only allows embarrassingly parallel operations - i.e. no nodal communication. I've also followed some discussions on the Octave mailing list about incorporating MPI. I don't think they've quite made it there yet. I'd also like to see them incorporate something like PLAPACK to create a "parallel" version of the computational capabilities. If only I had some time to work on it.... :) Jeff From Bogdan.Costescu at iwr.uni-heidelberg.de Sun Dec 19 15:18:49 2004 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] MPI Implementations for SMP use In-Reply-To: <20041218060030.GB2181@greglaptop.attbi.com> Message-ID: On Fri, 17 Dec 2004, Greg Lindahl wrote: > You would think that, but the nice thing about pure MPI is that > locality is perfect. So in most cases, a pure MPI code beats a > hybrid code. Would this still apply to multi-core CPUs ? The mighty forces have all decided that multi-core is the way of the future. I'm especially concerned about resources that will be shared between the cores, like memory interface, on-chip cache, etc. MPI, even with shmem, requires at least one copy of the data from one rank to the other. Then each rank would access different memory areas, meaning that the on-chip cache data cannot be shared and the main memory interface will need to be used "more often". On the contrary, threads could share the same data - so there is no copy between ranks - and with the right access pattern they could even share cached data. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From james.p.lux at jpl.nasa.gov Sun Dec 19 17:48:22 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] use for a cluster -- wall of monitors References: <200412191956.iBJJu25v009271@bluewest.scyld.com> <41C61178.9030106@uiuc.edu> Message-ID: <000401c4e635$fd8b0c60$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Isaac Dooley" To: Sent: Sunday, December 19, 2004 3:40 PM Subject: [Beowulf] use for a cluster -- wall of monitors > The NCSA has a project called Display Wall-in-a-Box > http://www.ncsa.uiuc.edu/Projects/AllProjects/Projects82.html > Their system lets you build a wall of displays... > > Isaac Indeed, it was the article in ClusterWorld that gave me this idea, but currently, you'd spend more on the monitors than on the cluster. Turns out (having looked into extensively) that monitors intended for use with computers are VERY different from those you watch TV on. 1) Design life is MUCH shorter on the monitors than the TVs. Typically the design is for half brightness after some 8-10k hrs of use. 2) The spot size on a TV is bigger than on a computer monitor (it makes it brighter, but has less resolution, no big deal on a TV that reproduces, at best, about 500x500 pixels). (For the technically inclined, it has to to with the accelerating voltage, the beam current, and focus electrodes in the electron gun) There's also the issue of seams. If you look for projectors (the NCSA approach), they're still quite pricey, at least for decent ones. From alvin at Mail.Linux-Consulting.com Sun Dec 19 18:36:22 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> Message-ID: On Sun, 19 Dec 2004, Dean Johnson wrote: > On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > > > I think it would be interesting to contemplate potential uses of a > > $2500 > > cluster. Once you've had the thrill of putting it together and > > rendering > > something with POVray, what next? > > While not terribly elegant, scientifically interesting, or > cluster-sexy, you could do distcc. "Look ma, I'm compiling over my > cluster". High school kids might like to do cluster BLAST type stuff. it's purrfect for training and building a "real cluster" and not screwing things up while install all the various tools and libs or that if things goes bonkers, its not any loss of real work and real productivity c ya alvin From alvin at Mail.Linux-Consulting.com Sun Dec 19 18:44:48 2004 From: alvin at Mail.Linux-Consulting.com (Alvin Oga) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] use for a cluster -- wall of monitors In-Reply-To: <41C61178.9030106@uiuc.edu> Message-ID: On Sun, 19 Dec 2004, Isaac Dooley wrote: > The NCSA has a project called Display Wall-in-a-Box > http://www.ncsa.uiuc.edu/Projects/AllProjects/Projects82.html > Their system lets you build a wall of displays... i didn't find the wall of monitors ( a picture of it ) etc but there's some other "Video Walls" with pics and config files http://www.Linux-1U.net/X11/Quad/ - simple quad display stuff Video-Wall: 2x2 ( 4 monitors ) http://www.Linux-1U.net/X11/Quad/gstreamer.net/video-wall-howto.html Video-Whale: 4x4 ( 16 video monitors ) http://www.Linux-1U.net/X11/Quad/gstreamer.net/vw/vw.html c ya alvin From rgb at phy.duke.edu Mon Dec 20 05:58:24 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: References: Message-ID: On Sun, 19 Dec 2004, Douglas Eadline, Cluster World Magazine wrote: > On Sat, 18 Dec 2004, Jim Lux wrote: > > > I think it would be interesting to contemplate potential uses of a $2500 > > cluster. Once you've had the thrill of putting it together and rendering > > something with POVray, what next? > > That is the $64,000 dollar question. Here is my 2 cent answer. > BTW, your ideas are great. I would love to see a discussion like this > continue because we all know the hardware is easy part! I'll kick in a pennysworth. What I use my somewhat more disorganized home cluster for is largely prototyping and development. Production clusters hate giving up cycles to little test runs, as they tend to slow down the entire (tightly coupled) computation. Having a small/cheap cluster that is big enough to be able to learn something useful about the scaling of the task and to debug parallel code is very useful. In addition, as already noted, all sorts of embarrassingly parallel computations can be run on the test cluster WHILE it is thus being used without much loss of efficiency, as EP tasks can finish whereever, and if you steal five minutes from them here or there it is no big loss. The CWM cluster is already better than one of my "remnant" clusters that I run at Duke (systems with the dread capacitor problem on the mobos that are gradually dying as the capacitors eventually blow) that is certainly useful in production, old/slow as it is. Learning has been mentioned -- I'm informally advising some five or six different students at different institutions in India who have picked building a cluster as a serious academic project. For these students the issue is building a cluster and then running some toy tasks that can demonstrate parallel scaling (generalized Amdahl's Law) relations, such as the ones that I published last year in CWM or that are available in examples directories in e.g. PVM. They also learn all sorts of useful things about networking and systems administration that are excellent preparation for a career in IT, cluster-oriented or not. Many of these students cannot afford to spend even $2500 on a cluster -- they make them out of obsolete or cast-off systems, adding perhaps $100 worth of additional/new hardware and a lot of figurative elbow grease. This is fine -- five year old systems (even ones that also run Windows as well as linux in a dual boot or diskless boot configuration) are precisely what I was lusting after >>six<< years ago to do real work! They don't make economic sense now for production when a single new system is much faster than a whole obsolete cluster (Moore's Law is brutal) but they are fabulous for learning. Finally, (Doug's remarks below notwithstanding) I actually think that it would be lovely if e.g. octave had a fully parallel component. We currently have several matlab clusters on campus at this point. Matlab, mathematica, octave -- these sorts of environments are perfectly great for a particular class of researcher. They fill a very similar niche to the one perl or python fills for programmers. For these researchers, the time required to "do a parallel computation right" vastly exceeds the time saved by doing the parallel computation right, if "right" is interpreted as maximally efficiently with PVM or MPI or raw sockets or something. If "right" means "in such a way as to maximize the productive work done per unit of their invested time and money" then using matlab with a suitable parallel library that hides the detail of parallelism from them entirely is as right as it can get, compared to the investment of as long as years learning C or Fortran, studying parallel algorithms, learning PVM or MPI, analyzing their task, and efficiently implementing their problem in parallel code (when perhaps their problem is just to solve a set of coupled equations that parallelizes well and transparently behind a single call). So sure, a little minicluster like this can be very useful indeed for folks who do this sort of work, although in a lot of cases tools like matlab/octave are memory hogs and the nodes will need to be equipped with a lot more than 256 MB of RAM to be useful. OK, so more than just a pennysworth... rgb > > There is part of this project which has a "build it and they will come > (and write software)" dream. Not being that naive, I believe there are > some uses for systems like this. The indented audience are not the > uber-cluster-geeks on this list, but rather the education, home, hacker, > crowd. In regards to education, I think if cluster technology is readily > available, then perhaps students will look to these technologies to solve > problems. And who knows maybe the "Lotus 123 of the cluster" will be built > by some person or persons with some low cost hardware and an idea everyone > said would not work. > > If you have followed the magazine, you will see that we highlighted > many open projects that are useful today. From an educational standpoint, > a small chemistry/biology department that can do quantum chemistry, > protein folding, or sequence analysis is pretty interesting to me. > There are others ares as well. > > There are also some other immediate things like running Mosix or Condor > on the cluster. A small group that has a need for a computation server > could find this useful for single process computational jobs. > > I also have an interest in seeing a cluster version of Octave or SciLab > set to work like a server. (as I recall rgb had some reasons not to use > these high level tools, but we can save this discussion for later) > > What I can say as part of the project, we will be collecting a software > list of applications and projects. > > Finally, once we all have our local clusters and software running to our > hearts content, maybe we can think about a grid to provide spare compute > cycles to educational and public projects around the world. > > Oh well, enough Sunday afternoon philosophizing. > > Doug > > > > > You want to avoid the "gosh, I can run 8 times as many Seti@Home units as I > > could before" or "Look, I can calculate Pi" kind of > > not-particularly-value-laden-to-the-casual-observer tasks. > > > > Sure, there's some value in learning how to build and manage a cluster, but > > I think the real value is in doing something useful with that $2500. So, > > what sort of "useful" could one do? Say you were to negotiate with your > > spouse to get $2500 to play with (or you were able to get a "mini-grant" at > > a high school). Is there something that is useful to the "general consumer > > public" that could be done better with a cluster than with a $2500 desktop > > machine? > > > > One computationally intensive task that might be applicable is making > > panoramas from multiple digital photos. It's incredibly tedious and time > > consuming to stitch together 30 or 40 digital photos into one seamless > > panorama (google for PanoTools and PTGui for ideas). > > > > What about kids in school? Is there some simulation that, if clusterized, > > would be more interactive and useful? > > > > What about interactive rendering from one of NASA's world view databases: > > layering the terrain models and imagery to do "fly bys"? > > > > Are there consumer type iterative optimization problems that could profit > > from a cluster? In my own fooling around, I do lots of antenna simulations, > > which are essentially embarassingly parallel. The ham radio community likes > > "scrounged and homebuilt" solutions to problems, so the $2500 cluster is a > > potential winner there. > > > > What about outreach to poverty stricken branches of academe who don't use > > computers much? literary analysis searching texts for common phrases? > > figuring out how to fit potsherds together? > > > > Jim Lux > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From deadline at linux-mag.com Mon Dec 20 05:56:39 2004 From: deadline at linux-mag.com (Douglas Eadline, Cluster World Magazine) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> Message-ID: On Sun, 19 Dec 2004, Jim Lux wrote: --snip-- > > This brings up an interesting optimization question. Just like in many > things (I'm thinking RF amplifiers in specific) it's generally cheaper/more > cost effective to buy one big thing IF it's fast enough to meet the > requirements. Once you get past what ONE widget can do, then, you're forced > to some form of parallelism or combining smaller widgets, and to a certain > extent it matters not how many you need to combine (to an order of > magnitude). The trade comes from the inevitable increase in system > management/support/infrastructure to support N things compared to supporting > just one. (This leaves aside high availability/high reliability kinds of > things). > > So, for clusters, where's the breakpoint? Is it at whatever the fastest > currently available processor is? This is kind of the question that's been > raised before.. Do I buy N processors now with my grant money, or do I wait > a year and buy N processors that are 2x as fast and do all the computation > in the second of two years? If one can predict the speed of future > processors, this might guide you whether you should wait for that single > faster processor, or decide that no matter if you wait 3 years, you'll need > more than the crunch of a single processor to solve your problem, so you > might as well get cracking on the cluster. > > Several times, I've contemplated a cluster to solve some problem, and then, > by the time I had it all spec'd out and figured out and costed, it turned > out that I'd been passed by AMD/Intel, and it was better just to go buy a > (single) faster processor. There are some interesting power/MIPS trades > that are non-obvious in this regime, as well as anomalous application > environments where the development cycle is much slower (not too many "Rad > Hard" Xeons out there). > > There are also inherently parallel kinds of tasks where you want to use > commodity hardware to get multiples of some resource, rather than some > special purpose thing (say, recording multi-track audio or the > aforementioned video wall). Another thing is some sort of single input > stream, multiple parallel processes for multiple outputs. High performance > speech recognition might be an example. > > What about some sort of search process with applicability to casual users > (route finding for robotics or such...) > Jim, Here is my "soap box" speech about this issue. The question of a cluster versus next years processor has always been a worthwhile consideration. For modestly parallel programs, say 3-4 times faster on 6-8 processors, this is definitely an issue. If however, you are seeing a 30-40 times faster on 60-80 processors (on a problem that will not fit on a workstation), then next years model will not help much. Now, the cost to go 30-40 times faster may be an issue to some. For small clusters this is more of an issue. For instance, on our $2500 cluster, we have eight 1.75GHz Semprons and 2304 MB of RAM. Using a very naive argument that we have 14.00 GHz to apply to a problem (or some other metric that is 8 time a single CPU), then if we can achieve 50% scalability (4x times faster on 8 CPUs) we are getting 7 GHz out for the system. I would *guess* that this is close to a dual desk top box. Of course, highly scalable things would push the cluster ahead. Now, it gets more interesting when you ask "Well should I wait for next year and get fast processors for my $2500 cluster?" As always it depends. If all that changes are faster CPUs (and lets assume the memory gets faster as well), then using the same interconnect, GigE, the scalability of some applications gets less and a cluster may not be the best choice. These types of arguments have been important "parallel computing" issues for quite some time. However, this was based on a Moore's Law assumption that single CPU speed will keep increasing. This assumption has held up until now. The introduction of dual core processors is an indication that scaling up frequency is harder than scaling out processors. So now the question will become, is it better to have two quad boxes (two dual motherboards with dual core processors), or four dual boxes (four single motherboards with dual core processors), or eight single boxes. Who knows? What I do know is that the issues we have been talking about on this little list will very soon become big issues to the rest of the market. Doug ---------------------------------------------------------------- Editor-in-chief ClusterWorld Magazine Desk: 610.865.6061 Fax: 610.865.6618 www.clusterworld.com From rgb at phy.duke.edu Mon Dec 20 06:55:38 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> Message-ID: On Sun, 19 Dec 2004, Jim Lux wrote: > This brings up an interesting optimization question. Just like in many > things (I'm thinking RF amplifiers in specific) it's generally cheaper/more > cost effective to buy one big thing IF it's fast enough to meet the > requirements. Once you get past what ONE widget can do, then, you're forced > to some form of parallelism or combining smaller widgets, and to a certain > extent it matters not how many you need to combine (to an order of > magnitude). The trade comes from the inevitable increase in system > management/support/infrastructure to support N things compared to supporting > just one. (This leaves aside high availability/high reliability kinds of > things). > > So, for clusters, where's the breakpoint? Is it at whatever the fastest > currently available processor is? This is kind of the question that's been > raised before.. Do I buy N processors now with my grant money, or do I wait > a year and buy N processors that are 2x as fast and do all the computation > in the second of two years? If one can predict the speed of future > processors, this might guide you whether you should wait for that single > faster processor, or decide that no matter if you wait 3 years, you'll need > more than the crunch of a single processor to solve your problem, so you > might as well get cracking on the cluster. This has actually been discussed on list several times, and some actual answers posted. The interesting thing is that it is susceptible to algebraic analysis and can actually be answered, at least in a best approximation (since there are partially stochastic delays that contribute to the actual optimal solution). The optimal solution depends on a number of parameters, of course: The problem. EP problems are far more flexible as far as mixing CPU speeds and hardware types goes. Synchronous fine grained computations are far more difficult to efficiently implement on mixed hardware. Moore's Law (smoothed) for all the various components. You have to be able to predict the APPROXIMATE rate of growth in hardware speed at constant cost to be able to determine how to spend your money optimally. Moore's Law (corrected). Moore's Law advances are NOT smooth -- they are discrete and punctuated by sudden jumps. Worse, those jumps aren't even uniform -- sometimes a processor or chipset is introduced that speeds up some operations by X and others by Y, so mere clockspeed scaling isn't a good predictor -- or where one of the underlying e.g. memory subsystems is suddenly changed while the processor remains the same. One cannot tell the future with any precision, but one needs to pay attention to (for example) the "roadmaps" published by Intel and AMD and IBM and Motorola and all the other major chip manufacturers that make key components that affect the work flow for your task(s). "TCO". Gawd, I hate that term, because it is much-abused by marketeers, but truly it IS something to think about. There are (economic) risks associated with building a cluster with bleeding-edge technology. There are risks associated with mixing hardware from many low-bid vendors. There are administrative costs (sometimes big ones) associated from mixing hardware architectures, even generally similar ones such as Intel and AMD or i386 and X86_64. Maintenance costs are sometimes as important to consider as pure Moore's Law and hardware costs. Human time requirements can vary wildly and are often neglected when doing the CBA for a cluster. Infrastructure costs are also an important specific factor in TCO. In fact, they (plus Moore's Law) tend to put an absolute upper bound on the useful lifetime of any given cluster node. Node power consumption (per CPU) scales up, but it seems to be following a much slower curve than Moore's Law -- slower than linear. A "node CPU" has cost in the ballpark of 100W form quite a few years now -- a bit over 100W for the highest clock highest end nodes, but well short of the MW that would be required if they followed anything like a ML trajectory from e.g. the original IBM PC. Consequently, just the cost of the >>power<< to run and cool older nodes at some point exceeds the cost of buying and running a single new node of equivalent aggregate compute power. This is probably the most predictable point of all -- a sort of "corallary" to Moore's Law. If one assumes a node cost of $1000/CPU and a node power cost of $100/year (for 100W nodes) and a ML doubling time of 18 months, then sometime between year four and year six -- depending on the particular discrete jumps -- it will be break even to buy a new node for $1000 and pay $100 for its power versus operate 11 nodes for the year. Except that Amdahl's Law guarantees that this is an upper bound time, and for most non-EP tasks the break even point will come earlier. Except that TCO costs for maintaining the node start to escalate after roughly year three (when most extended warranties stop and getting replacement hardware gets very difficult indeed). Finally, there is one consideration that often trumps all of the above. Many clusters if not most clusters are built to perform some specific, grant funded or corporate funded, piece of work. Even if it turns out to be "optimal" to wait until the end of year three, buy all your hardware then, and work for one year to complete the absolute most work that could be done on a four year grant, it is simply impossible to actually DO this. So people do the opposite -- spend all their money in year one and waste the work they could have accomplished riding ML, or if they are very clever and their task permits, spend it in e.g. 1/3's and get (1/3)*4*1 + (1/3)*2.5*2.0 + (1/3)*1*4 = 13/3 = 4 1/3 work units done instead of the 4 they'd get done on a flat year one investment. It is amusing to note that it is break even to buy in year one and run for four years versus buy at the end of year three and run for one year, EXCEPT for TCO. TCO makes the latter much, much cheaper, as it includes the infrastructure and administrative cost for running the nodes for four years instead of one, which are likely to equal or exceed the cost of all the hardware combined! However, you will convince very few researchers or granting agencies that the best/optimal course is for them to do nothing for the next three years and then work for one year -- and it probably isn't true. The truth is that there are nonlinear social and economic benefits from doing the work over time, even at a less than totally efficient rate. If there is a rule of thumb, though, it is that a true optimum given this sort of macroeconomic consideration is likely the distributed expenditure model. It is generally better for MANY kinds of tasks or task organizations to take any fixed budget for N>3 years and split it up into N-1 chunks or thereabouts and try to ride the ML breaks as they come. This means that in your organization you always have access to a cluster that is new/current technology and can exploit its nonlinear benefits; you have access to a workhorse cluster than is only 1-2 years old. You have access to a mish-mosh cluster that is 2-4 years old but still capable of doing useful work for lots of kinds of tasks (including e.g. prototyping, code development, EP tasks as well as some production). From there, as warranties expire and maintenance costs escalate, you retire them and ultimately (one hopes) recycle them in some socially responsible way. rgb > > I also have an interest in seeing a cluster version of Octave or SciLab > > set to work like a server. (as I recall rgb had some reasons not to use > > these high level tools, but we can save this discussion for later) > > I'd be real interested in this... Mathworks hasn't shown much interest in > accomodating clusters in the Matlab model, and I spend a fair amount of time > running Matlab code. I believe that there is an MPI library and some sort of compiler thing for making your own libraries, though. I don't use the tool and don't keep close track, although that will change next year as I'll be using it in teaching. The real problem is that people who CAN program matlab to do stuff in parallel aren't the people who are likely to use matlab in the first place. And since matlab is far, far from open source -- actually annoyingly expensive to run and carefully licensed -- the people who might be the most inclined to invest the work don't/can't do so in a way that is generally useful. One of the many evils of closed source, non-free applications. So I think Doug is on track here -- work should really be devoted to octave, where it can nucleate a serious community development effort and possibly give researchers a solid reason to choose octave instead of matlab in the first place. rgb > > > > > What I can say as part of the project, we will be collecting a software > > list of applications and projects. > > > > Finally, once we all have our local clusters and software running to our > > hearts content, maybe we can think about a grid to provide spare compute > > cycles to educational and public projects around the world. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Mon Dec 20 07:17:45 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] LCD monitors, please... In-Reply-To: <000401c4e635$fd8b0c60$32a8a8c0@LAPTOP152422> References: <200412191956.iBJJu25v009271@bluewest.scyld.com> <41C61178.9030106@uiuc.edu> <000401c4e635$fd8b0c60$32a8a8c0@LAPTOP152422> Message-ID: On Sun, 19 Dec 2004, Jim Lux wrote: > 1) Design life is MUCH shorter on the monitors than the TVs. Typically the > design is for half brightness after some 8-10k hrs of use. > 2) The spot size on a TV is bigger than on a computer monitor (it makes it > brighter, but has less resolution, no big deal on a TV that reproduces, at > best, about 500x500 pixels). (For the technically inclined, it has to to > with the accelerating voltage, the beam current, and focus electrodes in the > electron gun) > > There's also the issue of seams. > > If you look for projectors (the NCSA approach), they're still quite pricey, > at least for decent ones. I'm getting LCD monitors exclusively at this point. This is for several reasons. a) They use perhaps 1/3 the power. This defrays perhaps $100 of their cost over a 3 year plus projected lifetime. b) They are coming way down in cost. I just (yesterday) bought a 17", 1280x1024 LCD display for my house for $270 -- even adding a 3 year service contract for $50 it was only $320 plus tax. After Christmas I imagine they will drop another 20%. Kick in the $100 in power savings and they are pretty close to break even with CRTs in amortized cost already. c) CRTs are dangerous and environmentally unsound. Dangerous because they produce wavelengths that can be damaging to skin and eyes under chronic long-term exposure conditions. Environmentally unsound because the screens are full of lead to protect you against the radiation and because they waste a lot of energy. Duke currently charges folks $10 per monitor to safely recycle the monitors and keep them out of landfills, where they will >>eventually<< contaminate the ground water with lead and arsenic. My eyes have really deteriorated over the last two years -- some of it is doubtless plain old age-related presbyopia, but some fraction of it may well be due to close to 30 years worth of hours every day spent in front of an electron gun shooting ionizing radiation straight at my eyes. Radiation absorption is a quantum process; so is radiation damage. So a leaded screen reduces the probability that a soft X-ray gets through but doesn't reduce the damage done by the ones that GET through. d) They are a hell of a lot more convenient in all other ways than CRTs -- smaller footprint, lighter, portable, less supporting electronics. Building a wall of screens, you can actually hang them ON a wall without any special shelving or being likely to tear down the wall by their sheer weight. You can probably unmount them from their frames altogether and assemble arrays of them with very narrow seams (perhaps <1 cm total seam). e) CRT-based TV's and monitors both are obsolete technology. I predict that in three years they simply disappear. A number of companies are now mass-producing the LCD screens and are just enjoying the end of the period of relatively high margin sales as competition and increased production capacity drives prices down. I expect 17" monitors and flatpanel TVs to cost in the lower/upper $200's by this time next year, and at that cost who will buy CRTs? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From james.p.lux at jpl.nasa.gov Mon Dec 20 07:06:44 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> Message-ID: <003801c4e6a5$8d46db80$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: "Robert G. Brown" To: "Jim Lux" Cc: "Douglas Eadline, Cluster World Magazine" ; Sent: Monday, December 20, 2004 6:55 AM Subject: Re: [Beowulf] $2500 cluster. What it's good for? > On Sun, 19 Dec 2004, Jim Lux wrote: > > > This brings up an interesting optimization question. Just like in many > > things (I'm thinking RF amplifiers in specific) it's generally cheaper/more > > > This has actually been discussed on list several times, and some actual > answers posted. The interesting thing is that it is susceptible to > algebraic analysis and can actually be answered, at least in a best > approximation (since there are partially stochastic delays that > contribute to the actual optimal solution). > > "TCO". Gawd, I hate that term, because it is much-abused by > marketeers, but truly it IS something to think about. There are > (economic) risks associated with building a cluster with bleeding-edge > technology. There are risks associated with mixing hardware from many > low-bid vendors. There are administrative costs (sometimes big ones) > associated from mixing hardware architectures, even generally similar > ones such as Intel and AMD or i386 and X86_64. Maintenance costs are > sometimes as important to consider as pure Moore's Law and hardware > costs. Human time requirements can vary wildly and are often neglected > when doing the CBA for a cluster. And TCO with bleeding edge equipment is where the one vs many managment problem becomes so important. Managing the idiosyncracies of one high end machine may be within the realm of possibility. Managing 8/16/1024 is probably unreasonable. So, as you point out, there's a value/cost that can be associated with various generations of equipment with less bleeding edge generally being lower cost (and the ever present potential for "having a bad day" and getting a zillion copies of an unreliable component). > > Infrastructure costs are also an important specific factor in TCO. In > fact, they (plus Moore's Law) tend to put an absolute upper bound on the > useful lifetime of any given cluster node. Node power consumption (per > CPU) scales up, but it seems to be following a much slower curve than > Moore's Law -- slower than linear. A "node CPU" has cost in the > ballpark of 100W form quite a few years now -- a bit over 100W for the > highest clock highest end nodes, but well short of the MW that would be > required if they followed anything like a ML trajectory from e.g. the > original IBM PC. Consequently, just the cost of the >>power<< to run > and cool older nodes at some point exceeds the cost of buying and > running a single new node of equivalent aggregate compute power. This > is probably the most predictable point of all -- a sort of "corallary" > to Moore's Law. If one assumes a node cost of $1000/CPU and a node > power cost of $100/year (for 100W nodes) and a ML doubling time of 18 > months, then sometime between year four and year six -- depending on the > particular discrete jumps -- it will be break even to buy a new node for > $1000 and pay $100 for its power versus operate 11 nodes for the year. I'm going to guess that the 100W number derives from two things: the desire to use existing power supply designs; and probably more important; the desire to use standard IEC power cords, which are limited to 7 Amps, and decent design practice which would limit the "real" load to roughly half that (say, 400-450W, peak, into the PS). There are other regulatory issues with building things that draw significant power. The 7Amp cordset drives component values and ratings for inexpensive components such as power switches, relays, fuses, etc. > > > > > I also have an interest in seeing a cluster version of Octave or SciLab > > > set to work like a server. (as I recall rgb had some reasons not to use > > > these high level tools, but we can save this discussion for later) > > > > I'd be real interested in this... Mathworks hasn't shown much interest in > > accomodating clusters in the Matlab model, and I spend a fair amount of time > > running Matlab code. > > I believe that there is an MPI library and some sort of compiler thing > for making your own libraries, though. I don't use the tool and don't > keep close track, although that will change next year as I'll be using > it in teaching. The real problem is that people who CAN program matlab > to do stuff in parallel aren't the people who are likely to use matlab > in the first place. And since matlab is far, far from open source -- > actually annoyingly expensive to run and carefully licensed -- the > people who might be the most inclined to invest the work don't/can't do > so in a way that is generally useful. I'll say it's annoyingly expensive.. from what I've been told, you need a license for each cpu of the cluster. That makes running matlab on the new JPL 1024 Xeon cluster a bit impractical. From amacater at galactic.demon.co.uk Sun Dec 19 16:31:32 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> Message-ID: <20041220003132.GA15943@galactic.demon.co.uk> On Sun, Dec 19, 2004 at 12:02:32PM -0800, Jim Lux wrote: > > > > > There are also some other immediate things like running Mosix or Condor > > on the cluster. A small group that has a need for a computation server > > could find this useful for single process computational jobs. > > This brings up an interesting optimization question. Just like in many > things (I'm thinking RF amplifiers in specific) it's generally cheaper/more > cost effective to buy one big thing IF it's fast enough to meet the > requirements. Once you get past what ONE widget can do, then, you're forced > to some form of parallelism or combining smaller widgets, and to a certain > extent it matters not how many you need to combine (to an order of > magnitude). The trade comes from the inevitable increase in system > management/support/infrastructure to support N things compared to supporting > just one. (This leaves aside high availability/high reliability kinds of > things). > Someone else who's thought of hybrid combiners and "stuff" to produce more RF - and potentially discovered all the fun of imbalances :) > by the time I had it all spec'd out and figured out and costed, it turned > out that I'd been passed by AMD/Intel, and it was better just to go buy a > (single) faster processor. There are some interesting power/MIPS trades > that are non-obvious in this regime, as well as anomalous application > environments where the development cycle is much slower (not too many "Rad > Hard" Xeons out there). > If you have a long running problem - DON'T start it now. If it needs to run for two years - buy next year's equipment (which is twice as fast as today's) and run it for just one year. One year wait then one years intensive compute - and you're still ahead. Next year's computer is _automatically_ faster and potentially much better value for your $$ :) > There are also inherently parallel kinds of tasks where you want to use > commodity hardware to get multiples of some resource, rather than some > special purpose thing (say, recording multi-track audio or the > aforementioned video wall). Another thing is some sort of single input > stream, multiple parallel processes for multiple outputs. High performance > speech recognition might be an example. > High quality codecs on individual parts of a signal? Travelling salesman type problems? Finite element modelling or NEC type antenna modelling? De-noising pictures / signals? [Or, conversely, recovering coherent signals from close to the noise floor] Real time RF propagation correlation with all observed magnetic/auroral/weather/other propagation factors and propagation prediction. > What about some sort of search process with applicability to casual users > (route finding for robotics or such...) > Correlating spammers with IP ranges: correlating spam patterns with originators [working out ICBM missile co-ordinates for their hosting networks and zombies :) ] > > > > > I also have an interest in seeing a cluster version of Octave or SciLab > > set to work like a server. (as I recall rgb had some reasons not to use > > these high level tools, but we can save this discussion for later) > > > > > Finally, once we all have our local clusters and software running to our > > hearts content, maybe we can think about a grid to provide spare compute > > cycles to educational and public projects around the world. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From amacater at galactic.demon.co.uk Sun Dec 19 17:18:00 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> References: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> Message-ID: <20041220011800.GA16241@galactic.demon.co.uk> On Sun, Dec 19, 2004 at 01:49:28PM -0600, Dean Johnson wrote: > > On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > > >I think it would be interesting to contemplate potential uses of a > >$2500 > >cluster. Once you've had the thrill of putting it together and > >rendering > >something with POVray, what next? > > While not terribly elegant, scientifically interesting, or > cluster-sexy, you could do distcc. "Look ma, I'm compiling over my > cluster". High school kids might like to do cluster BLAST type stuff. > There is certain value in getting the dog-looking-at-the-tv look from > their parents when they explain their activities. "Well Son, I got no > damn idea what you are talking about, but your Mom and I are very > proud". > > Maybe someone should put together something akin to the OpenCD for > cluster usage by novices. Put together a bunch of the standard > cluster-aware apps in nice buildable or RPM type packages. > Quantian anybody ?? > --Dean > Shades of talking through all of this in 1997 or 1998 when talking about a distribution-agnostic version of Extreme Linux. Google will find references to this even if the archives - the third hit [a post on the debian-devel list pointing to Drake Diedrich's usual intelligent-type response to a clueless newbie :) ] should help. Andy > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From charliep at cs.earlham.edu Sun Dec 19 18:29:17 2004 From: charliep at cs.earlham.edu (Charlie Peck) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> References: <000501c4e519$9c224050$32a8a8c0@LAPTOP152422> <17D51947-51F7-11D9-9205-000A959A2956@uberh4x0r.org> Message-ID: On Dec 19, 2004, at 2:49 PM, Dean Johnson wrote: > > On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > >> I think it would be interesting to contemplate potential uses of a >> $2500 >> cluster. Once you've had the thrill of putting it together and >> rendering >> something with POVray, what next? > > While not terribly elegant, scientifically interesting, or > cluster-sexy, you could do distcc. "Look ma, I'm compiling over my > cluster". High school kids might like to do cluster BLAST type stuff. > There is certain value in getting the dog-looking-at-the-tv look from > their parents when they explain their activities. "Well Son, I got no > damn idea what you are talking about, but your Mom and I are very > proud". > > Maybe someone should put together something akin to the OpenCD for > cluster usage by novices. Put together a bunch of the standard > cluster-aware apps in nice buildable or RPM type packages. Check-out the Bootable Cluster CD, http://bccd.cs.uni.edu/ charlie peck From epaulson at cs.wisc.edu Sun Dec 19 20:19:19 2004 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] OpenPBS vs Condor? In-Reply-To: <20041219230008.58325.qmail@web18002.mail.tpe.yahoo.com> References: <20041216061130.79612.qmail@web11413.mail.yahoo.com> <20041219230008.58325.qmail@web18002.mail.tpe.yahoo.com> Message-ID: <20041220041918.GA21604@cobalt.cs.wisc.edu> On Mon, Dec 20, 2004 at 07:00:08AM +0800, Andrew Wang wrote: > OpenPBS is dead )no new version for several years), > you should use SGE or Torque. > > http://gridengine.sunsource.net/ > http://www.supercluster.org/torque > Or stick to Condor - you don't give any examples of what you want to do, so it's hard to say if Condor will work for you or not. -Erik > Andrew. > > > --- Kshitij Sanghi > ???T???G > > Hi, > > > > I'm new to Linux clusters. I wanted to know how does > > OpenPBS compare with Condor. We already have a small > > grid running Condor but wanted some scheduling > > facilities for our jobs. Is it necessary to shift to > > OpenPBS to provide job scheduling or will Condor do? > > The scheduler we require is nothing complex even the > > most basic one would do. > > > > Thanks and Regards, > > Kshitij > > > > > > > > > ___________________________________________________________ > > > > Win a castle for NYE with your mates and Yahoo! > > Messenger > > http://uk.messenger.yahoo.com > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > > unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________________________________ > Yahoo!?_???q?l?H?c > 250MB ?W?j?K?O?H?c?A?H???A?h?]?????I > http://mail.yahoo.com.tw/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From steve_heaton at ozemail.com.au Sun Dec 19 21:37:02 2004 From: steve_heaton at ozemail.com.au (steve_heaton@ozemail.com.au) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Cheapie cluster comments Message-ID: <20041220053703.EUHA16400.swebmail01.mail.ozemail.net@localhost> G'day all I'd like to share my experience on cheap cluster development. I've just completed the 1st round of a similar exercise :) Slogging towards my Masters in astronomy (part time), I've developed a passion for Large Scale Structure dynamics (galaxies whizzing around at 100's km/s). Modelling this kind of environment means lots of N-body calculations (gravitationally bound particles). I knew what I wanted my cluster to do :) I agree that "custom" frameworks are dead money. Down this path would have been aluminium sections, nuts and bolts, lots of drilling and cutting etc etc. Instead, while researching cheapie cases, I found a custom "case" product from Lubic. Their kits are basically a framework of ready cut aluminium sections, slider plates, screws to suit and some other PC friendly extensions. I could build the framework I needed without cutting anything (except MDF for mounting the mobos). I can also easily change the "chassis" to suit the cluster as it evolves. If I decide to decommission the cluster, I'm sure the framework will end up as my next robotics platform... or a coffee table ;) There are probably similar (cheaper) frame solutions out there too. I recently saw some retail shop displays made of similar materials :) Custom frames can be a good solution if made with reusable materials. My (mostly) 2nd hand 4 node dual Slot1 P3 500MHz, 128MB, 10GB IDE HDD, onboard FastE + PCI GigaE has come in a shade under A$1500 :) I hope to have some benchmarking results and photos available after silly season. Obviously not bleeding edge performance but on this code it kills my P4 3GHz, 512MB box that cost a similar amount. Current generation CPUs don't come in under my budget criteria. It'll be interesting when I retire the P3's to see how the bang-per-buck Xeon vs Opteron go in a couple of years! =) ...now if I add in the KKH dwarf clusters to the LG and some hydrodynamics code for the intergalactic gas... Cheers Stevo This message was sent through MyMail http://www.mymail.com.au From maurice at harddata.com Sun Dec 19 22:00:45 2004 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Re: Beowulf Digest, Vol 10, Issue 32 In-Reply-To: <200412191956.iBJJu25v009271@bluewest.scyld.com> References: <200412191956.iBJJu25v009271@bluewest.scyld.com> Message-ID: <41C66A8D.5010308@harddata.com> >----- Original Message ----- >From: "Douglas Eadline, Cluster World Magazine" >To: "Jim Lux" >Cc: > >With our best regards, > >Maurice W. Hilarius Telephone: 01-780-456-9771 >Hard Data Ltd. FAX: 01-780-456-9772 >11060 - 166 Avenue email:maurice@harddata.com >Edmonton, AB, Canada http://www.harddata.com/ > T5X 1Y3 > > >This email, message, and content, should be considered confidential, >and is the copyrighted property of Hard Data Ltd., unless stated otherwise. >Sent: Sunday, December 19, 2004 10:58 AM >Subject: Re: [Beowulf] $2500 cluster. What it's good for? ... >Several times, I've contemplated a cluster to solve some problem, and then, >by the time I had it all spec'd out and figured out and costed, it turned >out that I'd been passed by AMD/Intel, and it was better just to go buy a >(single) faster processor. There are some interesting power/MIPS trades >that are non-obvious in this regime, as well as anomalous application >environments where the development cycle is much slower (not too many "Rad >Hard" Xeons out there). That process has changed with the Opterons and dual versus single CPU. With the CPU's prior to that there were lots of cases we did not get very great computational gains on dual CPU machines. That is because a dual SMP was around 50% more computationally powerful than a single CPU implementation. Also older kernels were not very efficient in SMP compared to single. In many cases the only real reason for going dual was it was still slightly more of a gain than building lots more single CPU machines, especially if we used expensive interconnects. Further we often had to use SMP machines simply because that was often the only way to get the better PCI interfaces, PCI-64 or PCI- versus PCI32. Now however the Opteron offers SMP at about 90% efficiency, and that certainly skews the calculation of performance single versus SMP dual machines. Add to that the better scheduler in the newer 2.6 kernels and it is a vastly different model than we used to see. From larryas2 at fastmail.us Mon Dec 20 05:11:01 2004 From: larryas2 at fastmail.us (Larry Schuler) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: References: Message-ID: <41C6CF65.6000500@fastmail.us> >On Dec 18, 2004, at 9:52 AM, Jim Lux wrote: > > >I think it would be interesting to contemplate potential uses of a >$2500 >cluster. Once you've had the thrill of putting it together and >rendering >something with POVray, what next? > What about the creation of a short animation film (ala Dreamworks/Pixar)? I don't know if there's any open-source software to do this, but it would probably have: universal appeal, possibilities in many realms, scales well for cluster size, easy to visualize results? Is there anything to do this out there? --larry From rmiguel at senamhi.gob.pe Mon Dec 20 06:25:34 2004 From: rmiguel at senamhi.gob.pe (Richard Miguel San=?ISO-8859-1?Q?_Mart=EDn?=) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] about switches TrendNet In-Reply-To: References: <200412191956.iBJJu25v009271@bluewest.scyld.com> <41C61178.9030106@uiuc.edu> <000401c4e635$fd8b0c60$32a8a8c0@LAPTOP152422> Message-ID: <20041220141247.M39496@senamhi.gob.pe> Hi, Im building a cluster based in HPDL360 Xeon Processors, and Im trying of choice a good switch of lower cost and was seeing this model: TEG-160WS Switch Web-Based Smart 16 ports 10/100/1000Mbps Gigabit The question is ... what is the most important features of a switch for HPC. Mi application is a numerical model for weather, this model run in parallel. Someone have experience with Trendnet switches? Thanks ------------------------------ Richard Miguel San Martin CPN - SENAMHI Telf. 6141414 Anexo 464 Cel. 98540364 ------------------------------ From kartik at cs.fsu.edu Mon Dec 20 10:37:05 2004 From: kartik at cs.fsu.edu (Kartik Gopalan) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Performance of SMC Gigabit switches Message-ID: Would anyone on this list have experience using the SMC unmanaged switches (specifically one of the SMC8505T, SMC8508T, SMC8516T or SMC8524T switches). These seem to be inexpensive switches that claim to support jumbo frames. Before purchasing them, I am interested in finding out whether or not these switches actually deliver close to gigabit/sec throughput in practice. Any feedback is appreciated. Thanks in advance, - Kartik From dtj at uberh4x0r.org Mon Dec 20 12:35:36 2004 From: dtj at uberh4x0r.org (Dean Johnson) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <20041220003132.GA15943@galactic.demon.co.uk> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> Message-ID: <1103574936.26548.74.camel@terra> On Sun, 2004-12-19 at 18:31, Andrew M.A. Cater wrote: > If you have a long running problem - DON'T start it now. If it needs to > run for two years - buy next year's equipment (which is twice as fast as > today's) and run it for just one year. One year wait then one years intensive > compute - and you're still ahead. Next year's computer is > _automatically_ faster and potentially much better value for your $$ :) > If you have a long running problem - don't start it right now. Spend some time adding checkpointing and then start it. That way you can start it in a month, get 11 months in, and then restart it on new faster, cheaper hardware. If you are running a code that pops out '42' after two years time, you got "some issues". One must not forget that the cluster part may be the fun part, but ultimately the science is the important part. It would be fun to report to a funding organization at a review that your sole progress is that you clicked the refresh button on a pricewatch.com webpage 63,714 times for the preceding quarter. -- -Dean From rgb at phy.duke.edu Mon Dec 20 14:06:10 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <20041220003132.GA15943@galactic.demon.co.uk> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> Message-ID: On Mon, 20 Dec 2004, Andrew M.A. Cater wrote: > If you have a long running problem - DON'T start it now. If it needs to > run for two years - buy next year's equipment (which is twice as fast as > today's) and run it for just one year. One year wait then one years intensive > compute - and you're still ahead. Next year's computer is > _automatically_ faster and potentially much better value for your $$ :) The only real problem with this is that Moore's Law is just about exactly where this argument says that we should never attack any long running problem. For example, people who do lattice gauge simulations used to complain that there wasn't enough CPU on the planet to do their computations (this was 10-12 years ago). This of course didn't stop them from doing them anyway, in spite of the fact that they would have gotten as much or more net work done if they hadn't done any computations at all until a year ago and then spent all their money on a massive supercluster to do it all at once. In the meantime, many deserving high-energy theorists have been saved from begging in the street, many graduate students have been graduated, hundreds of administrators (systems and otherwise), many employees working in many companies making hardware have been kept from starvation. Indeed the ongoing high volume purchases of relatively high end hardware is one of the things that keeps prices dropping and Moore's Law on track. This isn't to say that your argument has no merit, just that it is more complicated than just this. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From amacater at galactic.demon.co.uk Mon Dec 20 13:26:49 2004 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> Message-ID: <20041220212649.GA5589@galactic.demon.co.uk> On Mon, Dec 20, 2004 at 05:06:10PM -0500, Robert G. Brown wrote: > On Mon, 20 Dec 2004, Andrew M.A. Cater wrote: > > > If you have a long running problem - DON'T start it now. If it needs to > > run for two years - buy next year's equipment (which is twice as fast as > > today's) and run it for just one year. One year wait then one years intensive > > compute - and you're still ahead. Next year's computer is > > _automatically_ faster and potentially much better value for your $$ :) > I _know_ this - didn't you catch the smiley above :) Withal, this is, as ever, a useful project and discussion point. It's _always_ worth getting a Doug Eadline / Jim Lux / rgb conversation going - a whole lot of stuff comes out of the woodwork. It may be symptomatic of a new interest in clusters / better publicity for Beowulfs or whatever but we're starting to see newcomers on the list again after what seems a long while without too many. As (still) a relatively clueless cluster person, it seems to me that it is still the best policy to read the newsgroup and ponder deeply upon what you read. Most of the answers are on this list somewhere : the only difficult thing to deal with is finding that someone else asked your question six years ago :) All the best to all the list - whether you celebrate or not - can I wish everyone happy holidays. Andy From mbaumgar at gup.jku.at Mon Dec 20 13:57:09 2004 From: mbaumgar at gup.jku.at (Markus Baumgartner) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] Performance of SMC Gigabit switches In-Reply-To: References: Message-ID: <41C74AB5.5070906@gup.jku.at> Kartik Gopalan wrote: >Would anyone on this list have experience using the SMC unmanaged switches >(specifically one of the SMC8505T, SMC8508T, SMC8516T or SMC8524T >switches). These seem to be inexpensive switches that claim to support >jumbo frames. > >Before purchasing them, I am interested in finding out whether or not >these switches actually deliver close to gigabit/sec throughput in >practice. > >Any feedback is appreciated. Thanks in advance, >- Kartik > > I'd stay away from the SMC8516T. We have one of those and are experiencing the following problem: we have a dual-opteron machine that tends to crash from time to time. This machine is connected to the SMC switch. Whenever the machine crashes, the switch stops working and all machines connected to the switch are unreachable. Obviously, some illegal frame or signal from the crashed machine make the switch freeze, too. Even after disconnecting the crashed machine the switch won't work. Only after resetting the switch it will work again. IMHO, a switch should be robust enough to never have to be reset manually. -- Markus Baumgartner Institute of Graphics and Parallel Processing, University of Linz, Austria www.gup.uni-linz.ac.at From James.P.Lux at jpl.nasa.gov Mon Dec 20 14:53:01 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> Message-ID: <6.1.1.1.2.20041220143920.042c8de8@mail.jpl.nasa.gov> At 02:06 PM 12/20/2004, Robert G. Brown wrote: >On Mon, 20 Dec 2004, Andrew M.A. Cater wrote: > > > If you have a long running problem - DON'T start it now. If it needs to > > run for two years - buy next year's equipment (which is twice as fast as > > today's) and run it for just one year. One year wait then one years > intensive > > compute - and you're still ahead. Next year's computer is > > _automatically_ faster and potentially much better value for your $$ :) > >The only real problem with this is that Moore's Law is just about >exactly where this argument says that we should never attack any long >running problem. For example, people who do lattice gauge simulations >used to complain that there wasn't enough CPU on the planet to do their >computations (this was 10-12 years ago). This of course didn't stop >them from doing them anyway, in spite of the fact that they would have >gotten as much or more net work done if they hadn't done any >computations at all until a year ago and then spent all their money on a >massive supercluster to do it all at once. > >In the meantime, many deserving high-energy theorists have been saved >from begging in the street, many graduate students have been graduated, >hundreds of administrators (systems and otherwise), many employees >working in many companies making hardware have been kept from >starvation. One could argue whether or not this is merely a form of "white collar welfare", raising all sorts of social implications. Would society be better served by funding physics researchers and putting others (choose your currently favored down-and-outer) on the street, or, should the physicists be shown the door, and the others given jobs doing something else. I think that as a class, one could make the argument that crime rates might decrease if the current streetcorner drug dealers were given jobs and physicists turned out to ply their trade for handouts (will theorize for food), physicists not being known for their propensity for crime. >Indeed the ongoing high volume purchases of relatively high >end hardware is one of the things that keeps prices dropping and Moore's >Law on track. Gosh, and Jack Valenti had me believing that what drove the CPU business was increased demand for capacity to download MP3s and movies. Realistically, the cluster market is probably a tiny, tiny fraction of the sales of high end processors. The symbiotic relationship between consumer hardware and consumer software vendors is probably more powerful (faster processors engender software which needs more power, which increases demand for faster processors). I'm not quite sure why MSExcel or MSWord should really require 10x the CPU speed for acceptable performance today as compared to, say, 5 years ago, considering the underlying computational task isn't much different. Is rendering the screen that much more complex? Where ARE all those CPU cycles going? Maybe it's real time virus checking? It's certainly not because they've started checking array bounds and making sure that pointers point somewhere real before using them. All those zillions of components in the most recent versions of Windows are theoretically added functionality and don't execute unless invoked. Perhaps there's some ever growing list of functions that gets checked on each system call, or some huge amount of registry searching that goes on to virtualized everything? >This isn't to say that your argument has no merit, just that it is more >complicated than just this. Returning to your earlier comment about this sort of thing being solvable as a set of equations, it might be interesting to try and bound all those icky externalities (maintenance cost, admin hassles, etc.). This IS sort of a classic Operations Research (if that term is still used) problem. It is highly nonlinear... admin costs tend to go in jumps corresponding to the need to hire another person, for instance. Perhaps a good Monte Carlo type simulation (on a cluster naturally) could provide some insight? \James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Mon Dec 20 15:21:32 2004 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <20041220212649.GA5589@galactic.demon.co.uk> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> <20041220212649.GA5589@galactic.demon.co.uk> Message-ID: <6.1.1.1.2.20041220151657.0430b670@mail.jpl.nasa.gov> Here's an intriguing possibility for use on a cheap cluster: http://simulationresearch.lbl.gov/GO/ Genopt is a generic optimization program. It invokes an external program to evaluate the cost function, and implements a variety of ways to do the optimization. Some of these might very amenable to EP execution on a cluster. (Particle Swarm and Pattern Search for instance). Some sort of GA might also be a good fit to a cluster. Genopt is fairly non-optimized. It spits out a text file to your evaluation/simulation program, then it reads the text file generated by the evalutor to extract the cost. It's more of an optmizer wrapper around some other tool. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From laytonjb at charter.net Mon Dec 20 15:36:37 2004 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] $2500 cluster. What it's good for? In-Reply-To: <6.1.1.1.2.20041220151657.0430b670@mail.jpl.nasa.gov> References: <001401c4e605$b8269de0$32a8a8c0@LAPTOP152422> <20041220003132.GA15943@galactic.demon.co.uk> <20041220212649.GA5589@galactic.demon.co.uk> <6.1.1.1.2.20041220151657.0430b670@mail.jpl.nasa.gov> Message-ID: <41C76205.70908@charter.net> I've been experimenting with parallel GA's for a few years now. Doing something that requires parallel evaluation of cost functions coupled with a parallel GA is a good application for small clusters. It allows you to consider a wider range of applications since you have more horsepower. It also allows you to consider more members in the population, which should help with finding the optimal point. Another really good GA application is multi-objective optimization. In these types of problems you are trying to find the pareto-optimal front, which sometimes means that you need a large population to define the front. Also, as you add objectives, you will need a larger population. In either case, this means more horsepower, more collective memory, and perhaps the application of parallel techniques to improve the search techniques. You know I should keep a list of what people are posting to summarize this discussion. I'll probably have to do it at some point for the BOB column in CW. :) Thanks! Jeff > Here's an intriguing possibility for use on a cheap cluster: > http://simulationresearch.lbl.gov/GO/ > > Genopt is a generic optimization program. It invokes an external > program to evaluate the cost function, and implements a variety of > ways to do the optimization. Some of these might very amenable to EP > execution on a cluster. (Particle Swarm and Pattern Search for instance). > > Some sort of GA might also be a good fit to a cluster. > > Genopt is fairly non-optimized. It spits out a text file to your > evaluation/simulation program, then it reads the text file generated > by the evalutor to extract the cost. It's more of an optmizer wrapper > around some other tool. > > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 From srgadmin at cs.hku.hk Mon Dec 20 16:13:48 2004 From: srgadmin at cs.hku.hk (srg-admin) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] CFP: ICCNMC'05 Message-ID: <41C76ABC.9090500@cs.hku.hk> Call for Papers 2005 International Conference on Computer Networks and Mobile Computing (ICCNMC'05) Zhangjiajie, China, August 2-4, 2005 http://www.iccnmc.org -------------------------------------- Sponsored by: China Computer Federation Co-Sponsored by IEEE Computer Society Technical Committee on Distributed Processing In cooperation with IEEE Computer Society Beijing Center Hunan Computer Society ------------------------------------- Scope The conference provides a forum for engineers and scientists in academia, industry and government to present their latest research findings in the field of computer networks and mobile computing. Topics of interest include, but are not limited to: Network architecture Protocol design and analysis Mobile computing Routing and scheduling Congestion management/QoS Admission control Internet and web applications Multimedia systems Network security and privacy Optical networks -------------------------------------- Paper Submission Form of Manuscript: Not to exceed 20 double-spaced, 8.5 x 11-inch pages (including figures, tables and references) in 10-12 point font. Number each page. Include an abstract, keywords, the technical area(s) most relevant to your paper, and the corresponding author's e-mail address. Electronic Submission: Web-based submissions are required. Please see the conference web page for details. Papers should be sent to jianjun_bai@163.net Important Dates Feb. 26, 2005 Submission Deadline April 10, 2005 Author Notification May 10, 2005 Final Manuscript Due -------------------------------------- Organizing & Program Committee Honorary Chair Ming T. (Mike) Liu, Ohio State Univ., USA General Co-Chairs Chita Das, Pennsylvania State Univ., USA Hequan Wu, Chinese Academy of Engineering, China Program Co-Chairs Xicheng Lu, National Univ. of Defense Tech, China Wei Zhao, Texas A&M Univ., USA Program Vice-Chairs Bo Li, Hong Kong Univ. of Science & Tech., China Jinshu Su, National Univ. of Defense Tech., China Jie Wu, Florida Atlantic Univ., USA Program Committee Members Giuseppe Anastasi, Univ. of Pisa, Italy Guohong Cao, Pennsylvania State Univ., USA Jianer Chen, Texas A&M Univ., USA Sajal K. Das, The Univ. of Texas at Arlington, USA Alois Ferscha, Univ. of Linz, Austria Chuanshan Gao, Fudan Univ., China Zhenghu Gong, National Univ. of Defense Tech., China Weijia Jia, Hong Kong City Univ., China Jie Li, Univ. of Tskuba, Japan Xiaoming Li, Peking Univ., China Prasant Mohapatra, Univ. of California at Davis, USA Stephan Olariu, Old Dominion Univ., USA Depei Qian, Xi'an Jiaotong Univ., China Hualin Qian, Chinese Academy of Science, China Mukesh Singhal, Univ. of Kentucky, USA Bala Srinivasan, Monash Univ., Australia Ivan Stojmenovic, Univ. of Ottawa, Canada Chengzheng Sun, Griffith Univ., Australia Jianping Wu, Tsinghua Univ., China Li Xiao, Michgan State Univ., USA Yuanyuan Yang, State Univ. of N. Y. at Stony Brook, USA Steering Committee Chair Benjamin W. Wah, Univ. of Illinois, USA Publication Chair Jiannong Cao, Hong Kong Polytechnic Univ., China Publicity Chair Cho-Li Wang, Univ. of Hong Kong, China Award Chair Wenhua Dou, National Univ. of Defense Tech., China Organizing Chair Ming Xu, National Univ. of Defense Tech., China IEEE Beijing Section, Director Zhiwei Xu, Chinese Academy of Science, China Note: The Proceedings will be published by Springer's Lecture Notes in Computer Science Series From srgadmin at cs.hku.hk Mon Dec 20 17:46:40 2004 From: srgadmin at cs.hku.hk (srg-admin) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] CFP: MPPS05 (New Submission Deadline) Message-ID: <41C78080.5050209@cs.hku.hk> ******************** NEW SUBMISSION DEADLINE *************************** Call for Papers The First International Workshop on Mobility in Peer-to-peer Systems (MPPS05) (http://mx.nthu.edu.tw/~ctking/MPPS05.htm) in conjunction with The 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005) June 6-9, 2005 Columbus, Ohio, USA ----------------------------------------------------------------------------------- THEME Peer-to-peer (P2P) has emerged as a promising paradigm for developing large-scale distributed systems. P2P systems are characterized as being fully decentralized, self-organizing, and self-repairing. Early P2P systems were designed with an Internet-like network infrastructure in mind. As the emergence and prevalence of new wireless networking techniques, such as wireless mesh networks, wireless LANs, and 3G cellular networks, the need to move P2P paradigm into wireless networking and to support mobile computing is increasing. How does a P2P system encompass wired and heterogeneous wireless networks? How does a P2P system exploit and aggregate the resources in such an environment? How should a P2P system manage node mobility? How should a mobile P2P system ensure security and privacy? These issues become very challenging. The goal of the workshop is to examine the mobility issues in P2P systems over heterogeneous wired/wireless networks. TOPICS Topics of interest include, but are not limited to: * P2P systems over wireless mesh and ad hoc networks * mobile P2P applications and systems * power-aware and energy-efficient mobile P2P systems * topology-aware mobile P2P systems * mobility and resource management in mobile P2P systems * P2P systems for wireless grid * security in mobile P2P systems * trust and access control in mobile P2P systems * anonymity and anti-censorship for mobile P2P systems * modeling and analysis for mobile P2P systems PAPER SUBMISSION Authors are invited to submit an electronic version of original, unpublished manuscripts, not to exceed 20 double-spaced pages, to ctking@mx.nthu.edu.tw. Submissions should be in PDF or Postscript format. Submissions must be received by the deadline. All submitted papers will be refereed by reviewers in terms of originality, contribution, correctness, and presentation. The accepted papers will be published by the IEEE Computer Society Press. IMPORTANT DATES - Paper submission: December 31, 2004 (extended) - Author notification: February 1, 2005 - Final Manuscript: March 1, 2005 ORGANIZATION Workshop Chair: Lionel M. Ni, Hong Kong University of Science and Technology, Hong Kong Program Chairs: Chung-Ta King, National Tsing Hua University, Taiwan Jie Wu, Florida Atlantic University, USA Program Committee: James Aspnes, Yale University, U.S.A. Jiannong Cao, Hong Kong Polytechnic University, Hong Kong Bernady O. Apduhan, Kyushu Sangyo University, Japan Dan Grigoras, University College Cork, Ireland Hung-Chang Hsiao, National Tsing Hua University, Taiwan Charlie Hu, Purdue University, U.S.A. Yiming Hu, University of Cincinnati, U.S.A. Jehn-Ruey Jiang, National Central University, Taiwan Fabian Kuhn, ETH, Zurich Xiaoming Li, Peking University Yunhao Liu, Hong Kong University of Science and Technology, Hong Kong Chunqiang Tang, IBM Watson Research Center, U.S.A. Li Xiao, Michigan State University, U.S.A. Aaron Zollinger, ETH, Zurich ------------------------------------------------ From andrewxwang at yahoo.com.tw Mon Dec 20 22:05:15 2004 From: andrewxwang at yahoo.com.tw (Andrew Wang) Date: Wed Nov 25 01:03:39 2009 Subject: [Beowulf] OpenPBS vs Condor? In-Reply-To: <20041220041918.GA21604@cobalt.cs.wisc.edu> Message-ID: <20041221060515.87049.qmail@web18007.mail.tpe.yahoo.com> A lot of people don't trust Condor because you guys (u of wisc) said to opensource it for years, but we never see a single line of the Condor source! Andrew. --- Erik Paulson ªº°T®§¡G > Or stick to Condor - you don't give any examples of > what > you want to do, so it's hard to say if Condor will > work > for you or not. > > -Erik > > > > Andrew. > > > > > > --- Kshitij Sanghi > > ???T???G > > > Hi, > > > > > > I'm new to Linux clusters. I wanted to know how > does > > > OpenPBS compare with Condor. We already have a > small > > > grid running Condor but wanted some scheduling > > > facilities for our jobs. Is it necessary to > shift to > > > OpenPBS to provide job scheduling or will Condor > do? > > > The scheduler we require is nothing complex even > the > > > most basic one would do. > > > > > > Thanks and Regards, > > > Kshitij > > > > > > > > > > > > > > > ___________________________________________________________ > > > > > > Win a castle for NYE with your mates and Yahoo! > > > Messenger > > > http://uk.messenger.yahoo.com > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or > > > unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > _______________________________________________________________________ > > Yahoo!?_???q?l?H?c > > 250MB ?W?j?K?O?H?c?A?H???A?h?]?????I > > http://mail.yahoo.com.tw/ > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________________________________ Yahoo!©_¼¯¹q¤l«H½c 250MB ¶W¤j§K¶O«H½c¡A«H¥ó¦A¦h¤]¤£©È¡I http://mail.yahoo.com.tw/ From henrique at dmo.fee.unicamp.br Tue Dec 21 10:43:16 2004 From: henrique at dmo.fee.unicamp.br (Carlos Henrique da Silva Santos) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] LAM-MPI problem Message-ID: Dears. I installed a LAM-MPI in my Fedora Machine. I configured the RSH service to remote access and I give permission on /etc/syconfig/iptables. Now, Im running LAM-MPI in the Localhost and the RSH is correct to remote access, but when I give de command "lamboot -v lamhosts", the system request a password. Can you help me with this problem? Configured Archieves: - /etc/securetty - /etc/hosts - /etc/fstab - /etc/exports - $HOME/.rhosts Thanks. Best Regards. ============================================= Carlos Henrique da Silva Santos Master Degree Student State University of Campinas - Unicamp Departament of Microwaves and Optic ============================================= From m.dierks at skynet.be Tue Dec 21 14:31:54 2004 From: m.dierks at skynet.be (Michel Dierks) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Traduction Beowulf history Message-ID: <41C8A45A.7010504@skynet.be> Hello, I ask me if a traduction in french of the "Beowulf history by Phil Merkey" web page from your site is available? Of perhaps anyone who traduce this, will distribute it? Thanks in advance for response. Michel Dierks - Belgium From steve_heaton at ozemail.com.au Tue Dec 21 18:40:19 2004 From: steve_heaton at ozemail.com.au (steve_heaton@ozemail.com.au) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Benchmark reality check please Message-ID: <20041222024019.SRLV21642.swebmail02.mail.ozemail.net@localhost> G'day all I'm not looking to start another fight over benchmarking... really I'm not! ;) I've got my beastie to the point that I'd like to establish a performance *baseline* for the current rig. I've read a lot on the various benchmark suites, approaches etc and became horribly confused. Now that the mental mindstorm has settled somewhat I'd appreciate comments from the collective minds. NASA call this process a "focus review", everybody else calls it a "reality check" ;) The goal is to come up with some numbers (and maybe some cute, simple graphs) that show how my 4 node, 8 CPU, cluster performs. I don't want to spend weeks setting up the benchmark suite if I can avoid it. The 96point flashing neon number will be the wall clock time on a sample Nbody run. That's why this Beowulf was built :) However, I'd also like some numbers that provide details on the components underlying that number. I'll give you some ideas of the granulation I'd like and possible comparison points. Things like raw CPU grunt (a 500MHz P3 v's 3GHz P4), cache levels (P3 600MHz with 128 v's 500MHz 512MB L2), System RAM (100MHz v's 133MHz, ECC's v's not and 128 v's 256MB/node), FastEther v's GigaEther interconnect. My proposed benchmark suite looks like this: That RGB fellow's BenchMaster suite would seem to give the CPU/RAM side of things a good workout. I'll give that a burl. Until recently LMBench seemed the go but BenchMaster seems to be a step up (more flexible)? (Someone *other* than RGB's opinion would be nice =P ) Maybe the Netpipe suit for all sorts of juicy network numbers? I'm not looking to kick heads on compilers. Ye olde "g" compilers will be the start point. Maybe a review of the Intel flavours (if they're still free for non-profit/personal/educational type usage) later on. If I've got to pay for more than download bandwidth then it's out. I'm eating tomato sauce sandwiches after the hardware purchases as it is! :) I was thinking about the various MPI options and how to put them through the wringer too. I suspect the wall clock time on the Nbody runs will be the easiest number. What will I do with the results? As I mentioned, this is a baseline ie. What do the numbers look like now? ...then it's off to the sandpit... Tweaking, tweaking and more tweaking. Running the same benchmark suite as I go. Was the tweak a net gain? You, know... the sort of thing that **real** benchmarks are used for! =) Thoughts, comments appreciated as always. Cheers Stevo This message was sent through MyMail http://www.mymail.com.au From akhtar_samo at yahoo.com Tue Dec 21 21:54:02 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Monitoring & using Cluster thru Internet Message-ID: <20041222055402.47486.qmail@web20026.mail.yahoo.com> Hi, How can I monitor & use my cluster ( 4 node Red hat LINUX Cluster ) thru internet(webservices). Akhtar --------------------------------- Do you Yahoo!? Send a seasonal email greeting and help others. Do good. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041221/b22bc7f5/attachment.html From zogas at upatras.gr Wed Dec 22 02:46:17 2004 From: zogas at upatras.gr (Stavros E. Zogas) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Cluster computing Message-ID: <000f01c4e813$77428f90$63ae8c96@zogas> I am interesting for cluster computing topics.I want to set up a cluster from the beginning! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041222/4af60e89/attachment.html From rgb at phy.duke.edu Wed Dec 22 19:55:36 2004 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Benchmark reality check please In-Reply-To: <20041222024019.SRLV21642.swebmail02.mail.ozemail.net@localhost> References: <20041222024019.SRLV21642.swebmail02.mail.ozemail.net@localhost> Message-ID: On Wed, 22 Dec 2004 steve_heaton@ozemail.com.au wrote: > That RGB fellow's BenchMaster suite would seem to give the CPU/RAM > side of things a good workout. I'll give that a burl. Until recently > LMBench seemed the go but BenchMaster seems to be a step up (more > flexible)? (Someone *other* than RGB's opinion would be nice =P ) Maybe > the Netpipe suit for all sorts of juicy network numbers? Fine, be that way;-) Actually, lmbench is without a doubt a superior benchmark from a technical point of view. It's just a PITA to download, build, and run, as you have to go through bitkeeper and use a networked SCCS auto-extraction tool. Mine you can just grab and go -- if you're running one of the distros I've built for you can probably even install the rpm and just run it, if you aren't going to play the compiler tuning optimization game. lmbench, OTOH, gives you LOTS of very precise measurements on EVERYTHING, and they've been tuning their timing harness MUCH longer than I have (to be fair to Larry and Carl). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From james.p.lux at jpl.nasa.gov Wed Dec 22 20:06:07 2004 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Benchmark reality check please References: <20041222024019.SRLV21642.swebmail02.mail.ozemail.net@localhost> Message-ID: <000401c4e8a4$bb208e10$32a8a8c0@LAPTOP152422> ----- Original Message ----- From: To: Sent: Tuesday, December 21, 2004 6:40 PM Subject: [Beowulf] Benchmark reality check please > G'day all > > I'm not looking to start another fight over benchmarking... really I'm not! ;) > > > My proposed benchmark suite looks like this: > > That RGB fellow's BenchMaster suite would seem to give the CPU/RAM side of things a good workout. I'll give that a burl. Until recently LMBench seemed the go but BenchMaster seems to be a step up (more flexible)? (Someone *other* than RGB's opinion would be nice =P ) Maybe the Netpipe suit for all sorts of juicy network numbers? > > I'm not looking to kick heads on compilers. Ye olde "g" compilers will be the start point. Maybe a review of the Intel flavours (if they're still free for non-profit/personal/educational type usage) later on. If I've got to pay for more than download bandwidth then it's out. I'm eating tomato sauce sandwiches after the hardware purchases as it is! :) > > I was thinking about the various MPI options and how to put them through the wringer too. I suspect the wall clock time on the Nbody runs will be the easiest number. > > What will I do with the results? As I mentioned, this is a baseline ie. What do the numbers look like now? ...then it's off to the sandpit... Tweaking, tweaking and more tweaking. Running the same benchmark suite as I go. Was the tweak a net gain? You, know... the sort of thing that **real** benchmarks are used for! =) I've always been intrigued by the HINT benchmark stuff. It produces a graph of (essentially) speed vs problem size, and shows up things like where the cache starts missing, etc. I haven't seen much recent development on HINT. The only reference I could find recently was a mirror of the old HINT website. Basically, the problem is a numerical integration of something, where the step size is continually made finer and finer (i.e. more and more slices) I just googled and found: http://hint.byu.edu/ ... there's a PVM version at http://hint.byu.edu/pub/HINT/source/parallel/pvm/ See also: http://hint.byu.edu/pub/HINT/source/doc/porting-guide.html > > Thoughts, comments appreciated as always. > > Cheers > Stevo > > This message was sent through MyMail http://www.mymail.com.au > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Raymond.Norris at mathworks.com Wed Dec 22 20:16:39 2004 From: Raymond.Norris at mathworks.com (Raymond Norris) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] $2500 cluster. What it's good for? Message-ID: Hi Jim- I apologize for the top post. My mail is not showing well who is replying to whom. I wanted to make a slight clarification on your comment about licensing and cost of our distributed computing tools. Previously, you would have had to pay full price for MATLAB plus any toolboxes needed for each node. We have now released two products, the Distributed Computing Toolbox (client) and the MATLAB Distributed Computing Engine (engine). The client is sold per user, similar to a typical toolbox. Each engine is sold in packs (8, 16, 32, etc). A node typically runs one engine (though it could run more), so for each node that is running an engine, it is consuming a license from the pack. However, for those who are familiar with the MathWorks pricing structure, you will see that the average cost of an engine is less than the cost of a single copy of MATLAB, with the average cost per engine dropping as the number of engines per pack goes up. In addition, the engine is granted full use of any toolboxes that the client is licensed for (with the exception of code generation toolboxes) at no extra charge. Regards, Raymond The MathWorks, Inc. -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jim Lux Sent: Monday, December 20, 2004 10:07 AM To: Robert G. Brown Cc: beowulf@beowulf.org Subject: Re: [Beowulf] $2500 cluster. What it's good for? ----- Original Message ----- From: "Robert G. Brown" To: "Jim Lux" Cc: "Douglas Eadline, Cluster World Magazine" ; Sent: Monday, December 20, 2004 6:55 AM Subject: Re: [Beowulf] $2500 cluster. What it's good for? > On Sun, 19 Dec 2004, Jim Lux wrote: > > > This brings up an interesting optimization question. Just like in many > > things (I'm thinking RF amplifiers in specific) it's generally cheaper/more > > > This has actually been discussed on list several times, and some actual > answers posted. The interesting thing is that it is susceptible to > algebraic analysis and can actually be answered, at least in a best > approximation (since there are partially stochastic delays that > contribute to the actual optimal solution). > > "TCO". Gawd, I hate that term, because it is much-abused by > marketeers, but truly it IS something to think about. There are > (economic) risks associated with building a cluster with bleeding-edge > technology. There are risks associated with mixing hardware from many > low-bid vendors. There are administrative costs (sometimes big ones) > associated from mixing hardware architectures, even generally similar > ones such as Intel and AMD or i386 and X86_64. Maintenance costs are > sometimes as important to consider as pure Moore's Law and hardware > costs. Human time requirements can vary wildly and are often neglected > when doing the CBA for a cluster. And TCO with bleeding edge equipment is where the one vs many managment problem becomes so important. Managing the idiosyncracies of one high end machine may be within the realm of possibility. Managing 8/16/1024 is probably unreasonable. So, as you point out, there's a value/cost that can be associated with various generations of equipment with less bleeding edge generally being lower cost (and the ever present potential for "having a bad day" and getting a zillion copies of an unreliable component). > > Infrastructure costs are also an important specific factor in TCO. In > fact, they (plus Moore's Law) tend to put an absolute upper bound on the > useful lifetime of any given cluster node. Node power consumption (per > CPU) scales up, but it seems to be following a much slower curve than > Moore's Law -- slower than linear. A "node CPU" has cost in the > ballpark of 100W form quite a few years now -- a bit over 100W for the > highest clock highest end nodes, but well short of the MW that would be > required if they followed anything like a ML trajectory from e.g. the > original IBM PC. Consequently, just the cost of the >>power<< to run > and cool older nodes at some point exceeds the cost of buying and > running a single new node of equivalent aggregate compute power. This > is probably the most predictable point of all -- a sort of "corallary" > to Moore's Law. If one assumes a node cost of $1000/CPU and a node > power cost of $100/year (for 100W nodes) and a ML doubling time of 18 > months, then sometime between year four and year six -- depending on the > particular discrete jumps -- it will be break even to buy a new node for > $1000 and pay $100 for its power versus operate 11 nodes for the year. I'm going to guess that the 100W number derives from two things: the desire to use existing power supply designs; and probably more important; the desire to use standard IEC power cords, which are limited to 7 Amps, and decent design practice which would limit the "real" load to roughly half that (say, 400-450W, peak, into the PS). There are other regulatory issues with building things that draw significant power. The 7Amp cordset drives component values and ratings for inexpensive components such as power switches, relays, fuses, etc. > > > > > I also have an interest in seeing a cluster version of Octave or SciLab > > > set to work like a server. (as I recall rgb had some reasons not to use > > > these high level tools, but we can save this discussion for later) > > > > I'd be real interested in this... Mathworks hasn't shown much interest in > > accomodating clusters in the Matlab model, and I spend a fair amount of time > > running Matlab code. > > I believe that there is an MPI library and some sort of compiler thing > for making your own libraries, though. I don't use the tool and don't > keep close track, although that will change next year as I'll be using > it in teaching. The real problem is that people who CAN program matlab > to do stuff in parallel aren't the people who are likely to use matlab > in the first place. And since matlab is far, far from open source -- > actually annoyingly expensive to run and carefully licensed -- the > people who might be the most inclined to invest the work don't/can't do > so in a way that is generally useful. I'll say it's annoyingly expensive.. from what I've been told, you need a license for each cpu of the cluster. That makes running matlab on the new JPL 1024 Xeon cluster a bit impractical. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hawaii2005 at vreme.yubc.net Thu Dec 23 01:34:58 2004 From: hawaii2005 at vreme.yubc.net (IPSI-2005 France and Spain) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Invitation to France and Spain, c/bb Message-ID: <200412230934.iBN9Yw4s015346@vreme.yubc.net> Dear potential speaker: On behalf of the organizing committee, I would like to extend a cordial invitation for you to attend one or both of the upcoming IPSI BgD multidisciplinary, interdisciplinary, and transdisciplinary conferences. They take place on two consecutive weekends, in two nearby locations (only about 200km away from each other, in the hills of France and on the coast of Spain). The first one will be in Carcassonne, France (near Toulouse): IPSI-2005 FRANCE (Carcassonne is a UNESCO World Heritage City) Hotel de la Cite (arrival: 23 April 05 / departure: 26 April 05) Deadlines: 27 December 04 (abstract) & 20 January 05 (full paper) The second one will be in Costa Brava, Spain (near Barcelona): IPSI-2005 SPAIN (S'Agaro is the Pearl of Costa Brava) Hostal de la Gavina (arrival: 28 April 05 / departure: 1 May 05) Deadlines: 04 January 05 (abstract) / 27 January 05 (full paper) All IPSI BgD conferences are non-profit. They bring together the elite of the world of science; so far, we have had seven Nobel Laureates speaking at the opening ceremonies. The conferences always take place in some of the most attractive places of the world. All those who come to IPSI conferences once, always love to come back (because of the unique professional quality and the extremely creative atmosphere); lists of past participants are on the web, as well as details of future conferences. These conferences are in line with the newest recommendations of the US National Science Foundation and of the EU research sponsoring agencies, to stress multidisciplinary, interdisciplinary, and transdisciplinary research (M.I.T. research). The speakers and activities at the conferences truly support this type of scientific interaction. Topics of interest include, but are not limited to: * Internet * Computer Science and Engineering * Mobile Communications/Computing for Science and Business * Management and Business Administration * Education * e-Medicine * e-Oriented Bio Engineering/Science and Molecular Engineering/Science * Environmental Protection * e-Economy * e-Law * Technology Based Art and Art to Inspire Technology Developments * Internet Psychology If you would like more information on either conference, please reply to this e-mail message. If you plan to submit an abstract and paper, please let us know immediately for planning purposes. Sincerely Yours, Prof. V. Milutinovic, Chairman IPSI BgD Conferences * * * CONTROLLING OUR E-MAILS TO YOU * * * If you would like to continue to be informed about future IPSI BgD conferences, please reply to this e-mail message with a subject line of SUBSCRIBE. If you would like to be removed from our mailing list, please reply to this e-mail message with a subject line of REMOVE. From cflau at clc.cuhk.edu.hk Wed Dec 22 20:44:28 2004 From: cflau at clc.cuhk.edu.hk (John Lau) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] MPICH with ICC 8.1 Message-ID: <1103777068.2841.51.camel@nuts.clc.cuhk.edu.hk> Hi, I can't compile MPICH 1.2.5.2 with the new Intel compiler 8.1. I used to compile it successfully with the ICC 8.0. This is the error message: /usr/intel/intel_cc_80/bin/icc -I. -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src -I../include -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/include -I.. -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/include -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/include -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/mpid/ch_p4 -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/util -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/ch_p4 -fPIC -O2 -mcpu=pentium4 -gcc-version=320 -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR -DHAVE_MPICHCONF_H -c /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c(25): error: identifier "mpi_address_" is undefined #pragma weak mpi_address_ = pmpi_address_ I think it is related to the weak symbol defination checking. Because when I put the function defination before weak symbol defination, it can be compiled with ICC 8.1. So is there any compiler option or workaround for ICC 8.1, so that I dont need to change the sources? Thanks in advance. Best regards, John Lau -- John Lau Chi Fai cflau@clc.cuhk.edu.hk Software Engineer Center for Large-Scale Computation From laytonjb at charter.net Thu Dec 23 13:35:40 2004 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] RLX is leaving hardware business Message-ID: <41CB3A2C.8080109@charter.net> Thought everyone might be interested in this: http://www.theregister.co.uk/2004/12/23/rlx_exits_hardware/ RLX will still be around - just no hardware. Jeff From djholm at fnal.gov Fri Dec 24 07:01:37 2004 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] MPICH with ICC 8.1 In-Reply-To: <1103777068.2841.51.camel@nuts.clc.cuhk.edu.hk> References: <1103777068.2841.51.camel@nuts.clc.cuhk.edu.hk> Message-ID: If you can do without weak symbols for the MPICH profiling interface (i.e., when you want to profile, your code would have to preface the names of mpi calls with "p", as in pmpi_wait instead of mpi_wait), then during the configure step add the switch --disable-weak-symbols I haven't found a compiler option, only this MPICH build option. Don Holmgren Fermilab On Thu, 23 Dec 2004, John Lau wrote: > Hi, > > I can't compile MPICH 1.2.5.2 with the new Intel compiler 8.1. I used to > compile it successfully with the ICC 8.0. This is the error message: > > /usr/intel/intel_cc_80/bin/icc -I. > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src > -I../include > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/include -I.. > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/include > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/include > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/mpid/ch_p4 > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/util > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/ch_p4 -fPIC -O2 > -mcpu=pentium4 -gcc-version=320 -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR > -DHAVE_MPICHCONF_H > -c /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c > /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c(25): error: identifier "mpi_address_" is undefined > #pragma weak mpi_address_ = pmpi_address_ > > I think it is related to the weak symbol defination checking. Because > when I put the function defination before weak symbol defination, it can > be compiled with ICC 8.1. > > So is there any compiler option or workaround for ICC 8.1, so that I > dont need to change the sources? > > Thanks in advance. > > Best regards, > John Lau > -- > John Lau Chi Fai > cflau@clc.cuhk.edu.hk > Software Engineer > Center for Large-Scale Computation > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From djholm at fnal.gov Sat Dec 25 10:19:56 2004 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] MPICH with ICC 8.1 In-Reply-To: References: <1103777068.2841.51.camel@nuts.clc.cuhk.edu.hk> Message-ID: Oops - sorry, I just showed my complete ignorance of the profiling interface. The configure option --disable-weak-symbols will give you separate libpmpich.a and libmpich.a libraries, containing respectively the "PMPI_xxx" and "MPI_xxx" versions of the code. The "PMPI_xxx" versions are the "real" versions. The "MPI_xxx" versions allow implementation of routines which intercept a given call, say to do profiling, and which also call the "PMPI_xxx" versions. If I'm interpreting the documentation correctly, with "--disable-weak-symbols" if you want to profile some calls and not other calls, you'd have to use a link command something like: cc ... -lprof -lpmpi -lmpi where libprof.a contains profiling versions of selected routines, eg, MPI_Send, which in turn call the base version, PMPI_Send, resolved in libpmpi.a. Other routines not defined in libprof.a would be resolved in libmpi. With weak symbols, on the other hand, it is sufficient to use cc ... -lprof -lmpi and then an "MPI_xxx" symbol not defined in libprof.a will be resolved by "PMPI_xxx" in libmpi.a See, for example, http://www.netlib.org/utk/papers/mpi-book/node190.html I just checked on two versions of mvapich on one of my machines, one built with the Intel 7.1 compiler and the other with the 8.1 compiler. The latter required "--disable-weak-symbols". I verified that mpif77, for example, explicitly linked with "-lpmpich -lmpich" on the 8.1 version, but only with "-lmpich" on the 7.1 version. Don Holmgren Fermilab On Fri, 24 Dec 2004, Don Holmgren wrote: > > If you can do without weak symbols for the MPICH profiling interface > (i.e., when you want to profile, your code would have to preface the > names of mpi calls with "p", as in pmpi_wait instead of mpi_wait), then > during the configure step add the switch > > --disable-weak-symbols > > I haven't found a compiler option, only this MPICH build option. > > Don Holmgren > Fermilab > > > > On Thu, 23 Dec 2004, John Lau wrote: > > > Hi, > > > > I can't compile MPICH 1.2.5.2 with the new Intel compiler 8.1. I used to > > compile it successfully with the ICC 8.0. This is the error message: > > > > /usr/intel/intel_cc_80/bin/icc -I. > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src > > -I../include > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/include -I.. > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/include > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/include > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/icc8/mpid/ch_p4 > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/util > > -I/usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/mpid/ch_p4 -fPIC -O2 > > -mcpu=pentium4 -gcc-version=320 -DUSE_SOCKLEN_T -DUSE_U_INT_FOR_XDR > > -DHAVE_MPICHCONF_H > > -c /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c > > /usr/src/redhat/BUILD/mpich-1.2.5.2-8chess/src/fortran/src/addressf.c(25): error: identifier "mpi_address_" is undefined > > #pragma weak mpi_address_ = pmpi_address_ > > > > I think it is related to the weak symbol defination checking. Because > > when I put the function defination before weak symbol defination, it can > > be compiled with ICC 8.1. > > > > So is there any compiler option or workaround for ICC 8.1, so that I > > dont need to change the sources? > > > > Thanks in advance. > > > > Best regards, > > John Lau > > -- > > John Lau Chi Fai > > cflau@clc.cuhk.edu.hk > > Software Engineer > > Center for Large-Scale Computation > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From landman at scalableinformatics.com Sun Dec 26 12:59:18 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] quick note on Redhat NFS issues with NAS units Message-ID: <41CF2626.2050005@scalableinformatics.com> Folks: Been looking into why a Redhat EL3 WS x86_64 client hangs when accessing a NAS based upon SuSE 9x. Turns out there are two problems. I can reliably cause the problem to appear/dissappear on my test hardware, and I thought others on this group would like to see what I did to make the problems dissappear. Problem manifests itself with RedHat EL3 WS x86_64 clients. I have not been able to replicate it with non-RHEL3 based clients, on the same hardware, including FC2/FC3/Ubuntu/SuSE9x/... Problem does not show up in 32 bit mode from what I can tell (need more testing but preliminary data seems to support this). Motherboard are Tyan s288x units. All have the Broadcom chipset ethernets. By default RedHat installs tg3 kernel modules to drive these chips. I have not been able to make the problem go away using the tg3 driver. So I replaced the tg3 driver with the bcm570x driver from Broadcom's download site. This did not make the problem go away, though NFS mount now respected the intr option (did not with the tg3). Next, I changed from udp to tcp. The original fstab line was 192.168.2.17:/big /big nfs udp,intr,bg 0 0 and the new one is 192.168.2.17:/big /big nfs tcp,intr,bg,wsize=32768,rsize=32768 0 0 MTU changes did not affect the results (though they did improve on some of the test timing). Without both of these changes, the RedHat client hangs with a simple ls (btw: strace is your friend) [root@hammer root]# strace ls /big execve("/bin/ls", ["ls", "/big"], [/* 28 vars */]) = 0 uname({sys="Linux", node="hammer.scalableinformatics.com", ...}) = 0 brk(0) = 0x513000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a9566c000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 ... stat("/big", {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 open("/big", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents64(3, With the changes, it ls'es quite nicely with no hangs. I can reliably and repeatably get the hang condition by switching back to udp (and the other mount line). I can reliably and repeatedly get the hang condition by switching back to the tg3 driver. This occurred with the Rocks toolkit (based upon RHEL3 WS). The workaround involves using our finishing scripts. I figured I would share the solution, as I spent a bit of time tracking it down and trying to reproduce it and solve it. Joe ps: if there are some Redhat people reading the list, you know, we would like some modern kernels, and not lots of backported stuff, not to mention xfs, and other goodies ... (yeah, I know, wait till EL4, ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From fuerforen at gmx.de Sat Dec 25 12:53:03 2004 From: fuerforen at gmx.de (fuerforen@gmx.de) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Beowulf test Software Message-ID: <4094.1104007983@www56.gmx.net> Hi at all, I?m searching for a software, to test my Beowulf-cluster like mp3pvm, which seems to be disappeared from the i-net. Does anybody know a software, i could use for it, before programming my own? greetings Ingo -- +++ Sparen Sie mit GMX DSL +++ http://www.gmx.net/de/go/dsl AKTION f?r Wechsler: DSL-Tarife ab 3,99 EUR/Monat + Startguthaben From akhtar_samo at yahoo.com Mon Dec 27 00:55:08 2004 From: akhtar_samo at yahoo.com (akhtar Rasool) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Error while tstmachines still not solved In-Reply-To: <41BBBF21.6020304@verizon.net> Message-ID: <20041227085508.57475.qmail@web20022.mail.yahoo.com> Hi, Actually the MPICH is installed on the root (server) node, how would other nodes be able to see the path of mpi binaries…. As u have written, let me know how nodes would be able to see executable program & mpi libraries… Whatever MPI program I m executing it is giving the output but wall clock time is increasing as the –np argument value increase, because the tasks aren’t running on other nodes only on the server…. I m using a 2 node LINUX 9 cluster & MPICH 1.2.5.2 as an MPI……… I have to present my project on 30th December, kindly solve the problem…. Akhtar Glen Gardner wrote:The error in the 5th step is caused by a chatty login message. This makes mpi complain but it ought to work anyway. You want to turn off motd, and if using freebsd create a file called ".huslogin" and put it in the users home directory. The next error is to do with paths to mpich and to the program being launched. All the nodes need to be able to "see" the mpi binaries and need to be able to see the executable program. The paths to mpi and the program being launched need to be the same for all nodes and for the root node. Make sure the path is seutup properly in the environment. You may need to chek your mount points and setup NFS properly. The last one probably has to do with name resolution. The root node usually won't need to be in the machines.linux file, but all other nodes need to be. I believe you need to list machines by hostname, not ip addresses so be sure that both machines have the same hostfile, same .rhosts, etc. Glen The next message indicates that the path to the executable "mpichfoo" was not found. akhtar Rasool wrote: After the extraction of MPICH in /usr/local 1- tcsh 2- ./configure –with-comm=shared --prefix=/usr/local 3- make 4- make install 5- util/tstmachines in the 5th step error was Errors while trying to run rsh 192.168.0.25 –n /bin/ls /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from 192.168.0.25 n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo: n no such file or directory The ls test failed on some machines. This usually means that u donot have a common filesystem on all of the machines in your machines list; MPICH requires this for mpirun (it is possible to handle this in a procgroup file; see the……) Other possible problems include:- The remote shell command rsh doesnot allow you to run ls. See the doc abt remote shell & rhosts You have common filesystem, but with inconsistent names See the doc on the automounter fix 1 error were encountered while testing the machines list for LINUX only these machines seem to be available host1 now since this is only a two node cluster host1 is the server on to which MPICH is being installed. & 192.168.0.25 is the client….. rsh on both nodes is logging freely……. On the server side the file “ machines.LINUX “ contains -192.168.0.25 -host1 Kindly help Akhtar --------------------------------- Do you Yahoo!? The all-new My Yahoo! – What will yours do? --------------------------------- _______________________________________________Beowulf mailing list, Beowulf@beowulf.orgTo change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Glen E. Gardner, Jr.AA8CAMSAT MEMBER 10593Glen.Gardner@verizon.nethttp://members.bellatlantic.net/~vze24qhw/index.html --------------------------------- Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041227/0a9102c3/attachment.html From henry.gabb at intel.com Mon Dec 27 12:51:37 2004 From: henry.gabb at intel.com (Gabb, Henry) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] RE: MPICH with ICC 8.1 Message-ID: Hello John, I was able to build MPICH-1.2.5.2 without any problems using the Intel 8.1 compilers. The package id's on my system are l_fc_p_8.1.018 and l_cc_p_8.1.021. What compiler packages are you using? The -V option will give you this information. Here is the sequence of commands that I used to build MPICH: > source /opt/intel_fc_81/bin/ifortvars.sh > source /opt/intel_cc_81/bin/iccvars.sh > export FC=ifort > export CC=icc > configure >& configure.log > make >& make.log I checked the configure and make logs and there were no errors or problems. The mpif77 script compiled ./mpich-1.2.5.2/examples/basic/fpi.f without problems. I performed my tests on an Itanium cluster running Red Hat EL 3. Best regards, Henry Gabb Intel Parallel and Distributed Solutions Division From knagaraj at cs.rutgers.edu Mon Dec 27 16:10:32 2004 From: knagaraj at cs.rutgers.edu (Kiran Nagaraja) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Systems administration survey Message-ID: <41D0A478.9000507@cs.rutgers.edu> Hi, We are a systems research group at the Computer Science department at Rutgers University, and are conducting a survey to understand details about network, systems and database administration. We hope that this information would help us recreate a realistic environment to help research in 'systems management'. We request network, systems, and database administrators to take this survey. As an incentive, all surveys completed in their entirety will be entered into a drawing of a number of $50 gift certificates (from Amazon.com). We hope you have few minutes to take the survey which is located at: http://vivo.cs.rutgers.edu/administration_survey.html Research in our group: The goal of our research is to improve the overall availability and maintainability of services. Since administrators form an integral part of these services, a key aspect of this work is to build environments and tools that ease the task of service administration. In particular, environments which would help administrators know how their actions might impact the real service (before performing them for real), we believe, would be useful in preventing inadvertent actions. This survey tries to understand the existing environments, what administrators do currently to test the 'validity' of their actions, and the difficulties they face in doing so. The two specific systems we are looking at are networks and databases, as we believe these are important components of many services. If you have any questions regarding this survey or our work, feel free to email us: Kiran Nagaraja (knagaraj@cs.rutgers.edu), or Fabio Oliveira (fabiool@cs.rutgers.edu) Thanks for your time, Kiran Nagaraja Graduate student, Vivo Research Group (http://vivo.cs.rutgers.edu) Rutgers University. From agrajag at dragaera.net Tue Dec 28 06:44:54 2004 From: agrajag at dragaera.net (Sean Dilda) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] quick note on Redhat NFS issues with NAS units In-Reply-To: <41CF2626.2050005@scalableinformatics.com> References: <41CF2626.2050005@scalableinformatics.com> Message-ID: <41D17166.7000008@dragaera.net> Joe Landman wrote: > Folks: > > Been looking into why a Redhat EL3 WS x86_64 client hangs when > accessing a NAS based upon SuSE 9x. Turns out there are two > problems. I can reliably cause the problem to appear/dissappear on my > test hardware, and I thought others on this group would like to see > what I did to make the problems dissappear. > > Problem manifests itself with RedHat EL3 WS x86_64 clients. I have > not been able to replicate it with non-RHEL3 based clients, on the > same hardware, including FC2/FC3/Ubuntu/SuSE9x/... Problem does not > show up in 32 bit mode from what I can tell (need more testing but > preliminary data seems to support this). Motherboard are Tyan s288x > units. All have the Broadcom chipset ethernets. > By default RedHat installs tg3 kernel modules to drive these chips. > I have not been able to make the problem go away using the tg3 > driver. So I replaced the tg3 driver with the bcm570x driver from > Broadcom's download site. This did not make the problem go away, > though NFS mount now respected the intr option (did not with the tg3). > Next, I changed from udp to tcp. The original fstab line was > > 192.168.2.17:/big /big nfs udp,intr,bg 0 0 > > and the new one is > > 192.168.2.17:/big /big nfs > tcp,intr,bg,wsize=32768,rsize=32768 0 0 > > MTU changes did not affect the results (though they did improve on > some of the test timing). I'm successfully using RHEL3 x86_64 NFS clients mounting from a RHEL3 x86 server. My only options are 'rsize=8192,wsize=8192', and jumbo frames are enabled. From alex at DSRLab.com Tue Dec 28 09:14:16 2004 From: alex at DSRLab.com (Alex Vrenios) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Re: [Linux-HA] Couldn't get watchdog to work Message-ID: <200412281707.iBSH77M6012538@bluewest.scyld.com> > -----Original Message----- > Paul Chen wrote: > > Both nodes did restart > > heartbeat but none of them reboot or shut down. Am I doing > > something wrong? > > > Alan Robertson wrote: > The watchdog timer will only kill the system if heartbeat goes insane. > It didn't. So, the watchdog timer is happy. > > At this point in time, the watchdog timer is not a > replacement for a STONITH device. > Which is exactly what I am looking into (the STONITH device)... I see two solutions, one hardware and one software. The hardware solution looks expensive, but I believe the software solution will help Mr. Chen (above), and would appreciate comments. I would have my "backup" system execute a command as part of its attempts to assume the identity, responsibilities and resources of the "primary" system. The command is run from backup, as follows: root@backup> ssh root@primary shutdown -h now This will not work in all cases, but it should work in cases like the above. A hardware solution is more general, but it doesn't hurt to run this command in any case. Alex Vrenios DSRLab From pmcdonnell at muncc.marmionacademy.org Tue Dec 28 14:05:08 2004 From: pmcdonnell at muncc.marmionacademy.org (Patrick McDonnell) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Benchmarking a Cluster Message-ID: <200412281605.09720.pmcdonnell@muncc.marmionacademy.org> Hi, My high school has been working on a small beowulf cluster, consisting of several old computers (and a couple newer ones). (More specs available on the website in my sig). While the cluster is by no means powerful enough to impress people with benchmarks, it would be nice to be "buzzword-compliant," and at least have some nice graphs showing "benchmarks." I am not at all familiar with benchmarking, on clusters or otherwise, so I'd appreciate any advice I can get. Basically, what benchmarking utilities are most appropriate, what's the best data to present, the best way to present that data, etc. Currently, I have MPICH and PVM setup and functioning across the cluster. I also have POVRAY-3.50c setup with the PVMPOV patch. So far, my best attempt at a benchmark has been to compare the amount of time it takes POV-Ray to finish rendering its benchmark scene on the head-node vs. head-node + 4 nodes. (approx. 7 minutes, btw). Anyway, I appreciate any help I can get. Thanks. -- Patrick McDonnell ----------------------------------- MUNCC 2 System Administrator http://muncc.marmionacademy.org/ pmcdonnell@muncc.marmionacademy.org ----------------------------------- From fant at pobox.com Tue Dec 28 14:15:14 2004 From: fant at pobox.com (Andrew Fant) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Distributed Parallel Caching Filesystems Message-ID: <41D1DAF2.3090801@pobox.com> Hello and happy holidays to all, I am looking around at high-performance i/o environments as my latest obsessions. I am hoping to have a couple of systems available to play with various options, and am looking to see if anyone has implemented a clustered file system that can take span multiple I/O servers with multiple network interfaces and cache to local systems based on locality of data (something akin to the way SGI did CC-Numa for memory in the big Origin 2000/3000 Systems). Coda seems to have some of this functionality, but I really don't want to make users have to explicitly send and release data, and I do need standard posix semantics. I can sketch this out in a png and send it to anyone who wants to talk further, but for the moment, I hope this sets out what I am thinking of. Thanks to anyone who has any ideas. Andy From cflau at clc.cuhk.edu.hk Tue Dec 28 18:37:16 2004 From: cflau at clc.cuhk.edu.hk (John Lau) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] RE: MPICH with ICC 8.1 Message-ID: <1104287836.2709.80.camel@nuts.clc.cuhk.edu.hk> Hi Henry, The packages on my system are intel-icc8-8.1-026 and intel-ifort8-8.1-023. And I use them on a i686 machine. FYI, I have l_cce_pc_8.1.022 and l_fce_pc_8.1.022 on my EM64T and they can compile mpich with no problem. Best regards, John -- John Lau Chi Fai cflau@clc.cuhk.edu.hk Software Engineer Center for Large-Scale Computation From jrajiv at hclinsys.com Tue Dec 28 19:08:34 2004 From: jrajiv at hclinsys.com (Rajiv) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Increasing memory size in Opteron,NACONA,Itanium Message-ID: <001401c4ed53$af4aa630$0f120897@PMORND> Dear All, I would like to increase the memory size for benchmarking larger order matrices for LINPACK. I could do this by setting the environment variable P4_GLOBMEMSIZE in IA32 machines. What is the equivalent in Opteron,NACONA, and Itanium machines. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041229/65feb8b5/attachment.html From nixon at nsc.liu.se Wed Dec 29 01:11:41 2004 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: (Guy Coates's message of "Sun, 19 Dec 2004 11:37:26 +0000 (GMT)") References: <003f01c4e3f4$47695fb0$0f120897@PMORND> <200412181033.54167.mwill@penguincomputing.com> <41C4DBE2.40204@scalableinformatics.com> Message-ID: <87is6ljwoi.fsf@nsc.liu.se> Guy Coates writes: > The only time SAN attached storage helps is in the case of storage node > failures, as you have redundant paths between storage nodes and disks. And the added complexity of a fail-over mechanism might well lower your total MTBF. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------ From landman at scalableinformatics.com Wed Dec 29 05:29:16 2004 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] quick note on Redhat NFS issues with NAS units In-Reply-To: <20041229111002.GA12164@ii.uib.no> References: <41CF2626.2050005@scalableinformatics.com> <20041229111002.GA12164@ii.uib.no> Message-ID: <41D2B12C.7080003@scalableinformatics.com> Jan-Frode Myklebust wrote: >On Sun, Dec 26, 2004 at 03:59:18PM -0500, Joe Landman wrote: > > >>Folks: >> >> Been looking into why a Redhat EL3 WS x86_64 client hangs when >>accessing a NAS based upon SuSE 9x. >> >> > >Great, thanks for this note! > >I've been struggeling quite a bit myself with Rocks-3.3 on opteron >(IBM e326), with AIX as file-server. I still don't quite understand >exactly what caused my hangs, but after reverting back to udp, and >default mount options plus increasing the number of lock-daemons on >the AIX-server, I now have a stable NFS. Still struggeling a bit with >the NFS performance.. > >Should maybe test if bcm lets me go back to nfs over tcp. > > I may have spoken a bit early ... It works in my test enviroment, works on the compute nodes, fails on the head node. I can mount and unmount, and intr now works. I can see the top-most directory of the mount. Traverse the mount point by one level (say to any subdirectory) and do an ls, or something that does a stat, and it hangs. Only on the head node. Compute nodes work perfectly now. No hangs. None of the above mentioned behavior. I may reload the head node. I will be trying to force replication of this in my lab, but if I cannot, I will do the head node reload. I am starting to suspect some sort of cached state (which is incorrect) on the head node. > > >>ps: if there are some Redhat people reading the list, you know, we would >>like some modern kernels, and not lots of backported stuff, not to >>mention xfs, and other goodies ... (yeah, I know, wait till EL4, ...) >> >> >> > >Maybe someone should do a kernel-2.6 roll for Rocks... > > I just pulled down the ROCKS source trees with the intention of rolling a 2.6 (with XFS, Trond and others NFS patches, and Andi Kleen's x86_64 bits). If I get this done soon I'll post a note looking for crash dummies^H^H^H^H^H^H^H^H^H^H^H^H^H volunteers to help me test. Joe > > -jf > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 612 4615 From mwill at penguincomputing.com Wed Dec 29 09:05:56 2004 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] HPC and SAN In-Reply-To: <87is6ljwoi.fsf@nsc.liu.se> References: <003f01c4e3f4$47695fb0$0f120897@PMORND> <87is6ljwoi.fsf@nsc.liu.se> Message-ID: <200412290905.56387.mwill@penguincomputing.com> On Wednesday 29 December 2004 01:11 am, Leif Nixon wrote: > Guy Coates writes: > > > The only time SAN attached storage helps is in the case of storage node > > failures, as you have redundant paths between storage nodes and disks. > > And the added complexity of a fail-over mechanism might well lower > your total MTBF. Speaking from experience? The expectation when building a fail-over system is that the systems mtb-total-f is higher even though the mtb-partial-f is shorter (more parts that can fail). Of course the probability, that the failover logic / software is the new single-point-of-failure, is not zero either. Michael -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com From Jan-Frode.Myklebust at bccs.uib.no Wed Dec 29 03:10:02 2004 From: Jan-Frode.Myklebust at bccs.uib.no (Jan-Frode Myklebust) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] quick note on Redhat NFS issues with NAS units In-Reply-To: <41CF2626.2050005@scalableinformatics.com> References: <41CF2626.2050005@scalableinformatics.com> Message-ID: <20041229111002.GA12164@ii.uib.no> On Sun, Dec 26, 2004 at 03:59:18PM -0500, Joe Landman wrote: > Folks: > > Been looking into why a Redhat EL3 WS x86_64 client hangs when > accessing a NAS based upon SuSE 9x. Great, thanks for this note! I've been struggeling quite a bit myself with Rocks-3.3 on opteron (IBM e326), with AIX as file-server. I still don't quite understand exactly what caused my hangs, but after reverting back to udp, and default mount options plus increasing the number of lock-daemons on the AIX-server, I now have a stable NFS. Still struggeling a bit with the NFS performance.. Should maybe test if bcm lets me go back to nfs over tcp. > > ps: if there are some Redhat people reading the list, you know, we would > like some modern kernels, and not lots of backported stuff, not to > mention xfs, and other goodies ... (yeah, I know, wait till EL4, ...) > Maybe someone should do a kernel-2.6 roll for Rocks... -jf From roy_grid at soluris.com Wed Dec 29 06:22:17 2004 From: roy_grid at soluris.com (Roye Avidor) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] General info ... Message-ID: <5.1.1.5.2.20041229091553.00bb25f8@pop.concord.tt.slb.com> Hello all, I'm new to the beowulf cluster, and I would like to read detailed information about it. I couldn't find one that will educate me about how exactly the beowulf is working ( in respect to the networking use of it ). and what is the relationship between beowulf and OSCAR. Thanks for your reply, Roye Avidor From Angel.R.Rivera at conocophillips.com Wed Dec 29 10:04:26 2004 From: Angel.R.Rivera at conocophillips.com (Rivera, Angel R) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] HPC and SAN Message-ID: I would not be quite so quick to discount a SAN. We have just received ours and I am adding to our cluster after 3 months of testing. I have worked hard for almost a year to get one in. You can build as much complexity as you want into it-but does not have to be this deep dark hole some might want you to believe it is. For us, it gives us a consolidated location for the disks with sufficient spares, and the Linux heads we get and can monitor. -ARR -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Michael Will Sent: Wednesday, December 29, 2004 11:06 AM To: beowulf@beowulf.org Cc: Leif Nixon Subject: Re: [Beowulf] HPC and SAN On Wednesday 29 December 2004 01:11 am, Leif Nixon wrote: > Guy Coates writes: > > > The only time SAN attached storage helps is in the case of storage node > > failures, as you have redundant paths between storage nodes and disks. > > And the added complexity of a fail-over mechanism might well lower > your total MTBF. Speaking from experience? The expectation when building a fail-over system is that the systems mtb-total-f is higher even though the mtb-partial-f is shorter (more parts that can fail). Of course the probability, that the failover logic / software is the new single-point-of-failure, is not zero either. Michael -- Michael Will, Linux Sales Engineer NEWS: We have moved to a larger iceberg :-) NEWS: 300 California St., San Francisco, CA. Tel: 415-954-2822 Toll Free: 888-PENGUIN Fax: 415-954-2899 www.penguincomputing.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hanzl at noel.feld.cvut.cz Wed Dec 29 16:32:34 2004 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Distributed Parallel Caching Filesystems In-Reply-To: <41D1DAF2.3090801@pobox.com> References: <41D1DAF2.3090801@pobox.com> Message-ID: <20041230013234E.hanzl@unknown-domain> > I am looking around at high-performance i/o environments ... > ... a clustered file system that can take span multiple I/O > servers with multiple network interfaces and cache to local systems > based on locality of data ... Coda seems to have some of > this functionality, but I really don't want to make users have to > explicitly send and release data, and I do need standard posix > semantics. ... I think NFS with fscache could provide much of Coda functionality in very cluster-friendly manner. It is still in testing phase but I consider it one of the 'strategically safest' options for these reasons: - it could get to 2.6 mainline kernel - many non-cluster users will be also happy with it There are (and was) many other projects going this way but quite often there are reasons to be very skeptical about their long time viability or even their actual existence (I mean existence of anything usable behind the hype). It is possible for a project to make headlines, be recommended over and over again and yet have no single user who would attest that it works for him. It is possible to make quite interesting and working kernel modifications going the way you described but it is much harder to keep the thing alive when kernel changes. So, to name something usable I trust ... PVFS comes to my mind, but it might not have all the functionality you want (but it has very nice and friendly team of developers) ... I do not recall much else, others please help me... Regards Vaclav Hanzl http://www.redhat.com/archives/linux-cachefs/2004-October/msg00027.html - 2.6.9-rc4-mm1 patch that will enable NFS (even NFS4) to do persistent file caching on the local harddisk http://www.redhat.com/archives/linux-cachefs/2004-October/msg00004.html - older message explaining what is going on http://www.redhat.com/archives/linux-cachefs/2004-October/msg00019.html - about ways to get this to the mainline kernel http://www.redhat.com/mailman/listinfo/linux-cachefs - list archives and subscription page https://www.redhat.com/archives/linux-cachefs/2004-November/msg00005.html - patch against vanilla 2.6.9 to try it out From eugen at leitl.org Thu Dec 30 04:20:59 2004 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Xgrid and Mosix (fwd from john@rudd.cc) Message-ID: <20041230122059.GL9221@leitl.org> ----- Forwarded message from John Rudd ----- From: John Rudd Date: Wed, 29 Dec 2004 19:37:02 -0800 To: xgrid-users@lists.apple.com Subject: Xgrid and Mosix X-Mailer: Apple Mail (2.619) I see in the archives that someone asked about OpenMosix back in September ( http://lists.apple.com/archives/xgrid-users/2004/Sep/msg00023.html ), but I didn't see any responses. So I thought I'd ask too, but with a little more detail. The thing that I find interesting about the Mosix style distributed computing environment is that applications do NOT need to be re-written around them. Mosix abstracts the distributed computing cluster away from the program and developer in the same way that threads abstract multi-processing away from the program and developer. Under Mosix, any program, without having to be written around any special library, without having to be relinked or recompiled, can be moved off to another processing node if there are nodes that are significantly less busy than yours. And, AFAIK, any multi-threaded application can make use of multiple nodes (with threads being spawned on any host that is less loaded than the current node). Imagine taking a completely mundane but multi-threaded application (I'll assume Photoshop is multi-threaded and use that as an example). Suddenly, without having to get Adobe to support Xgrid, you can use Xgrid to speed up your Photoshop rendering. It seems to me that a similar set of features could be added to Xgrid. The threading and processing spawning code within the kernel could be extended by Xgrid to check for lightly loaded Agents, and move the new process or thread to that Agent. Only the IO routines would need to exist on the Client (and even then, maybe not: if every node has similar filesystem image, then only the UI (for user bound applications) or primary network interface code (for network daemons/servers) needs to run on the original Client system). From what I recall, the mach microkernel already makes some infrastructure for this type of thing available, it just needs to be utilized, and done deep enough in the kernel that an application doesn't need to know about it. Though, that does bring up one consideration: I have a friend who did a lot of distributed computing work when he was working for Flying Crocodile (a web hosting company that specialized in porn sites, where his distributed computing code had to support multiple-millions of hits per second). His experience there gave him a concern about Mosix style distributed computing. One of the advantages of something like Beowulf is that the coder often needs to control what things need to be kept low latency (must use threads for SMP on the local processor) and what things can have high latency (can use parallel code on the network), and the programming interface type of distributed computing gives them that flexibility. The idea that I suggested was something like nice/renice in unix, where you could specify certain parallelism parameters to a process before you run it, or after it is already running. For example, instead of "process priority", you might specify a sort of "process affinity" or "thread affinity". For process affinity, a low number (which means high affinity, just like priority and nice numbers) means "when this process creates a child, it must be kept close to the same CPU as the one that spawned it". Thread affinity would be the same, but for threads. A default of zero means "everything must run locally". A high number means "I can tolerate more latency" (so, "latency tolerance" would be the opposite of "affinity"). (it occurs to me after I wrote all of this that it might be easier for the end user to think in terms of "latency tolerance" instead of "process affinity", high latency = high number, instead of the opportunity for confusion that affinity has since the numbers go in the opposite direction ... I hope all of that made sense) A process with a low process affinity (high number) and a high thread affinity (low number) means that it can spawn new tasks/processes/applications anywhere in the network, but any threads for it (or its sub-processes) must exist on the same node as its main thread. Or, if you want all of the applications to be running on your workstation/Client, but run their threads all over the network, then you set a high process affinity (low number), and a low thread affinity (high number). I would have the xgrid command line tool have such a facility (I don't know if it does already or not, I haven't really done much with xgrid) similar to both the "nice" and "renice" commands. I would also add a preference pane that allows the user to set a default process affinity, a default thread affinity, and a list of applications and default affinities for each of those applications (so that they can be exceptions to the default, without the user having to set it via command line every time). Last, I would add a tool, possibly attached to the Xgrid tachometer, which would allow me to adjust an affinity after a program was running. The only thing up in the air is the ability to move a running thread from one node to another while it's running (well, during a context switch, really). I know a friend of mine at Ga Tech was doing PhD research on that (portable threads) 10ish years ago, but I don't know if it got anywhere. But, that would allow someone to lower the number of an application's affinity while it's running, thus recalling the threads or processes from a remote Agent to the local Client (the scenario being I have a laptop that is an Xgrid Client, and I start running applications that spread out across the network ... then I get up to leave, so I lower the affinity numbers of everything so that the tasks and threads come back to my laptop, running slower now that they have fewer nodes to run upon, but still running (or sleeping, as the case might be)). So ... all of that leads up to: does anyone know if Xgrid is working on this type of Application-Transparent Distributed Computing that Mosix, OpenMosix, and I think OpenSSI have? I think it would be a natural extension to Xgrid: Apple is trying to make this as "it just works" as possible, so it seems that it should not only be easy for the sysadmin to set up the distributed computing cluster, but easy/transparent for the developer, too (in the same way that threads made Multi-Processing easier and more abstract for the developer, this type of distributed computing makes threads not just a multi-processing model, but a distributed computing model). Ultimately, it even makes distributed computing easy for the user: they don't need to learn how to re-code a program (or coerce a vendor into making a distributed version of their application), any multi-threaded application will use multiple nodes, and even single-threaded non-distributed applications can be run on remote nodes. That seems like a powerful "it just works" capability to me. (the main drawback of Mosix, OpenMosix, and OpenSSI from my perspective is that they're Linux only, specifically developed for the Linux kernel ... but I'd really love to see something like them available for Mac OS X) Thoughts? _______________________________________________ Do not post admin requests to the list. They will be ignored. Xgrid-users mailing list (Xgrid-users@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xgrid-users/eugen%40leitl.org This email sent to eugen@leitl.org ----- End forwarded message ----- -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041230/ed307a9e/attachment.bin From daniel.kidger at quadrics.com Thu Dec 30 07:11:31 2004 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Increasing memory size in Opteron,NACONA,Itanium In-Reply-To: <001401c4ed53$af4aa630$0f120897@PMORND> References: <001401c4ed53$af4aa630$0f120897@PMORND> Message-ID: <200412301511.31279.daniel.kidger@quadrics.com> Rajiiv, On Wednesday 29 December 2004 3:08 am, Rajiv wrote: > Dear All, > I would like to increase the memory size for benchmarking larger order > matrices for LINPACK. I could do this by setting the environment variable > P4_GLOBMEMSIZE in IA32 machines. What is the equivalent in Opteron,NACONA, > and Itanium machines. the 'P4' in P4_GLOBMEMSIZE refers to a driver model for MPI called 'P4'. It is nothing to do with the Pentium4 I suspect that for Opteron/Nocona and IA64 for that matter you do not need to set this variable at all. Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From john.hearns at streamline-computing.com Wed Dec 29 02:07:20 2004 From: john.hearns at streamline-computing.com (John Hearns) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Benchmarking a Cluster In-Reply-To: <200412281605.09720.pmcdonnell@muncc.marmionacademy.org> References: <200412281605.09720.pmcdonnell@muncc.marmionacademy.org> Message-ID: <1104314840.3768.15.camel@localhost.localdomain> On Tue, 2004-12-28 at 16:05 -0600, Patrick McDonnell wrote: > Hi, > > My high school has been working on a small beowulf cluster, consisting of > several old computers (and a couple newer ones). (More specs available on > the website in my sig). While the cluster is by no means powerful enough to > impress people with benchmarks, it would be nice to be "buzzword-compliant," > and at least have some nice graphs showing "benchmarks." > Congratulations! I hope you have had fun building your project - and I'm sure you learned loads. You've asked the $20 000 question on this list! Discussions about benchmarking take place all the time :-) Actually you have done the right thing - benchmark your setup with some example codes you are interested in running. That is the conclusion of most benchmarking discussions here! The next stage for you, should you be interested, is to go do some research in your local library or on the internet re. benchmarking. You'll get good advice from other people on this list. Another thought I had would be to go to the Top 500 website http://www.top500.org and look at how benchmarking is done for that list. Don't be at all disappointed if you get a low result - it would be a good achievement to get to the stage of running the benchmark, and you don't have a system costing many $$$ But it would be instructive to get your benchmark number, and see what rank your setup would have had ten years ago. For the list - I realise that it is a bad idea to point Patrick to this one benchmark, but the intent is to point the students at the Top 500 and get them to see the systems there. Patrick, there are plenty of other resources on benchmarking, including the book on Beowulf Clustering by Sterling et. al. (MIT Press) http://mitpress.mit.edu/catalog/item/default.asp?tid=8681&ttype=2 The new OReilly book on clusters suggests the following benchmarks: HINT, used to test subsystem performance http://hint.byu.edu High Performance Linpack (as used in Top 500 ranking) http://www.netlib.org/benchmark/hpl Iozone for disk and filesystem benchmarking http://www.iozone.org Iperf to measure network performance http://dast.nlanr.net/Projects/Iperf Just getting some of these benchmarks to run will teach you and your team a lot. Happy New Year when it comes, and don't spend TOO much time in front of those monitors! From jrajiv at hclinsys.com Wed Dec 29 23:01:46 2004 From: jrajiv at hclinsys.com (Rajiv) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] OpenMP/PBS tuning Message-ID: <014001c4ee3d$6d231d10$0f120897@PMORND> Dear All, I am planning to setup a cluster with 3 Nacona machines and 3 Opteron machines - each two processors. Is any OpenMP tuning / PBS tuning required to make the cluster work efficiently. Regards, Rajiv -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041230/cd70fcd7/attachment.html From Jan-Frode.Myklebust at bccs.uib.no Thu Dec 30 05:53:36 2004 From: Jan-Frode.Myklebust at bccs.uib.no (Jan-Frode Myklebust) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] quick note on Redhat NFS issues with NAS units In-Reply-To: <41D2B12C.7080003@scalableinformatics.com> References: <41CF2626.2050005@scalableinformatics.com> <20041229111002.GA12164@ii.uib.no> <41D2B12C.7080003@scalableinformatics.com> Message-ID: <20041230135336.GC25987@ii.uib.no> On Wed, Dec 29, 2004 at 08:29:16AM -0500, Joe Landman wrote: > > I may have spoken a bit early ... Sorry to hear that.. > It works in my test enviroment, works > on the compute nodes, fails on the head node. I can mount and unmount, > and intr now works. I can see the top-most directory of the mount. > Traverse the mount point by one level (say to any subdirectory) and do > an ls, or something that does a stat, and it hangs. Only on the head > node. Compute nodes work perfectly now. No hangs. None of the above > mentioned behavior. My hangs were also only (?) on the head node, but I couldn't reliably trigger it. After a while (10's of minutes) the hang would start, I could still list directories, but anything touching the files would hang, and only way I found to recover was to reboot. The difference between the head and the compute nodes is mainly that the head node will typically have a lot more users and processes active on the filesystems, while the compute nodes will work more sequential (open one file, read it, close it, open next file, etc..), so maybe my increase of lock-daemons on the server was the cure for me. Do you have any parameters to tune on your NAS-box? Supporting a full cluster might put a different load on it than it was originally aimed at. -jf From mlbertog at yahoo.com Thu Dec 30 07:21:19 2004 From: mlbertog at yahoo.com (Mario Leandro Bertogna) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] mpirun -nolocal option seems no to be working Message-ID: <20041230152119.66797.qmail@web41603.mail.yahoo.com> Hi, I've 3 sun sparc ultra 10 with suse 7.3, mpich 1.2.6. and ssh v2 Everything works ok if I run this example from machine c02: #../../bin/mpirun -np 4 cpi Process 0 of 4 on c02.uncoma.edu.ar pi is approximately 3.1415926544231239, Error is 0.0000000008333307 wall clock time = 0.015761 Process 1 of 4 on c03.uncoma.edu.ar Process 2 of 4 on c04.uncoma.edu.ar Process 3 of 4 on c02.uncoma.edu.ar But if i want to use -nolocal options, mpi just runs in one processor, even using -np 4 # ../../bin/mpirun -np 4 -nolocal cpi Process 0 of 1 on c03.uncoma.edu.ar pi is approximately 3.1415926544231341, Error is 0.0000000008333410 wall clock time = 0.003563 I tried this, and everything seems to be OK ./mpirun -np 4 -no-local -v -t ../examples/basic/cpi running /usr/local/mpich-1.2.6/bin/../examples/basic/cpi on 4 LINUX ch_p4 processors Procgroup file: c03 0 /usr/local/mpich-1.2.6/bin/../examples/basic/cpi c04 1 /usr/local/mpich-1.2.6/bin/../examples/basic/cpi c03 1 /usr/local/mpich-1.2.6/bin/../examples/basic/cpi c04 1 /usr/local/mpich-1.2.6/bin/../examples/basic/cpi ssh c03 And I force -p4pg with the procgrout file and gives me the next error: ./mpirun -p4pg pp.txt ../examples/basic/cpi rm_20531: p4_error: rm_start: net_conn_to_listener failed: 3975 p0_30125: p4_error: Child process exited while making connection to remote process on c04: 0 P4 procgroup file is pp.txt. I just want to run mpi but not in the master node, someone has an idea what's happening? Thanks in advance Leandro _________________________________________________________ Do You Yahoo!? Informaci?n de Estados Unidos y Am?rica Latina, en Yahoo! Noticias. Vis?tanos en http://noticias.espanol.yahoo.com From eugen at leitl.org Fri Dec 31 08:19:19 2004 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:03:40 2009 Subject: [Beowulf] Re: Xgrid and Mosix (fwd from prabhaka@apple.com) Message-ID: <20041231161918.GG9221@leitl.org> ----- Forwarded message from Ernest Prabhakar ----- From: Ernest Prabhakar Date: Thu, 30 Dec 2004 12:10:28 -0800 To: John Rudd Cc: xgrid-users@lists.apple.com Subject: Re: Xgrid and Mosix X-Mailer: Apple Mail (2.688) Hi John, On Dec 29, 2004, at 7:37 PM, John Rudd wrote: >So ... all of that leads up to: does anyone know if Xgrid is working >on this type of Application-Transparent Distributed Computing that >Mosix, OpenMosix, and I think OpenSSI have? Thanks for your detailed comments, it was very educational. While I can't comment on future plans, I do want to point out that Xgrid is really a "distributed resource manager" like Sun Grid Engine, not an 'application environment' like MPI or OpenMP. In particular, the Xgrid API and communication model are all designed around submitting and starting jobs -- they have absolutely no awareness of (or control over) what goes inside an application. While I'm not directly familiar with Mosix et al, from your description it sounds like a kernel-level thread migration service, which is roughly analogous to the user-level task migration facilities of Xgrid. From what I can tell, MOSIX also requires you to explicitly fork off separate serial threads or tasks, and doesn't provide any inter-process communication, so I'm not sure how it is really any different than Xgrid -- at least from the perspective of the *computation engine*. Xgrid also allows you to take an existing, serializable application and run it (unmodified) on multiple systems. The main difference, as best I can tell, is that MOSIX leverages the kernel's scheduler to automatically migrate discrete processes to other machines, whereas Xgrid requires explicit invocation (e.g., via the xgrid(1) command-line tool). However, I'm no expert on these matters -- perhaps someone who understand both could comment more helpfully on a comparison between the two. Best, Ernie P. ------------------------------------------------------------------------ ----------------- Ernest N. Prabhakar, Ph.D. (408) 974-3075 Xgrid Product Manager, System Software Marketing Apple Computer; 303-4SW 3 Infinite Loop; Cupertino, CA 95014 _______________________________________________ Do not post admin requests to the list. They will be ignored. Xgrid-users mailing list (Xgrid-users@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xgrid-users/eugen%40leitl.org This email sent to eugen@leitl.org ----- End forwarded message ----- -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20041231/a6cf8e00/attachment.bin