From rgb at phy.duke.edu Mon Apr 1 05:27:43 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:12 2009 Subject: DHCP Help In-Reply-To: Message-ID: On Sat, 30 Mar 2002, Adrian Garcia Garcia wrote: > Hello everybody, I'm a beginner and I have been having problems with my > dhcp server, I cant assign the ip's to the clients, I dont know exactly > if the server is not working or the client. I am working with Red Hat 7.1 > and my dhcp client is dhcpcd because I tried with pump but It was not > work. Please, Please, can anybody give some halp, what can I do???? Sorry > for my poor english, In fact I speak spanish. Pleas help. Thanks a lot. > > ________________________________________________________________________________ > Join the world?s largest e-mail service with MSN Hotmail. Click Here > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Sr Garcia, Por favor, encuentre un ejemplo de mi configuracion que yo uso a mi casa por mi beowulf privada. Esto es por dhcpd, en /etc/dhcpd.conf, y es por una red interna privada con IP numeros 192.168. Nota bien los tres secciones. Esto funciona bien por computadores que boot en Windows o Linux o otro con clientes dhcp -- algunos de mi computadores a casa boot ambos. Nota tambien los direcciones: range 192.168.1.192 192.168.1.224; solamente estos estan usado para computadores no conocido por el servidor con numeros ethernet registrado y direcciones staticos. Espero que esto se ayuda on poquito. Y desculpame de mi Espanol malo; es (estoy seguro) peor que su Ingles, pero yo necesito la practica. rgb ############################################################################## # # /etc/dhcpd.conf - configuration file for our DHCP/BOOTP server # ########################################################### # Global Parameters ########################################################### option domain-name "rgb.private.net"; option domain-name-servers 152.3.250.1; option subnet-mask 255.255.255.0; option broadcast-address 192.168.1.255; use-host-decl-names on; ########################################################### # Subnets ########################################################### shared-network RGB { subnet 192.168.1.0 netmask 255.255.255.0 { range 192.168.1.192 192.168.1.224; default-lease-time 43200; max-lease-time 86400; option routers 192.168.1.1; option domain-name "rgb.private.net"; option domain-name-servers 152.3.250.1; option broadcast-address 192.168.1.255; option subnet-mask 255.255.255.0; } } ########################################################### # Static IP addresses managed by DHCP server ########################################################### # Personal Computers (MSDOS/Win-3.x/WfW/Win-95/Win-NT/MacOS) #host hostname { # hardware ethernet xx:xx:xx:xx:xx:xx; # fixed-address 152.3.xxx.xxx; # option host-name hostname; # option routers 152.3.xxx.250; #} # UNIX systems #host hostname { # hardware ethernet xx:xx:xx:xx:xx:xx; # fixed-address 152.3.xxx.xxx; # option host-name hostname; # option routers 152.3.xxx.250; #} # adam future gateway redux? 300MHz Celeron host adam { hardware ethernet 00:20:18:58:27:1a; fixed-address 192.168.1.1; next-server 192.168.1.131; option domain-name "rgb.private.net"; option host-name "adam"; } # caine (Linux/Windows workstation) # (Linux/Windows workstation) host tyrial { hardware ethernet 00:a0:cc:59:45:9b; fixed-address 192.168.1.134; next-server 192.168.1.131; option routers 192.168.1.1; option domain-name "rgb.private.net"; option host-name "tyrial"; } etc... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From eugen at leitl.org Mon Apr 1 13:19:31 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:12 2009 Subject: FY;) Google's secret clustering technology Message-ID: http://www.google.com/technology/pigeonrank.html From emiller at techskills.com Mon Apr 1 17:41:18 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:12 2009 Subject: Syntax for executing Message-ID: Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node cluster. - What is the syntax to run an executable in the cluster environment? For example, I run NP=5 mpi-mandel to run the test fractal program. How would I execute say, SETI, using the cluster? Assume that the SETI executable is in the PATH. Also, the older version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? - What would cause all but one of the processors to show usage in beostatus? The node shows "up" in every other way: hardware identical, memory, swap, network, etc....just when I run something, only that one processor on one node shows no % usage. -ETM .~. /V\ // \\ /( )\ ^'~'^ From hanzl at noel.feld.cvut.cz Tue Apr 2 00:28:38 2002 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:02:12 2009 Subject: Newest RPM's? In-Reply-To: <002c01c1d933$ec02a3c0$c31fa6ac@xp> References: <1017612125.19271.20.camel@vhwalke.mathsci.usna.edu> <002c01c1d933$ec02a3c0$c31fa6ac@xp> Message-ID: <20020402102838A.hanzl@unknown-domain> > I am using RH7.2 on my master node and would like to RPM the latest stable > version of Scyld, instead of using the CD (I have 27Bz-7, based on RH6.2) I am not sure there is RH7.2 based Scyld system already available, thought it is quite possible I missed something. You may consider Clustermatic - it is similar to Scyld but smaller (and therefore easier), rpm install on top of RH7.2 works great and you may download iso images if you want. http://www.clustermatic.org See my previous post "Clustermatic: smooth upgrade to new version" for rpm-install microhowto: http://www.beowulf.org/pipermail/beowulf/2002-March/002969.html HTH Vaclav Hanzl From daniel.kidger at quadrics.com Tue Apr 2 01:10:20 2002 From: daniel.kidger at quadrics.com (Dan Kidger) Date: Wed Nov 25 01:02:12 2009 Subject: FY;) Google's secret clustering technology References: Message-ID: <002601c1da27$10987090$0100a8c0@spot> ----- Original Message ----- Eugen Leit" wrote: >To: >Sent: Monday, April 01, 2002 10:19 PM >Subject: FY;) Google's secret clustering technology > > http://www.google.com/technology/pigeonrank.html > This is a very interesting article. However there is no mention of them using the Quadrics Interconnect, nor that matter Myrinet, Scali or even plain ethernet. I can only assume the whole cluster is run by just using cereal lines. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From hanzl at noel.feld.cvut.cz Tue Apr 2 01:51:11 2002 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:02:12 2009 Subject: 27z-9 ? (Was: Syntax for executing ) In-Reply-To: References: Message-ID: <20020402115111Y.hanzl@unknown-domain> > Hey all, got a five-node cluster up running 27-z9 I can see just a few files at ftp://ftp.scyld.com/pub/beowulf/27z-9/ Please can anybody comment on status of 27z-9 ? Thanks Vaclav From rbw at ahpcrc.org Tue Apr 2 07:48:50 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:02:12 2009 Subject: Uptime data/studies/anecdotes ... ? Message-ID: <200204021548.g32Fmod13276@mycroft.ahpcrc.org> All, What information is available on typical uptimes of large-scale, clusters ... say greater than 256 processors and running a multi-user workload. What gains do single-point-of-administration tools like SCYLD provide? Clearly, there are a great number of things one can do to maximize uptime/utilization (not the same thing really). What are the essentials from the lists point of view? If a good figure is, say, 80% utilization over a 8760 hour year today, what will this number be in three years? Annual utilization for the 1088 processor T3E we run here is about 95%. How long until a similarly sized cluster typically yields the same value? Regards, rbw #--------------------------------------------------- # # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # "Why waste time learning when ignornace is # instantaneous?" -Thomas Hobbes #--------------------------------------------------- From roger at ERC.MsState.Edu Tue Apr 2 08:15:00 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:02:12 2009 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: <200204021548.g32Fmod13276@mycroft.ahpcrc.org> Message-ID: We currently run an average of about 75% utilization on our 586 processor (293 node) cluster. We probably have about one node per week crash and hang for various reasons. We have occasional problems with memory leaks or PBS hangups which require large scale reboots of the cluster. (Actually, PBS just died as I'm typing this, but our pbs heartbeat script should restart it automatically in a few minutes). I'd say we have to do a full reboot of the cluster about every 3-4 months. For a bunch of PC hardware running a free OS, this seems like a pretty good number to me. It's not in the same class as our Sun servers (nor even our SGIs!), but then, none of those systems are this large, either. On Tue, 2 Apr 2002, Richard Walsh wrote: > > All, > > What information is available on typical uptimes > of large-scale, clusters ... say greater than 256 > processors and running a multi-user workload. What > gains do single-point-of-administration tools like > SCYLD provide? Clearly, there are a great number > of things one can do to maximize uptime/utilization > (not the same thing really). What are the essentials > from the lists point of view? > > If a good figure is, say, 80% utilization over a > 8760 hour year today, what will this number be in > three years? Annual utilization for the 1088 processor > T3E we run here is about 95%. How long until a similarly > sized cluster typically yields the same value? > > Regards, > > rbw > > #--------------------------------------------------- > # > # Richard Walsh > # Project Manager, Cluster Computing, Computational > # Chemistry and Finance > # netASPx, Inc. > # 1200 Washington Ave. So. > # Minneapolis, MN 55415 > # VOX: 612-337-3467 > # FAX: 612-337-3400 > # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com > # > #--------------------------------------------------- > # "What you can do, or dream you can, begin it; > # Boldness has genius, power, and magic in it." > # -Goethe > #--------------------------------------------------- > # "Without mystery, there can be no authority." > # -Charles DeGaulle > #--------------------------------------------------- > # "Why waste time learning when ignornace is > # instantaneous?" -Thomas Hobbes > #--------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From rbw at ahpcrc.org Tue Apr 2 10:24:22 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:02:12 2009 Subject: Uptime data/studies/anecdotes ... ? Message-ID: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> On Tue, 2 Apr 2002 10:15:00 Roger Smith wrote: >We currently run an average of about 75% utilization on our 586 processor >(293 node) cluster. We probably have about one node per week crash and >hang for various reasons. > >We have occasional problems with memory leaks or PBS hangups which require >large scale reboots of the cluster. (Actually, PBS just died as I'm typing >this, but our pbs heartbeat script should restart it automatically in a >few minutes). I'd say we have to do a full reboot of the cluster about >every 3-4 months. >For a bunch of PC hardware running a free OS, this seems like a pretty >good number to me. It's not in the same class as our Sun servers (nor >even our SGIs!), but then, none of those systems are this large, either. Thanks for the estimate. Do you use SCYLD or another pseudo-single-system- image tool? I assume that 75% is a steady state number ... how long did it take your group to reach that state? If a full reboot is required only every 3-4 months then is singel node failure your main source of cycle loss? Or are other things like inefficient scheduling and lack of check-point/restart, etc. important? 75% does seem like a reasonably good number. rbw From roger at ERC.MsState.Edu Tue Apr 2 10:46:07 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:02:12 2009 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> Message-ID: On Tue, 2 Apr 2002, Richard Walsh wrote: > Thanks for the estimate. Do you use SCYLD or another pseudo-single-system- > image tool? Nope, We use RH 7.2, PBS, and MPI/Pro, MPICH, and LAM MPI. > I assume that 75% is a steady state number ... how long did > it take your group to reach that state? Our users are a bit "bursty". The cluster rarely drops below 50%. Looking back through my records, it hasn't been below 140 processors in use in several weeks, and has spent most of its time with 400+ in use. As we near project deadlines, we often have jobs waiting in the queue. I've seen as many as 1100 processors in use, or requested and waiting. When we upgraded from 324 to 586 processors, the users were banging on my door wanting to know when the new nodes were available. Within an hour or releasing the new nodes (and without any notification to the users), they were already using over 500 processors. I'm currently working on an expansion to about 1036 processors, and I fully expect to see it slammed within a few days of release. > If a full reboot is required > only every 3-4 months then is singel node failure your main source of > cycle loss? Or are other things like inefficient scheduling and lack of > check-point/restart, etc. important? PBS is our leading cause of cycle loss. We now run a cron job on the headnode that checks every 15 minutes to see if the PBS daemons have died, and if so, it automatically restarts them. About 75% of the time that I have a node fail to accept jobs, it is because its pbs_mom has died, not because there is anything wrong with the node. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From gabriel.weinstock at dnamerican.com Tue Apr 2 13:00:03 2002 From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock) Date: Wed Nov 25 01:02:12 2009 Subject: Maui scheduler error Message-ID: <21193187508190@DNAMERICAN.COM> Hi, We are having trouble getting the Maui scheduler to work. We have no problem starting the server/scheduler and drone programs. (For testing, we are not starting the drone on every node in the cluster; is this a problem?) The set up is 'mauictl start' on the head node, followed by 'nodectl start' on 2 compute nodes. 'showq' works correctly. The log files show all three nodes processing correctly; right up until a user submits a job, at which point the server node spits out the following message to its log file and exits: - log file - 4/02 15:21:25 (Sched.java:299) iteration 36 04/02 15:21:25 (Wiki.java:392) Wiki loop event 04/02 15:21:25 (BackfillMod.java:147) backfill scheduling 04/02 15:21:25 (ReservationsMod.java:105) handling reservations 04/02 15:21:25 (JobChecker.java:220) checkpointing... 04/02 15:21:25 (Sched.java:311) scheduling interval took 0.016 seconds 04/02 15:21:29 (BasicWorker.java:430) mauisubmit 04/02 15:21:29 (MauiSubmit.java:96) mauisubmit 04/02 15:21:29 (MauiSubmit.java:128) LRM cmdfile 04/02 15:21:29 (CMD.java:280) Removing envvar HOSTNAME 04/02 15:21:29 (CMD.java:280) Removing envvar MACHTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar HOSTTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar OSTYPE 04/02 15:21:29 (CMD.java:280) Removing envvar _ 04/02 15:21:29 (MauiMySQL.java:268) Changing romeda's job account to no-account 04/02 15:21:29 (MauiSubmit.java:199) checking job on RM=Node 04/02 15:21:29 (BasicPolicy.java:111) pre debiting bank for 7200 slotsecs for job=romeda:1017778889:0 04/02 15:21:29 (MauiXMLHandlerImpl.java:284) FATAL: org.xml.sax.SAXParseException: Illegal XML character: �. 04/02 15:21:29 (BasicWorker.java:244) Ignoring SAX freak-out: Illegal XML character: �. 04/02 15:21:30 (Sched.java:326) ---------------------------------------------------- 04/02 15:21:30 (Sched.java:299) iteration 37 04/02 15:21:30 (Wiki.java:392) Wiki loop event 04/02 15:21:30 (BackfillMod.java:147) backfill scheduling 04/02 15:21:30 (BackfillMod.java:164) contemplating job romeda:1017778889:0 04/02 15:21:30 (Sched.java:330) java.lang.ArrayIndexOutOfBoundsException java.lang.ArrayIndexOutOfBoundsException at unm.maui.rm.SimpleMatcher.getNodeAvailSlotIDs(SimpleMatcher.java:563) at unm.maui.rm.SimpleMatcher.getNodesSlots(SimpleMatcher.java:377) at unm.maui.rm.SimpleMatcher.getNodesSlots(SimpleMatcher.java:256) at unm.maui.rm.SimpleMatcher.findNodesSlots(SimpleMatcher.java:79) at unm.maui.sched.BackfillMod.makeReservation(BackfillMod.java:240) at unm.maui.sched.BackfillMod.event(BackfillMod.java:169) at unm.maui.sched.Sched.fireLoop(Sched.java:922) at unm.maui.sched.Sched.run(Sched.java:306) at java.lang.Thread.run(Thread.java:484) 04/02 15:21:30 (Sched.java:347) checkpointing scheduler. 04/02 15:21:30 (Wiki.java:385) shutting down RM=Node 04/02 15:21:30 (Sched.java:359) scheduler finished - end - If I try to restart the server daemon after this crash, it immediately exits again with the message in iteration 37 (ArrayIndexOutOfBoundsException.) The only way to restart the daemon is to create the mySQL database again (wiping whatever was in it.) Here is my .cmd file, which I run with 'mauisubmit maui_job.cmd': - maui_job.cmd - IWD == "/tmp" WCLimit == 3600 Account == "WWGD190053X" Tasks == 2 Nodes == 2 TaskPerNode == 1 Arch == x86 OS == Linux JobType == "mpi.ch_gm" Exec == "/export/mauisched-1.2/bin/runmpi_gm" Args == "/export/home/romeda/cpi" Output == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).out" Error == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).err" Log == "/tmp/$(MAUI_JOB_USER)2x3gm$(MAUI_JOB_ID).log" Input == "/dev/null" - end - Is the XML error related to the out of bounds array exception? We compiled with the Sun jdk 1.3.1-02 and JavaCC 2.1. There is no information about this error on the web. Any help would be greatly appreciated. Thanks, Gabe From emiller at techskills.com Tue Apr 2 13:34:27 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:12 2009 Subject: Syntax for executing In-Reply-To: Message-ID: disregard. SETI is not available in an MPI-enabled format. My apologies. Can anyone direct me to an URL that lists some available programs that I can execute on the cluster? Preferably something with a continuous (looping?) graphical output (e.g. SETI). This is a display for students to visualize and promote educational programs for Linux, like a museum peice. >>>>>>>>>>>>>>>>>>>>>>>>>> Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node cluster. - What is the syntax to run an executable in the cluster environment? For example, I run NP=5 mpi-mandel to run the test fractal program. How would I execute say, SETI, using the cluster? Assume that the SETI executable is in the PATH. Also, the older version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? - What would cause all but one of the processors to show usage in beostatus? The node shows "up" in every other way: hardware identical, memory, swap, network, etc....just when I run something, only that one processor on one node shows no % usage. -ETM .~. /V\ // \\ /( )\ ^'~'^ From gropp at mcs.anl.gov Tue Apr 2 13:48:19 2002 From: gropp at mcs.anl.gov (William Gropp) Date: Wed Nov 25 01:02:12 2009 Subject: Syntax for executing In-Reply-To: References: Message-ID: <5.1.0.14.2.20020402154730.01bdc3b8@localhost> At 04:34 PM 4/2/2002 -0500, Eric Miller wrote: >disregard. SETI is not available in an MPI-enabled format. > >My apologies. Can anyone direct me to an URL that lists some available >programs that I can execute on the cluster? Preferably something with a >continuous (looping?) graphical output (e.g. SETI). This is a display for >students to visualize and promote educational programs for Linux, like a >museum peice. pmandel in the MPICH distribution has a -loop option for just this purpose. See the README in mpich/mpe/contrib/mandel . Bill From aby_sinha at yahoo.com Tue Apr 2 19:19:42 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Wed Nov 25 01:02:12 2009 Subject: apic problems Message-ID: <3CAA74CE.2070109@yahoo.com> Hi All I am using dual processors with a Tyan Tiger 2505 T board and having so many problems with the APIC on the machine . I have looked around on the newsgroups and mailing list..with no hints... Does the return code in the end of the message 00(02) > APIC error on CPU0: 00(08) > APIC error on CPU0: 08(08) > APIC error on CPU0: 08(08) > APIC error on CPU1: 02(02) > APIC error on CPU1: 02(08) mean that this particular board i m using is crappy or the whole 2505T series cannot handle these kinds of requests I am pasting the dmesg from the server below Linux version 2.4.7-10smp ( bhcompile@stripples.devel.redhat.com ) (gcc > version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Sep 6 > 17:09:31 EDT 2001 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000007fff0000 (usable) > BIOS-e820: 000000007fff0000 - 000000007fff3000 (ACPI NVS) > BIOS-e820: 000000007fff3000 - 0000000080000000 (ACPI data) > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) > Scanning bios EBDA for MXT signature > 1151MB HIGHMEM available. > found SMP MP-table at 000f5660 > hm, page 000f5000 reserved twice. > hm, page 000f6000 reserved twice. > hm, page 000f1000 reserved twice. > hm, page 000f2000 reserved twice. > On node 0 totalpages: 524272 > zone(0): 4096 pages. > zone(1): 225280 pages. > zone(2): 294896 pages. > Intel MultiProcessor Specification v1.4 > Virtual Wire compatibility mode. > OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000 > Processor #0 Pentium(tm) Pro APIC version 17 > Processor #1 Pentium(tm) Pro APIC version 17 > I/O APIC #2 Version 17 at 0xFEC00000. > Processors: 2 > Kernel command line: ro root=/dev/hda2 > Initializing CPU#0 > Detected 864.238 MHz processor. > Console: colour VGA+ 80x25 > Calibrating delay loop... 1723.59 BogoMIPS > Memory: 2056920k/2097088k available (1396k kernel code, 37736k reserved, > 102k data, 240k init, 1179584k highmem) > Dentry-cache hash table entries: 262144 (order: 9, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > Mount-cache hash table entries: 32768 (order: 6, 262144 bytes) > Buffer-cache hash table entries: 131072 (order: 7, 524288 bytes) > Page-cache hash table entries: 524288 (order: 10, 4194304 bytes) > CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000 > CPU serial number disabled. > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Checking 'hlt' instruction... OK. > POSIX conformance testing by UNIFIX > mtrr: v1.40 (20010327) Richard Gooch ( rgooch@atnf.csiro.au ) > mtrr: detected mtrr type: Intel > CPU: Before vendor init, caps: 0383fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check reporting enabled on CPU#0. > CPU: After vendor init, caps: 0383fbff 00000000 00000000 00000000 > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > CPU0: Intel Pentium III (Coppermine) stepping 0a > per-CPU timeslice cutoff: 730.77 usecs. > enabled ExtINT on CPU#0 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Booting processor 1/1 eip 2000 > Initializing CPU#1 > masked ExtINT on CPU#1 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > Calibrating delay loop... 1723.59 BogoMIPS > CPU: Before vendor init, caps: 0387fbff 00000000 00000000, vendor = 0 > CPU: L1 I cache: 16K, L1 D cache: 16K > CPU: L2 cache: 256K > Intel machine check reporting enabled on CPU#1. > CPU: After vendor init, caps: 0387fbff 00000000 00000000 00000000 > CPU serial number disabled. > CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 > CPU: Common caps: 0383fbff 00000000 00000000 00000000 > CPU1: Intel Pentium III (Coppermine) stepping 0a > Total of 2 processors activated (3447.19 BogoMIPS). > ENABLING IO-APIC IRQs > ...changing IO-APIC physical APIC ID to 2 ... ok. > init IO_APIC IRQs > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 > not connected. > ..TIMER: vector=0x31 pin1=2 pin2=0 > number of MP IRQ sources: 19. > number of IO-APIC #2 registers: 24. > testing the IO APIC....................... > > IO APIC #2...... > .... register #00: 02000000 > ....... : physical APIC id: 02 > .... register #01: 00178011 > ....... : max redirection entries: 0017 > ....... : IO APIC version: 0011 > WARNING: unexpected IO-APIC, please mail > to linux-smp@vger.kernel.org > .... register #02: 00000000 > ....... : arbitration: 00 > .... IRQ redirection table: > NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: > 00 000 00 1 0 0 0 0 0 0 00 > 01 003 03 0 0 0 0 0 1 1 39 > 02 003 03 0 0 0 0 0 1 1 31 > 03 003 03 0 0 0 0 0 1 1 41 > 04 003 03 0 0 0 0 0 1 1 49 > 05 003 03 0 0 0 0 0 1 1 51 > 06 003 03 0 0 0 0 0 1 1 59 > 07 003 03 0 0 0 0 0 1 1 61 > 08 003 03 0 0 0 0 0 1 1 69 > 09 003 03 0 0 0 0 0 1 1 71 > 0a 003 03 1 1 0 1 0 1 1 79 > 0b 003 03 1 1 0 1 0 1 1 81 > 0c 003 03 1 1 0 1 0 1 1 89 > 0d 003 03 0 0 0 0 0 1 1 91 > 0e 003 03 0 0 0 0 0 1 1 99 > 0f 003 03 0 0 0 0 0 1 1 A1 > 10 000 00 1 0 0 0 0 0 0 00 > 11 000 00 1 0 0 0 0 0 0 00 > 12 000 00 1 0 0 0 0 0 0 00 > 13 000 00 1 0 0 0 0 0 0 00 > 14 000 00 1 0 0 0 0 0 0 00 > 15 000 00 1 0 0 0 0 0 0 00 > 16 000 00 1 0 0 0 0 0 0 00 > 17 000 00 1 0 0 0 0 0 0 00 > IRQ to pin mappings: > IRQ0 -> 0:2 > IRQ1 -> 0:1 > IRQ3 -> 0:3 > IRQ4 -> 0:4 > IRQ5 -> 0:5 > IRQ6 -> 0:6 > IRQ7 -> 0:7 > IRQ8 -> 0:8 > IRQ9 -> 0:9 > IRQ10 -> 0:10 > IRQ11 -> 0:11 > IRQ12 -> 0:12 > IRQ13 -> 0:13 > IRQ14 -> 0:14 > IRQ15 -> 0:15 > .................................... done. > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 864.2437 MHz. > ..... host bus clock speed is 132.9603 MHz. > cpu: 0, clocks: 1329603, slice: 443201 > CPU0 > cpu: 1, clocks: 1329603, slice: 443201 > CPU1 > checking TSC synchronization across CPUs: passed. > mtrr: your CPUs had inconsistent variable MTRR settings > mtrr: probably your BIOS does not setup all CPUs > PCI: PCI BIOS revision 2.10 entry at 0xfb3e0, last bus=1 > PCI: Using configuration type 1 > PCI: Probing PCI hardware > Unknown bridge resource 0: assuming transparent > Unknown bridge resource 1: assuming transparent > Unknown bridge resource 2: assuming transparent > PCI: Using IRQ router VIA [1106/0686] at 00:07.0 > PCI->APIC IRQ transform: (B0,I6,P0) -> 12 > PCI->APIC IRQ transform: (B0,I7,P3) -> 12 > PCI->APIC IRQ transform: (B0,I7,P3) -> 12 > PCI->APIC IRQ transform: (B0,I13,P0) -> 10 > PCI->APIC IRQ transform: (B0,I14,P0) -> 11 > PCI: Enabling Via external APIC routing > isapnp: Scanning for PnP cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Initializing RT netlink socket > apm: BIOS version 1.2 Flags 0x07 (Driver version 1.14) > apm: disabled - APM is not SMP safe. > mxt_scan_bios: enter > Starting kswapd v1.8 > allocated 64 pages and 64 bhs reserved for the highmem bounces > VFS: Diskquotas version dquot_6.5.0 initialized > Detected PS/2 Mouse Port. > pty: 2048 Unix98 ptys configured > Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT > SHARE_IRQ SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > Real Time Clock Driver v1.10d > block: queued sectors max/low 1365629kB/1234557kB, 4032 slots per queue > RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz PCI bus speed for PIO modes; override with idebus=xx > VP_IDE: IDE controller on PCI bus 00 dev 39 > VP_IDE: chipset revision 6 > VP_IDE: not 100% native mode: will probe irqs later > ide: Assuming 33MHz PCI bus speed for PIO modes; override with idebus=xx > VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1 > ide0: BM-DMA at 0xd400-0xd407, BIOS settings: hda:DMA, hdb:pio > ide1: BM-DMA at 0xd408-0xd40f, BIOS settings: hdc:DMA, hdd:pio > hda: QUANTUM FIREBALLlct20 40, ATA DISK drive > hdc: CDU5211, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > ide1 at 0x170-0x177,0x376 on irq 15 > hda: 78177792 sectors (40027 MB) w/418KiB Cache, CHS=4866/255/63, UDMA(33) > ide-floppy driver 0.97 > Partition check: > hda: hda1 hda2 hda3 > FDC 0 is a post-1991 82077 > ide-floppy driver 0.97 > md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md: Autodetecting RAID arrays. > md: autorun ... > md: ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 16384 buckets, 128Kbytes > TCP: Hash tables configured (established 524288 bind 65536) > Linux IP multicast router 0.06 plus PIM-SM > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > RAMDISK: Compressed image found at block 0 > Freeing initrd memory: 324k freed > VFS: Mounted root (ext2 filesystem). > Journalled Block Device driver loaded > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > EXT3-fs: mounted filesystem with ordered data mode. > Freeing unused kernel memory: 240k freed > Adding Swap: 2040244k swap-space (priority -1) > usb.c: registered new driver usbdevfs > usb.c: registered new driver hub > usb-uhci.c: $Revision: 1.259 $ time 17:18:11 Sep 6 2001 > usb-uhci.c: High bandwidth mode enabled > usb-uhci.c: USB UHCI at I/O 0xd800, IRQ 12 > usb-uhci.c: Detected 2 ports > usb.c: new USB bus registered, assigned bus number 1 > hub.c: USB hub found > hub.c: 2 ports detected > usb-uhci.c: USB UHCI at I/O 0xdc00, IRQ 12 > usb-uhci.c: Detected 2 ports > usb.c: new USB bus registered, assigned bus number 2 > hub.c: USB hub found > hub.c: 2 ports detected > usb-uhci.c: v1.251:USB Universal Host Controller Interface driver > EXT3 FS 2.4-0.9.8, 25 Aug 2001 on ide0(3,2), internal journal > kjournald starting. Commit interval 5 seconds > EXT3 FS 2.4-0.9.8, 25 Aug 2001 on ide0(3,1), internal journal > EXT3-fs: mounted filesystem with ordered data mode. > parport0: PC-style at 0x378 [PCSPP,EPP] > parport0: cpp_daisy: aa5500ff(38) > parport0: assign_addrs: aa5500ff(38) > parport0: cpp_daisy: aa5500ff(38) > parport0: assign_addrs: aa5500ff(38) > parport_pc: Via 686A parallel port: io=0x378 > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:E0:81:20:55:CC, IRQ > 10. Board assembly 567812-052, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > eth1: Intel Corporation 82557 [Ethernet Pro 100] (#2), 00:E0:81:20:55:CD, > IRQ 11. > Board assembly 567812-052, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > APIC error on CPU1: 00(02) > APIC error on CPU0: 00(08) > APIC error on CPU0: 08(08) > APIC error on CPU0: 08(08) > APIC error on CPU1: 02(02) > APIC error on CPU1: 02(08) PLEASE HELP abby From raysonlogin at yahoo.com Tue Apr 2 20:07:19 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:12 2009 Subject: Uptime data/studies/anecdotes ... ? In-Reply-To: Message-ID: <20020403040719.14849.qmail@web11408.mail.yahoo.com> --- "Roger L. Smith" wrote: > We currently run an average of about 75% utilization on our 586 > processor (293 node) cluster. We probably have about one node per > week crash and hang for various reasons. The OpenPBS backfilling algorithm is really bad. If you are running parallel jobs, you should use PBS+Maui. > We have occasional problems with memory leaks or PBS hangups which > require large scale reboots of the cluster. (Actually, PBS just died > as I'm typing this, but our pbs heartbeat script should restart it > automatically in a few minutes). I'd say we have to do a full reboot > of the cluster about every 3-4 months. One bigger problem is (or was, I haven't been looking at PBS code since last fall) that in each scheduling cycle, the scheduler tries to contact each MOM in the cluster to get resource information, but if one of the MON dies, then the scheduler hangs... and then timeout & restarts. You may try the "Cplant Fault Recovery Patch" and several other patches if you want to stay with PBS. > For a bunch of PC hardware running a free OS, this seems like a > pretty good number to me. It's not in the same class as our Sun > servers (nor even our SGIs!), but then, none of those systems are > this large, either. Another problem (at least in OpenPBS 2.3.12) is that there are some hard limit that is defined in the source (like "#define PBS_ACCT_MAX_RCD 4095", "#define PBS_NET_MAX_CONNECTIONS 256", which may not work in large clusters) If you want something free, then you may try SGE. It scales quite nicely (SGE improved a lot in 5.3), it's open source, and integrates with Maui. I like SGE better than OpenPBS. -- at least when one (or more?) of your nodes dies, the cluster continues to operate, and SGE even re-runs the job for you. Another feature is the shadow master, which restarts the master daemon on other machines if your master node dies. I think someone on this list is planning to tell us his experience with SGE on his beowulf? Rayson P.S. links: OpenPBS public home: http://www-unix.mcs.anl.gov/openpbs/ SGE : http://gridengine.sunsource.net Maui : http://www.supercluster.org __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From Drake.Diedrich at anu.edu.au Tue Apr 2 23:11:26 2002 From: Drake.Diedrich at anu.edu.au (Drake Diedrich) Date: Wed Nov 25 01:02:12 2009 Subject: pvm povray help In-Reply-To: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> References: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> Message-ID: <20020403171126.B26086@duh.anu.edu.au> On Thu, Mar 21, 2002 at 11:39:24AM +0100, Luc Vereecken wrote: > >a very large project. > > That shouldn't be a very large project at all. Read the inputfile The very large part would be in broadcasting the parsed object tree, so as to limit the serial overhead of parsing to just one node, rather than duplicate that effort on all nodes. From opengeometry at yahoo.ca Tue Apr 2 23:31:52 2002 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) Message-ID: <20020403023152.A2972@node0.opengeometry.ca> What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not versed in the latest CPU trends. Does it mean that dual-P4Xeon will behave like 4-way SMP? -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From didonato at bigpond.net.au Wed Apr 3 00:35:03 2002 From: didonato at bigpond.net.au (Christian Di Donato) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: <000301c1daea$76b06a40$99ca8490@claptop> There is a Whitepaper on the Xeon Processor concerning Hyperthreading over at the intel site http://www.intel.com/eBusiness/products/server/processor/xeon/wp020901_s um.htm -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org] On Behalf Of William Park Sent: Wednesday, 3 April 2002 5:32 PM To: beowulf@beowulf.org Subject: Hyperthreading in P4 Xeon (question) What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not versed in the latest CPU trends. Does it mean that dual-P4Xeon will behave like 4-way SMP? -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Luc.Vereecken at chem.kuleuven.ac.be Wed Apr 3 03:32:39 2002 From: Luc.Vereecken at chem.kuleuven.ac.be (Luc Vereecken) Date: Wed Nov 25 01:02:12 2009 Subject: pvm povray help In-Reply-To: <20020403171126.B26086@duh.anu.edu.au> References: <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> <3.0.6.32.20020321113924.00846be0@arrhenius.chem.kuleuven.ac.be> Message-ID: <3.0.6.32.20020403133239.008b8720@arrhenius.chem.kuleuven.ac.be> At 17:11 3/04/02 +1000, Drake Diedrich wrote: >On Thu, Mar 21, 2002 at 11:39:24AM +0100, Luc Vereecken wrote: >> >a very large project. >> >> That shouldn't be a very large project at all. Read the inputfile > > The very large part would be in broadcasting the parsed object tree, so >as to limit the serial overhead of parsing to just one node, rather than >duplicate that effort on all nodes. Would that duplication avoidance gain you anything ? Current case (IIRC, I haven't used pvm povray recently): Every node reads the inputfile (possibly from an inefficient NFS mounted volume), and parses. New Case 1 : Read the inputfiles on master, broadcast these N bytes, parse for Q seconds on all nodes. User gained : no need to have the input file on all nodes. Developer gained : easy to implement. New Case 2 : Read the inputfile on master, parse for Q' seconds on master node, broadcast M bytes for parsed object tree. User gained: no need to have the input file on all nodes. In the second case, you have NODES-1 nodes doing nothing, but you might not be able to do anything with that free time, as they e.g. are already allocated to that job, or whatever, especially since the parsing is fairly short compared to the rendering. Assuming identical nodes, the walltime of the parsing is the same everywhere (Q=Q'), and duplicating that effort doesn't require extra walltime (so irrelevant unless you're charged per used cpu time, or if you have multiple jobs per processor (e.g. SMP) to reclaim the idle time). If so, it then depends on whether the parsed object tree (M bytes) is larger or smaller than the text inputfiles and other required files (N bytes). If M > N, it takes longer to broadcast the parsed tree, if N > M, then it is quicker to broadcast the parsed tree. If the Master node is faster than the others, it's parsing time might be shorter than the slowest of the other nodes (Q' < Q), and then it is possible that even with M > N, it might be faster to distribute the parsed tree rather than the inputfiles. The basic question is therefore : how large is (typically) the parsed tree compared to the original input file ? Standard povray include files should be assumed predistributed as they should/can be installed on each node together with the executable. To be honest, I have no idea about this ratio. Luc From didonato at bigpond.net.au Wed Apr 3 04:29:25 2002 From: didonato at bigpond.net.au (Christian Di Donato) Date: Wed Nov 25 01:02:12 2009 Subject: Testing Message-ID: <000401c1db0b$3438b7a0$99ca8490@claptop> Can someone just reply to this list and confirm that they are indeed receiving this. Only one person needs to reply. I'm getting e-mails bouncing back every time I try to send something to beowulf@beowulf.org Thanks in Advance And Kind Regards Christian Di Donato From walke at usna.edu Wed Apr 3 04:46:46 2002 From: walke at usna.edu (LT V. H. Walke) Date: Wed Nov 25 01:02:12 2009 Subject: Testing In-Reply-To: <000401c1db0b$3438b7a0$99ca8490@claptop> References: <000401c1db0b$3438b7a0$99ca8490@claptop> Message-ID: <1017838087.30683.1.camel@vhwalke.mathsci.usna.edu> I read you loud and clear. Vann On Wed, 2002-04-03 at 07:29, Christian Di Donato wrote: > Can someone just reply to this list and confirm that they are indeed > receiving this. Only one person needs to reply. I'm getting e-mails > bouncing back every time I try to send something to beowulf@beowulf.org > > > Thanks in Advance > > And Kind Regards > > > Christian Di Donato > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ---------------------------------------------------------------------- Vann H. Walke Office: Chauvenet 341 Computer Science Dept. Ph: 410-293-6811 572 Holloway Road, Stop 9F Fax: 410-293-2686 United States Naval Academy email: walke@usna.edu Annapolis, MD 21402-5002 http://www.cs.usna.edu/~walke ---------------------------------------------------------------------- From Daniel.Kidger at quadrics.com Wed Apr 3 05:56:28 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:02:12 2009 Subject: Testing Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com> Christian Di Donato [mailto:didonato@bigpond.net.au] wrote: >Can someone just reply to this list and confirm that they are indeed >receiving this. Only one person needs to reply. I'm getting e-mails >bouncing back every time I try to send something to beowulf@beowulf.org So why cant that someone reply just to you rather than the whole list? and more importantly - how can anyone know that they are the said 'one person' ! Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From math at velocet.ca Wed Apr 3 06:13:23 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:12 2009 Subject: Testing In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com>; from Daniel.Kidger@quadrics.com on Wed, Apr 03, 2002 at 02:56:28PM +0100 References: <010C86D15E4D1247B9A5DD312B7F5AA74D2D37@stegosaurus.bristol.quadrics.com> Message-ID: <20020403091323.J69845@velocet.ca> On Wed, Apr 03, 2002 at 02:56:28PM +0100, Daniel Kidger's all... > > Christian Di Donato [mailto:didonato@bigpond.net.au] wrote: > > >Can someone just reply to this list and confirm that they are indeed > >receiving this. Only one person needs to reply. I'm getting e-mails > >bouncing back every time I try to send something to beowulf@beowulf.org > > So why cant that someone reply just to you rather than the whole list? > > and more importantly > - how can anyone know that they are the said 'one person' ! Because when he asks for 'only one person' there's an implicit semaphore called in the operation. Didnt you heed it? Now look what you've done! :) This would all be funnier if it was still Apr 1. /kc > > > Yours, > Daniel. > > -------------------------------------------------------------- > Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com > One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 > ----------------------- www.quadrics.com -------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From timm at fnal.gov Wed Apr 3 06:27:42 2002 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: I have one such test machine that we are evaluating at the moment. It's a dual cpu machine but under Linux it shows up looking like it has four cpu's. Haven't actually tried yet to see if it really can run four loads just as well... the specimen we have has DDR SDRAM and already gets bogged down going with two processes at once. Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations On Wed, 3 Apr 2002, William Park wrote: > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > behave like 4-way SMP? > > -- > William Park, Open Geometry Consulting, > 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hahn at physics.mcmaster.ca Wed Apr 3 07:50:06 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > behave like 4-way SMP? for some value of "behave like" ;) that is, it will definitely NOT get twice as fast. but it will appear to have 4 CPUs, and can run 4 threads/procs at once (for values of "once" > 1 clock cycle ;) we did a quick test on a dual-prestonia here, and saw a ~5% speedup on a probably cache-friendly, compute-bound task. From jurgen at botz.org Wed Apr 3 10:25:31 2002 From: jurgen at botz.org (Jurgen Botz) Date: Wed Nov 25 01:02:12 2009 Subject: Linux Software RAID5 Performance In-Reply-To: Message from mprinkey@aeolusresearch.com (Michael Prinkey) of "Sun, 31 Mar 2002 14:33:59 EST." Message-ID: <18878.1017858331@localhost> Michael Prinkey wrote: > Again, performance (see below) is remarkably good, especially considering > all of the strikes against this configuration: EIDE instead of SCSI, UDMA66 > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave drives on > each port instead of a single drive per port. With regard to the master/slave config... I note that your performance test is a single reader/writer... in this config with RAID5 I would expect the performance to be quite good even with 2 drives per IDE controller. But if you have several processes doing disk I/O simultaneously you should see a rather more precipitous drop in performance than you would with a single drive per IDE controller. I'm working on testing a very similar config right now and that's one of my findings (which I had expected) but our application for this is not very performance sensitive so it's not a big deal. A more important issue for me is reliability, and I'm somewhat concerned about failure modes. For example, can an IDE drive fail in such a way that if will disable the controller or the other drive on the same controller? If so, that would seriously limit the usefulness of RAID5 in this config. In general how good is Linux software RAID's failure handling? Etc. :j -- J?rgen Botz | While differing widely in the various jurgen@botz.org | little bits we know, in our infinite | ignorance we are all equal. -Karl Popper From ron_chen_123 at yahoo.com Wed Apr 3 11:02:19 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:12 2009 Subject: Fwd: FreeBSD port of SGE Message-ID: <20020403190219.24496.qmail@web14701.mail.yahoo.com> FreeBSD hackers and Beowulf users, I am porting SGE (a software for the compute farms, or the so-called batch systems) to *BSDs, and I am wondering if someone can take over some of the ports. I just started porting the code to *BSDs. Currently, I can get the code compiled on *BSDs with "#ifdef BSD"s. I am starting the system specific part, mainly to get the load, cpu, and stuff like that. I am not done yet, but I just want to tell you that it is getting there :-) Someone also started the SGE port to FreeBSD (which means duplicated work), so you are interested, or if you want to be the maintainer of the ports (currently, we have FreeBSD, NetBSD, OpenBSD, Darwin/MacOSX), please contact me. More info: gridengine.sunsource.net Thanks, -Ron --- I wrote: > Status of the port(s): > > - compiled on FreeBSD, NetBSD, OpenBSD. > - coding routines to get the load: > load: getloadavg(3), kvm_getloadavg(3) > #cpu: sysctl(3) hw.ncpu > mem : sysctl(3) vm.stats_vm.* > proc info: kvm_getprocs(3) > > -Ron > > --- Andy Schwierskott > wrote: > > Ron, > > > > > OK, I think I should write a porting-HOWTO. > > > > > > Once I am done, can you also include in the > > "HowTo" > > > page? > > > > Of course, we (and certainly many developers) > would > > be more than happy to > > add such a page;-) > > > > Andy > > > dev-unsubscribe@gridengine.sunsource.net > For additional commands, e-mail: > dev-help@gridengine.sunsource.net > __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Wed Apr 3 11:16:28 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: Message-ID: > I can amplify that point. A commercial CFD application ran significantly > slower using 4 threads vs 2 on a dual Prestonia system. Anything memory > limited will probably behave the same way. well, it's an interesting issue. afaikt, the benefit of HT depends on what degree your app leaves idle resources. for instance, if everything you run is thrashing your dram bandwidth (big arrays, perhaps), then forget HT - it doesn't add extra dimms! similarly, if the CPU has just one fsqrt unit, and that's your bottleneck, HT doesn't add more units. there are other resource nonlinearities, like cache hitrate - the same effect that gives rise to superlinear SMP speedup will slaughter some apps run on HT... but if there's other work to be done while one thread is spinning sqrt's, ie, there are idle resources, then a thread that uses them will show HT profit... in some sense, HT works precisely when the system's resources *don't* match the optimal set your app wants. I wonder if/when Intel will start pouring in hordes of extra functional units, since another 50M transistors will only improve the cache hit rate a little bit... of course, it's also true that HT makes bigger TLB's and more associative caches attractive... From garcia_garcia_adrian at hotmail.com Wed Apr 3 11:14:06 2002 From: garcia_garcia_adrian at hotmail.com (Adrian Garcia Garcia) Date: Wed Nov 25 01:02:12 2009 Subject: DHCP Help Again Message-ID: An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020403/4e0c70c2/attachment.html From crhea at mayo.edu Wed Apr 3 13:04:12 2002 From: crhea at mayo.edu (Cris Rhea) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... Message-ID: <200204032104.PAA23347@sijer.mayo.edu> What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). My last fan failure was a CPU fan that toasted the CPU and motherboard. How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. For some things like fans, they sent extras for me to keep on-hand. For my last fan/CPU/motherboard failure, the node pair will be down ~5 days waiting for parts. Comments? Thoughts? Ideas? Thanks- --- Cris ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 From fraser5 at cox.net Wed Apr 3 13:37:56 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <000901c1db57$d222ac90$0300005a@papabear> Sounds to me like you have a heat problem. dual ultra thin's generally run pretty hot. good luck with it. There is just no room for any serious air to move thru that case. The fan diameter is so small that they require ridiculous rpms to move the needed volume making them noisy and prone to fail, add to that the high heat and you accelerate the mtbf to tomorrow. Most fans fail quickly in high heat conditions. I think the basic rack design concept while rugged and strong is fundamentally flawed and over priced. I would invest in a serious rack fan that moves major air out of that case somehow. good luck with it. jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Cris Rhea Sent: Wednesday, April 03, 2002 4:04 PM To: beowulf@beowulf.org Subject: How do you keep clusters running.... What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). My last fan failure was a CPU fan that toasted the CPU and motherboard. How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. For some things like fans, they sent extras for me to keep on-hand. For my last fan/CPU/motherboard failure, the node pair will be down ~5 days waiting for parts. Comments? Thoughts? Ideas? Thanks- --- Cris ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From alvin at Maggie.Linux-Consulting.com Wed Apr 3 13:44:14 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: hi ya buy better quality fans... we use $15.oo fans ( 40x40x10mm ) stuff used in 1U chassis ( you can get fans as cheap as $4.oo but is a dead $1,000 server ( worth the cost differences of cheap fans ??? ( not the place to save $$$ ) - similarly ..get better quality (cooler running) powersupply too fans should NOT die... at least not more than once a year ... c ya alvin http:/www.linux-1U.net ... 11" deep 1U chassis w/ amd 1700+ On Wed, 3 Apr 2002, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > From nordwall at pnl.gov Wed Apr 3 14:46:31 2002 From: nordwall at pnl.gov (Doug J Nordwall) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> References: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <1017873992.2054.42.camel@duke> On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: What are folks doing about keeping hardware running on large clusters? Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... Sure seems like every week or two, I notice dead fans (each RS-1200 has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). You running lm_sensors on your nodes? That's a handy tool for paying attention to things like that. We use ours in combination with ganglia and pump it to a web page and to big brother to see when a cpu might be getting hot, or a fan might be too slow. We actually saved a dozen machines that way...we have 32 4 processor racksaver boxes in a rack, and they rack was not designed to handle racksaver's fan system. That is to say, there was a solid sidewall on the rack, and it kept in heat. I set up lm_sensors on all the nodes (homogenous, so configured on one and pushed it out to all), then pumped the data into ganglia (ganglia.sourceforge.net) and then to a web page. I noticed that the temp on a dozen of the machines was extremely high. So, I took off the side panel of the rack. The temp dropped by 15 C on all the nodes, and everything was within normal parameters again. My last fan failure was a CPU fan that toasted the CPU and motherboard. Ya, we would have seen this on ours earlier...excellent tool How are folks with significantly more nodes than mine dealing with constant maintenance on their nodes? Do you have whole spare nodes sitting around- ready to be installed if something fails, or do you have a pile of spare parts? No, we don't actually, but we've talked about it Did you get the vendor (if you purchased prebuilt systems) to supply a stockpile of warranty parts? we use racksaver as well, so our experience is similar. Probably should talk to our people about getting some spare nodes One of the problems I'm facing is that every time something croaks, Racksaver is very good about replacing it under warranty, but getting the new parts delivered usually takes several days. Ya...this is another area where just monitoring the data can be helpful...if a fan is failing, you can see it coming (temperature slowly rises) and you can order it before hand and schedule downtime. ---- Cristopher J. Rhea Mayo Foundation Research Computing Facility Pavilion 2-25 crhea@Mayo.EDU Rochester, MN 55905 Fax: (507) 266-4486 (507) 284-0587 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Douglas J Nordwall http://rex.nmhu.edu/~musashi System Administrator Pacific Northwest National Labs -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020403/ad990599/attachment.html From tim.carlson at pnl.gov Wed Apr 3 14:59:56 2002 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... In-Reply-To: <000901c1db57$d222ac90$0300005a@papabear> Message-ID: On Wed, 3 Apr 2002, Jim Fraser wrote: > Sounds to me like you have a heat problem. dual ultra thin's generally run > pretty hot. If you are putting these boxes in a rack and are not using the Racksaver rack, you need to take the side off of your rack (assuming you can do that) We've got 32 of these in a rack (4 CPU's per 1U) and they were running really hot until week took the side panel off. 5 minutes later the CPU temps had dropped 10C. Tim Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson@pnl.gov EMSL UNIX System Support From opengeometry at yahoo.ca Wed Apr 3 12:32:27 2002 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:02:12 2009 Subject: Hyperthreading in P4 Xeon (question) In-Reply-To: ; from hahn@physics.mcmaster.ca on Wed, Apr 03, 2002 at 10:50:06AM -0500 References: <20020403023152.A2972@node0.opengeometry.ca> Message-ID: <20020403153227.A15201@node0.opengeometry.ca> On Wed, Apr 03, 2002 at 10:50:06AM -0500, Mark Hahn wrote: > > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > > behave like 4-way SMP? > > for some value of "behave like" ;) > that is, it will definitely NOT get twice as fast. but it will appear > to have 4 CPUs, and can run 4 threads/procs at once (for values of > "once" > 1 clock cycle ;) > > we did a quick test on a dual-prestonia here, and saw a ~5% speedup > on a probably cache-friendly, compute-bound task. Hi Mark, Steve, and Michael, Can you try compiling your kernel, using make clean; time make bzImage modules >& j1 make clean; time make -j2 bzImage modules >& j2 make clean; time make -j4 bzImage modules >& j4 -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From James.P.Lux at jpl.nasa.gov Wed Apr 3 17:15:58 2002 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:02:12 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <5.1.0.14.2.20020403170033.0248fec0@mail1.jpl.nasa.gov> You know, fans shouldn't fail...... There are fans available with 50,000 hour MTBFs.. sure, they cost a bit more than $5, but, given the cost of the time to replace them (especially if you cook something), it might be a good investment. You might cannibalize one of your failed fans to look for the number and kind of bearings. I have heard that some "ball bearing" fans actually have sleeve bearings, a sure recipe for short life. It's not unheard of to have some fans that are mislabelled. Bear in mind that most fans have two bearings (one on each end of the shaft) and it is entirely possible to build a fan with one sleeve and one ball bearing. At 03:04 PM 4/3/2002 -0600, Cris Rhea wrote: >What are folks doing about keeping hardware running on large clusters? > >Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > >Sure seems like every week or two, I notice dead fans (each RS-1200 >has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). >Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory 4800 Oak Grove Road, Mail Stop 161-213 Pasadena CA 91109 818/354-2075, fax 818/393-6875 From emiller at techskills.com Wed Apr 3 18:12:46 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:12 2009 Subject: Node boot disk to designate eth0 In-Reply-To: <20020403040719.14849.qmail@web11408.mail.yahoo.com> Message-ID: is there a switch I can pass to the node floppy routine that will cause the node to boot using a designated ethernet adapter? I have one onboard 10mb adapter and a PCI 100 mb adapter (eth0), but the node tries to connect throught the onboard eth1. I cannot disable the onboard adapter in BIOS (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can this be done? From becker at scyld.com Wed Apr 3 19:37:23 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:12 2009 Subject: Node boot disk to designate eth0 In-Reply-To: Message-ID: On Wed, 3 Apr 2002, Eric Miller wrote: > Subject: Node boot disk to designate eth0 > > is there a switch I can pass to the node floppy routine that will cause the > node to boot using a designated ethernet adapter? I have one onboard 10mb > adapter and a PCI 100 mb adapter (eth0), but the node tries to connect > throught the onboard eth1. I cannot disable the onboard adapter in BIOS > (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can > this be done? The Scyld system tries all interfaces (using RARP) to find a master. That allows the system to work with all network topologies. To avoid using finding the master on eth1, just don't connect that interface to a master. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From emiller at techskills.com Wed Apr 3 19:50:21 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:13 2009 Subject: Fw: Node boot disk to designate eth0 Message-ID: <005e01c1db8b$da3a8590$c31fa6ac@xp> ----- Original Message ----- From: "Eric Miller" To: "Donald Becker" Sent: Wednesday, April 03, 2002 10:48 PM Subject: Re: Node boot disk to designate eth0 > > ----- Original Message ----- > From: "Donald Becker" > To: "Eric Miller" > Cc: > Sent: Wednesday, April 03, 2002 10:37 PM > Subject: Re: Node boot disk to designate eth0 > > > > On Wed, 3 Apr 2002, Eric Miller wrote: > > > > > Subject: Node boot disk to designate eth0 > > > > > > is there a switch I can pass to the node floppy routine that will cause > the > > > node to boot using a designated ethernet adapter? I have one onboard > 10mb > > > adapter and a PCI 100 mb adapter (eth0), but the node tries to connect > > > throught the onboard eth1. I cannot disable the onboard adapter in BIOS > > > (compaq :( ), so I need pass a pararmeter at boot time to use eth0. Can > > > this be done? > > > > The Scyld system tries all interfaces (using RARP) to find a master. > > That allows the system to work with all network topologies. > > > > To avoid using finding the master on eth1, just don't connect that > > interface to a master. > > It's odd, I can see that the drivers for both interfaces are being loaded, > and they are both the correct drivers. It does not seem to be looking on > both interfaces, however. It clearly is looking on only eth1, as it > specifies it line by line during the RARP requests. I've tried all the > obvious, different NIC, different board, etc. > > Thanks, I guess Ill leave it alone. > > > > -- > > Donald Becker becker@scyld.com > > Scyld Computing Corporation http://www.scyld.com > > 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters > > Annapolis MD 21403 410-990-9993 > > > From leandro at ep.petrobras.com.br Thu Apr 4 05:12:54 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Wed Nov 25 01:02:13 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> References: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: <1017925975.30189.87.camel@linux60> We have here an beowulf cluster with 64 production nodes and 128 processors, and we have some problems like you, about fans. Here, our cluster hardware is very cheap, using motherboards and cases founds easily in the local market, and the problems is critical. We have 5 spare nodes, and only 3 of that are ready to work. All our production nodes and the 3 spare nodes which are read to start are an dual PIII 1GHz, the other 2 spare nodes are an dual PIII 800MHz but this processors are slot 1 (SECC2) and we have one node down because we dont find coolers for this! The cooler vendors say they not producing anymore SECC2 coolers, and i am studying how can i adapt others fans in that coolers... this is sad but true. We have a lot of problems with memory, hard disks and other parts. A 3 months ago, our cluster nodes was one PIII 500 MHz per node, and after the upgrade to dual 1GHz we now have lots of memory and spare disks. I think this kind of problem is inevitable with cheap PC parts, and can be lower with high-quality (and price) parts. We are making an study to by a new cluster, for another application and we call Compaq and IBM to see what they have in hardware and software, with the hope of a future with less problems... Regards, and sorry about my poor english, i am brazilian and speak portuguese... Em Qua, 2002-04-03 ?s 18:04, Cris Rhea escreveu: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From rgb at phy.duke.edu Thu Apr 4 06:52:47 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:13 2009 Subject: DHCP Help Again In-Reply-To: Message-ID: On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: For one thing don't use the range statement -- it tells dhcpd the range of IP numbers to assign UNKNOWN ethernet numbers. You are statically assigning an IP number in your "free" range to a particular host with a KNOWN ethernet number below. I don't know what dhcpd would do in that case -- something sensible one would hope but then, maybe not. The range statement is really there so you can dynamically allocate addresses from the range to hosts you may never have seen before that you don't care to ever address by name (as they might well get a different IP number on the next boot). DHCP servers run by ISP's not infrequently use the range feature to conserve IP numbers -- they only need enough to cover the greatest number of connections they are likely to have at any one time, not one IP number per host that might ever connect. Departments might use it to give IP numbers to laptops brought in by visitors (with the extra benefit that they can assign a subnet block that isn't "trusted" by the usual department servers and/or is firewalled from the outside by an ip-forwarding/masquerading host). You want "only" static IP's in your cluster, as you'd like nodo1 to be the same machine and IP address every time. Be a bit careful about your use of domain names. As it happens, I don't find cluster.org registered yet (amazingly enough!) but it is pretty easy to pick one that does exist in nameservice in the outside world. In that case you'll run a serious risk of routing or name resolution problems depending on things like the search order you use in /etc/nsswitch.conf. Even my previous example of rgb.private.net is a bit risky. You should run a nameserver (cache only is fine) on your 192.168.1.1 server, presuming it lives on an external network and you care to resolve global names. Similarly you may want: option routers 192.168.1.1; if you want internal hosts to be able to get out through your (presumed gateway) server. Finally, if you want nodo1 to come up knowing its own name without hardwiring it in on the node itself, add option host-name nodo1; to its definition. I admit that I do tend to lay out my dhcpd.conf a bit differently than you have it below but I don't think that the differences are particularly significant, and you have a copy of the one I use anyway if you want to play with the pieces. You should find a log trace of dhcpd's activities in /var/log/messages, which should help with any further debugging. On your nodo1 host, make sure that: cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BOOTPROTO=dhcp ONBOOT=yes and cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=nodo1 and that in /etc/modules.conf there is something like: cat /etc/modules.conf alias parport_lowlevel parport_pc alias eth0 tulip (or instead of tulip, whatever your network module is). If you then boot your e.g. RH client it SHOULD just come up, automatically try to start the network on device eth0 using dhcp as its protocol for obtaining and IP number, ask the dhcp server for an address and a route, and just "work" when they come back. Hope this helps. rgb > server-name "server.cluster.org" > > subnet 192.168.1.0 netmask 255.255.255.0 > { > range 192.168.1.2 192.168.1.10 #my client has the ip > 192.168.1.2 > #and my > server the static ip 192.168.1.1 > option subnet-mask 255.255.255.0; > option broadcast-address 192.168.1.255; > option domain-name-server 192.168.1.1; > option domain-name "cluster.org"; > > host nodo1.cluster.org > { > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > client's card > fixed-address 192.168.1.2; > } > } > > And finally some files on my server. > > NETWORK > ------------------------------------------ > networking = yes > hostname =server.cluster.org > gatewaydev = eth0 > gatewaye= > ------------------------------------------ > > HOSTS ( In my server and in the client I have the same on this file ) > ------------------------------------------ > 127.0.0.1 localhost > 192.168.1.1 server.cluster.org > 192.168.1.2 nodo1.cluster.org > > > Ok thats the information, I am a little confuse, could you help me please > =). I can´t detect the mistake, I dont know if is the server or some card > =s. Thanks for all. > > ________________________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com. > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jayne at sphynx.clara.co.uk Thu Apr 4 10:49:03 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Wed Nov 25 01:02:13 2009 Subject: commercial parallel libraries Message-ID: Hi, I know this is a beowulf list, but I could do with getting some info on any (if there are) commercial parallel libraries, the equivalent of pvm and mpi. Do any of you know the names of any? Thanks. Jayne From gropp at mcs.anl.gov Thu Apr 4 11:01:59 2002 From: gropp at mcs.anl.gov (William Gropp) Date: Wed Nov 25 01:02:13 2009 Subject: commercial parallel libraries In-Reply-To: Message-ID: <5.1.0.14.2.20020404125850.0197fb88@localhost> At 06:49 PM 4/4/2002 +0000, Jayne Heger wrote: >Hi, > >I know this is a beowulf list, but I could do with getting some info on any >(if there are) commercial parallel libraries, the equivalent of pvm and mpi. > >Do any of you know the names of any? MPI is a standard for which there are both freely available and commercial implementations. Bill From Daniel.Kidger at quadrics.com Thu Apr 4 11:25:29 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:02:13 2009 Subject: commercial parallel libraries Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D43@stegosaurus.bristol.quadrics.com> -----Original Message----- William Gropp [mailto:gropp@mcs.anl.gov] wrote: >At 06:49 PM 4/4/2002 +0000, Jayne Heger wrote: > >>Hi, >> >>I know this is a beowulf list, but I could do with getting some info on any >>(if there are) commercial parallel libraries, the equivalent of pvm and mpi. >> >>Do any of you know the names of any? > >MPI is a standard for which there are both freely available and commercial >implementations. or do you mean something that is 'the equivalent of mpi and pvm' but which isn't pvm or mpi (like perhaps ARMCI)? Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From Kim.Branson at csiro.au Thu Apr 4 07:38:50 2002 From: Kim.Branson at csiro.au (Kim Branson) Date: Wed Nov 25 01:02:13 2009 Subject: node problems Message-ID: <1017934730.20621.38.camel@paracelsus> Hi all i have a 64node athlon cluster, at the moment i have about 19 nodes that are flaky, they stay up for a bit and then fall over. one can still ping them but not telnet or ftp. I'm trying to keep as many up as possible (more nodes means i can get the final calculations done for my phd thesis faster....) this may be an unrelated problem but i see errors in the logs about telnet node01 telnetd[16941]: ttloop: peer died: EOF xinetd[17099]: warning: can't get client address: Connection reset by peer Apr 5 00:32:21 node01 rlogind[17099]: Can't get peer name of remote host: Transport endpoint is not connected Apr 5 00:32:21 node01 rshd[17098]: getpeername: Transport endpoint is not connected Apr 5 00:32:21 node01 ftpd[17097]: getpeername (in.ftpd): Transport endpoint is not connected Apr 5 00:32:31 node01 rlogind[17100]: Can't get peer name of remote host: Transport endpoint is not connected Apr 5 00:32:31 node01 xinetd[17101]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 xinetd[17102]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 xinetd[17103]: warning: can't get client address: Connection reset by peer Apr 5 00:32:31 node01 ftpd[17101]: getpeername (in.ftpd): Transport endpoint is not connected i am using enfuzion to do job dispatch and collect. by looking at the packets i see the enfuzion director on the head node attempts to send a UDP packet to the node. all udp ports on the nodes are blocked i checked this by scanning a node with nmap. older installs of redhat (i.e my workstation) seem to have udp ports enabled. regardless of the ttloop error the machine appears to work for a while. i.e enfuzion logs in jobs run etc, untill sudennly all stops. the machines remain up, and can be pinged. but no other services (rsh ssh etc run) If i connect a monitor and keyboard to the node it is also unresponive. this is a problem across many nodes. has anyone who uses enfuzion seen this error with nodes that are a rh7.1 install On one node i have seen on 2 occasions CPU 0: Machine Check Exception: 0000000000000004 Bank 2: d40040000000017a at 540040000000017a decoding this using a until i found on the net Status: (4) Machine Check in progress. Restart IP invalid. parsebank(2): f60020000000017a @ 760020000000017a External tag parity error Correctable ECC error MISC register information valid Memory heirarchy error Request: Generic error Transaction type : Generic Memory/IO : I/O can anyone tell me what the Restart IP invalid means. is this a dead cpu or a memory problem causing a mce? cheers Kim -- ______________________________________________________________________ Kim Branson Phd Student Structural Biology CSIRO Health Sciences and Nutrition Walter and Eliza Hall Institute Royal Parade, Parkville, Melbourne, Victoria Ph 61 03 9662 7136 Email kbranson@wehi.edu.au ______________________________________________________________________ From juari at provinet.com.br Thu Apr 4 14:20:47 2002 From: juari at provinet.com.br (JOELMIR RITTER MULLER ) Date: Wed Nov 25 01:02:13 2009 Subject: very high bandwidth, low latency manner? Message-ID: <200204041920.AA26476752@provinet.com.br> what the best mean of interconnecting several microcomputers in a very high bandwidth, low latency manner? does anyone have some ideas about this subject? Cheers, Juari R. M?ller From James.P.Lux at jpl.nasa.gov Thu Apr 4 16:05:30 2002 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:02:13 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <200204041920.AA26476752@provinet.com.br> Message-ID: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> What's high bandwidth? What's low latency? How much money do you want to spend? Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get switches, cables, adapters, etc.) Latency is kind of slow (compared to dedicated point to point links) At 07:20 PM 4/4/2002 -0300, JOELMIR RITTER MULLER wrote: >what the best mean of interconnecting several microcomputers >in a very high bandwidth, low latency manner? >does anyone have some ideas about this subject? > >Cheers, >Juari R. M?ller > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory 4800 Oak Grove Road, Mail Stop 161-213 Pasadena CA 91109 818/354-2075, fax 818/393-6875 From aby_sinha at yahoo.com Thu Apr 4 16:43:01 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Wed Nov 25 01:02:13 2009 Subject: console redirect issue References: <3CA244E0.5060602@yahoo.com> Message-ID: <3CACF315.2080704@yahoo.com> Hi list Sometime back i posted this problem on the list and now that i solved it i wanted to share my experience with the list. I had console redirection enabled on a Tyan 2505 T bios and everytime i used to boot it used to go straight in the BIOS. On the other side i was using Hyperterminal(customer requirement). I checked the console redirection on an older version of hyperterminal(windows 2000) and found it to be working . I mean the system was not going into BIOS everytime. But when i used the Hyperterminal version 5 that comes with win2000 professional the system it was going into BIOS every time it booted without touching any key. So much for microsoft technology that the newer version doesnt work and the older version does . Finally we resorted to using CRT in windows to do console redirect and it worked fine. I was trying to convince the customer to use minicom since we were selling Linux based servers and knew it would work, but to no use. being a tech i was amazed at what we can do with linux since we have the code open. U dont realise it a lot of time until u get a Application that doesnt run and u cant do anythign abt it . Its ridiculous that the older version of Hyperterminal works and the newer one shows strange problems... Abhishek Sinha California Digital Abhishek sinha wrote: > hi list > > > This might be just out of the topic, but i couldnt find help anywhere. > I am using serial console redirect on the 2505 t Tyan board. now i am > getting strange things that i have never seen before. When i connect > the machines with the null modem cable , the machine (where the > console redirect is enabled ) goes into the BIOS. If u save and exit > again it goes into the BIOS without doing anything. When u disconnect > the cable then this does not happen . I tried using a cross over rj45 > cable. With this i cannot see the POST messages and i can only see the > messages when the kernel boots. Is this an issue with the BIOS or some > one has been in wonderland and seen this issue . > > Please advise > abhisek > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From raysonlogin at yahoo.com Thu Apr 4 17:43:34 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:13 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: <20020405014334.82902.qmail@web11402.mail.yahoo.com> You may consider Myrinet, VIA, SCI... (don't have the money to try each of those, so I can tell you which is the best ;-( ) http://grappew2k.imag.fr/evalRezo.html (just found this benchmark on the Net) Rayson --- Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time > you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > > > > > > At 07:20 PM 4/4/2002 -0300, JOELMIR RITTER MULLER wrote: > > >what the best mean of interconnecting several microcomputers > >in a very high bandwidth, low latency manner? > >does anyone have some ideas about this subject? > > > >Cheers, > >Juari R. Müller > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > Jim Lux > Spacecraft Telecommunications Equipment Section > Jet Propulsion Laboratory > 4800 Oak Grove Road, Mail Stop 161-213 > Pasadena CA 91109 > > 818/354-2075, fax 818/393-6875 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From ron_chen_123 at yahoo.com Thu Apr 4 20:23:51 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:13 2009 Subject: FreeBSD port of SGE (Compute farm system) Message-ID: <20020405042351.86759.qmail@web14706.mail.yahoo.com> Hi, I compiled the source, changed a few parameters, and SGE finally runs on FreeBSD. It is running in single- user mode, with only 1 host. I am doing a little clean up, and then I will need to make sure my changes do not affect others (by "#ifdef BSD"). It still does not get the correct system information yet, but some of the job accounting info is there (at least run time is correct 8-) ). It is now running for several hours, it looks stable. It ran several tens of jobs. "qstat", "qhost", "qacct", "qconf", "qdel" look fine, output makes sense (but need to implement the resource info collecting routines). I will post the patches tomorrow, together with some output of the commands. (I will be busy today) Also, I will move the discussion from the hackers list to the cluster@freebsd list. -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From Karl.Bellve at umassmed.edu Fri Apr 5 07:08:53 2002 From: Karl.Bellve at umassmed.edu (Karl Bellve) Date: Wed Nov 25 01:02:13 2009 Subject: ISSPL Message-ID: <3CADBE05.1A608277@umassmed.edu> Is there an AMD or INTEL optimized version of the ISSPL libraries? We have an application that I ported from an array processor from CSPI to a Beowulf system and it uses ISSPL. Right now, the ISSPL library I am using is just straight C code and doesn't contain any optimization for the Intel/AMD platform. Or, is it better to switch to another library, like Intel Kernel Math Library, or perhaps just use FFTw. It would be simplier if I could just find a standard ISSPL library for Intel/AMD. -- Cheers, Karl Bellve, Ph.D. ICQ # 13956200 Biomedical Imaging Group TLCA# 7938 University of Massachusetts Email: Karl.Bellve@umassmed.edu Phone: (508) 856-6514 Fax: (508) 856-1840 PGP Public key: finger kdb@molmed.umassmed.edu From math at velocet.ca Fri Apr 5 09:59:56 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:13 2009 Subject: How do you keep clusters running.... In-Reply-To: <1017925975.30189.87.camel@linux60>; from leandro@ep.petrobras.com.br on Thu, Apr 04, 2002 at 10:12:54AM -0300 References: <200204032104.PAA23347@sijer.mayo.edu> <1017925975.30189.87.camel@linux60> Message-ID: <20020405125956.D69845@velocet.ca> On Thu, Apr 04, 2002 at 10:12:54AM -0300, Leandro Tavares Carneiro's all... > We have here an beowulf cluster with 64 production nodes and 128 > processors, and we have some problems like you, about fans. > Here, our cluster hardware is very cheap, using motherboards and cases > founds easily in the local market, and the problems is critical. > We have 5 spare nodes, and only 3 of that are ready to work. All our [..] > I think this kind of problem is inevitable with cheap PC parts, and can > be lower with high-quality (and price) parts. We are making an study to > by a new cluster, for another application and we call Compaq and IBM to > see what they have in hardware and software, with the hope of a future > with less problems... You can always employ the 'maximum tolerable failure rate' concept and buy for that rate. I find in terms of pricing equipment, there is a definite non linear (exponential?) relationship between MTBF and price. For a failure rate thats 3-5 times higher you can spend up to 40% less (or better) on equipment. This isnt a solid number, but feels within the ballpark to me based on what I've priced out before on clusters. Others may dispute this, but I am talking about buying Dell 2U rackmount servers pre-assembled vs a bunch of boards and CPUs and ram you slap together yourself. Using this concept, and setting your maximum tolerable failure rate at a specific level that suits your needs, for eg 1 node per month, coupled with an agreesive RMA schedule with a good vendor, you can get the best price performance out of a cluster. If you can withstand, using my example, 3-5 times higher failure rate which ends up being 1 node per month, you end up with 40% more gear. If you require 100% of all nodes present to be in one mesh involved in parallel calculations and a single node failure is catastrophic to the entire job running since startup, then its obviously not worth it if your jobs have a similar runtime as the failure rate (1 month). A failure rate of 1 node/5 months would work far better in that case, as the average failure would lose you only 10% of the work you do in 5 months, whereas with 40% more equipment and 5x the failure rate you may lose most of your work. (Note I am not considering that your jobs may run in [1 month / 1.4] instead due to the speedup from more gear - which will cause jobs to run in ~70% of the time (~3 weeks) - and therefore have a higher success rate in finishing in the 1 node/mo MTBF environment.) However, if your jobs run on all nodes for only a day, then a failure of a single node once per month nets you a loss of a half day per month lost work average. For this concession you get 40% more equipment (possibly meaning 40% more processing power, depending on your application). You also need to factor in how much personal time you have to deal with RMAing and swapping equipment. This may well make any efforts towards this kind of model impossible if extra time is not available. That notwithstanding, the cost of extra time can be easily factored into the equation (and knowledgeable work-study undergrads can be a REALLY cheap alternative here :) Of course with 40% more power, you may configure two sub-clusters of 70% power of the original HA design (HA = high availability ~ higher price). If this fits your needs, a failure of a single node once per month on average jobs of a day in length will net you the equivalent loss of a quarter-day total possible work. The more you isolate sections of the cluster from eachother, the less you will lose when a failure occurs. If you can manually segment your jobs to run one per node and still achieve near 100% (or more?) of possible capacity vs a more parallelized system, then a single node failure is inconsequential. Considering the amount and types of failures discussed here, there are obviously no guarantee that a certain type of cluster setup will save you from having massive problems. Being able to plan for downtime and manage the costs associated with it is also obviously part of the design and operation of the cluster. Its a seesaw-type of balance - if you want more nodes for less money, be prepared to spend more time fixing them. Of course with any cluster, more nodes of any type will logically translate into more down/service time - so there will probably be a non-linear translation of amount of work when comparing fewer HA nodes vs more cheaper nodes. Of course by this logic, buying fewer bigger nodes would also result in less work. At some point this becomes too expensive because you're buying big Suns that are very expensive per GFLOPS (unless of course, it suits your needs best...). Another problem with this whole situation that makes it even more complex is that many cluster installations are subject to strange pricing/operation cost models. Various parts may actually lie outside your budget responsability: One time costs: - design costs (on paper) - equipment purchase - equipment cosntruction/installation - equipment configuration - softwre installation & configuration Long term/ongoing: - software maintenance/reconfiguration - upkeep/repair - equipment upgrades - power costs - cooling costs There are probably sub categories these could be split into as well. The issue here is that, say in a university, power and cooling may be paid for by the university as well as manual labour for upkeep and repair. If that is the case, then getting very power-inefficient but fast CPUs may work well (AMD thunderbirds, for eg :). If you have to pay for your own power and cooling and manual labour, then you may well just opt for spending more on cheaper gear (Athlon XPs) - and at that point may as well go for HA gear as well (depending on the cost model) to save expensive manual labour (at commercial rates >$50/hr you can quickly rack up a node's cost in a day of work). We have successfully employed the non-HA equipment deisgn in building one of our clusters - and in fact there are added advantages. We have observed that most (for various values of 'most' - 50% to 80%?) failures occur within the first month of usage. Once you start swapping out bad nodes, you have a falling rate of failure (though the age of components slowly catches up over a long time period - things with moving parts, such as fans, especially). With all problems taken together (swapping over NFS included, as these are diskless nodes) we have about 1 node crash/fail in some way every 2 months. Of course, since jobs can be checkpointed, and a single node failing doesnt take down the whole cluster (as jobs are run on subsets of nodes) not much work is lost overall. For the increased throughput from more nodes for the money, and including about 15 minutes of work per month physically messing with the machines thats directly related to hardware problems and crashes (ie unrelated to the time spent maintaining the cluster as per normal operations), its been an overall win on that particular cluster. (We have not had to RMA any equipment since the start of the 2nd month of operation - under our current service agreement, RMA would take 1-3 days, and about 20-30 min of labour, and in the meantime not significantly impact the cluster's performance). As always, designing your cluster customized for your needs and limitations is always the biggest win on price/performance. Limitations to this are having very wide ranges of needs and not having any idea of what capabilities will be required in the future, along with expensive losses when there's downtime, and expensive manual labour to get things working again. Barring these kinds of considerations, commodity equipment with a failure rate that you can deal with can net noticeable gains - having a planned failure cost related to that rate will save you from suprises. No matter what kind of cluster you build you WILL have failures, and designing to be able to mitigate the impact from such to the highest possible extent is obviously good planning. /kc > Em Qua, 2002-04-03 ?s 18:04, Cris Rhea escreveu: > > > > What are folks doing about keeping hardware running on large clusters? > > > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > > > Sure seems like every week or two, I notice dead fans (each RS-1200 > > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > > > How are folks with significantly more nodes than mine dealing with constant > > maintenance on their nodes? Do you have whole spare nodes sitting around- > > ready to be installed if something fails, or do you have a pile of > > spare parts? Did you get the vendor (if you purchased prebuilt systems) > > to supply a stockpile of warranty parts? > > > > One of the problems I'm facing is that every time something croaks, > > Racksaver is very good about replacing it under warranty, but getting > > the new parts delivered usually takes several days. > > > > For some things like fans, they sent extras for me to keep on-hand. > > > > For my last fan/CPU/motherboard failure, the node pair will be > > down ~5 days waiting for parts. > > > > Comments? Thoughts? Ideas? > > > > Thanks- > > > > --- Cris > > > > > > > > ---- > > Cristopher J. Rhea Mayo Foundation > > Research Computing Facility Pavilion 2-25 > > crhea@Mayo.EDU Rochester, MN 55905 > > Fax: (507) 266-4486 (507) 284-0587 > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- > Leandro Tavares Carneiro > Analista de Suporte > EP-CORP/TIDT/INFI > Telefone: 2534-1427 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From cblack at EraGen.com Tue Apr 2 12:09:34 2002 From: cblack at EraGen.com (Chris Black) Date: Wed Nov 25 01:02:13 2009 Subject: Lost cycles due to PBS (was Re: Uptime data/studies/anecdotes) In-Reply-To: <"from roger"@ERC.MsState.Edu> References: <200204021824.g32IOMa14409@mycroft.ahpcrc.org> Message-ID: <20020402140934.A29446@getafix.EraGen.com> On Tue, Apr 02, 2002 at 12:46:07PM -0600, Roger L. Smith wrote: > On Tue, 2 Apr 2002, Richard Walsh wrote: [stuff deleted] > PBS is our leading cause of cycle loss. We now run a cron job on the > headnode that checks every 15 minutes to see if the PBS daemons have died, > and if so, it automatically restarts them. About 75% of the time that I > have a node fail to accept jobs, it is because its pbs_mom has died, not > because there is anything wrong with the node. > We used to have the same problem with PBS, especially when many jobs were in the queue. At that point sometimes the pbs master died as well. Since we've switched to SGE/GridEngine/CODINE I've been MUCH happier. Plus there are lots of nifty things you can do with the expandibility of writing your own load monitors via shell scripts and such. The whole point of this post is: GNQS < PBS < Sun Gridengine :) Chris (who tried two other batch schedulers until settling on SGE) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20020402/1433e290/attachment.bin From tekka99 at libero.it Tue Apr 2 13:29:55 2002 From: tekka99 at libero.it (Gianluca Cecchi) Date: Wed Nov 25 01:02:13 2009 Subject: Linux Software RAID5 Performance References: Message-ID: <006b01c1da8d$89365dd0$44e01d97@emea.cpqcorp.net> Which option did you use for ext3 journal mechanism? It makes difference expecially when using "writeback" vs the default "ordered" (see below part of the "Changes" file for ext3) Which I/O benchmark did you use? Thanks, Gianluca Cecchi New mount options: "mount -o journal=update" Mounts a filesystem with a Version 1 journal, upgrading the journal dynamically to Version 2. "mount -o data=journal" Journals all data and metadata, so data is written twice. This is the mode which all prior versions of ext3 used. "mount -o data=ordered" Only journals metadata changes, but data updates are flushed to disk before any transactions commit. Data writes are not atomic but this mode still guarantees that after a crash, files will never contain stale data blocks from old files. "mount -o data=writeback" Only journals metadata changes, and data updates are entirely left to the normal "sync" process. After a crash, files will may contain stale data blocks from old files: this mode is exactly equivalent to running ext2 with a very fast fsck on reboot. Ordered and Writeback data modes require a Version 2 journal: if you do not update the journal format then only the Journaled data will be allowed. The default data mode is Journaled for a V1 journal, and Ordered for V2. ----- Original Message ----- From: "Michael Prinkey" To: Sent: Sunday, March 31, 2002 9:33 PM Subject: Linux Software RAID5 Performance > Some time ago, a thread discussed the relative performance and stability > merits of different RAID solutions. At that time, I gave some results for > 640-GB arrays that I had build using EIDE drives and Software RAID5. I just > recently constructed and installed a 1.0-TB array and had some performance > numbers to share for it as well. They are interesting for two reasons: > First, the filesystem in use is ext3, rather than ext2. Second, the read > performance is significantly better (almost 2x) than that of the 640-GB > units. > > The system uses 11 120-GB Maxtor 5400-RPM drives, two Promise Ultra66 > controllers, a P4 1.6-GHz CPU, an Intel 850 motherboard, and 512 MB ECC > RDRAM. Drives are configured in RAID5 (9 data, 1 parity, 1 hot spare). > Four drives are on each Promise controller. Three are on the on-board EIDE > controller (UDMA100). A small boot drive is also on the on-board > controller. I had intended to use Ultra100 TX2 controllers, but the latest > EIDE driver updates with TX2 support are not making it into the latest > kernels (I'm using 2.4.18), so I opted for the older, slower controllers > rather than patching. So, I am both cautious and lazy. 8) > > Again, performance (see below) is remarkably good, especially considering > all of the strikes against this configuration: EIDE instead of SCSI, UDMA66 > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave drives on > each port instead of a single drive per port. With some hdparm tuning (-c 3 > -u 1), the read performance went from 83 MB/sec to 93 MB/sec. Write > performance remained essentially unchanged by tuning at 26 MB/sec. For > comparison, the 640-GB arrays gave read performance of about 56 MB/sec, > write performance of 28.5 MB/sec. > > Had I more time, I would have tested ext2 vs ext3 to ascertain how much that > change effected performance. Likewise, I was considering the use of a raid1 > array as the ext3 journal device to perhaps improve write performance. Any > thoughts? > > Regards, > > Mike Prinkey > Aeolus Research, Inc. > > ---------------------- > > [root@tera /root]# df; mount; cat /proc/mdstat; cat bonnie10.log > Filesystem 1k-blocks Used Available Use% Mounted on > /dev/hda6 38764268 2601128 34193976 8% / > /dev/hda1 101089 4965 90905 6% /boot > /dev/md0 1063591944 58195936 1005396008 6% /raid > raid640:/raid/home 630296592 284066148 346230444 46% /mnt/tmp > /dev/hda6 on / type ext2 (rw) > none on /proc type proc (rw) > /dev/hda1 on /boot type ext2 (rw) > none on /dev/pts type devpts (rw,gid=5,mode=620) > /dev/md0 on /raid type ext3 (rw) > automount(pid580) on /misc type autofs > (rw,fd=5,pgrp=580,minproto=2,maxproto=3) > raid640:/raid/home on /mnt/tmp type nfs (rw,addr=192.168.0.123) > Personalities : [raid5] > read_ahead 1024 sectors > md0 : active raid5 hdl1[10] hdk1[9] hdj1[8] hdi1[7] hdh1[6] hdg1[5] hdf1[4] > hde1[3] hdd1[2] hdc1[1] hdb1[0] > 1080546624 blocks level 5, 32k chunk, algorithm 2 [10/10] [UUUUUUUUUU] > > unused devices: > Bonnie 1.2: File '/raid/Bonnie.1027', size: 1048576000, volumes: 10 > Writing with putc()... done: 14810 kB/s 88.9 %CPU > Rewriting... done: 22288 kB/s 13.4 %CPU > Writing intelligently... done: 26438 kB/s 21.7 %CPU > Reading with getc()... done: 17112 kB/s 97.9 %CPU > Reading intelligently... done: 93332 kB/s 32.2 %CPU > Seek numbers calculated on first volume only > Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... > ---Sequential Output (nosync)--- ---Sequential Input-- --Rnd > Seek- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k > (03)- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec > %CPU > raid05 10*1000 14810 88.9 26438 21.7 22288 13.4 17112 97.9 93332 32.2 206.3 > 2.1 > > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tekka99 at libero.it Tue Apr 2 13:44:28 2002 From: tekka99 at libero.it (Gianluca Cecchi) Date: Wed Nov 25 01:02:13 2009 Subject: Syntax for executing References: Message-ID: <00b201c1da8f$91945340$44e01d97@emea.cpqcorp.net> if using also pvm is not a problem you could use the pvm enabled version of povray 3d rendring engine: http://www.povray.org/ http://pvmpov.sourceforge.net/ Or, there are also MPI patches to povray (but I never used them): http://www.ce.unipr.it/pardis/parma2/povray/povray.html http://www.verrall.demon.co.uk/mpipov/ HIH. Bye, Gianluca Cecchi ----- Original Message ----- From: "Eric Miller" To: Sent: Tuesday, April 02, 2002 11:34 PM Subject: RE: Syntax for executing > disregard. SETI is not available in an MPI-enabled format. > > My apologies. Can anyone direct me to an URL that lists some available > programs that I can execute on the cluster? Preferably something with a > continuous (looping?) graphical output (e.g. SETI). This is a display for > students to visualize and promote educational programs for Linux, like a > museum peice. > > > >>>>>>>>>>>>>>>>>>>>>>>>>> > > Hey all, got a five-node cluster up running 27-z9, preparing for a 30 node > cluster. > > - What is the syntax to run an executable in the cluster environment? For > example, I run > > NP=5 mpi-mandel > > to run the test fractal program. How would I execute say, SETI, using the > cluster? Assume that the SETI executable is in the PATH. Also, the older > version of Scyld had some test code in /usr/mpi-beowulf/*. Is that gone? > > - What would cause all but one of the processors to show usage in > beostatus? The node shows "up" in every other way: hardware identical, > memory, swap, network, etc....just when I run something, only that one > processor on one node shows no % usage. > > -ETM > > .~. > /V\ > // \\ > /( )\ > ^'~'^ > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Wed Apr 3 13:23:44 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:02:13 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: I don't know how to say this without sounding condescending, but we resolved this problem by purchasing high quality machines. We currently use IBM x330s (although I also had good luck with our SGI 1100's before SGI discontinued them). We have enough nodes on hand, that IBM has stocked a couple of spare motherboards, power supplies, etc., but we don't need them that often. I've never had a fan failure. In general, hardware problems are a very minor part of the care and feeding of our cluster. On Wed, 3 Apr 2002, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply fans). > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > How are folks with significantly more nodes than mine dealing with constant > maintenance on their nodes? Do you have whole spare nodes sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From SGaudet at turbotekcomputer.com Wed Apr 3 13:26:28 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:13 2009 Subject: How do you keep clusters running.... Message-ID: <3450CC8673CFD411A24700105A618BD61BF020@911TURBO> Hello Chris, > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of > 20 nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power > supply fans). > > My last fan failure was a CPU fan that toasted the CPU and > motherboard. > > How are folks with significantly more nodes than mine dealing > with constant > maintenance on their nodes? Do you have whole spare nodes > sitting around- > ready to be installed if something fails, or do you have a pile of > spare parts? Did you get the vendor (if you purchased > prebuilt systems) > to supply a stockpile of warranty parts? > > One of the problems I'm facing is that every time something croaks, > Racksaver is very good about replacing it under warranty, but getting > the new parts delivered usually takes several days. > > For some things like fans, they sent extras for me to keep on-hand. > > For my last fan/CPU/motherboard failure, the node pair will be > down ~5 days waiting for parts. > > Comments? Thoughts? Ideas? ------------------------------------------ The vendor of choise should be using quality parts. We don't see these issues here. Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From haohe at me1.eng.wayne.edu Wed Apr 3 14:48:14 2002 From: haohe at me1.eng.wayne.edu (Hao He) Date: Wed Nov 25 01:02:13 2009 Subject: GbE Channel Bonding Message-ID: <200204032258.RAA15974@me1.eng.wayne.edu> Any one who has experience in bonding Gigabit Ethernet cards? How about the performance? Thanks. -HH From mikeprinkey at hotmail.com Wed Apr 3 10:10:10 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Wed Nov 25 01:02:13 2009 Subject: Hyperthreading in P4 Xeon (question) Message-ID: I can amplify that point. A commercial CFD application ran significantly slower using 4 threads vs 2 on a dual Prestonia system. Anything memory limited will probably behave the same way. Mike Prinkey Aeolus Research, Inc. >From: Mark Hahn >To: William Park >CC: >Subject: Re: Hyperthreading in P4 Xeon (question) >Date: Wed, 3 Apr 2002 10:50:06 -0500 (EST) > > > What is the realistic effect of "hyperthreading" in P4 Xeon? I'm not > > versed in the latest CPU trends. Does it mean that dual-P4Xeon will > > behave like 4-way SMP? > >for some value of "behave like" ;) >that is, it will definitely NOT get twice as fast. but it will appear >to have 4 CPUs, and can run 4 threads/procs at once (for values of >"once" > 1 clock cycle ;) > >we did a quick test on a dual-prestonia here, and saw a ~5% speedup >on a probably cache-friendly, compute-bound task. > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From alan at infogroup.it Wed Apr 3 15:13:39 2002 From: alan at infogroup.it (amedeo pimpini) Date: Wed Nov 25 01:02:13 2009 Subject: my first diskless beowulf cluster. Message-ID: <3CAB8CA3.9030809@infogroup.it> I've encountered a difficlult to launch init after mount root on nfs. Can somebody help me ? Follows details: I have compiled a 2.4.7-10 kernel whith autoconfiguration ip, with root on nfs and placed on /tftpboot the kernel mount /tftbboot but dont start init. On console of first ws: IP-Config: Got DHCPanswer from 10.1.1.1 my address is 10.0.0.2 ... VFS: Mounted root (nfs filesystem). Freeing unused kernel memory: 180k freed Kernel panik: No init found. Try passing init= option to kernel i have recompiled main.c with printk( %d ), errno end i obtined 8 with perror on have Error code 8: Exec format error If i mv /sbin/init /sbin/init.old then i obtine error 14. ON the server /var/log/messages i have: Apr 3 00:17:14 nut1 dhcpd: Both dynamic and static leases present for 10.1.1.2. Apr 3 00:17:14 nut1 dhcpd: Either remove host declaration nut2 or remove 10.1.1.2 Apr 3 00:17:14 nut1 dhcpd: from the dynamic address pool for 10.1.0.0 Apr 3 00:17:14 nut1 dhcpd: DHCPREQUEST for 10.1.1.2 from 00:e0:4c:20:6b:8f via eth0 Apr 3 00:17:14 nut1 dhcpd: DHCPACK on 10.1.1.2 to 00:e0:4c:20:6b:8f via eth0 Apr 3 00:17:14 nut1 mountd[1520]: mountproc_translate_mnt_1_svc(/tftpboot/10.1.1.2) Apr 3 00:17:14 nut1 mountd[1520]: NFS mount of /tftpboot/10.1.1.2 attempted from 10.1.1.2 Apr 3 00:17:14 nut1 mountd[1520]: /tftpboot/10.1.1.2 has been mounted by 10.1.1.2 and tcpdump: 00:21:06.149970 arp who-has nut1 tell nut2 00:21:06.149970 arp reply nut1 is-at 0:e0:4c:f0:6d:fb 00:21:06.149970 nut2.800 > nut1.sunrpc: udp 56 (DF) 00:21:06.149970 nut1.sunrpc > nut2.800: udp 28 (DF) 00:21:06.149970 nut2.800 > nut1.sunrpc: udp 56 (DF) 00:21:06.149970 nut1.sunrpc > nut2.800: udp 28 (DF) 00:21:06.149970 nut2.800 > nut1.849: udp 64 (DF) 00:21:06.149970 nut1.849 > nut2.800: udp 60 (DF) 00:21:06.149970 nut2.56685225 > nut1.nfs: 100 getattr [|nfs] (DF) 00:21:06.149970 nut1.nfs > nut2.56685225: reply ok 96 getattr DIR 47777 ids 0/0 sz 4096 (DF) 00:21:06.149970 nut2.73462441 > nut1.nfs: 100 fsstat [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.73462441: reply ok 48 fsstat [|nfs] (DF) 00:21:06.159970 nut2.90239657 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.90239657: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.107016873 > nut1.nfs: 112 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.107016873: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.123794089 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.123794089: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.140571305 > nut1.nfs: 108 lookup [|nfs] (DF) 00:21:06.159970 nut1.nfs > nut2.140571305: reply ok 128 lookup [|nfs] (DF) 00:21:06.159970 nut2.157348521 > nut1.nfs: 112 read [|nfs] (DF) 00:21:06.159970 nut1 > nut2: (frag 23518:1244@2960) 00:21:06.159970 nut1 > nut2: (frag 23518:1480@1480+) 00:21:06.159970 nut1.nfs > nut2.157348521: reply ok 1472 read (frag 23518:1480@0+) 00:21:11.149970 arp who-has nut2 tell nut1 00:21:11.149970 arp reply nut2 is-at 0:e0:4c:20:6b:8f i've tagged my kernel with mknbi-linux --output=/tftpboot/vmlinux.3com /usr/src/linux-2.4.7-10/arch/i386/boot/bzImage --ip=":10.1.1.1:10.1.1.1:255.255.0.0:" From rgb at phy.duke.edu Wed Apr 3 15:27:31 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:13 2009 Subject: How do you keep clusters running.... In-Reply-To: <200204032104.PAA23347@sijer.mayo.edu> Message-ID: On Wed, 3 Apr 2002, Cris Rhea wrote: > Comments? Thoughts? Ideas? a) Use onboard sensors (hoping your motherboards have them) to shut nodes down if the CPU temp exceeds an alarm threshold. That way future fan failures shouldn't cause system failure, just node shutdown. b) Use the largest cases you can manage given your space requirements. Larger cases have a bit more thermal ballast and can tolerate poor cooling for a bit longer before catastrophically failing. Gives you (or your monitor software) more time to react if nothing else. c) With only ten boxes, it sounds like you're having plain old bad luck, possibly caused by a bad batch of fans. Relax, perhaps your luck will improve;-) With all that said, it is still true that maintenance problems scale poorly with number of nodes. One reason (of many) that I prefer not to get nodes from vendors in another state that I never meet face to face. If your nodes are built by a local vendor (especially one with a decent local parts inventory and service department) then it is a bit easier to get good turnaround on node repairs and minimize downtime, especially since a local business rapidly learns that to make you happy is more important to their bottom line than making the next twenty or thirty customers that might walk through their door happy. There is also the usual tradeoff between buying "insurance" (e.g. onsite, 24 hour service contracts) on everything and number of nodes. There are plenty of companies that will sell you nodes and guarantee minimal downtime -- for a price. IBM and Dell come to mind, although there are many more. Only you can determine how mission critical it is to keep your nodes up and what the cost benefit tradeoffs are between buying fewer nodes (but getting better quality nodes and arranging guarantees of minimal downtime) or buying more nodes (but risking having a node or two down pending repairs from time to time). Cost-benefit analysis is at the heart of beowulf engineering, but you have to determine the "values" that enter into the analysis based on your local needs. rgb > > Thanks- > > --- Cris > > > > ---- > Cristopher J. Rhea Mayo Foundation > Research Computing Facility Pavilion 2-25 > crhea@Mayo.EDU Rochester, MN 55905 > Fax: (507) 266-4486 (507) 284-0587 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From tim at dolphinics.com Tue Apr 2 10:05:48 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Wed Nov 25 01:02:13 2009 Subject: Call for Papers Message-ID: <3CA9F2FC.7F39E715@dolphinics.com> CALL FOR PAPERS Workshop on High-Speed Local Networks (HSLN) as part of the IEEE LCN conference http://www.hcs.ufl.edu/hsln http://www.ieeelcn.org November 6 - 8, 2002 Embassy Suites USF, Tampa, Florida Important dates and contact: ---------------------------- Paper submission: June 10, 2002 Notification of acceptance: July 15, 2002 Camera-ready copy due: August 16, 2002 General Chair: Alan D. George (george@hcs.ufl.edu) General Information: -------------------- The High-Speed Local Networks (HSLN) workshop, within the 27th IEEE Conference on Local Computer Networks (LCN), focuses on the design, analysis, implementation, and exploitation of new concepts, techno- logies, and applications related to high-performance networks on a local scale. This workshop will bring together networking researchers, engineers, and practitioners from across the spectrum of high-speed local networks, with participants from industry, academia, and government. Original papers that present research results, case studies, technology development or deployment experience, work in progress, etc. are solicited, as are survey articles. Specific areas of interest include (but are not limited to): - High-speed LANs (e.g. Gigabit Ethernet, 10 Gigabit Ethernet) - System-area networks (e.g. SCI, Myrinet, ServerNet) - Storage-area networks (e.g. Fibre Channel) and I/O interconnects - High-speed networks in embedded systems (e.g. avionics, space systems) - Protocols, services, and topologies for high-speed local networks - Routing and switch architectures for high-speed local networks - Quality of Service (QoS) in high-speed local networks - Performance analysis of high-speed local networks and systems - Modeling and simulation of high-speed local networks - Middleware for high-speed local network communication - Applications for high-speed local networks (e.g. video on demand) Paper Submission Instructions: ------------------------------ Authors are invited to submit papers of up to ten camera-ready pages, in PDF or Postscript format, for presentation at the workshop and publication in the conference proceedings. Papers should be submitted by email to the workshop at hsln@hcs.ufl.edu on or before June 10, 2002. Alternatively, send five hard copies via postal mail to: Dr. Alan D. George HSLN General Chair Department of Electrical and Computer Engineering University of Florida PO Box 116200, 327 Larsen Hall Gainesville, FL 32611-6200 HSLN Organizing Committee: -------------------------- Workshop Chair: Industry Chair: Program Chair: A.D. George J.L. Meier K.J. Christensen ECE Department Advanced Technology Center CSE Department Univ of Florida Rockwell Collins, Inc. Univ of South Florida george@hcs.ufl.edu jlmeier@rockwellcollins.com christen@csee.usf.edu HSLN Program Committee: ----------------------- Jay Bragg (awbragg@yahoo.com) Consultant Ron Brightwell (bright@sandia.gov) Sandia National Labs, New Mexico Wayne Chang (wchang@arl.army.mil) Army Research Laboratory Helen Chen (hycsw@california.sandia.gov) Sandia National Labs, California Patrick W. Dowd (dowd@lts.ncsc.mil) University of Maryland at College Park and U.S. Department of Defense College Park, MD Mike Foster (michael.s.foster@boeing.com) Boeing Corporation Michael A. Hoard (hoardm@us.ibm.com) IBM Beaverton, OR Cynthia S. Hood (hood@iit.edu) Illinois Institute of Technology Chicago, IL Anestis Karasaridis (karasaridis@att.com) Network Design and Performance Analysis Dept. AT&T Labs, Middletown, NJ Fred Kuhns (fredk@arl.wustl.edu) Washington University St. Louis, MI Michael McKee (mckee026@umn.edu) University of Minnesota, Rochester Rochester, MN Knut Omang (knuto@fast.no) University of Oslo Oslo, Norway Sarp Oral (oral@hcs.ufl.edu) University of Florida Gainesville, FL D. K. Panda (panda@cis.ohio-state.edu) Ohio State University Columbus, Ohio Anthony Skjellum (tony@MPI-SoftTech.Com) Mississippi State University Starkville, MS Norm Strole (ncstrole@us.ibm.com) IBM Research Triangle Park, NC Rollins Turner (rturner@paradyne.com) Paradyne Corporation Largo, FL William White (wwhite@siue.edu) Southern Illinois University Edwardsville, IL Tim Wilcox (tim.wilcox@dolphinics.com) Technical Director, Dolphin Interconnect --- -------------- next part -------------- A non-text attachment was scrubbed... Name: tim.vcf Type: text/x-vcard Size: 180 bytes Desc: Card for Tim Wilcox Url : http://www.scyld.com/pipermail/beowulf/attachments/20020402/3038c307/tim.vcf From mikeprinkey at hotmail.com Tue Apr 2 15:07:21 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Wed Nov 25 01:02:13 2009 Subject: Linux Software RAID5 Performance Message-ID: Hi Gianluca, I used the default "ordered" journaling option. I haven't really looked into the different journaling options and there impact on performance. Does the ordered option require two writes? Also, any thoughts on performance tuning or using an external raid1 journal device? The benchmark application is Bonnie 1.2. Thanks, Mike >From: "Gianluca Cecchi" >To: , >Subject: Re: Linux Software RAID5 Performance >Date: Tue, 2 Apr 2002 23:29:55 +0200 > >Which option did you use for ext3 journal mechanism? It makes difference >expecially >when using "writeback" vs the default "ordered" (see below part of the >"Changes" file for ext3) >Which I/O benchmark did you use? >Thanks, >Gianluca Cecchi > >New mount options: > > "mount -o journal=update" > Mounts a filesystem with a Version 1 journal, upgrading the > journal dynamically to Version 2. > > "mount -o data=journal" > Journals all data and metadata, so data is written twice. This > is the mode which all prior versions of ext3 used. > > "mount -o data=ordered" > Only journals metadata changes, but data updates are flushed to > disk before any transactions commit. Data writes are not atomic > but this mode still guarantees that after a crash, files will > never contain stale data blocks from old files. > > "mount -o data=writeback" > Only journals metadata changes, and data updates are entirely > left to the normal "sync" process. After a crash, files will > may contain stale data blocks from old files: this mode is > exactly equivalent to running ext2 with a very fast fsck on >reboot. > >Ordered and Writeback data modes require a Version 2 journal: if you do >not update the journal format then only the Journaled data will be >allowed. > >The default data mode is Journaled for a V1 journal, and Ordered for V2. > > >----- Original Message ----- >From: "Michael Prinkey" >To: >Sent: Sunday, March 31, 2002 9:33 PM >Subject: Linux Software RAID5 Performance > > > > Some time ago, a thread discussed the relative performance and stability > > merits of different RAID solutions. At that time, I gave some results >for > > 640-GB arrays that I had build using EIDE drives and Software RAID5. I >just > > recently constructed and installed a 1.0-TB array and had some >performance > > numbers to share for it as well. They are interesting for two reasons: > > First, the filesystem in use is ext3, rather than ext2. Second, the >read > > performance is significantly better (almost 2x) than that of the 640-GB > > units. > > > > The system uses 11 120-GB Maxtor 5400-RPM drives, two Promise Ultra66 > > controllers, a P4 1.6-GHz CPU, an Intel 850 motherboard, and 512 MB ECC > > RDRAM. Drives are configured in RAID5 (9 data, 1 parity, 1 hot spare). > > Four drives are on each Promise controller. Three are on the on-board >EIDE > > controller (UDMA100). A small boot drive is also on the on-board > > controller. I had intended to use Ultra100 TX2 controllers, but the >latest > > EIDE driver updates with TX2 support are not making it into the latest > > kernels (I'm using 2.4.18), so I opted for the older, slower controllers > > rather than patching. So, I am both cautious and lazy. 8) > > > > Again, performance (see below) is remarkably good, especially >considering > > all of the strikes against this configuration: EIDE instead of SCSI, >UDMA66 > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave >drives >on > > each port instead of a single drive per port. With some hdparm tuning >(-c >3 > > -u 1), the read performance went from 83 MB/sec to 93 MB/sec. Write > > performance remained essentially unchanged by tuning at 26 MB/sec. For > > comparison, the 640-GB arrays gave read performance of about 56 MB/sec, > > write performance of 28.5 MB/sec. > > > > Had I more time, I would have tested ext2 vs ext3 to ascertain how much >that > > change effected performance. Likewise, I was considering the use of a >raid1 > > array as the ext3 journal device to perhaps improve write performance. >Any > > thoughts? > > > > Regards, > > > > Mike Prinkey > > Aeolus Research, Inc. > > > > ---------------------- > > > > [root@tera /root]# df; mount; cat /proc/mdstat; cat bonnie10.log > > Filesystem 1k-blocks Used Available Use% Mounted on > > /dev/hda6 38764268 2601128 34193976 8% / > > /dev/hda1 101089 4965 90905 6% /boot > > /dev/md0 1063591944 58195936 1005396008 6% /raid > > raid640:/raid/home 630296592 284066148 346230444 46% /mnt/tmp > > /dev/hda6 on / type ext2 (rw) > > none on /proc type proc (rw) > > /dev/hda1 on /boot type ext2 (rw) > > none on /dev/pts type devpts (rw,gid=5,mode=620) > > /dev/md0 on /raid type ext3 (rw) > > automount(pid580) on /misc type autofs > > (rw,fd=5,pgrp=580,minproto=2,maxproto=3) > > raid640:/raid/home on /mnt/tmp type nfs (rw,addr=192.168.0.123) > > Personalities : [raid5] > > read_ahead 1024 sectors > > md0 : active raid5 hdl1[10] hdk1[9] hdj1[8] hdi1[7] hdh1[6] hdg1[5] >hdf1[4] > > hde1[3] hdd1[2] hdc1[1] hdb1[0] > > 1080546624 blocks level 5, 32k chunk, algorithm 2 [10/10] >[UUUUUUUUUU] > > > > unused devices: > > Bonnie 1.2: File '/raid/Bonnie.1027', size: 1048576000, volumes: 10 > > Writing with putc()... done: 14810 kB/s 88.9 %CPU > > Rewriting... done: 22288 kB/s 13.4 %CPU > > Writing intelligently... done: 26438 kB/s 21.7 %CPU > > Reading with getc()... done: 17112 kB/s 97.9 %CPU > > Reading intelligently... done: 93332 kB/s 32.2 %CPU > > Seek numbers calculated on first volume only > > Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done... > > ---Sequential Output (nosync)--- ---Sequential Input-- >--Rnd > > Seek- > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- >--04k > > (03)- > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU >/sec > > %CPU > > raid05 10*1000 14810 88.9 26438 21.7 22288 13.4 17112 97.9 93332 32.2 >206.3 > > 2.1 > > > > > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at >http://explorer.msn.com/intl.asp. > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > > _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From mikeprinkey at hotmail.com Wed Apr 3 11:49:00 2002 From: mikeprinkey at hotmail.com (Michael Prinkey) Date: Wed Nov 25 01:02:13 2009 Subject: Linux Software RAID5 Performance Message-ID: Indeed, the multiple processes accessing the device made significantly degrade performance. Fortunately for us, as well, access speed is limited by the NFS/SMB and the network, not by array performance. Unfortunately, the unit is online now and I can't fiddle around with the settings and test it further. WRT reliability, we have seen the array drop to degraded mode because of a single drive failure. We have also a single drive take down the entire IDE port. This results in the md device disappearing until you swap out the offending drive and restart the array. There is no data here. Usually one drive goes and the array goes into degraded mode and starts reconstructing on the spare. Then the second goes and the array disappears. It is a bit disconcerting to do ls /raid and get nothing back. Changing out the drive and restarting pulls everything back. I can honestly say that the only data loss that I have had on these arrays came when a maintenance person completely unplugged one of the arrays from the UPS. It caused low-level corruption on 5 of the 9 drives in the array. We ended up using a Windows 98 boot floppy with Maxtor's Powermax utility to patch them all back up. It took many hours. This is the WORST possible scenario, BTW. Even reseting the system gives the EIDE devices a chance to flush their caches and maintain low-level integrity. Cutting the power can leave the array/drives inconsistent on the filesystem, device (/dev/md0), and hardware-format datagram levels. So, lock your arrays in a cabinet! 8) Mike >From: Jurgen Botz >To: mprinkey@aeolusresearch.com (Michael Prinkey) >CC: beowulf@beowulf.org >Subject: Re: Linux Software RAID5 Performance >Date: Wed, 03 Apr 2002 10:25:31 -0800 > >Michael Prinkey wrote: > > Again, performance (see below) is remarkably good, especially >considering > > all of the strikes against this configuration: EIDE instead of SCSI, >UDMA66 > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave >drives on > > each port instead of a single drive per port. > >With regard to the master/slave config... I note that your performance >test is a single reader/writer... in this config with RAID5 I would >expect the performance to be quite good even with 2 drives per IDE >controller. But if you have several processes doing disk I/O >simultaneously you should see a rather more precipitous drop in >performance than you would with a single drive per IDE controller. >I'm working on testing a very similar config right now and that's >one of my findings (which I had expected) but our application for this >is not very performance sensitive so it's not a big deal. > >A more important issue for me is reliability, and I'm somewhat >concerned about failure modes. For example, can an IDE drive fail >in such a way that if will disable the controller or the other >drive on the same controller? If so, that would seriously limit >the usefulness of RAID5 in this config. In general how good is >Linux software RAID's failure handling? Etc. > >:j > > >-- >Jürgen Botz | While differing widely in the various >jurgen@botz.org | little bits we know, in our infinite > | ignorance we are all equal. -Karl >Popper > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx From emiller at techskills.com Wed Apr 3 17:06:32 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:13 2009 Subject: another syntax question In-Reply-To: <20020403153227.A15201@node0.opengeometry.ca> Message-ID: For non-parallel applications, is it possible to run individual instances on diskless nodes? For example, I want to execute a non-MPI program "A" that is located in the /bin directory of my master node, but I want to run one instance of "A" on each of my diskless nodes. What is the syntax that equates to: #NP=1 "A" on node0 only #NP=1 "A" on node1 only #.... #.... From alvin at Maggie.Linux-Consulting.com Wed Apr 10 20:55:45 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:13 2009 Subject: Call for Papers In-Reply-To: <3CA9F2FC.7F39E715@dolphinics.com> Message-ID: hi tim am fairly sure St Louis in Missouri is MO for state initials thanx alvin http://www.Linux-1U.net .... 8 Drives in 1U chassis ... On Tue, 2 Apr 2002, Tim Wilcox wrote: > CALL FOR PAPERS > Workshop on High-Speed Local Networks (HSLN) > as part of the IEEE LCN conference > http://www.hcs.ufl.edu/hsln > http://www.ieeelcn.org > > November 6 - 8, 2002 > Embassy Suites USF, Tampa, Florida > > Fred Kuhns (fredk@arl.wustl.edu) > Washington University .... [ snipped ] > St. Louis, MI ^^^^^^^^^^^^^^^^^^^^^ > Michael McKee (mckee026@umn.edu) > University of Minnesota, Rochester > Rochester, MN > > Knut Omang (knuto@fast.no) > University of Oslo > Oslo, Norway > > Sarp Oral (oral@hcs.ufl.edu) > University of Florida > Gainesville, FL > > D. K. Panda (panda@cis.ohio-state.edu) > Ohio State University > Columbus, Ohio > > Anthony Skjellum (tony@MPI-SoftTech.Com) > Mississippi State University > Starkville, MS > .... From garcia_garcia_adrian at hotmail.com Thu Apr 4 09:48:29 2002 From: garcia_garcia_adrian at hotmail.com (Adrian Garcia Garcia) Date: Wed Nov 25 01:02:13 2009 Subject: DHCP Help Message-ID: An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020404/ca30ab16/attachment.html From rocky at atipa.com Thu Apr 4 11:43:58 2002 From: rocky at atipa.com (Rocky McGaugh) Date: Wed Nov 25 01:02:13 2009 Subject: commercial parallel libraries In-Reply-To: Message-ID: On Thu, 4 Apr 2002, Jayne Heger wrote: > > Hi, > > I know this is a beowulf list, but I could do with getting some info on any > (if there are) commercial parallel libraries, the equivalent of pvm and mpi. > > Do any of you know the names of any? > > Thanks. > > Jayne MPIPro is a commercial implementation of MPI. I've heard alot of good about their Win/32 implementation, but not as much about their normal unix MPI. Linda is another commercial parallel API that provides very good support and services. -- Rocky McGaugh Atipa Technologies rocky@atipatechnologies.com rmcgaugh@atipa.com 1-785-841-9513 x3110 http://1087800222/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' From wewu at oscar.eecs.tufts.edu Thu Apr 4 12:56:20 2002 From: wewu at oscar.eecs.tufts.edu (wewu@oscar.eecs.tufts.edu) Date: Wed Nov 25 01:02:13 2009 Subject: restrict node access Message-ID: We want to restrict regular users to access the nodes using rlogin or rsh or ssh in cluster, but still let PBS run the job. Does anybady have a better suggestion? We install oscar 1.2.1 on our cluster. Thanks From s02.sbecker at wittenberg.edu Thu Apr 4 15:47:53 2002 From: s02.sbecker at wittenberg.edu (s02.sbecker) Date: Wed Nov 25 01:02:13 2009 Subject: Scyld node boot problem Message-ID: <3C435198@smtp.wittenberg.edu> I am using version 27bz-8 for the Scyld disk, with kernel version 2.2.19-12.beo. I have a 3c905b card in the slave and a 3c905 in the master. I am getting to the third phase of the boot for the slave to where it outputs the log file. Then the node hangs. Here is the log file for node.0... node_up: Setting system clock. node_up: TODO set interface netmask. node_up: Configuring loopback interface. node_up: Configuring PCI devices. setup_fs: Configuring node filesystems... setup_fs: Using /etc/beowulf/fstab setup_fs: Checking /dev/ram3 (type=ext2)... setup_fs: Hmmm...This appears to be a ramdisk. setup_fs: I'm going to try to try checking the filesystem (fsck) anyway. setup_fs: If it is a RAM disk the following will fail harmlessly. e2fsck 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 Couldn't find ext2 superblock, trying backup blocks... e2fsck The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 : Bad magic number in super-block while trying to open /dev/ram3 setup_fs: FSCK failure. (OK for RAM disks) setup_fs: Creating ext2 on /dev/ram3... mke2fs 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults) setup_fs: Checking 192.168.1.1:/home (type=nfs)... setup_fs: Mounting 192.168.1.1:/home on /rootfs//home... (type=nfs; options=nolock) mount: 192.168.1.1:/home failed, reason given by server: Permission denied Failed to mount 192.168.1.1:/home on /home. Can someone help? Thanks. Shawn From sp at scali.com Thu Apr 4 18:08:32 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:13 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: On Thu, 4 Apr 2002, Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > Well this is a "touchy" topic since different people has different opinions. There is also different ways of measuring bandwidth mainly point to point (two machines talking together) and bisection (dividing your nework in half and let the one half talk to the other which kind of shows how the network scales with more nodes). Also some people like to talk about the hardware bandwidth and hardware latency, while the thing that really matters (IMHO) is application to application bandwidth and latency. I don't want to start a flamewar here, but I _think_ (not knowing real numbers for other high speed interconnects) that SCI has atleast the lowest latency and maybe also the highest point to point bandwidth : SCI application to application latency : 2.5 us SCI application to application bandwidth : 325 MByte/sec Note that these numbers are very chipset specific (as most high speed interconnect numbers are), these numbers are from IA64. Here are numbers from a popular IA32 platform, the AMD 760MPX : SCI application to application latency : 1.8 us SCI application to application bandwidth : 283 MByte/sec More "real" performance numbers using MPI over SCI (also collective and application benchmarks) can be located on Dolphin's homepage http://www.dolphinics.com Other popular high speed interconnects I know of is Myrinet (considered the main competitor to SCI for cluster interconnects) and Giganet. There are some performance numbers on Myricoms homepage (http://www.myricom.com) but I doubt if that is for their latest hardware generation (correct me if I'm wrong). Best regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From alvin at Maggie.Linux-Consulting.com Wed Apr 10 21:07:40 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:13 2009 Subject: Call for Papers - oops In-Reply-To: <20020411040416.0F7FE14093@smtp.x263.net> Message-ID: > hi tim > > am fairly sure St Louis in Missouri is MO for state initials - was hoping to help catch the typo before (??) it goes to the hard copy printers oopps... didnt mean for that to go the list... my apologies for bothering ya.... ( twice )... thanx alvin From suraj_peri at yahoo.com Sat Apr 6 03:35:45 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Wed Nov 25 01:02:13 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020405125956.D69845@velocet.ca> Message-ID: <20020406113545.91938.qmail@web10504.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From suraj_peri at yahoo.com Sat Apr 6 03:35:45 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Wed Nov 25 01:02:13 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020405125956.D69845@velocet.ca> Message-ID: <20020406113545.91938.qmail@web10504.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From sp at scali.com Thu Apr 4 18:08:32 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:13 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.2.20020404160051.00aff7a0@mail1.jpl.nasa.gov> Message-ID: On Thu, 4 Apr 2002, Jim Lux wrote: > What's high bandwidth? > What's low latency? > How much money do you want to spend? > > Ethernet is cheap, $100-$200/node for 100 Mbps or GBE (by the time you get > switches, cables, adapters, etc.) > Latency is kind of slow (compared to dedicated point to point links) > > Well this is a "touchy" topic since different people has different opinions. There is also different ways of measuring bandwidth mainly point to point (two machines talking together) and bisection (dividing your nework in half and let the one half talk to the other which kind of shows how the network scales with more nodes). Also some people like to talk about the hardware bandwidth and hardware latency, while the thing that really matters (IMHO) is application to application bandwidth and latency. I don't want to start a flamewar here, but I _think_ (not knowing real numbers for other high speed interconnects) that SCI has atleast the lowest latency and maybe also the highest point to point bandwidth : SCI application to application latency : 2.5 us SCI application to application bandwidth : 325 MByte/sec Note that these numbers are very chipset specific (as most high speed interconnect numbers are), these numbers are from IA64. Here are numbers from a popular IA32 platform, the AMD 760MPX : SCI application to application latency : 1.8 us SCI application to application bandwidth : 283 MByte/sec More "real" performance numbers using MPI over SCI (also collective and application benchmarks) can be located on Dolphin's homepage http://www.dolphinics.com Other popular high speed interconnects I know of is Myrinet (considered the main competitor to SCI for cluster interconnects) and Giganet. There are some performance numbers on Myricoms homepage (http://www.myricom.com) but I doubt if that is for their latest hardware generation (correct me if I'm wrong). Best regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From tony at MPI-SoftTech.Com Sun Apr 7 09:36:53 2002 From: tony at MPI-SoftTech.Com (Tony Skjellum) Date: Wed Nov 25 01:02:13 2009 Subject: Experience with GigE Switches with Jumbo packet support Message-ID: Any Beowulf folks out there have specific experience with switches that allow Jumbo packets? It seems hard to tell from online specs on various company pages whether a switch does this or not? Adapters seem to be readily available... Any clusters doing this right now? Thanks, Tony From s02.sbecker at wittenberg.edu Sun Apr 7 19:02:40 2002 From: s02.sbecker at wittenberg.edu (Shawn M Becker s02) Date: Wed Nov 25 01:02:13 2009 Subject: Scyld slave node boot problem Message-ID: <5.1.0.14.2.20020407220211.0196d080@mail.wittenberg.edu> I am using version 27bz-8 for the Scyld disk, with kernel version 2.2.19-12.beo. I have a 3c905b card in the slave and a 3c905 in the master. I am getting to the third phase of the boot for the slave to where it outputs the log file. Then the node hangs. Here is the log file for node.0... node_up: Setting system clock. node_up: TODO set interface netmask. node_up: Configuring loopback interface. node_up: Configuring PCI devices. setup_fs: Configuring node filesystems... setup_fs: Using /etc/beowulf/fstab setup_fs: Checking /dev/ram3 (type=ext2)... setup_fs: Hmmm...This appears to be a ramdisk. setup_fs: I'm going to try to try checking the filesystem (fsck) anyway. setup_fs: If it is a RAM disk the following will fail harmlessly. e2fsck 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 Couldn't find ext2 superblock, trying backup blocks... e2fsck The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 : Bad magic number in super-block while trying to open /dev/ram3 setup_fs: FSCK failure. (OK for RAM disks) setup_fs: Creating ext2 on /dev/ram3... mke2fs 1.20, 25-May-2001 for EXT2 FS 0.5b, 95/08/09 setup_fs: Mounting /dev/ram3 on /rootfs//... (type=ext2; options=defaults) setup_fs: Checking 192.168.1.1:/home (type=nfs)... setup_fs: Mounting 192.168.1.1:/home on /rootfs//home... (type=nfs; options=nolock) mount: 192.168.1.1:/home failed, reason given by server: Permission denied Failed to mount 192.168.1.1:/home on /home. Can someone help? Thanks. Shawn ~~~~~~~~~~~~~~~~~~~ Shawn Becker Wittenberg University 930 N. Fountain Springfield, OH 45504 (937) 360-7562 ~~~~~~~~~~~~~~~~~~~ From wheeler.mark at ensco.com Mon Apr 8 04:56:07 2002 From: wheeler.mark at ensco.com (Wheeler.Mark) Date: Wed Nov 25 01:02:13 2009 Subject: PG Compilers Message-ID: <8986151694190742869D08450EE4DCDE0CF298@amu-exch.ensco.win> We are running pgf77 version 3.2-4 on a Linux cluster. For a standard FORTRAN WRITE statement with IOSTAT=IOS, I am getting a value of 5. In the section B.4 (runtime error messages) of the PGI User's Guide, I do not see values less than 201. Does anyone know what this error means? How can I determine what is causing this error? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020408/13d77c93/attachment.html From ron_chen_123 at yahoo.com Mon Apr 8 08:01:27 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:13 2009 Subject: FreeBSD port of SGE (Compute farm system) In-Reply-To: <20020405042351.86759.qmail@web14706.mail.yahoo.com> Message-ID: <20020408150127.20897.qmail@web14708.mail.yahoo.com> Patch and output attached. Also, I already found 1 problem -- somewhere in execd. It affects the process' priority in SGEEE mode. However, I've not fixed it yet, I just want to release the current patch ASAP to let people try it out. -Ron --- Ron Chen wrote: > Hi, > > I compiled the source, changed a few parameters, and > SGE finally runs on FreeBSD. It is running in > single- > user mode, with only 1 host. I am doing a little > clean > up, and then I will need to make sure my changes do > not affect others (by "#ifdef BSD"). > > It still does not get the correct system information > yet, but some of the job accounting info is there > (at > least run time is correct 8-) ). > > It is now running for several hours, it looks > stable. It ran several tens of jobs. "qstat", > "qhost", "qacct", "qconf", "qdel" look fine, output > makes sense (but need to implement the resource info > collecting routines). > > I will post the patches tomorrow, together with some > output of the commands. (I will be busy today) > > Also, I will move the discussion from the hackers > list > to the cluster@freebsd list. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ -------------- next part -------------- Index: aimk =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/aimk,v retrieving revision 1.38 diff -u -6 -r1.38 aimk --- aimk 2002/02/22 13:23:59 1.38 +++ aimk 2002/04/05 17:54:10 @@ -1,7 +1,7 @@ -#!/bin/csh -fb +#!/bin/csh # # aimk # #___INFO__MARK_BEGIN__ ########################################################################## # @@ -78,12 +78,18 @@ case "crayts": set BUILDARCH = UNICOS_TS breaksw case "craytsieee": set BUILDARCH = UNICOS_TS_IEEE breaksw +case "darwin": + set BUILDARCH = DARWIN + breaksw +case "freebsd" + set BUILDARCH = FREEBSD + breaksw case "glinux": set BUILDARCH = LINUX6 breaksw case "hp10": set BUILDARCH = HP10 breaksw @@ -872,12 +878,97 @@ set GCC_NOERR_CXXFLAGS = "$CXXFLAGS" endif set SGE_NPROCS_CFLAGS = "$CFLAGS" breaksw + +case DARWIN: + set COMPILE_DC = 1 + if ( $USE_QMAKE == 0 ) then + set MAKE = make + endif + set OFLAG = "-O" + if ( "$CC" != insure ) then + set CC = cc + set CXX = c++ + else + set CFLAGS = "-Wno-error $CFLAGS" + set CXXFLAGS = "-Wno-error $CXXFLAGS" + set LIBS = "$LIBS" + endif + set DEPEND_FLAGS = "$CFLAGS $XMTINCD" + + set LD_LIBRARY_PATH = "/usr/lib" + + if ( $SHAREDLIBS == 1 ) then + set LIBEXT = ".dylib" + else + set LIBEXT = ".a" + endif + + set PTHRDSFLAGS = "-D_REENTRANT -D__USE_REENTRANT" + + if ( $DEBUGGED == 1) then + set DEBUG_FLAG = "-ggdb $INSURE_FLAG" + endif + if ( $GPROFFED == 1) then + set DEBUG_FLAG = "$DEBUG_FLAG -pg" + endif + + set ARFLAGS = rcv + set CFLAGS = "$OFLAG -Wall -Werror -D$BUILDARCH $DEBUG_FLAG $CFLAGS" + set CXXFLAGS = "$OFLAG -Werror -Wstrict-prototypes -D$BUILDARCH $DEBUG_FLAG $CXXFLAGS" + set NOERR_CFLAG = "-Wno-error" + set GCC_NOERR_CFLAGS = "$CFLAGS -Wno-error" + set GCC_NOERR_CXXFLAGS = "$CXXFLAGS -Wno-error" + set LFLAGS = "$DEBUG_FLAG $LFLAGS" + set LIBS = "$LIBS" + set RANLIB = "ranlib" + set XMTDEF = "" + set XINCD = "$XMTINCD $XINCD -I/usr/X11R6/include" + set XCFLAGS = "-Wno-strict-prototypes -Wno-error $XMTDEF $XINCD" + set XLIBD = "-L/usr/X11R6/lib" + set XLFLAGS = "$XLIBD" + set XLIBS = "-lXm -lXpm -lXt -lXext -lX11 -lSM -lICE -lXp" + + set SGE_NPROCS_CFLAGS = "$CFLAGS" + + breaksw + +case FREEBSD: + set COMPILE_DC = 1 + set MAKE = make + set OFLAG = "-O" + set ARFLAGS = rcv + if ( "$CC" != insure ) then + set CC = gcc + set CXX = g++ + else + set CFLAGS = "-Wno-error $CFLAGS" + set CXXFLAGS = "-Wno-error $CXXFLAGS" + set LIBS = "$LIBS" + endif + set DEPEND_FLAGS = "$CFLAGS $XMTINCD" + set PTHRDSFLAGS = "-D_REENTRANT -D__USE_REENTRANT" + set CFLAGS = "$OFLAG -Wall -D$BUILDARCH $DEBUG_FLAG $CFLAGS -I/usr/X11R6/include" + set CXXFLAGS = "$OFLAG -Wstrict-prototypes -D$BUILDARCH $DEBUG_FLAG $CXXFLAGS" + set NOERR_CFLAG = "-Wno-error" + set GCC_NOERR_CFLAGS = "$CFLAGS -Wno-error" + set GCC_NOERR_CXXFLAGS = "$CXXFLAGS -Wno-error" + set LFLAGS = "$DEBUG_FLAG $LFLAGS" + set LIBS = "$LIBS" + set XMTDEF = "" + set XINCD = "$XMTINCD $XINCD -I/usr/X11/include" + set XCFLAGS = "-Wno-strict-prototypes -Wno-error $XMTDEF $XINCD" + set XLIBD = "-L/usr/X11R6/lib" + set XLFLAGS = "$XLIBD" + set XLIBS = "-Xlinker -Bstatic -lXm -Xlinker -Bdynamic -lXpm -lXt -lXext -lX11 -lSM -lICE -lXp" + + set SGE_NPROCS_CFLAGS = "$CFLAGS" + breaksw case IRIX6*: set COMPILE_DC = 1 set ARCH = $IRIX_ARCHDEF #if (`hostname` != DWAIN) then # set MAKE = make Index: 3rdparty/sge_depend/Makefile =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/3rdparty/sge_depend/Makefile,v retrieving revision 1.1.1.1 diff -u -6 -r1.1.1.1 Makefile --- 3rdparty/sge_depend/Makefile 2001/07/18 11:06:07 1.1.1.1 +++ 3rdparty/sge_depend/Makefile 2002/04/05 17:54:11 @@ -53,11 +53,14 @@ ifparser.o: $(DEP_DIR)/ifparser.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/ifparser.c cppsetup.o: $(DEP_DIR)/cppsetup.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/cppsetup.c -include.o: $(DEP_DIR)/include.c +include.o: $(DEP_DIR)/include.c + @echo "CFLAGS" : $(CFLAGS) + @echo "MAIN_DEFINES" : $(MAIN_DEFINES) + @echo "DEP_DIR" : $(DEP_DIR) $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/include.c pr.o: $(DEP_DIR)/pr.c $(CC) -c $(CFLAGS) $(MAIN_DEFINES) $(DEP_DIR)/pr.c Index: daemons/common/pdc.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/common/pdc.c,v retrieving revision 1.4 diff -u -6 -r1.4 pdc.c --- daemons/common/pdc.c 2002/02/24 13:41:30 1.4 +++ daemons/common/pdc.c 2002/04/05 17:54:11 @@ -114,13 +114,13 @@ #include #include #include #include "sge_stat.h" #endif -#if defined(LINUX) || defined(ALPHA) || defined(IRIX6) || defined(SOLARIS) +#if defined(LINUX) || defined(ALPHA) || defined(IRIX6) || defined(SOLARIS) || defined(FREEBSD) #include "sge_os.h" #endif #if defined(IRIX6) # define F64 "%lld" # define S64 "%lli" @@ -2041,13 +2041,13 @@ static time_t start_time; int psStartCollector(void) { static int initialized = 0; - int ncpus; + int ncpus = 0; #if defined(ALPHA) int start=0; #endif if (initialized) @@ -2069,13 +2069,13 @@ sysdata.sys_length = sizeof(sysdata); /* page size */ pagesize = getpagesize(); /* retrieve static parameters */ -#if defined(LINUX) || defined(ALINUX) || defined(IRIX6) || defined(SOLARIS) +#if defined(LINUX) || defined(ALINUX) || defined(IRIX6) || defined(SOLARIS) || defined(FREEBSD) ncpus = sge_nprocs(); #elif defined(ALPHA) { /* Number of CPUs */ ncpus = sge_nprocs(); #ifdef PDC_STANDALONE Index: daemons/common/procfs.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/common/procfs.c,v retrieving revision 1.3 diff -u -6 -r1.3 procfs.c --- daemons/common/procfs.c 2002/02/24 13:41:30 1.3 +++ daemons/common/procfs.c 2002/04/05 17:54:11 @@ -47,13 +47,15 @@ #include #endif #include #include #include -#include +#if 0 + #include +#endif #include #include #include #include #if defined(ALPHA) Index: daemons/execd/exec_job.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/execd/exec_job.c,v retrieving revision 1.20 diff -u -6 -r1.20 exec_job.c --- daemons/execd/exec_job.c 2002/02/24 13:41:34 1.20 +++ daemons/execd/exec_job.c 2002/04/05 17:54:12 @@ -408,13 +408,13 @@ static const char *get_sharedlib_path_name(void) { #if defined(AIX4) return "LIBPATH"; #elif defined(HP10) || defined(HP11) return "SHLIB_PATH"; -#elif defined(ALPHA) || defined(IRIX6) || defined(IRIX65) || defined(LINUX) || defined(SOLARIS) +#elif defined(ALPHA) || defined(IRIX6) || defined(IRIX65) || defined(LINUX) || defined(SOLARIS) ||defined(FREEBSD) return "LD_LIBRARY_PATH"; #else #error "don't know how to set shared lib path on this architecture" return NULL; /* never reached */ #endif } Index: daemons/execd/ptf.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/execd/ptf.c,v retrieving revision 1.15 diff -u -6 -r1.15 ptf.c --- daemons/execd/ptf.c 2002/02/24 13:41:35 1.15 +++ daemons/execd/ptf.c 2002/04/05 17:54:12 @@ -272,13 +272,13 @@ * static osjobid_t - os job id (job id / ash / supplementary gid) ******************************************************************************/ static osjobid_t ptf_get_osjobid(lListElem *osjob) { osjobid_t osjobid; -#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) +#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) && !defined(FREEBSD) osjobid = lGetUlong(osjob, JO_OS_job_ID2); osjobid = (osjobid << 32) + lGetUlong(osjob, JO_OS_job_ID); #else @@ -302,13 +302,13 @@ * INPUTS * lListElem *osjob - element of type JO_Type * osjobid_t osjobid - os job id (job id / ash / supplementary gid) ******************************************************************************/ static void ptf_set_osjobid(lListElem *osjob, osjobid_t osjobid) { -#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) +#if !defined(LINUX) && !defined(SOLARIS) && !defined(ALPHA5) && !defined(NECSX4) && !defined(NECSX5) && !defined(FREEBSD) lSetUlong(osjob, JO_OS_job_ID2, ((u_osjobid_t) osjobid) >> 32); lSetUlong(osjob, JO_OS_job_ID, osjobid & 0xffffffff); #else @@ -907,13 +907,13 @@ { lListElem *job, *osjob = NULL; lCondition *where; DENTER(TOP_LAYER, "ptf_get_job_os"); -#if defined(LINUX) || defined(SOLARIS) || defined(ALPHA5) || defined(NECSX4) || defined(NECSX5) +#if defined(LINUX) || defined(SOLARIS) || defined(ALPHA5) || defined(NECSX4) || defined(NECSX5) || defined(FREEBSD) where = lWhere("%T(%I == %u)", JO_Type, JO_OS_job_ID, (u_long32) os_job_id); #else where = lWhere("%T(%I == %u && %I == %u)", JO_Type, JO_OS_job_ID, (u_long) (os_job_id & 0xffffffff), JO_OS_job_ID2, (u_long) (((u_osjobid_t) os_job_id) >> 32)); #endif Index: daemons/shepherd/setrlimits.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/daemons/shepherd/setrlimits.c,v retrieving revision 1.5 diff -u -6 -r1.5 setrlimits.c --- daemons/shepherd/setrlimits.c 2002/02/24 13:41:43 1.5 +++ daemons/shepherd/setrlimits.c 2002/04/05 17:54:12 @@ -45,14 +45,19 @@ #endif #if defined(HP10_01) || defined(HPCONVEX) # define _KERNEL #endif -#include +#if defined(FREEBSD) +#include +#endif +#if 0 +#include +#endif #if defined(HP10_01) || defined(HPCONVEX) # undef _KERNEL #endif #if defined(IRIX6) # define RLIMIT_STRUCT_TAG rlimit64 @@ -403,13 +408,13 @@ /* hard limit must be greater or equal to soft limit */ if (rlp->rlim_max < rlp->rlim_cur) rlp->rlim_cur = rlp->rlim_max; #if defined(LINUX) || ( defined(SOLARIS) && !defined(SOLARIS64) ) || defined(NECSX4) || defined(NECSX5) # define limit_fmt "%ld" -#elif defined(IRIX6) || defined(HP11) || defined(HP10) +#elif defined(IRIX6) || defined(HP11) || defined(HP10) || defined(FREEBSD) # define limit_fmt "%lld" #elif defined(ALPHA) || defined(SOLARIS64) # define limit_fmt "%lu" #else # define limit_fmt "%d" #endif Index: dist/util/arch =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/dist/util/arch,v retrieving revision 1.7 diff -u -6 -r1.7 arch --- dist/util/arch 2002/01/29 14:58:56 1.7 +++ dist/util/arch 2002/04/05 17:54:12 @@ -44,12 +44,32 @@ # PATH=/bin:/usr/bin:/usr/sbin ARCH=UNKNOWN +if [ -x /usr/bin/uname ]; then + os="`/usr/bin/uname -s`" + ht="`/usr/bin/uname -m`" + osht="$os,$ht" + case $osht in + Darwin,*) + ARCH=darwin + ;; + FreeBSD,*) + ARCH=freebsd + ;; + OpenBSD,*) + ARCH=freebsd + ;; + NetBSD,*) + ARCH=freebsd + ;; + esac +fi + if [ -x /bin/uname ]; then os="`/bin/uname -s`" ht="`/bin/uname -m`" osht="$os,$ht" case $osht in SUPER-UX,SX-4*) Index: libs/comm/commlib.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/comm/commlib.c,v retrieving revision 1.7 diff -u -6 -r1.7 commlib.c --- libs/comm/commlib.c 2002/02/27 08:14:45 1.7 +++ libs/comm/commlib.c 2002/04/05 17:54:13 @@ -2063,12 +2063,14 @@ sigdelset(&mask, SIGILL); sigdelset(&mask, SIGQUIT); sigdelset(&mask, SIGURG); sigdelset(&mask, SIGIO); sigdelset(&mask, SIGSEGV); sigdelset(&mask, SIGFPE); + +#define SIGCLD SIGCHLD /* Same as SIGCHLD (System V). */ sigaddset(&mask, SIGCLD); sigprocmask(SIG_SETMASK, &mask, NULL); return omask; } #endif Index: libs/rmon/rmon_semaph.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/rmon/rmon_semaph.c,v retrieving revision 1.2 diff -u -6 -r1.2 rmon_semaph.c --- libs/rmon/rmon_semaph.c 2001/07/20 08:21:38 1.2 +++ libs/rmon/rmon_semaph.c 2002/04/05 17:54:13 @@ -53,13 +53,13 @@ #include "msg_rmon.h" #define BIGCOUNT 10000 /* initial value of process counter */ /* * Define the semaphore operation arrays for the semop() calls. */ -#if defined(bsd4_2) || defined(MACH) || defined(__hpux) || defined(_AIX) || defined(SOLARIS) || defined(SINIX) || (defined(LINUX) && defined(_SEM_SEMUN_UNDEFINED)) +#if defined(bsd4_2) || defined(MACH) || defined(__hpux) || defined(_AIX) || defined(SOLARIS) || defined(SINIX) || (defined(LINUX) && defined(_SEM_SEMUN_UNDEFINED)) union semun { int val; /* value for SETVAL */ struct semid_ds *buf; /* buffer for IPC_STAT & IPC_SET */ ushort *array; /* array for GETALL & SETALL */ }; Index: libs/sched/sort_hosts.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/sched/sort_hosts.c,v retrieving revision 1.6 diff -u -6 -r1.6 sort_hosts.c --- libs/sched/sort_hosts.c 2001/12/17 15:09:38 1.6 +++ libs/sched/sort_hosts.c 2002/04/05 17:54:13 @@ -31,16 +31,12 @@ /*___INFO__MARK_END__*/ #include #include #include #include -#ifndef WIN32 -# include -#endif - #include "sgermon.h" #include "sge.h" #include "sge_gdi_intern.h" #include "cull.h" #include "sge_all_listsL.h" #include "sge_parse_num_par.h" Index: libs/uti/sge_arch.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_arch.c,v retrieving revision 1.10 diff -u -6 -r1.10 sge_arch.c --- libs/uti/sge_arch.c 2002/02/24 13:41:51 1.10 +++ libs/uti/sge_arch.c 2002/04/05 17:54:13 @@ -85,20 +85,22 @@ #elif defined(ALINUX) # define ARCHBIN "alinux" #elif defined(LINUX5) # define ARCHBIN "linux" #elif defined(LINUX6) # define ARCHBIN "glinux" +#elif defined(FREEBSD) +# define ARCHBIN "freebsd" #elif defined(SLINUX) # define ARCHBIN "slinux" #elif defined(CRAY) # if defined(CRAYTSIEEE) # define ARCHBIN "craytsieee" -# elif defined(CRAYTS) +#elif defined(CRAYTS) # define ARCHBIN "crayts" -# else +#else # define ARCHBIN "cray" # endif #elif defined(NECSX4) # define ARCHBIN "necsx4" #elif defined(NECSX5) # define ARCHBIN "necsx5" Index: libs/uti/sge_getloadavg.c =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_getloadavg.c,v retrieving revision 1.6 diff -u -6 -r1.6 sge_getloadavg.c --- libs/uti/sge_getloadavg.c 2002/02/24 13:41:54 1.6 +++ libs/uti/sge_getloadavg.c 2002/04/05 17:54:13 @@ -600,12 +600,56 @@ } #endif DEXIT; return cpu_load; } + +#elif defined(FREEBSD) + +double get_cpu_load() +{ + return 0.0; +} + #elif defined(LINUX) static char* skip_token( char *p ) { while (isspace(*p)) { @@ -833,12 +877,38 @@ loadavg[2] /= cpus; return 3; } else { return -1; } } +#elif defined(FREEBSD) + +static int get_load_avg( +double loadavg[], +int nelem +) { + + return 0; + +} #elif defined(LINUX) static int get_load_avg( double loadv[], int nelem @@ -1075,13 +1145,13 @@ int nelem ) { int elem = 0; #if defined(SOLARIS64) elem = getloadavg(loadavg, nelem); /* <== library function */ -#elif (defined(SOLARIS) && !defined(SOLARIS64)) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(CRAY) || defined(NECSX4) || defined(NECSX5) || defined(LINUX) +#elif (defined(SOLARIS) && !defined(SOLARIS64)) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(CRAY) || defined(NECSX4) || defined(NECSX5) || defined(LINUX) ||defined(FREEBSD) elem = get_load_avg(loadavg, nelem); #else elem = -1; #endif if (elem != -1) { elem = nelem; Index: libs/uti/sge_getloadavg.h =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/libs/uti/sge_getloadavg.h,v retrieving revision 1.3 diff -u -6 -r1.3 sge_getloadavg.h --- libs/uti/sge_getloadavg.h 2001/10/20 14:47:28 1.3 +++ libs/uti/sge_getloadavg.h 2002/04/05 17:54:13 @@ -29,17 +29,17 @@ * * All Rights Reserved. * ************************************************************************/ /*___INFO__MARK_END__*/ -#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(CRAY) || defined(NEXSX4) || defined(NECSX5) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) -# define SGE_LOADAVG +#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(CRAY) || defined(NEXSX4) || defined(NECSX5) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(FREEBSD) +#define SGE_LOADAVG #endif -#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) +#if defined(LINUX) || defined(SOLARIS) || defined(SOLARIS64) || defined(ALPHA4) || defined(ALPHA5) || defined(IRIX6) || defined(HP10) || defined(HP11) || defined(FREEBSD) # define SGE_LOADCPU #endif #ifdef SGE_LOADAVG int sge_getloadavg(double loadavg[], int nelem); Index: scripts/distinst =================================================================== RCS file: /usr/local/tigris/data/helm/cvs/repository/gridengine/source/scripts/distinst,v retrieving revision 1.21 diff -u -6 -r1.21 distinst --- scripts/distinst 2002/01/28 08:57:01 1.21 +++ scripts/distinst 2002/04/05 17:54:13 @@ -52,23 +52,23 @@ HASDIR="ckpt doc examples/jobs locale mpi pvm qmon/PIXMAPS/big qmon/locale" HASARCHDIR="bin lib examples/jobsbin utilbin" DEFAULTPROG="sge_qmaster sge_execd sge_shadowd sge_commd sge_schedd \ sge_shepherd sge_coshepherd qstat qsub qalter qconf qdel \ - qacct qmod qsh commdcntl utilbin jobs qmon qhold qrls qhost \ - qmake qtcsh" + qacct qmod qsh commdcntl utilbin jobs qhold qrls qhost \ + " UTILITYBINARIES="uidgid gethostname gethostbyname gethostbyaddr \ getservbyname filestat checkprog loadcheck now checkuser \ adminrun qrsh_starter testsuidroot openssl" REMOTEBINARIES="rsh rshd rlogin" SUPPORTEDARCHS="aix42 aix43 alinux cray crayts craytsieee glinux hp10 \ -hp11 irix6 necsx4 necsx5 slinux solaris solaris64 solaris86 osf4 tru64" +hp11 irix6 necsx4 necsx5 slinux solaris solaris64 solaris86 osf4 tru64 freebsd" #SGEEE_UTILITYBINARIES="sge_share_mon sge_host_mon" SGEEE_UTILITYBINARIES="sge_share_mon" JOBBINARIES="work" @@ -161,12 +161,14 @@ elif [ $i = hp11 ]; then ARCHBIN=HP11 elif [ $i = irix6 ]; then ARCHBIN=IRIX6 elif [ $i = glinux ]; then ARCHBIN=LINUX6 + elif [ $i = freebsd ]; then + ARCHBIN=FREEBSD elif [ $i = alinux ]; then ARCHBIN=ALINUX elif [ $i = slinux ]; then ARCHBIN=SLINUX elif [ $i = osf4 ]; then ARCHBIN=ALPHA4 @@ -655,143 +657,13 @@ if [ $instexamples = true ]; then echo Installing \"examples/jobs\" Execute rm -f $DEST_SGE_ROOT/examples/jobs/* Execute cp dist/examples/jobs/*.sh $DEST_SGE_ROOT/examples/jobs fi - if [ $instqmon = true ]; then - echo Copying Pixmaps and Qmon resource file - - Execute rm -f $DEST_SGE_ROOT/qmon/PIXMAPS/*.xpm - Execute rm -f $DEST_SGE_ROOT/qmon/PIXMAPS/big/*.xpm - Execute cp dist/qmon/PIXMAPS/small/*.xpm $DEST_SGE_ROOT/qmon/PIXMAPS - Execute cp dist/qmon/PIXMAPS/big/toolbar*.xpm $DEST_SGE_ROOT/qmon/PIXMAPS/big - - Execute chmod 644 $DEST_SGE_ROOT/qmon/PIXMAPS/*.xpm - Execute chmod 644 $DEST_SGE_ROOT/qmon/PIXMAPS/big/*.xpm - - Execute cp dist/qmon/Qmon $DEST_SGE_ROOT/qmon/Qmon - Execute chmod 644 $DEST_SGE_ROOT/qmon/Qmon - - Execute cp dist/qmon/qmon_help.ad $DEST_SGE_ROOT/qmon - Execute chmod 644 $DEST_SGE_ROOT/qmon/qmon_help.ad - - ( echo changing to $DEST_SGE_ROOT/qmon/PIXMAPS ; \ - cd $DEST_SGE_ROOT/qmon/PIXMAPS; \ - echo ln -s intro-sge.xpm intro.xpm; \ - ln -s intro-sge.xpm intro.xpm; \ - echo ln -s logo-sge.xpm logo.xpm; \ - ln -s logo-sge.xpm logo.xpm \ - ) - fi - - if [ $instpvm = true ]; then - echo Installing \"pvm\" - Execute rm -rf $DEST_SGE_ROOT/pvm/* - Execute mkdir $DEST_SGE_ROOT/pvm/src - - for f in $PVMSCRIPTS; do - Execute cp dist/pvm/$f $DEST_SGE_ROOT/pvm - done - chmod 755 $DEST_SGE_ROOT/pvm/*.sh - - for f in $PVMSOURCES; do - Execute cp dist/pvm/src/$f $DEST_SGE_ROOT/pvm/src - done - - for f in $PVMSRCSCRIPTS; do - Execute cp dist/pvm/src/$f $DEST_SGE_ROOT/pvm/src - chmod 755 $DEST_SGE_ROOT/pvm/src/$f - done - fi - - if [ $instmpi = true ]; then - echo Installing \"mpi/\" - rm -rf $DEST_SGE_ROOT/mpi/* - for f in $MPIFILES; do - Execute cp dist/mpi/$f $DEST_SGE_ROOT/mpi - done - chmod 755 $DEST_SGE_ROOT/mpi/*.sh $DEST_SGE_ROOT/mpi/hostname $DEST_SGE_ROOT/mpi/rsh - - HPCBASE=mpi/sunhpc/loose-integration - Execute mkdir -p $DEST_SGE_ROOT/$HPCBASE/accounting - - for f in $SUNHPC_FILES; do - Execute cp dist/$HPCBASE/$f $DEST_SGE_ROOT/$HPCBASE - Execute chmod 644 $DEST_SGE_ROOT/$HPCBASE/$f - done - - for f in $SUNHPC_SCRIPTS; do - Execute cp dist/$HPCBASE/$f $DEST_SGE_ROOT/$HPCBASE - Execute chmod 755 $DEST_SGE_ROOT/$HPCBASE/$f - done - - for f in $SUNHPCACCT_FILES; do - Execute cp dist/$HPCBASE/accounting/$f $DEST_SGE_ROOT/$HPCBASE/accounting - Execute chmod 644 $DEST_SGE_ROOT/$HPCBASE/accounting/$f - done - - for f in $SUNHPCACCT_SCRIPTS; do - Execute cp dist/$HPCBASE/accounting/$f $DEST_SGE_ROOT/$HPCBASE/accounting - Execute chmod 755 $DEST_SGE_ROOT/$HPCBASE/accounting/$f - done - fi - - if [ $instman = true ]; then - echo Installing \"man/\" and \"catman/\" - Execute rm -rf $DEST_SGE_ROOT/man $DEST_SGE_ROOT/catman - Execute cp -r MANSBUILD_$SGE_PRODUCT_MODE/SEDMAN/man $DEST_SGE_ROOT - Execute cp -r MANSBUILD_$SGE_PRODUCT_MODE/ASCMAN/catman $DEST_SGE_ROOT - fi - - if [ $instdoc = true ]; then - echo Installing \"doc/\" - echo " --> PS and PDF files" - Execute rm -rf $DEST_SGE_ROOT/doc - Execute mkdir $DEST_SGE_ROOT/doc - Execute cp $MANUALPDF $DEST_SGE_ROOT/doc/SGE53beta2_doc.pdf - fi - # this rule must come *after* the "instdoc" rule - # - if [ $insttxtdoc = true ]; then - echo "Installing README, INSTALL ... files" - Execute cp ../doc/*.asc $DEST_SGE_ROOT/doc - Execute cp ../doc/INSTALL $DEST_SGE_ROOT/doc - Execute cp ../doc/UPGRADE-2-53 $DEST_SGE_ROOT/doc/UPGRADE - Execute chmod 644 $DEST_SGE_ROOT/doc/* - fi - - if [ $instckpt = true ]; then - echo Installing \"ckpt/\" - Execute rm -rf $DEST_SGE_ROOT/ckpt/* - cp dist/ckpt/* $DEST_SGE_ROOT/ckpt - chmod 755 $DEST_SGE_ROOT/ckpt/*_command - fi - - if [ $instlocale = true ]; then - echo "Installing \"locale/\" and \"qmon/locale/\"" - Execute cp -r locale/* $DEST_SGE_ROOT/locale - Execute rm -rf $DEST_SGE_ROOT/qmon/locale/* - Execute cp -r dist/qmon/locale/* $DEST_SGE_ROOT/qmon/locale - fi - - if [ $instsec = true ]; then - echo Installing \"security\" modules - Execute mkdir -p $DEST_SGE_ROOT/security - for f in $SECFILES; do - Execute cp $f $DEST_SGE_ROOT/security - fb=`basename $f` - if [ -x $DEST_SGE_ROOT/security/$fb ]; then - chmod 755 $DEST_SGE_ROOT/security/$fb - else - chmod 644 $DEST_SGE_ROOT/security/$fb - fi - done - Execute ln -s gss_customer.html $DEST_SGE_ROOT/security/README.html - fi # Set file and directory permissions to 755/644 and owner to 0.0 if [ $setfileperm = true ]; then echo Setting file permissions SetFilePerm $DEST_SGE_ROOT fi @@ -820,13 +692,13 @@ echo "Installing binaries for $i from `pwd` -->" echo " --> $DEST_SGE_ROOT/bin/$i" echo ------------------------------------------------------------------------ for prog in $PROG; do case $prog in - jobs|ckpt|locale|doc|inst_sge|utiltree|examples|man|mpi|pvm|qmontree|common|distcommon|utilbin) + jobs|ckpt|locale|doc|inst_sge|utiltree|examples|man|mpi|pvm|common|distcommon|utilbin) : ;; qmake) echo Installing qmake Install 0.0 755 ../3rdparty/qmake/$ARCHBIN/make $DEST_SGE_ROOT/${UTILPREFIX}/$DSTARCH/qmake ;; -------------- next part -------------- > qstat > qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS -------------------------------------------------------------------------------- global - - - - - - - host1 freebsd 1 0.00 - - - - > cat s #!/bin/sh sleep 10 echo "Hello" exit 2 > qsub s your job 11 ("s") has been submitted > qstat job-ID prior name user state submit/start at queue master ja-task-ID --------------------------------------------------------------------------------------------- 11 0 s ron qw 04/05/2002 12:04:07 > cat s.o11 Hello > qacct -j 11 ============================================================== qname host1.q hostname host1 group UNKNOWN owner ron jobname s jobnumber 11 taskid undefined account sge priority 0 qsub_time Fri Apr 5 12:04:07 2002 start_time Fri Apr 5 12:04:10 2002 end_time Fri Apr 5 12:04:20 2002 granted_pe none slots 1 failed 0 exit_status 2 ru_wallclock 10 ru_utime 0 ru_stime 0 ru_maxrss 916 ru_ixrss 808 ru_ismrss 0 ru_idrss 488 ru_isrss 256 ru_minflt 361 ru_majflt 0 ru_nswap 0 ru_inblock 0 ru_oublock 1 ru_msgsnd 17 ru_msgrcv 17 ru_nsignals 5 ru_nvcsw 29 ru_nivcsw 5 cpu 0 mem 0.000 io 0.000 iow 0.000 maxvmem 0.000000 From christon at pluto.dsu.edu Tue Apr 9 07:40:15 2002 From: christon at pluto.dsu.edu (Christoffersen, Neils) Date: Wed Nov 25 01:02:13 2009 Subject: Beowulf -- confirmation of subscription -- request 851822 Message-ID: <0718ABB23368D2119FC200008362AF6816BDD7@pluto.dsu.edu> -----Original Message----- From: beowulf-request@beowulf.org To: christon@pluto.dsu.edu Sent: 4/9/02 9:35 AM Subject: Beowulf -- confirmation of subscription -- request 851822 Beowulf -- confirmation of subscription -- request 851822 We have received a request from 138.247.172.98 for subscription of your email address, , to the beowulf@beowulf.org mailing list. To confirm the request, please send a message to beowulf-request@beowulf.org, and either: - maintain the subject line as is (the reply's additional "Re:" is ok), - or include the following line - and only the following line - in the message body: confirm 851822 (Simply sending a 'reply' to this message should work from most email interfaces, since that usually leaves the subject line in the right form.) If you do not wish to subscribe to this list, please simply disregard this message. Send questions to beowulf-admin@beowulf.org. From eugen at leitl.org Wed Apr 10 03:52:30 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:13 2009 Subject: CCL:parallel quantum solutions (fwd) Message-ID: ---------- Forwarded message ---------- Date: Tue, 09 Apr 2002 10:07:54 -0400 From: David J Giesen To: Dr. Bill Davis , CHEMISTRY@ccl.net Subject: CCL:parallel quantum solutions Bill - This is a long post, but I think vendors that do excellent jobs should get pats on the back. I am not a PQS employee nor do I receive new-customer kick-backs. Those not interested in a review of PQS products can hit delete now. I've been a very satisfied PQS customer since Aug. 2000. We purchased an 8-processor Linux cluster at that point, and several months later we were so pleased we bought a second. At the end of last year, we contracted with them to build a 34-processor cluster for general computational (not just chemistry) use. Hardware performance : The setup they use works well for running serial or parallel codes, and I have PQS, Jaguar and Gaussian (using LINDA) running in parallel on them. Based on timings against other machines/platforms, the PQS machines perform as well as could be expected. Our 1.2 GHz athlon PQS machine runs G98 slightly (~10%) faster than the latest-and-greatest-just-off-the-design-sheet Sun hardware, and 2-3 times faster than our SGI 194 MHz R10000. It is ~10% slower than a 1.5GHz P4. Both the Athlon and P4 machine used PIII optimized blas libraries... Software performance : the PQS software 'is what it is'. If you are interested in mainly HF, MP2 and DFT computations, it is very good. You can see its capabilities on their website. Speed-wise, it runs faster than other codes I use, although it is not faster than Jaguar's pseudo-spectral methods. The geometry optimizer is rock solid dependable as one would expect from code by Pulay and Baker. PQS uses PVM for parallel execution. Without getting into a debate about parallel paradigms, I'll say simply this: in our hands, I have never had a PVM job die because of inter-process communication problems while MPICH/MPI is very flaky and tends to die on about 10-25% of chemistry jobs (even more for systems using automount) independent of linux, sun or SGI. Because PQS uses PVM to set up the parallel system only once per job, there is less parallel overhead using PQS than with other codes that set up LINDA parallel systems at every SCF and geometry optimization step - although for large jobs, these both essentially go to zero. Support : PQS is a small company, and the support shows it. They have absolutely bent over backwards each time we have had an issue, and dealing with them is always a pleasure. We have not had a hardware or PQS software issue that they haven't resolved to our satisfaction. In fairness, you can't expect a 24-hour help line or technicians in suits to fly in and fix problems. You should be aware that they are not in the business of selling/supporting Linux or Gnu software, so some problems you have on your machine if you veer off the PQS path might be technically out of their scope. However, in my experience, they make every honest effort to solve those as well (and usually do). Every machine they shipped us has been stress-tested by an expert for a number of days before they are delivered. Ease of use : The machines come setup to run the PQS chemistry code out of the box. If you are planning on running one PQS parallel job at a time across the whole cluster or multiple serial jobs, the included DQS (not associated with PQS) queuing system works OK. Running multiple parallel codes/jobs on the same cluster through the queue does not work well. Running other codes in parallel through the queue takes some hard work. Setting up other parallel codes also takes some work. This is not really a function of PQS, however, and you'll find this is true no matter what machine you get. Disclaimer : This e-mail does not in any way imply an 'official Kodak' stance, it is merely the personal opinion of a Kodak employee who uses PQS products at work. Dave Dr. Bill Davis wrote: > > Hi! > > Does anyone have any experience with the PQS hardware/software > combination, more specifically the QS4-1800S? Any comments on ease of > use, support and any other important points would be greatly > appreciated...Thanks! > > Bill > > > -- > ********************************** > Dr. William M. Davis > Assistant Professor of Chemistry/ > Phi Theta Kappa Advisor > Dept. of Chemistry and Environmental Science > University of Texas at Brownsville > 80 Fort Brown > Brownsville, TX 78520 > Phone: (956) 574-6646 > Fax: (956) 574-6692 > WWW: unix.utb.edu/~bdavis > ********************************** > > -- Dr. David J. Giesen Eastman Kodak Company david.giesen@kodak.com 2/83/RL MC 02216 (ph) 1-585-58(8-0480) Rochester, NY 14650 (fax)1-585-588-1839 -= This is automatically added to each message by mailing script =- CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From fraser5 at cox.net Wed Apr 10 05:24:05 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:13 2009 Subject: MPICH: works for users not root? Message-ID: <003801c1e08a$9b6d8840$0300005a@papabear> I have had this pesky problem of running mpi using the bash shell and trying to figure out how to get it to work for root. It works fine for all the users but not root. As root I can rsh to any node ok but if I do a rsh node2 -n true then I get a permission denied. Again, it works for normal users. I have gutted the .bashrc and /etc/bashrc scripts and the .rhosts seem ok. what could be the problem? (Linux 7.2) thanks jim From rastapoppolous at yahoo.com Wed Apr 10 22:29:24 2002 From: rastapoppolous at yahoo.com (k r) Date: Wed Nov 25 01:02:13 2009 Subject: Scyld and mpi fasta Makefile Problems Message-ID: <20020411052924.33098.qmail@web9009.mail.yahoo.com> hello all, I can't seem to get the included Makefile (Makefile.mpi4) for FASTA to compile. When i compile i get the following error. mm_file.h:25: conflicting types for `int64_t' types.h:172: previous declaration of `int64_t' I did not make any changes to the Makefile. any help is appreciated. Thanks, Kart __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From siegert at sfu.ca Wed Apr 10 22:43:28 2002 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:02:13 2009 Subject: MPICH: works for users not root? In-Reply-To: <003801c1e08a$9b6d8840$0300005a@papabear>; from fraser5@cox.net on Wed, Apr 10, 2002 at 08:24:05AM -0400 References: <003801c1e08a$9b6d8840$0300005a@papabear> Message-ID: <20020410224328.A19551@stikine.ucs.sfu.ca> On Wed, Apr 10, 2002 at 08:24:05AM -0400, Jim Fraser wrote: > I have had this pesky problem of running mpi using the bash shell and trying > to figure out how to get it to work for root. It works fine for all the > users but not root. As root I can rsh to any node ok but if I do a rsh > node2 -n true then I get a permission denied. Again, it works for normal > users. I have gutted the .bashrc and /etc/bashrc scripts and the .rhosts > seem ok. what could be the problem? (Linux 7.2) > > thanks > jim The least that you need is a line "rsh" in /etc/securetty. Hope this helps. Cheers, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From rastapoppolous at yahoo.com Wed Apr 10 22:44:14 2002 From: rastapoppolous at yahoo.com (k r) Date: Wed Nov 25 01:02:13 2009 Subject: mpi fasta Makefile Problems Message-ID: <20020411054414.34011.qmail@web9009.mail.yahoo.com> hello all, I can't seem to get the included Makefile (Makefile.mpi4) for FASTA to compile on a beowulf cluster. When i compile i get the following error. mm_file.h:25: conflicting types for `int64_t' types.h:172: previous declaration of `int64_t' I did not make any changes to the Makefile. any help is appreciated. Thanks, Kart __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From math at velocet.ca Wed Apr 10 23:07:43 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:13 2009 Subject: Linux Software RAID5 Performance In-Reply-To: ; from mikeprinkey@hotmail.com on Wed, Apr 03, 2002 at 02:49:00PM -0500 References: Message-ID: <20020411020739.W19272@velocet.ca> On Wed, Apr 03, 2002 at 02:49:00PM -0500, Michael Prinkey's all... > Indeed, the multiple processes accessing the device made significantly > degrade performance. Fortunately for us, as well, access speed is limited > by the NFS/SMB and the network, not by array performance. Unfortunately, > the unit is online now and I can't fiddle around with the settings and test > it further. > > WRT reliability, we have seen the array drop to degraded mode because of a > single drive failure. We have also a single drive take down the entire IDE > port. This results in the md device disappearing until you swap out the > offending drive and restart the array. There is no data here. Usually one > drive goes and the array goes into degraded mode and starts reconstructing > on the spare. Then the second goes and the array disappears. It is a bit > disconcerting to do ls /raid and get nothing back. Changing out the drive > and restarting pulls everything back. > > I can honestly say that the only data loss that I have had on these arrays > came when a maintenance person completely unplugged one of the arrays from > the UPS. It caused low-level corruption on 5 of the 9 drives in the array. > We ended up using a Windows 98 boot floppy with Maxtor's Powermax utility to > patch them all back up. It took many hours. This is the WORST possible > scenario, BTW. Even reseting the system gives the EIDE devices a chance to > flush their caches and maintain low-level integrity. Cutting the power can > leave the array/drives inconsistent on the filesystem, device (/dev/md0), > and hardware-format datagram levels. So, lock your arrays in a cabinet! 8) ok get an EIDE RAID controller with battery backed-up ram onboard. We pulled a bunch of the SCSI equiv of such from a netfinity server a customer pawned off on us. Rather nice. (anyone want to buy? :) /kc > > Mike > > >From: Jurgen Botz > >To: mprinkey@aeolusresearch.com (Michael Prinkey) > >CC: beowulf@beowulf.org > >Subject: Re: Linux Software RAID5 Performance > >Date: Wed, 03 Apr 2002 10:25:31 -0800 > > > >Michael Prinkey wrote: > > > Again, performance (see below) is remarkably good, especially > >considering > > > all of the strikes against this configuration: EIDE instead of SCSI, > >UDMA66 > > > instead of 100/133, 5400-RPM instead of 7200-RPM, and master/slave > >drives on > > > each port instead of a single drive per port. > > > >With regard to the master/slave config... I note that your performance > >test is a single reader/writer... in this config with RAID5 I would > >expect the performance to be quite good even with 2 drives per IDE > >controller. But if you have several processes doing disk I/O > >simultaneously you should see a rather more precipitous drop in > >performance than you would with a single drive per IDE controller. > >I'm working on testing a very similar config right now and that's > >one of my findings (which I had expected) but our application for this > >is not very performance sensitive so it's not a big deal. > > > >A more important issue for me is reliability, and I'm somewhat > >concerned about failure modes. For example, can an IDE drive fail > >in such a way that if will disable the controller or the other > >drive on the same controller? If so, that would seriously limit > >the usefulness of RAID5 in this config. In general how good is > >Linux software RAID's failure handling? Etc. > > > >:j > > > > > >-- > >J?rgen Botz | While differing widely in the various > >jurgen@botz.org | little bits we know, in our infinite > > | ignorance we are all equal. -Karl > >Popper > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From ron_chen_123 at yahoo.com Thu Apr 11 00:53:52 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:13 2009 Subject: Lost cycles due to PBS (was Re: Uptime data/studies/anecdotes) In-Reply-To: <20020402140934.A29446@getafix.EraGen.com> Message-ID: <20020411075352.26837.qmail@web14703.mail.yahoo.com> --- Chris Black wrote: > On Tue, Apr 02, 2002 at 12:46:07PM -0600, Roger L. > Smith wrote: > > On Tue, 2 Apr 2002, Richard Walsh wrote: > [stuff deleted] > > PBS is our leading cause of cycle loss. We now > run a cron job on the > > headnode that checks every 15 minutes to see if > the PBS daemons have died, > > and if so, it automatically restarts them. About > 75% of the time that I > > have a node fail to accept jobs, it is because its > pbs_mom has died, not > > because there is anything wrong with the node. > > > > We used to have the same problem with PBS, > especially when many jobs were > in the queue. At that point sometimes the pbs master > died as well. > Since we've switched to SGE/GridEngine/CODINE I've > been MUCH happier. > Plus there are lots of nifty things you can do with > the expandibility of > writing your own load monitors via shell scripts and > such. > The whole point of this post is: > GNQS < PBS < Sun Gridengine :) > > Chris (who tried two other batch schedulers until > settling on SGE) > I also have similar experience -- I tried PBS, it is hard to install, and there are not much scheduling policies -- but it is hard to config. Then I read the news about SGE, and since it does not require root access to install/run, I gave it a try. I did an experience a few weeks ago -- submitting over 30,000 "sleep jobs" to SGE, and it did not die! If the master host is down, another machine takes over, so there is not lost of computing power. I think SGE 5.3 is better than anything available. I tried commerical DRM systems, other open source packages, but so far SGE is by far the best. BTW, Chris, how many nodes are there in your cluster? -Ron P.S. I'm doing a port of SGE to FreeBSD, hope people find it useful __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From keithu at parl.clemson.edu Thu Apr 11 05:11:23 2002 From: keithu at parl.clemson.edu (Keith Underwood) Date: Wed Nov 25 01:02:13 2009 Subject: Experience with GigE Switches with Jumbo packet support In-Reply-To: Message-ID: Most of the Extreme switches support Jumbo packets. The new line of products from Foundry Networks (JetCore is what I think they call it) is supposed to support Jumbo packets (even for Fast Ethernet from the way I read the spec, if you could find a card to do it). Keith On Sun, 7 Apr 2002, Tony Skjellum wrote: > Any Beowulf folks out there have specific experience with switches that allow > Jumbo packets? It seems hard to tell from online specs on various company > pages whether a switch does this or not? > > Adapters seem to be readily available... > > Any clusters doing this right now? > > Thanks, > Tony > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University From wonglijie at yahoo.com Thu Apr 11 06:00:08 2002 From: wonglijie at yahoo.com (Li Jie) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) Message-ID: <20020411130008.63691.qmail@web9608.mail.yahoo.com> hi may i know if anyone here can provide detailed information on how to start a a beowulf? i have about 9 machinese with Pentium MMX 233 Mhz processors, 128 mb RAM, 2 x 1.99 Gb. HDD I am also considering various designs and this is a school project. Thanks for your help! lijie [2002] 7540832 [2 Cor. 5:7] We live by faith and not by sight. --------------------------------- Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020411/fd1e3d99/attachment.html From rgb at phy.duke.edu Thu Apr 11 06:18:39 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help In-Reply-To: Message-ID: On Thu, 4 Apr 2002, Adrian Garcia Garcia wrote: > Ok I understand most of the tips, but I have some doubts about the domain > name, I used the domain name "cluster.org" because every documentation > about DHCP had a domain name in the configuration so ... > Is it necesary to have a domain name server (like BIND) working together > with the dhcp server?????? We're getting to where I don't know the answers -- just try it with and without. At a guess, the answer is no, you don't need a domain name and if you use one it can likely be made up -- mine always have been, and IIRC I've used names that didn't correspond to anything in hosts and didn't even have an approved ending. If you do make one up I'd suggest you stay away from any name (unfortunately like cluster.org or cluster.net) that MIGHT be registered in nameservice so you can avoid any possibility of name resolution confusion in the future. You definitely don't need a nameserver -- my hosts are all on a private internal network anyway and not in nameservice. If you want them to resolve by name you have to ensure that they are resolvable one of the ways given for hosts in /etc/nsswitch.conf and the library calls will take care of the rest. > One more thing... > ? > I don?t have Internet in my LAN and I don?t know if is it necesary the > domain name????? > > Thanks a lot. I'm newbe and my english is not good =) Probably not. It depends on what services you want to run elsewhere. Mail servers/clients will likely get unhappy without some sort of domain name defined, maybe a few other things like this. It is also possible some distribution-installed tools (assuming in their preconfiguration that they are on an open LAN) will bitch or break if no domain name is defined -- I've not tried it so can't tell you. /etc/hosts based name resolution per se couldn't care less. Domain names are used primarily for routing or domain administration. The correspondance between a domain name and a subnet block or union of subnet blocks is often useful for both. If you have a private network, no routing except between hosts on the same wire/switch, and no need to differentiate subnet blocks for administrative purposes you can probably live without. If you think that there is any reasonable chance that your cluster might one day end up on a public network it is reasonable to define one anyway. If any installed tools complain because there isn't one it is certainly harmless enough to define one. I generally do out of sheer habit and inertia even within my private lan at home. rgb > > Adri?n . > > ________________________________________________________________________________ > Chat with friends online, try MSN Messenger: Click Here > _______________________________________________ Beowulf mailing list, > Beowulf@beowulf.org To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From emiller at techskills.com Thu Apr 11 07:08:16 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: <20020411130008.63691.qmail@web9608.mail.yahoo.com> Message-ID: >hi >may i know if anyone here can provide detailed information on how to start a a beowulf? >i have about 9 machinese with Pentium MMX 233 Mhz processors, 128 mb RAM, 2 x 1.99 Gb. HDD >I am also considering various designs and this is a school project. Thanks for your help! Li, this group is not very "newbie freindly" when you ask for detailed information, I am working on a similar project and have only gotten spotty assistance. I will tell you that the Scyld system is by far the easiset to set up, and it works well. If you are familiar with Linux, you should have no problems getting a Scyld beowulf up and running, just be sure to read the docs first, they explain the NIC requirements on the master, and other important physical setup issues. After you get the network built, it is really a well-built distribution, with a GUI and all. See www.scyld.com I have tried others, but Scyld is far and away the best, with the most community support. After you get the cluster up and running, that's where the help seems to drift off. Most of the people in this group are upper-level users who know how to get these MPI enabled programs to run on thier clusters. If you are like me, these topics are a little foreign. If you are looking for something to run continuously, like a display, they say the MandelBrot renderer has a loop function, but I can't get it to work. Someone suggested SETI many months ago, which would be perfect, but SETI does not offer an MPI enabled program. Maybe you and I can work together, Ill help you get your cluster up and running, then together we can rattle our swords for some detailed assistance with the MPI programs (and programming). Contact me emiller@techskills.com, good luck! From pzb at datastacks.com Wed Apr 10 20:56:53 2002 From: pzb at datastacks.com (Peter Bowen) Date: Wed Nov 25 01:02:14 2009 Subject: Newest RPM's? In-Reply-To: <1017637861.1772.39.camel@loiosh> References: <1017612125.19271.20.camel@vhwalke.mathsci.usna.edu> <002c01c1d933$ec02a3c0$c31fa6ac@xp> <1017637861.1772.39.camel@loiosh> Message-ID: <1018497414.18187.3.camel@gargleblaster.caffeinexchange.org> On Mon, 2002-04-01 at 00:11, Sean DIlda wrote: > On Sun, 2002-03-31 at 23:15, Eric Miller wrote: > I must note, my above answer was given as if you were installing over > RH6.2 I do *NOT* recommend installing the binary rpms from a > RHL6.2-based Scyld Beowulf over a RHL7.2 system. This is by no means a > supported method. I don't know if anything will or won't break in doing > it, but I would assume that something will considering how much has > changed between RHL6.2 and RHL7.2 If you really want to try, you're > free to try, but if you want something right now, I'd suggesting going > to the RHL6.2 based setup. Are there beowulf packages available for RHL7.2? Thanks. Peter From Todd_Henderson at Raytheon.com Tue Apr 9 07:16:44 2002 From: Todd_Henderson at Raytheon.com (Todd Henderson) Date: Wed Nov 25 01:02:14 2009 Subject: NASTRAN on cluster Message-ID: <3CB2F7CC.82F09FF9@raytheon.com> We're in the process of starting the search for a cluster to replace our 35 XP1000's that come off lease in Sept. We currently use the cluster for CFD only, but we have been instructed that it would be beneficial to all to ensure that NASTRAN can use the new cluster. Therefore, I was wondering if anyone out there is running NASTRAN on a cluster? If so, what OS and cpu's are you using, and do you have any suggestions. thanks, Todd Henderson From jayne at sphynx.clara.co.uk Tue Apr 9 09:28:09 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Wed Nov 25 01:02:14 2009 Subject: Parallel povraying baby!!!! Message-ID: Right, I've now ran an parallel application on my Beowulf Cluster, and its working well! ;) When runnig pvmpov which is a parallel rendering farm application. I get these results when I render skyvase.pov, (a picture of a vase) 1 host = 7 mins, 11 seconds 2 hosts = 3min, 30 seconds 3 hosts = 2min 18 seconds One other machine to add yet though! These are all 486's This is my final year project at university What do you think??? kw1el huh???? Jayne From rickey-co at mug.biglobe.ne.jp Wed Apr 10 22:15:40 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: References: Message-ID: I think ... Quadrics is another one. Here's quick figures I have on hand.... RH7.2, 2.4.9 kernel for i860 cluster. On their site, they claim; after protocol, of 340Mbytes/second in each direction. The process-to-process latency for remote write operations is2us, and 5us for MPI messages. But pricing is MUCH higher than SCI/Myrinet. Best regards, At 4:08 +0200 5.04.2002, Steffen Persvold wrote: >On Thu, 4 Apr 2002, Jim Lux wrote: > >> What's high bandwidth? >> What's low latency? > > How much money do you want to spend? >I don't want to start a flamewar here, but I _think_ (not knowing real >numbers for other high speed interconnects) that SCI has atleast the >lowest latency and maybe also the highest point to point bandwidth : > >SCI application to application latency : 2.5 us >SCI application to application bandwidth : 325 MByte/sec > >Note that these numbers are very chipset specific (as most high speed >interconnect numbers are), these numbers are from IA64. Here are numbers >from a popular IA32 platform, the AMD 760MPX : > >SCI application to application latency : 1.8 us >SCI application to application bandwidth : 283 MByte/sec -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From jobriant at MPI-SoftTech.Com Thu Apr 11 08:20:00 2002 From: jobriant at MPI-SoftTech.Com (Jennifer O'Briant) Date: Wed Nov 25 01:02:14 2009 Subject: cluster of IBM Netfinity's Message-ID: I have a cluster of 10 IBM Netfinity's that I am upgrading with a 2nd PIII 700Mhz,type slot 1, processor. I am having a hard time finding a fan and heatsink that will fit in these 1U size servers. Does anyone have any ideas where I can find a side mount fan or side flow fan that will work? Jennifer O'Briant Associate Systems Administrator MPI Software Technology, Inc. From rgb at phy.duke.edu Wed Apr 10 06:31:06 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Quoting "Robert G. Brown" : > Is there a convenient way to obtain static ip-addresses using dhcp without > having to explicitly write down the mac-addresses in dhcpd.conf? > > Regards, > > /jon Static? As in each machine gets a single IP number that remains its own "forever" through all reboots and which can be identified by a fixed name in host tables? Following the time-honored tradition of actually reading the man pages for dhcpd, we see that the answer is "sort of". As in in principle yes, but only in a wierd way and would you really want to? First of all, let us consider, how COULD it do this? All dhcp knows of a host is its mac address. System needs IP number. System broadcasts a DHCP request. What can the daemon do? It can assign the address out of a range without looking at the MAC address (beyond ensuring that isn't one that it recognizes already) or it can look at the MAC address, do a table lookup and find it in the table, and assign an IP address based on the table that maps MAC->IP. This is pretty much what actually happens, and of course the lookup table CAN ensure a static MAC->IP matchup. The only question is how the lookup table is constructed. The obvious way is by making explict per-host entries in the dhcpd.conf file. dhcpd reads the file and builds the table from what it finds there. You make the dhcpd.conf entries by hand or automagically by means of a clever script. In general this isn't a real problem. You have to make a per-host entry into e.g. /etc/hosts as well, or you won't know the NAME the host is going to have to correspond to the IP number the daemon happened to give it the first time it saw it. The same script can do both, given e.g. the MAC address and hostname you wish to assign as arguments. Now there is nothing to PREVENT the daemon from assigning IP numbers out of the free range, creating a MAC->IP mapping, and saving the mapping itself so that it is automagically reloaded after, say, a crash (which tends to wipe out the table it builds in memory. By strange chance, this is pretty much exactly what dhcpd does. It views IP's assigned out of a given subnet range as "leases", to be given to hosts for a certain amount of time and then recovered for reuse. It saves its current lease table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this table and "grooms" it, cleaning out expired leases so the IP numbers are reused. In many/most cases where range addresses are used, this is just fine. Remember, dhcp was "invented" at least in part to simplify address assignment to rooms full of PC's running WinXX, a well-known stupid operating system that wouldn't know what to do with a remote login attempt if it saw one. Heck, it doesn't know what to do with a LOCAL login a lot of the time. The IP<->name map is pretty unimportant in this case, because you tend never to address the system by its internet name. So it's no big deal to let IP addresses for dumb WinXX clients recycle. Of course this isn't always true even for WinXX, especially if XX is 2K or XP or NT. Sometimes systems people really like to know that log traces by IP number can be mapped into specific machines just so they can go around with a sucker rod (see "man syslogd" and do a search on "sucker") to administer correction, for example, even if they cannot remotely login to the host in question. dhcpd allows you to pretty much totally control the lease time used for any given subnet or range. You can set it from very short to "very large", probably 4 billion or so seconds, which is (practically) "infinity". Infinity would be your coveted static IP address assignment. Once again I'd argue that although you CAN do this, you probably don't want to in just about any unixoid context including LAN management and cluster engineering. There is something so satisfying, so USEFUL, about the hostname<->IP map, and in order for this map to correspond to some SPECIFIC box, you really are building the hostname<->IP<->MAC map, piecewise. And of course you need to leave the NIC's in the boxes, since yes the map follows the NIC and not the actual box. Although it likely isn't the "only" way to control the complete chain, simultaneously and explicity building /etc/hosts (or the NIS, LDAP, rsync exported versions thereof), the various hostname-related permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is arguably the best way. To emphasize this last point, note that there is additional information that can be specified in the dhcp static table entries, such as the name of a per-host kickstart file to be used in installing it and more. dhcp is at least an approximation to a centralized configuration data server and can perform lots of useful services in this arena, not just handing out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's options can only be passed from it's own internal list, so one can't QUITE use it as a way of globally synchronizing whole tables of important data (like /etc/hosts,netgroups,passwd) across a subnet as systems automatically and periodically renew their leases. The list of options it supports as it stands now is quite large, though. I also don't know how susceptible it is to spoofing -- one problem with daemon-based services like this is that if they aren't uniquely bound at both ends to an authorized server and somebody puts a faster server on the same physical network, one can sometimes do something like dynamically change a systems "identity" in real time and gain access privileges you otherwise might not have had. Obviously, sending files like /etc/passwd around in this way would be a very dangerous thing to do unless the daemon were re-engineered to use something like ssl to simultaneously certify the server and encrypt the traffic. Hope this helps. BTW, in addition to the always useful man pages for dhcpd and dhcpd.conf (e.g.) you can and should look at the linux documentation project site and the various RFCs that specify dhcp's behavior and option spread. rgb From rgb at phy.duke.edu Wed Apr 10 09:03:42 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? Not a whole lot better. Since our installations tend to be O(10) systems at a time (10-30, not hundreds) and since we've gotten our local vendor to label each node with the MAC address before delivery (they've gotta boot up and burn in each node anyway) we just pop the nodes in a rack and use a script to insert a static entry for each one in an order that corresponds to rack order. After all, even though yes we label the nodes, it would be a bit silly to have g01 next to g22 next to g13 in rack order, and since we use the same dhcp server for nodes that we use for the general department, we cannot guarantee that some other host won't request and be granted a floating IP number that breaks the ordered sequence. The alternative (which would work fine for a cluster with a dedicated, in-the-local-isolated-net, and hence predictable dhcpd server) is to write the scriptset you describe, which we've actually considered doing. Boot the nodes in rack order, with floating addresses hopefully assigned in strict order from the address range, let them install themselves, and in the meantime write a script that parses e.g. /var/log/messages for the DHCP request and offer messages or /var/lib/dhpc/dhcpd.leases for the MAC and IP mapping and creates the required host and dhcpd.conf tables. We haven't gone this way partly out of laziness -- with tens of systems at a time to install it will only save work (relative to the time required to write the scripts) after we've used the scriptset for years -- and partly because to our direct observation at least one node install in twenty or thirty will screw up and occur in the wrong order. This, of course, will screw up EVERYTHING -- either one physically rearranges the rack or hand edits the tables, either of which costs one far more than the labor saved in the first place. There may be a better solution (probably smarter, more complex scripts that can perform e.g. node insert and delete operations and hence manage a reordering of the tables without having to hand edit everything) but more complex scripts require a signficant investment in time and one needs a very clear conceptualization of the design to have a good chance at ending up with something really usable. This in turn requires experience with the simpler scripts and a time living with their frustrations. We just don't have enough nodes to do all this except for the fun of it -- maybe a really big DOE site does but we don't. So we'll likely continue to use simple-building block scripts that require the entry of the MAC address and desired hostname/IP mapping as parameters (possibly augmented by a script that extracts MAC addresses from the log files, since even with help for the vendor we often have nodes or workstations to install with unknown MAC addresses and have to boot once, get the MAC address, and boot again to do the install). Not to beat dead horses or anything, but (IMHO) a lot of this management scriptset development is retarded by the fact that every single system tool has a configuration file with its own unique format and structure. I am well on the way to becoming downright religious about using xml as THE basis for the formatting of this sort of thing, at least where one can choose to do so in future applications. If dhcpd.conf and dhcpd.leases were written in an xml-compliant way, it would both make much better logical sense and it would be easier to both parse and write tools to manipulate them. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Apr 11 08:54:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: Newest RPM's? In-Reply-To: <1018497414.18187.3.camel@gargleblaster.caffeinexchange.org> Message-ID: On 10 Apr 2002, Peter Bowen wrote: > Are there beowulf packages available for RHL7.2? Depends on what you mean. Some of the fundamental tools (PVM, flavors of MPI, more) are already packaged in and in 7.2. Ditto the full range of GPL compilers and programming support tools. Even commercial beowulf packages like scyld or a turnkey vendor's arrangement often use RH as a base, although they aren't always current with the very latest release. Pretty much all the truly open source beowulf tools either are available in rpm form that will install under 7.2 (or source rpm that will rebuild and install under 7.2) or at the very least and worst in a tarball form that will build and install under any unixoid/posix environment including 7.2. In fact, it is almost tautological that this would be so -- beowulf tools were mostly developed on linux/gnu boxes and RH is at heart a generic linux/gnu distribution. Commercial packages (e.g. portland or absoft compilers, PBS-Pro) can almost always be obtained in a form that runs under 7.2. The only exceptions are likely to be ones with library issues that haven't yet been ported. libc has from time to time changed enough to break things, so tools developed on e.g. RH 5.2 don't always work on 7.2 without some porting effort (but I know of no mainstream tools in this category). So I think the answer would have to be "yes";-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From josip at icase.edu Thu Apr 11 09:00:04 2002 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:02:14 2009 Subject: Naming etc. (Was: DHCP Help) References: Message-ID: <3CB5B304.49BCC793@icase.edu> "Robert G. Brown" wrote: > > [...] my hosts are all on a private > internal network anyway and not in nameservice. Good policy! Private hostnames/addresses should remain private because they are not guaranteed to be unique across the entire Internet. The DNS server should contain only registered hostnames/addresses. The head node of a cluster is typically multi-homed and its public interface should be DNS registered, but the internal private interface (and the client nodes on the internal private network) are best resolved via /etc/hosts, where internal domain name is determined from the FQDN form of the name. If /etc/hosts on client1 contains: 192.168.1.1 client1.internal.domain client1 then 'dnsdomainname' on client1 returns 'internal.domain' (clearly not found in any Internet registry). This would work fine internally, but NOT outside the cluster (e.g. sendmail may have problems, etc.). The /etc/hosts tables should be consistent across the cluster, even if there are reasons to play tricks. For example, one typically has all machines on a fast ethernet (FE) subnet (say 192.168.1.x) but a few may also have gigabit ethernet (GE) interfaces (say 192.168.2.x). Using IP level routing can result in complicated routing tables, because only specific FE hosts can also be reached via the GE interface. What about name level "routing"? While /etc/hosts can be used to make hostnames of GE machines resolve to GE addresses on GE machines but to their FE addresses on the FE-only machines, this can lead to problems with software packages which assume globally consistent hostname/address mapping. For example, grid software (Globus) needs a globally consistent FQDN/IP mapping. The grid machine name is the fully-qualified domain name or Internet name of a grid machine. It should be the name returned by the "gethostbyname()" function (from libc) and the primary name retrieved from DNS via nslookup. The primary name should correspond to the host's primary interface (if there is more than one) and be fully accessible across the grid. The grid could involve private addresses, but those are visible only WITHIN an organization because private addresses must not be routable outside an organization. This is a serious limitation -- so it is probably best to limit grids to publicly registered hosts only. Proxy processes on the head nodes to access internal machines may be needed. Most clusters are built around a private subnet, sometimes with IP masquerading enabled on the head node so that the internal clients can 'call out'. This still means that internal clients are not visible externally, i.e. one cannot 'call in' from the outside. As a consequence, parallel jobs which assume global TCP connectivity of all participating machines (e.g. MPICH-G2) will have problems in using two clusters (each with its own private internal subnet). At the moment, every node (that you wish to use in a MPICH-G2 job) must have a public IP address and must be fully accessible. To run jobs across several clusters with internal private networks, the MPI programmer would need to provide a proxy process on the head node to overcome this difficulty. In summary, naming is a simple concept but just under the surface is a can of worms created by established programming practices based on diverse assumptions. Multiply connected machines and/or public/private network mixtures need to be set up with great care. Tricky setups are fragile; simplicity and transparency works better. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From tegner at nada.kth.se Wed Apr 10 05:36:40 2002 From: tegner at nada.kth.se (tegner@nada.kth.se) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: References: Message-ID: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Quoting "Robert G. Brown" : Is there a convenient way to obtain static ip-addresses using dhcp without having to explicitly write down the mac-addresses in dhcpd.conf? Regards, /jon > On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: > > For one thing don't use the range statement -- it tells dhcpd the range > of IP numbers to assign UNKNOWN ethernet numbers. You are statically > assigning an IP number in your "free" range to a particular host with a > KNOWN ethernet number below. I don't know what dhcpd would do in that > case -- something sensible one would hope but then, maybe not. The > range statement is really there so you can dynamically allocate > addresses from the range to hosts you may never have seen before that > you don't care to ever address by name (as they might well get a > different IP number on the next boot). > > DHCP servers run by ISP's not infrequently use the range feature to > conserve IP numbers -- they only need enough to cover the greatest > number of connections they are likely to have at any one time, not one > IP number per host that might ever connect. Departments might use it to > give IP numbers to laptops brought in by visitors (with the extra > benefit that they can assign a subnet block that isn't "trusted" by the > usual department servers and/or is firewalled from the outside by an > ip-forwarding/masquerading host). > > You want "only" static IP's in your cluster, as you'd like nodo1 to be > the same machine and IP address every time. > > Be a bit careful about your use of domain names. As it happens, I don't > find cluster.org registered yet (amazingly enough!) but it is pretty > easy to pick one that does exist in nameservice in the outside world. > In that case you'll run a serious risk of routing or name resolution > problems depending on things like the search order you use in > /etc/nsswitch.conf. Even my previous example of rgb.private.net is a > bit risky. > > You should run a nameserver (cache only is fine) on your 192.168.1.1 > server, presuming it lives on an external network and you care to > resolve global names. > > Similarly you may want: > > option routers 192.168.1.1; > > if you want internal hosts to be able to get out through your (presumed > gateway) server. > > Finally, if you want nodo1 to come up knowing its own name without > hardwiring it in on the node itself, add > > option host-name nodo1; > > to its definition. > > I admit that I do tend to lay out my dhcpd.conf a bit differently than > you have it below but I don't think that the differences are > particularly significant, and you have a copy of the one I use anyway if > you want to play with the pieces. You should find a log trace of > dhcpd's activities in /var/log/messages, which should help with any > further debugging. > > On your nodo1 host, make sure that: > > cat /etc/sysconfig/network-scripts/ifcfg-eth0 > DEVICE=eth0 > BOOTPROTO=dhcp > ONBOOT=yes > > and > > cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=nodo1 > > and that in /etc/modules.conf there is something like: > > cat /etc/modules.conf > alias parport_lowlevel parport_pc > alias eth0 tulip > > (or instead of tulip, whatever your network module is). > > If you then boot your e.g. RH client it SHOULD just come up, > automatically try to start the network on device eth0 using dhcp as its > protocol for obtaining and IP number, ask the dhcp server for an address > and a route, and just "work" when they come back. > > Hope this helps. > > rgb > > > server-name "server.cluster.org" > > > > subnet 192.168.1.0 netmask 255.255.255.0 > > { > > range 192.168.1.2 192.168.1.10 #my client has the ip > > 192.168.1.2 > > #and > my > > server the static ip 192.168.1.1 > > option subnet-mask 255.255.255.0; > > option broadcast-address 192.168.1.255; > > option domain-name-server 192.168.1.1; > > option domain-name "cluster.org"; > > > > host nodo1.cluster.org > > { > > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > > client's card > > fixed-address 192.168.1.2; > > } > > } > > > > And finally some files on my server. > > > > NETWORK > > ------------------------------------------ > > networking = yes > > hostname =server.cluster.org > > gatewaydev = eth0 > > gatewaye= > > ------------------------------------------ > > > > HOSTS ( In my server and in the client I have the same on this file ) > > ------------------------------------------ > > 127.0.0.1 localhost > > 192.168.1.1 server.cluster.org > > 192.168.1.2 nodo1.cluster.org > > > > > > Ok thats the information, I am a little confuse, could you help me > please > > =). I can?t detect the mistake, I dont know if is the server or some > card > > =s. Thanks for all. > > > > > ________________________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com. > > _______________________________________________ Beowulf mailing list, > > Beowulf@beowulf.org To change your subscription (digest mode or > > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From tegner at nada.kth.se Wed Apr 10 07:25:17 2002 From: tegner at nada.kth.se (tegner@nada.kth.se) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: References: Message-ID: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Very helpful! Thanks! But I'm still curious about how you make - automagically - the hardware ethernet line in dhcpd.conf initially. Say you have 100 machines. One way I would think of would be to use kickstart and: Install the machines and boot them up in sequence and using the range statement in dhcpd.conf (so that the first machine gets 192.168.1.101, the second 192.168.1.102 ...) Once all nodes are up use some script to extract the mac addresses for all the nodes and either modify dhcpd.conf - or - discard of dhcp completely and hardwire the ip-addresses to each node. But I'm sure there are better ways to do this? Thanks again, /jon Quoting "Robert G. Brown" : > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > Quoting "Robert G. Brown" : > > Is there a convenient way to obtain static ip-addresses using dhcp > without > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > Regards, > > > > /jon > > Static? As in each machine gets a single IP number that remains its own > "forever" through all reboots and which can be identified by a fixed > name in host tables? > > Following the time-honored tradition of actually reading the man pages > for dhcpd, we see that the answer is "sort of". As in in principle yes, > but only in a wierd way and would you really want to? > > First of all, let us consider, how COULD it do this? All dhcp knows of > a host is its mac address. System needs IP number. System broadcasts a > DHCP request. What can the daemon do? > > It can assign the address out of a range without looking at the MAC > address (beyond ensuring that isn't one that it recognizes already) or > it can look at the MAC address, do a table lookup and find it in the > table, and assign an IP address based on the table that maps MAC->IP. > This is pretty much what actually happens, and of course the lookup > table CAN ensure a static MAC->IP matchup. > > The only question is how the lookup table is constructed. > > The obvious way is by making explict per-host entries in the dhcpd.conf > file. dhcpd reads the file and builds the table from what it finds > there. You make the dhcpd.conf entries by hand or automagically by > means of a clever script. In general this isn't a real problem. You > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > know the NAME the host is going to have to correspond to the IP number > the daemon happened to give it the first time it saw it. The same > script can do both, given e.g. the MAC address and hostname you wish to > assign as arguments. > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > of the free range, creating a MAC->IP mapping, and saving the mapping > itself so that it is automagically reloaded after, say, a crash (which > tends to wipe out the table it builds in memory. By strange chance, > this is pretty much exactly what dhcpd does. It views IP's assigned out > of a given subnet range as "leases", to be given to hosts for a certain > amount of time and then recovered for reuse. It saves its current lease > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > table and "grooms" it, cleaning out expired leases so the IP numbers are > reused. In many/most cases where range addresses are used, this is just > fine. Remember, dhcp was "invented" at least in part to simplify > address assignment to rooms full of PC's running WinXX, a well-known > stupid operating system that wouldn't know what to do with a remote > login attempt if it saw one. Heck, it doesn't know what to do with a > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > in this case, because you tend never to address the system by its > internet name. So it's no big deal to let IP addresses for dumb WinXX > clients recycle. > > Of course this isn't always true even for WinXX, especially if XX is 2K > or XP or NT. Sometimes systems people really like to know that log > traces by IP number can be mapped into specific machines just so they > can go around with a sucker rod (see "man syslogd" and do a search on > "sucker") to administer correction, for example, even if they cannot > remotely login to the host in question. > > dhcpd allows you to pretty much totally control the lease time used for > any given subnet or range. You can set it from very short to "very > large", probably 4 billion or so seconds, which is (practically) > "infinity". Infinity would be your coveted static IP address > assignment. > > Once again I'd argue that although you CAN do this, you probably don't > want to in just about any unixoid context including LAN management and > cluster engineering. There is something so satisfying, so USEFUL, about > the hostname<->IP map, and in order for this map to correspond to some > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > piecewise. And of course you need to leave the NIC's in the boxes, > since yes the map follows the NIC and not the actual box. Although it > likely isn't the "only" way to control the complete chain, > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > rsync exported versions thereof), the various hostname-related > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > arguably the best way. > > To emphasize this last point, note that there is additional information > that can be specified in the dhcp static table entries, such as the name > of a per-host kickstart file to be used in installing it and more. dhcp > is at least an approximation to a centralized configuration data server > and can perform lots of useful services in this arena, not just handing > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > options can only be passed from it's own internal list, so one can't > QUITE use it as a way of globally synchronizing whole tables of > important data (like /etc/hosts,netgroups,passwd) across a subnet as > systems automatically and periodically renew their leases. The list of > options it supports as it stands now is quite large, though. > > I also don't know how susceptible it is to spoofing -- one problem with > daemon-based services like this is that if they aren't uniquely bound at > both ends to an authorized server and somebody puts a faster server on > the same physical network, one can sometimes do something like > dynamically change a systems "identity" in real time and gain access > privileges you otherwise might not have had. Obviously, sending files > like /etc/passwd around in this way would be a very dangerous thing to > do unless the daemon were re-engineered to use something like ssl to > simultaneously certify the server and encrypt the traffic. > > Hope this helps. BTW, in addition to the always useful man pages for > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > documentation project site and the various RFCs that specify dhcp's > behavior and option spread. > > rgb > > From joelja at darkwing.uoregon.edu Thu Apr 11 09:27:47 2002 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: You have to have a host-specific value to key on... that would be the mac address... you can approach the problem a different way (dynamic dns) so that the machine get the same hostname regardless of what ip they get but that's more trouble than it's worth for a cluster... On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Quoting "Robert G. Brown" : > Is there a convenient way to obtain static ip-addresses using dhcp without > having to explicitly write down the mac-addresses in dhcpd.conf? > > Regards, > > /jon > > > > > On Wed, 3 Apr 2002, Adrian Garcia Garcia wrote: > > > > For one thing don't use the range statement -- it tells dhcpd the range > > of IP numbers to assign UNKNOWN ethernet numbers. You are statically > > assigning an IP number in your "free" range to a particular host with a > > KNOWN ethernet number below. I don't know what dhcpd would do in that > > case -- something sensible one would hope but then, maybe not. The > > range statement is really there so you can dynamically allocate > > addresses from the range to hosts you may never have seen before that > > you don't care to ever address by name (as they might well get a > > different IP number on the next boot). > > > > DHCP servers run by ISP's not infrequently use the range feature to > > conserve IP numbers -- they only need enough to cover the greatest > > number of connections they are likely to have at any one time, not one > > IP number per host that might ever connect. Departments might use it to > > give IP numbers to laptops brought in by visitors (with the extra > > benefit that they can assign a subnet block that isn't "trusted" by the > > usual department servers and/or is firewalled from the outside by an > > ip-forwarding/masquerading host). > > > > You want "only" static IP's in your cluster, as you'd like nodo1 to be > > the same machine and IP address every time. > > > > Be a bit careful about your use of domain names. As it happens, I don't > > find cluster.org registered yet (amazingly enough!) but it is pretty > > easy to pick one that does exist in nameservice in the outside world. > > In that case you'll run a serious risk of routing or name resolution > > problems depending on things like the search order you use in > > /etc/nsswitch.conf. Even my previous example of rgb.private.net is a > > bit risky. > > > > You should run a nameserver (cache only is fine) on your 192.168.1.1 > > server, presuming it lives on an external network and you care to > > resolve global names. > > > > Similarly you may want: > > > > option routers 192.168.1.1; > > > > if you want internal hosts to be able to get out through your (presumed > > gateway) server. > > > > Finally, if you want nodo1 to come up knowing its own name without > > hardwiring it in on the node itself, add > > > > option host-name nodo1; > > > > to its definition. > > > > I admit that I do tend to lay out my dhcpd.conf a bit differently than > > you have it below but I don't think that the differences are > > particularly significant, and you have a copy of the one I use anyway if > > you want to play with the pieces. You should find a log trace of > > dhcpd's activities in /var/log/messages, which should help with any > > further debugging. > > > > On your nodo1 host, make sure that: > > > > cat /etc/sysconfig/network-scripts/ifcfg-eth0 > > DEVICE=eth0 > > BOOTPROTO=dhcp > > ONBOOT=yes > > > > and > > > > cat /etc/sysconfig/network > > NETWORKING=yes > > HOSTNAME=nodo1 > > > > and that in /etc/modules.conf there is something like: > > > > cat /etc/modules.conf > > alias parport_lowlevel parport_pc > > alias eth0 tulip > > > > (or instead of tulip, whatever your network module is). > > > > If you then boot your e.g. RH client it SHOULD just come up, > > automatically try to start the network on device eth0 using dhcp as its > > protocol for obtaining and IP number, ask the dhcp server for an address > > and a route, and just "work" when they come back. > > > > Hope this helps. > > > > rgb > > > > > server-name "server.cluster.org" > > > > > > subnet 192.168.1.0 netmask 255.255.255.0 > > > { > > > range 192.168.1.2 192.168.1.10 #my client has the ip > > > 192.168.1.2 > > > #and > > my > > > server the static ip 192.168.1.1 > > > option subnet-mask 255.255.255.0; > > > option broadcast-address 192.168.1.255; > > > option domain-name-server 192.168.1.1; > > > option domain-name "cluster.org"; > > > > > > host nodo1.cluster.org > > > { > > > hardware ethernet 00:60:97:a1:ef:e0; #here is the address of the > > > client's card > > > fixed-address 192.168.1.2; > > > } > > > } > > > > > > And finally some files on my server. > > > > > > NETWORK > > > ------------------------------------------ > > > networking = yes > > > hostname =server.cluster.org > > > gatewaydev = eth0 > > > gatewaye= > > > ------------------------------------------ > > > > > > HOSTS ( In my server and in the client I have the same on this file ) > > > ------------------------------------------ > > > 127.0.0.1 localhost > > > 192.168.1.1 server.cluster.org > > > 192.168.1.2 nodo1.cluster.org > > > > > > > > > Ok thats the information, I am a little confuse, could you help me > > please > > > =). I can?t detect the mistake, I dont know if is the server or some > > card > > > =s. Thanks for all. > > > > > > > > ________________________________________________________________________________ > > > Get your FREE download of MSN Explorer at http://explorer.msn.com. > > > _______________________________________________ Beowulf mailing list, > > > Beowulf@beowulf.org To change your subscription (digest mode or > > > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > -- > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > > Duke University Dept. of Physics, Box 90305 > > Durham, N.C. 27708-0305 > > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli Academic User Services joelja@darkwing.uoregon.edu -- PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -- The accumulation of all powers, legislative, executive, and judiciary, in the same hands, whether of one, a few, or many, and whether hereditary, selfappointed, or elective, may justly be pronounced the very definition of tyranny. - James Madison, Federalist Papers 47 - Feb 1, 1788 From rgb at phy.duke.edu Thu Apr 11 10:17:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? Not that I know of. Maybe somebody else knows of one. I'd just use perl or bash (either would probably work, although parsing is generally easier in perl), parse e.g. Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 from /var/log/messages on the dhcp server, and write an output routine to generate # golem (Linux/Windows laptop lilith, second/100BT interface) host golem { hardware ethernet 00:20:e0:6d:a0:05; fixed-address 192.168.1.140; next-server 192.168.1.131; option routers 192.168.1.1; option domain-name "rgb.private.net"; option host-name "golem"; } and 192.168.1.140 golem.rgb.private.net golem and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then distribute copies of the resulting /etc/hosts -- as Josip made eloquently clear your private internal network should resolve consistently on all PIN hosts and probably should have SOME sort of domainname defined so that software the might include a getdomainbyname() call and might not include an adequate check and handle of a null value can cope. It's hard to know what assumptions were made by the designer of every single piece of network software you might want to run... Of coures you'll probably want to do the b01, b02, b03... hostname iteration -- I'm just pulling an example at random out of my own log tables. rgb > > Thanks again, > > /jon > > Quoting "Robert G. Brown" : > > > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > > > Quoting "Robert G. Brown" : > > > Is there a convenient way to obtain static ip-addresses using dhcp > > without > > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > > > Regards, > > > > > > /jon > > > > Static? As in each machine gets a single IP number that remains its own > > "forever" through all reboots and which can be identified by a fixed > > name in host tables? > > > > Following the time-honored tradition of actually reading the man pages > > for dhcpd, we see that the answer is "sort of". As in in principle yes, > > but only in a wierd way and would you really want to? > > > > First of all, let us consider, how COULD it do this? All dhcp knows of > > a host is its mac address. System needs IP number. System broadcasts a > > DHCP request. What can the daemon do? > > > > It can assign the address out of a range without looking at the MAC > > address (beyond ensuring that isn't one that it recognizes already) or > > it can look at the MAC address, do a table lookup and find it in the > > table, and assign an IP address based on the table that maps MAC->IP. > > This is pretty much what actually happens, and of course the lookup > > table CAN ensure a static MAC->IP matchup. > > > > The only question is how the lookup table is constructed. > > > > The obvious way is by making explict per-host entries in the dhcpd.conf > > file. dhcpd reads the file and builds the table from what it finds > > there. You make the dhcpd.conf entries by hand or automagically by > > means of a clever script. In general this isn't a real problem. You > > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > > know the NAME the host is going to have to correspond to the IP number > > the daemon happened to give it the first time it saw it. The same > > script can do both, given e.g. the MAC address and hostname you wish to > > assign as arguments. > > > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > > of the free range, creating a MAC->IP mapping, and saving the mapping > > itself so that it is automagically reloaded after, say, a crash (which > > tends to wipe out the table it builds in memory. By strange chance, > > this is pretty much exactly what dhcpd does. It views IP's assigned out > > of a given subnet range as "leases", to be given to hosts for a certain > > amount of time and then recovered for reuse. It saves its current lease > > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > > table and "grooms" it, cleaning out expired leases so the IP numbers are > > reused. In many/most cases where range addresses are used, this is just > > fine. Remember, dhcp was "invented" at least in part to simplify > > address assignment to rooms full of PC's running WinXX, a well-known > > stupid operating system that wouldn't know what to do with a remote > > login attempt if it saw one. Heck, it doesn't know what to do with a > > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > > in this case, because you tend never to address the system by its > > internet name. So it's no big deal to let IP addresses for dumb WinXX > > clients recycle. > > > > Of course this isn't always true even for WinXX, especially if XX is 2K > > or XP or NT. Sometimes systems people really like to know that log > > traces by IP number can be mapped into specific machines just so they > > can go around with a sucker rod (see "man syslogd" and do a search on > > "sucker") to administer correction, for example, even if they cannot > > remotely login to the host in question. > > > > dhcpd allows you to pretty much totally control the lease time used for > > any given subnet or range. You can set it from very short to "very > > large", probably 4 billion or so seconds, which is (practically) > > "infinity". Infinity would be your coveted static IP address > > assignment. > > > > Once again I'd argue that although you CAN do this, you probably don't > > want to in just about any unixoid context including LAN management and > > cluster engineering. There is something so satisfying, so USEFUL, about > > the hostname<->IP map, and in order for this map to correspond to some > > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > > piecewise. And of course you need to leave the NIC's in the boxes, > > since yes the map follows the NIC and not the actual box. Although it > > likely isn't the "only" way to control the complete chain, > > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > > rsync exported versions thereof), the various hostname-related > > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > > arguably the best way. > > > > To emphasize this last point, note that there is additional information > > that can be specified in the dhcp static table entries, such as the name > > of a per-host kickstart file to be used in installing it and more. dhcp > > is at least an approximation to a centralized configuration data server > > and can perform lots of useful services in this arena, not just handing > > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > > options can only be passed from it's own internal list, so one can't > > QUITE use it as a way of globally synchronizing whole tables of > > important data (like /etc/hosts,netgroups,passwd) across a subnet as > > systems automatically and periodically renew their leases. The list of > > options it supports as it stands now is quite large, though. > > > > I also don't know how susceptible it is to spoofing -- one problem with > > daemon-based services like this is that if they aren't uniquely bound at > > both ends to an authorized server and somebody puts a faster server on > > the same physical network, one can sometimes do something like > > dynamically change a systems "identity" in real time and gain access > > privileges you otherwise might not have had. Obviously, sending files > > like /etc/passwd around in this way would be a very dangerous thing to > > do unless the daemon were re-engineered to use something like ssl to > > simultaneously certify the server and encrypt the traffic. > > > > Hope this helps. BTW, in addition to the always useful man pages for > > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > > documentation project site and the various RFCs that specify dhcp's > > behavior and option spread. > > > > rgb > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From shaeffer at neuralscape.com Thu Apr 11 03:01:55 2002 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: ; from rgb@phy.duke.edu on Wed, Apr 10, 2002 at 09:31:06AM -0400 References: <1018442200.3cb431d875b7d@mail1.nada.kth.se> Message-ID: <20020411030155.A30307@synapse.neuralscape.com> On Wed, Apr 10, 2002 at 09:31:06AM -0400, Robert G. Brown wrote: > > Hope this helps. BTW, in addition to the always useful man pages for > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > documentation project site and the various RFCs that specify dhcp's > behavior and option spread. http://www.amazon.com/exec/obidos/search-handle-form/ref=s_sf_b_as/002-5123550-8208810 Is a reasonably well done book that folks interested in DHCP might consider acquiring. It provides a comprehensive overview of the subject. cheers, Karen -- Karen Shaeffer Neuralscape; Santa Cruz, Ca. 95060 shaeffer@neuralscape.com http://www.neuralscape.com From roger at ERC.MsState.Edu Thu Apr 11 11:00:04 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? That's exactly how I do it. Then, in the Kickstart configuration script, I have the node configure itself not to use DHCP anymore. It is a bit cumbersome when new nodes are added, but since the nodes that I will be installing in two weeks are the product of a purchase cycle that started in February, I don't have to worry about doing it too often. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From roger at ERC.MsState.Edu Thu Apr 11 11:02:46 2002 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Robert G. Brown wrote: > > But I'm sure there are better ways to do this? > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate It's actually easier to grab them out of /var/lib/dhcpd.leases, since some of the information that you're looking for is already in the format that you need it. _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From christon at pluto.dsu.edu Thu Apr 11 11:09:59 2002 From: christon at pluto.dsu.edu (Christoffersen, Neils) Date: Wed Nov 25 01:02:14 2009 Subject: scyld slave node problems Message-ID: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Hello all, I'm setting up a small cluster for my university using the Scyld distro. The master is up and running and now I'm trying to get the nodes to operate. However, the node I'm currently working on is having some difficulties. It seems to be communicating with the master just fine, but when copying the libraries from the master it starts spitting out "try_do_free_pages failed for init" and similar messages. It seems to me that maybe the hard drive is not being recognized and it's trying to run everything on ram and just running out of memory. Does anyone know what could be causing this? I have the node log which I can attach if you wish (I just don't have it with me at the moment). Thanks for any help you can lend. Sincerely Neils Christoffersen From canon at nersc.gov Thu Apr 11 11:16:25 2002 From: canon at nersc.gov (canon@nersc.gov) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: Message from tegner@nada.kth.se of "Wed, 10 Apr 2002 16:25:17 +0200." <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <200204111816.g3BIGPw13292@pookie.nersc.gov> Jon, We install our machines in pretty much this fashion. I wrote a script that yanks out the mac address and builds a dhcp entry that I append to the dhcpd.conf file. Its not the most elegant solution but it works. Also, I think NPACI/ROCKS includes some utilities to stream-line this process. --Shane Canon From RSchilling at affiliatedhealth.org Thu Apr 11 10:59:55 2002 From: RSchilling at affiliatedhealth.org (Schilling, Richard) Date: Wed Nov 25 01:02:14 2009 Subject: Parallel povraying baby!!!! Message-ID: <51FCCCF0C130D211BE550008C724149E01165AF7@mail1.affiliatedhealth.org> Very nice results! Would you be willing to discuss or document the steps you took to get set up? Thanks! --Richard Schiling -----Original Message----- From: Jayne Heger To: Davis, Robin J.; Penfold, Brian; webmaster@wisewolf.com; Clever TW; Roy Gudz; Stephen.Cooke@severntrent.co.uk; Symon Cook; Tasneem Sharif; beowulf-newbie@fecundswamp.net; beowulf@beowulf.org; chris Sent: 9/04/02 17:28 Subject: Parallel povraying baby!!!! Right, I've now ran an parallel application on my Beowulf Cluster, and its working well! ;) When runnig pvmpov which is a parallel rendering farm application. I get these results when I render skyvase.pov, (a picture of a vase) 1 host = 7 mins, 11 seconds 2 hosts = 3min, 30 seconds 3 hosts = 2min 18 seconds One other machine to add yet though! These are all 486's This is my final year project at university What do you think??? kw1el huh???? Jayne _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020411/787c81d7/attachment.html From siegert at sfu.ca Thu Apr 11 11:34:38 2002 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se>; from tegner@nada.kth.se on Wed, Apr 10, 2002 at 04:25:17PM +0200 References: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <20020411113438.C20302@stikine.ucs.sfu.ca> On Wed, Apr 10, 2002 at 04:25:17PM +0200, tegner@nada.kth.se wrote: > Very helpful! Thanks! > > But I'm still curious about how you make - automagically - the hardware ethernet > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > of would be to use kickstart and: > > Install the machines and boot them up in sequence and using the range statement > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > 192.168.1.102 ...) > > Once all nodes are up use some script to extract the mac addresses for all the > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > hardwire the ip-addresses to each node. > > But I'm sure there are better ways to do this? If you want to use static ip addresses anyway (as I do), why do you use dhcp at all? I use a kickstart file with something like network --bootproto static --device eth3 --ip 172.17.254.1 --netmask 255.255.0.0 --gateway 172.17.0.1 --hostname ks1 --nameserver 172.17.0.1 and have on the master node a set of ip addresses reserved for kickstart installations: 172.17.254.1 ks1 172.17.254.2 ks2 172.17.254.3 ks3 172.17.254.4 ks4 172.17.254.5 ks5 In the %post section of the kickstart file I then run a script that increases a counter on the master node, returns that counter as the real ip address of the new node, and updates the /etc/hosts file on all other nodes. I have installed my cluster (96 nodes) that way all by myself without any (big) problems ... maybe I just was too lazy to learn how to deal with dhcp. Cheers, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From joachim at lfbs.RWTH-Aachen.DE Thu Apr 11 11:46:28 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> ...Iwao Makino wrote: > > I think ... Quadrics is another one. [...] > But pricing is MUCH higher than SCI/Myrinet. Do you have any pricing information at all? AFAIK, they are only distribute with Compaq clusters. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From emiller at techskills.com Thu Apr 11 11:52:06 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:14 2009 Subject: scyld slave node problems In-Reply-To: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Message-ID: >I'm setting up a small cluster for my university using the Scyld distro. >The master is up and running and now I'm trying to get the nodes to operate. >However, the node I'm currently working on is having some difficulties. It >seems to be communicating with the master just fine, but when copying the >libraries from the master it starts spitting out "try_do_free_pages failed >for init" and similar messages. It seems to me that maybe the hard drive is There might be a more technical solution, but I had the same problem and was able to solve it by booting that node diskless. Just disconnect the hard drive, and re-boot with the boot disk. Like I said, there may be a better way or technical solution, but ".....free_pages" led me to believe it was a HDD problem. I booted diskless, and had no problems. From rok at ucsd.edu Thu Apr 11 11:31:23 2002 From: rok at ucsd.edu (Robert Konecny) Date: Wed Nov 25 01:02:14 2009 Subject: DHCP Help Again In-Reply-To: ; from rgb@phy.duke.edu on Thu, Apr 11, 2002 at 01:17:51PM -0400 References: <1018448717.3cb44b4d7ee2a@mail1.nada.kth.se> Message-ID: <20020411113123.B26495@ucsd.edu> that's pretty much how insert-ethers from Rocks clustering software works (rocks.npaci.edu). You fire it up on frontend and it starts parsing /var/log/messages in real time. Then you kick start a node and when insert-ethers sees a request for a lease with unknown MAC it updates Rocks MySQL database, generates new dhcpd.conf and restarts dhcpd. Works like charm. robert On Thu, Apr 11, 2002 at 01:17:51PM -0400, Robert G. Brown wrote: > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate > > # golem (Linux/Windows laptop lilith, second/100BT interface) > host golem { > hardware ethernet 00:20:e0:6d:a0:05; > fixed-address 192.168.1.140; > next-server 192.168.1.131; > option routers 192.168.1.1; > option domain-name "rgb.private.net"; > option host-name "golem"; > } > > and > > 192.168.1.140 golem.rgb.private.net golem > > and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then > distribute copies of the resulting /etc/hosts -- as Josip made > eloquently clear your private internal network should resolve > consistently on all PIN hosts and probably should have SOME sort of > domainname defined so that software the might include a > getdomainbyname() call and might not include an adequate check and > handle of a null value can cope. It's hard to know what assumptions > were made by the designer of every single piece of network software you > might want to run... > > Of coures you'll probably want to do the b01, b02, b03... hostname > iteration -- I'm just pulling an example at random out of my own log > tables. > > rgb > > > > > Thanks again, > > > > /jon > > > > Quoting "Robert G. Brown" : > > > > > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > > > > > Quoting "Robert G. Brown" : > > > > Is there a convenient way to obtain static ip-addresses using dhcp > > > without > > > > having to explicitly write down the mac-addresses in dhcpd.conf? > > > > > > > > Regards, > > > > > > > > /jon > > > > > > Static? As in each machine gets a single IP number that remains its own > > > "forever" through all reboots and which can be identified by a fixed > > > name in host tables? > > > > > > Following the time-honored tradition of actually reading the man pages > > > for dhcpd, we see that the answer is "sort of". As in in principle yes, > > > but only in a wierd way and would you really want to? > > > > > > First of all, let us consider, how COULD it do this? All dhcp knows of > > > a host is its mac address. System needs IP number. System broadcasts a > > > DHCP request. What can the daemon do? > > > > > > It can assign the address out of a range without looking at the MAC > > > address (beyond ensuring that isn't one that it recognizes already) or > > > it can look at the MAC address, do a table lookup and find it in the > > > table, and assign an IP address based on the table that maps MAC->IP. > > > This is pretty much what actually happens, and of course the lookup > > > table CAN ensure a static MAC->IP matchup. > > > > > > The only question is how the lookup table is constructed. > > > > > > The obvious way is by making explict per-host entries in the dhcpd.conf > > > file. dhcpd reads the file and builds the table from what it finds > > > there. You make the dhcpd.conf entries by hand or automagically by > > > means of a clever script. In general this isn't a real problem. You > > > have to make a per-host entry into e.g. /etc/hosts as well, or you won't > > > know the NAME the host is going to have to correspond to the IP number > > > the daemon happened to give it the first time it saw it. The same > > > script can do both, given e.g. the MAC address and hostname you wish to > > > assign as arguments. > > > > > > Now there is nothing to PREVENT the daemon from assigning IP numbers out > > > of the free range, creating a MAC->IP mapping, and saving the mapping > > > itself so that it is automagically reloaded after, say, a crash (which > > > tends to wipe out the table it builds in memory. By strange chance, > > > this is pretty much exactly what dhcpd does. It views IP's assigned out > > > of a given subnet range as "leases", to be given to hosts for a certain > > > amount of time and then recovered for reuse. It saves its current lease > > > table in /var/lib/dhcp/dhcpd.leases. Periodically it goes through this > > > table and "grooms" it, cleaning out expired leases so the IP numbers are > > > reused. In many/most cases where range addresses are used, this is just > > > fine. Remember, dhcp was "invented" at least in part to simplify > > > address assignment to rooms full of PC's running WinXX, a well-known > > > stupid operating system that wouldn't know what to do with a remote > > > login attempt if it saw one. Heck, it doesn't know what to do with a > > > LOCAL login a lot of the time. The IP<->name map is pretty unimportant > > > in this case, because you tend never to address the system by its > > > internet name. So it's no big deal to let IP addresses for dumb WinXX > > > clients recycle. > > > > > > Of course this isn't always true even for WinXX, especially if XX is 2K > > > or XP or NT. Sometimes systems people really like to know that log > > > traces by IP number can be mapped into specific machines just so they > > > can go around with a sucker rod (see "man syslogd" and do a search on > > > "sucker") to administer correction, for example, even if they cannot > > > remotely login to the host in question. > > > > > > dhcpd allows you to pretty much totally control the lease time used for > > > any given subnet or range. You can set it from very short to "very > > > large", probably 4 billion or so seconds, which is (practically) > > > "infinity". Infinity would be your coveted static IP address > > > assignment. > > > > > > Once again I'd argue that although you CAN do this, you probably don't > > > want to in just about any unixoid context including LAN management and > > > cluster engineering. There is something so satisfying, so USEFUL, about > > > the hostname<->IP map, and in order for this map to correspond to some > > > SPECIFIC box, you really are building the hostname<->IP<->MAC map, > > > piecewise. And of course you need to leave the NIC's in the boxes, > > > since yes the map follows the NIC and not the actual box. Although it > > > likely isn't the "only" way to control the complete chain, > > > simultaneously and explicity building /etc/hosts (or the NIS, LDAP, > > > rsync exported versions thereof), the various hostname-related > > > permissions (e.g. netgroups) and /etc/dhpcd.conf static entries is > > > arguably the best way. > > > > > > To emphasize this last point, note that there is additional information > > > that can be specified in the dhcp static table entries, such as the name > > > of a per-host kickstart file to be used in installing it and more. dhcp > > > is at least an approximation to a centralized configuration data server > > > and can perform lots of useful services in this arena, not just handing > > > out IP addresses. Unfortunately (perhaps? as far as I know?) dhcp's > > > options can only be passed from it's own internal list, so one can't > > > QUITE use it as a way of globally synchronizing whole tables of > > > important data (like /etc/hosts,netgroups,passwd) across a subnet as > > > systems automatically and periodically renew their leases. The list of > > > options it supports as it stands now is quite large, though. > > > > > > I also don't know how susceptible it is to spoofing -- one problem with > > > daemon-based services like this is that if they aren't uniquely bound at > > > both ends to an authorized server and somebody puts a faster server on > > > the same physical network, one can sometimes do something like > > > dynamically change a systems "identity" in real time and gain access > > > privileges you otherwise might not have had. Obviously, sending files > > > like /etc/passwd around in this way would be a very dangerous thing to > > > do unless the daemon were re-engineered to use something like ssl to > > > simultaneously certify the server and encrypt the traffic. > > > > > > Hope this helps. BTW, in addition to the always useful man pages for > > > dhcpd and dhcpd.conf (e.g.) you can and should look at the linux > > > documentation project site and the various RFCs that specify dhcp's > > > behavior and option spread. > > > > > > rgb > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lightdee at netscape.net Thu Apr 11 11:34:33 2002 From: lightdee at netscape.net (lightdee@netscape.net) Date: Wed Nov 25 01:02:14 2009 Subject: How do you keep clusters running.... Message-ID: <1D889E10.452B89F1.009FF3AE@netscape.net> Doug J Nordwall wrote: >On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: > > What are folks doing about keeping hardware running on large clusters? > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 >nodes)... > > Sure seems like every week or two, I notice dead fans (each RS-1200 > has 6 case fans in addition to the 2 CPU fans and 2 power supply > fans). > > >You running lm_sensors on your nodes? That's a handy tool for paying >attention to things like that. We use ours in combination with ganglia >and pump it to a web page and to big brother to see when a cpu might be >getting hot, or a fan might be too slow. We actually saved a dozen >machines that way...we have 32 4 processor racksaver boxes in a rack, >and they rack was not designed to handle racksaver's fan system. That is >to say, there was a solid sidewall on the rack, and it kept in heat. I >set up lm_sensors on all the nodes (homogenous, so configured on one and >pushed it out to all), then pumped the data into ganglia >(ganglia.sourceforge.net) and then to a web page. I noticed that the >temp on a dozen of the machines was extremely high. So, I took off the >side panel of the rack. The temp dropped by 15 C on all the nodes, and >everything was within normal parameters again. > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > >Ya, we would have seen this on ours earlier...excellent tool [snip] We use Clusterworx, which isn't open source (from Linux Networx), but it goes a step further than Ganglia. It uses lm_sensors and a power control box (again from linux networx) to actually shutdown a node if it is getting too hot, and the event parameters are all tweakable. It's always a good idea to have some kind of cluster monitoring software installed, but it's nice to be able to setup event triggers in your software in case something goes wrong and you're not around. ---- David Henry Synergy Software, Inc. lightdee@netscape.net __________________________________________________________________ Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ From ctierney at hpti.com Thu Apr 11 11:59:20 2002 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:02:14 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020406113545.91938.qmail@web10504.mail.yahoo.com>; from suraj_peri@yahoo.com on Sat, Apr 06, 2002 at 03:35:45AM -0800 References: <20020405125956.D69845@velocet.ca> <20020406113545.91938.qmail@web10504.mail.yahoo.com> Message-ID: <20020411125920.D32605@hpti.com> It depends on what you are trying to do (doesn't everyone love that answer). The number of flops your cluster can do should be equal to: flops = (no. of cpus) * (Mhz) * (flops per hz) So for your cluster flops = 8 * 1.53 Ghz * 2 I am assuming that with SSE you can get 2 flops per cycle. flops = 24.48 Gflops Now, there are some issues with this. First, you are never going to get 1.53*2 Gflops out of a single processor. Second, leveraging all 8 cpus to get their maximum is going to be difficult if there is any communication between the nodes. Compilers play a big role in extracting the best performance out of the system. If you don't have a commerical compiler from the likes of Intel or Portland Group, I highly recommend getting one. You only have to purchase the compiler for where you compile, and not where you run. You can get away with one copy of the compiler on your server. If you are trying to compare the AMD system to the DS20E system, it will depend on what you are actually trying to do. If you are running single precision floating point codes that do not require all the memory bandwidth a DS20E provides, I would think that within 10% that AMD processor will do the work of one 833 Mhz Alpha Cpu (You didn't say if you had 2 cpus in your DS20e). At least this is what I am seeing for my codes when comparing Dual Xeon's, Dual AMD's, and dual API 833 boxes. Craig On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri wrote: > Hi group, > I was calculating the performance of my cluster. The > features are > > 1. 8 nodes > 2. Processor: AMD Athlon XP 1800+ > 3. 8 CPUs > 4. 8*1.5 GB DDR RAM > 5. 1 Server with 2 processorts with AMD MP 1800+ and > 2GB DDR RAM > > I calculated this to be 48 Mflops . Is this correct ? > if not, what is the correct performance of my cluster. > I also comparatively calculated that my cluster would > be 3 times faster than AlphaServer DS20E ( 833 MHz > alpha 64 bit processor, 4 GB max memory) > > Is my calculation correct or wrong? please help me > ASAP. thanks in advance. > > cheers > suraj. > > ===== > PIL/BMB/SDU/DK > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From becker at scyld.com Thu Apr 11 12:16:47 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:14 2009 Subject: scyld slave node problems In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Eric Miller wrote: > >I'm setting up a small cluster for my university using the Scyld distro. > >The master is up and running and now I'm trying to get the nodes to > operate. > >However, the node I'm currently working on is having some difficulties. It > >seems to be communicating with the master just fine, but when copying the > >libraries from the master it starts spitting out "try_do_free_pages failed > >for init" and similar messages. My first guess is that you don't have enough memory (64MB+) on the slave node. But this might also be a memory or disk problem. > There might be a more technical solution, but I had the same problem and was > able to solve it by booting that node diskless. Just disconnect the hard > drive, and re-boot with the boot disk. Like I said, there may be a better > way or technical solution, but ".....free_pages" led me to believe it was a > HDD problem. I booted diskless, and had no problems. You should not need to physically disconnect the hard disk. Just remove any references to /dev/hda that you added in /etc/beowulf/fstab. However if you do have a hardware problem, disconnecting the disk might avoid the symptoms. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From skruglik at gmu.edu Thu Apr 11 12:35:28 2002 From: skruglik at gmu.edu (Stepan Kruglikov) Date: Wed Nov 25 01:02:14 2009 Subject: scyld slave node problems References: <0718ABB23368D2119FC200008362AF6816BDDA@pluto.dsu.edu> Message-ID: <00a901c1e190$0924c730$c932ae81@lyapunov> Hello, I solved this problem by increasing memory on each node to 64MB. You can also get the node up and running with 64MB, setup node partitions, and then delete ram drive and run with 32mB. Although it works, I recommend doing it only in case if you are interested in proof of concept cluster. Stepan Kruglikov ----- Original Message ----- From: "Christoffersen, Neils" To: Sent: Thursday, April 11, 2002 2:09 PM Subject: scyld slave node problems > Hello all, > > I'm setting up a small cluster for my university using the Scyld distro. > The master is up and running and now I'm trying to get the nodes to operate. > However, the node I'm currently working on is having some difficulties. It > seems to be communicating with the master just fine, but when copying the > libraries from the master it starts spitting out "try_do_free_pages failed > for init" and similar messages. It seems to me that maybe the hard drive is > not being recognized and it's trying to run everything on ram and just > running out of memory. > > Does anyone know what could be causing this? I have the node log which I can > attach if you wish (I just don't have it with me at the moment). > > Thanks for any help you can lend. > > Sincerely > Neils Christoffersen > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ctierney at hpti.com Thu Apr 11 12:26:50 2002 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de>; from joachim@lfbs.RWTH-Aachen.DE on Thu, Apr 11, 2002 at 08:46:28PM +0200 References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> Message-ID: <20020411132650.A32674@hpti.com> I talked to a guy at SC2002 from Quadrics and he said that list pricing on a Quadrics network was about $3500 per node when you are in the 100s of nodes and up. The price includes the cards, cables, switches, etc. This doesn't include any sort of discount that you might get. Myrinet is about $2000 for an equivelent network at list price. Dolphin/SCI falls around $2245 list per node (if the system is > 144 nodes and you have to get the 3d card). I heard that Quadrics had a customer that just had to have an Intel/Quadrics system so either they or he was working on porting the drivers. The web page says they support Linux and Tru64. You could probably get the hardware without going through Compaq, but Compaq is most likely buying up most of the supply. Craig -- Craig Tierney (ctierney@hpti.com) On Thu, Apr 11, 2002 at 08:46:28PM +0200, Joachim Worringen wrote: > ...Iwao Makino wrote: > > > > I think ... Quadrics is another one. > [...] > > But pricing is MUCH higher than SCI/Myrinet. > > Do you have any pricing information at all? AFAIK, they are only > distribute with Compaq clusters. > > Joachim > > -- > | _ RWTH| Joachim Worringen > |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen > | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim > |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From epaulson at cs.wisc.edu Thu Apr 11 12:32:02 2002 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: ; from emiller@techskills.com on Thu, Apr 11, 2002 at 10:08:16AM -0400 References: <20020411130008.63691.qmail@web9608.mail.yahoo.com> Message-ID: <20020411143202.C27111@perdita.cs.wisc.edu> On Thu, Apr 11, 2002 at 10:08:16AM -0400, Eric Miller wrote: > > After you get the cluster up and running, that's where the help seems to > drift off. Most of the people in this group are upper-level users who know > how to get these MPI enabled programs to run on thier clusters. If you are > like me, these topics are a little foreign. If you are looking for > something to run continuously, like a display, they say the MandelBrot > renderer has a loop function, but I can't get it to work. Someone suggested > SETI many months ago, which would be perfect, but SETI does not offer an MPI > enabled program. > What possible good would an MPI-enabled SETI@Home do? The whole point of SETI@Home is that it's already parallelized. If you've got N nodes, submit N copies of SETI@home to your queuing system, and your cluster will get an N times speedup over a single node. I don't see how you can hope to do better than that. -Erik From becker at scyld.com Thu Apr 11 13:03:31 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:14 2009 Subject: scyld slave node problems In-Reply-To: <00a901c1e190$0924c730$c932ae81@lyapunov> Message-ID: On Thu, 11 Apr 2002, Stepan Kruglikov wrote: > I solved this problem by increasing memory on each node to 64MB. You can > also get the node up and running with 64MB, setup node partitions, and then > delete ram drive and run with 32mB. Although it works, I recommend doing it > only in case if you are interested in proof of concept cluster. It's possible to trim the cached library list in /etc/beowulf/config and fit into 32MB. But only the most trivial application will run with 32MB and no local disk. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From fraser5 at cox.net Thu Apr 11 13:38:44 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:14 2009 Subject: Will the dual Tyan board boot without a graphics card installed? Message-ID: <006a01c1e198$dff52c70$0300005a@papabear> I have had this problem on a couple other boards and it can be annoying. Thanks, Jim From emiller at techskills.com Thu Apr 11 14:01:24 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: <20020411143202.C27111@perdita.cs.wisc.edu> Message-ID: >> >> Someone suggested >> SETI many months ago, which would be perfect, but SETI does not offer an MPI >> enabled program. > >What possible good would an MPI-enabled SETI@Home do? The whole point of >SETI@Home is that it's already parallelized. > My definition of parrellelized is MPI or PVM enabled code, not _distributed_ applications like SETI. When demonstrating to students the capabilities of Linux, its not nearly as convincing to just start N number of instances on N nodes. The magic stuff that we newbie cluster builders seek is not found in that. It is found in having a bona-fide cluster with master and slave nodes, and a single instance of a program being managed and executed by a group of machines. Am I alone in this opinion? >If you've got N nodes, submit N copies of SETI@home to your queuing system, >and your cluster will get an N times speedup over a single node. I don't see >how you can hope to do better than that. I was aware of this possibility, but do not have the skills to implement it. Please see my post from weeks ago, March 11th. It was SETI that I was referring to: --For non-parallel applications, is it possible to run individual instances on --diskless nodes? For example, I want to execute a non-MPI program "A" that --is located in the /bin directory of my master node, but I want to run one --instance of "A" on each of my diskless nodes. --What is the syntax that equates to: --#NP=1 "A" on node0 only --#NP=1 "A" on node1 only --#.... --#.... From math at velocet.ca Thu Apr 11 14:25:02 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:14 2009 Subject: Will the dual Tyan board boot without a graphics card installed? In-Reply-To: <006a01c1e198$dff52c70$0300005a@papabear>; from fraser5@cox.net on Thu, Apr 11, 2002 at 04:38:44PM -0400 References: <006a01c1e198$dff52c70$0300005a@papabear> Message-ID: <20020411172502.F19272@velocet.ca> On Thu, Apr 11, 2002 at 04:38:44PM -0400, Jim Fraser's all... > > I have had this problem on a couple other boards and it can be annoying. I have found that clearing the BIOS and setting everything back up can solve this problem sometimes. /kc > > Thanks, > > Jim > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From epaulson at cs.wisc.edu Thu Apr 11 14:26:24 2002 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: ; from emiller@techskills.com on Thu, Apr 11, 2002 at 05:01:24PM -0400 References: <20020411143202.C27111@perdita.cs.wisc.edu> Message-ID: <20020411162624.E27598@perdita.cs.wisc.edu> On Thu, Apr 11, 2002 at 05:01:24PM -0400, Eric Miller wrote: > >> > >> Someone suggested > >> SETI many months ago, which would be perfect, but SETI does not offer an > MPI > >> enabled program. > > > >What possible good would an MPI-enabled SETI@Home do? The whole point of > >SETI@Home is that it's already parallelized. > > > > My definition of parrellelized is MPI or PVM enabled code, not _distributed_ > applications like SETI. When demonstrating to students the capabilities of > Linux, its not nearly as convincing to just start N number of instances on N > nodes. The magic stuff that we newbie cluster builders seek is not found in > that. It is found in having a bona-fide cluster with master and slave > nodes, and a single instance of a program being managed and executed by a > group of machines. Am I alone in this opinion? > Yes. What you'll discover is that there is no magic to cluster building. If your problem can be solved in parallel just by running N unmodified copies of your code, then that's the way to do it. And there's tons of science to be done this way (in fact, I'd bet there's more to be done this way than with big MPI jobs) If your codes to solve your problem need to be parallelized with MPI or PVM for whatever reason (maybe you don't need to solve N instanances of your code, just one instance and minimize the time, or you need more resources than any one machine can handle - ie 32 gigs of RAM or some such) then you don't really have a choice and you have to break down and do it. But again, there's no magic here. There is not a single instance of you program on the cluster - if your code is using N nodes, then there are N copies of your program on the cluster. (Yes, maybe you're using some quasi-SSI thing like Scyld or MOSIX, but as far as I know both of them still transfer the entire memory image over to the machine, and don't page things over as needed) You can write a program that works exactly like an MPI program with 0 MPI calls - whereever you'd write MPI_Send, just use BSD sockets and send things that way. Tons more to do (you have to locate all the other processes in the computation, you have to worry about buffering, failures, etc) but none of it's unknown. > >If you've got N nodes, submit N copies of SETI@home to your queuing system, > >and your cluster will get an N times speedup over a single node. I don't > see > >how you can hope to do better than that. > > I was aware of this possibility, but do not have the skills to implement it. Yes you do. Download Condor, or PBS, or Sun Grid Engine, or buy Platform LSF, and: A. Install it on N nodes B. Submit N copies or, install Scyld or MOSIX. Type: my_program & N times. -Erik From laytonjb at bellsouth.net Thu Apr 11 13:33:58 2002 From: laytonjb at bellsouth.net (Jeff Layton) Date: Wed Nov 25 01:02:14 2009 Subject: How do you keep clusters running.... References: <1D889E10.452B89F1.009FF3AE@netscape.net> Message-ID: <3CB5F336.4AEFE8EA@bellsouth.net> lightdee@netscape.net wrote: > Doug J Nordwall wrote: > > >On Wed, 2002-04-03 at 13:04, Cris Rhea wrote: > > > > What are folks doing about keeping hardware running on large clusters? > > > > Right now, I'm running 10 Racksaver RS-1200's (for a total of 20 >nodes)... > > > > Sure seems like every week or two, I notice dead fans (each RS-1200 > > has 6 case fans in addition to the 2 CPU fans and 2 power supply > fans). > > > > > >You running lm_sensors on your nodes? That's a handy tool for paying > >attention to things like that. We use ours in combination with ganglia > >and pump it to a web page and to big brother to see when a cpu might be > >getting hot, or a fan might be too slow. We actually saved a dozen > >machines that way...we have 32 4 processor racksaver boxes in a rack, > >and they rack was not designed to handle racksaver's fan system. That is > >to say, there was a solid sidewall on the rack, and it kept in heat. I > >set up lm_sensors on all the nodes (homogenous, so configured on one and > >pushed it out to all), then pumped the data into ganglia > >(ganglia.sourceforge.net) and then to a web page. I noticed that the > >temp on a dozen of the machines was extremely high. So, I took off the > >side panel of the rack. The temp dropped by 15 C on all the nodes, and > >everything was within normal parameters again. > > > > > > My last fan failure was a CPU fan that toasted the CPU and motherboard. > > > > > >Ya, we would have seen this on ours earlier...excellent tool > > [snip] > > We use Clusterworx, which isn't open source (from Linux Networx), but it goes a step further than Ganglia. It uses lm_sensors and a power control > box (again from linux networx) to actually shutdown a node if it is getting > too hot, and the event parameters are all tweakable. It's always a good > idea to have some kind of cluster monitoring software installed, but it's > nice to be able to setup event triggers in your software in case something goes wrong and you're not around. You can set a shutdown temperature via the BIOS on most decent motherboards. You can also easily script this up if you have some power control unit connected to a node that you can talk to (e.g. APC's stuff). All of the stuff you need it available as Opensource. You can hook all of this together with Ganglia if you want. In fact, Matt has announced (or hinted) at the next version of Ganglia that will start to have a number of new features built in (but not nodal shutdown if I remember correctly). Jeff Layton > > > ---- > David Henry > Synergy Software, Inc. > lightdee@netscape.net > > __________________________________________________________________ > Your favorite stores, helpful shopping tools and great gift ideas. Experience the convenience of buying online with Shop@Netscape! http://shopnow.netscape.com/ > > Get your own FREE, personal Netscape Mail account today at http://webmail.netscape.com/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Apr 11 15:58:18 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: <20020411162624.E27598@perdita.cs.wisc.edu> Message-ID: On Thu, 11 Apr 2002, Erik Paulson wrote: > > >If you've got N nodes, submit N copies of SETI@home to your queuing system, > > >and your cluster will get an N times speedup over a single node. I don't > > see > > >how you can hope to do better than that. > > > > I was aware of this possibility, but do not have the skills to implement it. > > Yes you do. Download Condor, or PBS, or Sun Grid Engine, or buy Platform LSF, > and: > A. Install it on N nodes > B. Submit N copies > > or, install Scyld or MOSIX. Type: > my_program & > > N times. And not even for SETI will you get an Nx speedup on N nodes. There is ALWAYS a serial fraction even for embarrassingly parallel applications, and the time required to send the jobs out to the nodes (relative to just looping N times on the node) is part of it. In Amdahl's Law N-fold speedup is the upper bound, not the general, practical limit. This is the basis of Eric's observation about embarassingly parallel jobs being ideal for clusters -- they're the ones that often get very close to N-fold speedup on N nodes for nearly arbitrary N. "Real" parallel jobs (ones with nontrivial communications built on MPI or PVM or raw sockets or even shared memory or some sort of specialized communications channel) almost never do this well, and more often than not will only speedup at all up to some maximum number of nodes and then actually run more slowly if further partitioned. It's also interesting that master-slave jobs were cited as being "real" parallel applications as in many cases the master is nothing more than an intelligent front end for an embarassingly parallel application core. What's the difference between using a script or Mosix or even a bunch of rsh's as the "master" that distributes the jobs and collects the results and using PVM to do exactly the same thing? Not much, really, but perhaps a small edge in network efficiency for that part of things. This may matter -- if the jobs run a short time and communicate with the master a long time it will matter -- but in cases where this paradigm makes sense at all (where the ratio of run to communication is the other way around -- lots of computation, a little communication) it won't matter much. Most of this is in any decent book on parallel computing, including at least one that is freely available on the web. Then there is my online book (which I make no claim for being "decent", but it is free:-). Lots of these resources are on or linked to various cluster sites, including: http://www.phy.duke.edu/brahma rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rickey-co at mug.biglobe.ne.jp Thu Apr 11 15:38:24 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020411132650.A32674@hpti.com> References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> <20020411132650.A32674@hpti.com> Message-ID: AFAIK, it's 'BESTBUY' for 16node per node cost. @$3,500 is good guess for 16nodes including software licensing and hardware(switch, card and cables). Actually for 16 nodes costs litttle less. But for 64/128 is about that range. And for larger... 256/512/1024 and beyond, general idea of $5,000/node. These are their offering price so I assume some volume discounts are applied for larger scales. And yes, not only from Compaq we should be able to purchase, but haven't got that details answered. At 13:26 -0600 11.04.2002, Craig Tierney wrote: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. >The price includes the cards, cables, switches, >etc. This doesn't include any sort of discount that you >might get. Myrinet is about $2000 for an equivelent >network at list price. Dolphin/SCI falls around $2245 list >per node (if the system is > 144 nodes and you have to get >the 3d card). Dolphin/SCI for smaller nodes(<144?) is from $1,695 and larger with 3D chain from $2,245 list. I haven't tested this new 3D version yet. >I heard that Quadrics had a customer that just had to have >an Intel/Quadrics system so either they or he was working >on porting the drivers. The web page says they support >Linux and Tru64. You could probably get the hardware without >going through Compaq, but Compaq is most likely buying up >most of the supply. I know they works on ServerWorks HE and i860 Xeon, also they are working on Plumas and GC-LE. -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From emiller at techskills.com Thu Apr 11 16:52:04 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:14 2009 Subject: (no subject) In-Reply-To: Message-ID: >And not even for SETI will you get an Nx speedup on N nodes. There is >ALWAYS a serial fraction even for embarrassingly parallel applications, >and the time required to send the jobs out to the nodes I guess what I am fundamentally saying is, for a cluster to be "working its magic" it a student's eyes consider two scenarios: 1- Running N iterations of a program, and seeing work^N being done. It's like, um... well yeah if I run SETI on 8 systems, then I will crunch 8 times as many units, but I will NOT crunch 1 unit in 1/8th the time as percieved on the front end node. -OR- 2- Having an 8 node cluster running, say, a raytracer. Then, having a solo machine running the same application. Actually seeing ONE instance render an image (roughly) 8 times faster than a single system (esp. when all of the systems were pulled out of the trash can!!), THAT is the magic that newbies and students want to see. That's the "cool" factor that bring annoying #^$%s like me to this forum and post questions that are outside the arena of analyzing proteins and DNA molecules on 256 node AthlonXP rackmounts with Myrinet. We are not experts, we have ALOT of questions, and all we want to do is see Linux do something cool that we can show our freinds/students/selves. Robert, thank you for your positive and informative reply. From sp at scali.com Thu Apr 11 20:29:07 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020411132650.A32674@hpti.com> Message-ID: On Thu, 11 Apr 2002, Craig Tierney wrote: > > I talked to a guy at SC2002 from Quadrics and he said > that list pricing on a Quadrics network was about $3500 > per node when you are in the 100s of nodes and up. > The price includes the cards, cables, switches, > etc. This doesn't include any sort of discount that you > might get. Myrinet is about $2000 for an equivelent > network at list price. Dolphin/SCI falls around $2245 list > per node (if the system is > 144 nodes and you have to get > the 3d card). > > This is list prices for the cards only, right ? What about the switches needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which makes the total system cost a bit lower doesn't it ?). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From sp at scali.com Thu Apr 11 20:37:17 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:14 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Iwao Makino wrote: > I think ... Quadrics is another one. > Yep, sorry I forgot that one. > Here's quick figures I have on hand.... > > RH7.2, 2.4.9 kernel for i860 cluster. > On their site, they claim; > after protocol, of 340Mbytes/second in each direction. The > process-to-process latency for remote write operations is2us, and 5us for > MPI messages. > But this 340MBytes/second and 2us latency is also chipset dependent, as I mentioned for SCI (in my examples latency was lowest on 760MPX but bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can actually perform as well as 340MByte/sec since the Hub-Link (between the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... > But pricing is MUCH higher than SCI/Myrinet. > Certainly. > Best regards, > > At 4:08 +0200 5.04.2002, Steffen Persvold wrote: > >On Thu, 4 Apr 2002, Jim Lux wrote: > > > >> What's high bandwidth? > >> What's low latency? > > > How much money do you want to spend? > >I don't want to start a flamewar here, but I _think_ (not knowing real > >numbers for other high speed interconnects) that SCI has atleast the > >lowest latency and maybe also the highest point to point bandwidth : > > > >SCI application to application latency : 2.5 us > >SCI application to application bandwidth : 325 MByte/sec > > > >Note that these numbers are very chipset specific (as most high speed > >interconnect numbers are), these numbers are from IA64. Here are numbers > >from a popular IA32 platform, the AMD 760MPX : > > > >SCI application to application latency : 1.8 us > >SCI application to application bandwidth : 283 MByte/sec > > Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From justin at cs.duke.edu Thu Apr 11 20:54:16 2002 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:02:15 2009 Subject: DHCP Help Again In-Reply-To: Message-ID: Hello all, Part of a project I've been working on deals some with boot management and DHCP specifically. I'm not familiar with the NPACI solution/codebase, but I hacked a version of proxydhcp to work with a MySQL backend. It has some nice hooks in it which let you know if the machine is booting PXE or booting from dhclient/pump/whatever. I think having the DB backend is a little nicer than having to worry about the leases file (XML or not) since it gives you more fine-grained control over who has access to the information and how that information gets parsed. Plus it can detect when a new host is coming up and add a mapping to the DB without requiring you to parse through /var/log/messages. :) Obviously my code has some parts which are somewhat project-specific for me (I don't think everyone wants to boot off the same ramdisk I do by default :)) but I could post the code in a few weeks (deadlines coming up) if anyone's interested in such a beast. Another nice part of the DB backend is that generating a future dhcpd.conf file is pretty easy: mysql_query("SELECT HWaddr,IPaddr FROM nics ORDER BY IPaddr"); and then spew the output to a file as desired. :) -jdm Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu On Thu, 11 Apr 2002, Robert G. Brown wrote: > On Wed, 10 Apr 2002 tegner@nada.kth.se wrote: > > > Very helpful! Thanks! > > > > But I'm still curious about how you make - automagically - the hardware ethernet > > line in dhcpd.conf initially. Say you have 100 machines. One way I would think > > of would be to use kickstart and: > > > > Install the machines and boot them up in sequence and using the range statement > > in dhcpd.conf (so that the first machine gets 192.168.1.101, the second > > 192.168.1.102 ...) > > > > Once all nodes are up use some script to extract the mac addresses for all the > > nodes and either modify dhcpd.conf - or - discard of dhcp completely and > > hardwire the ip-addresses to each node. > > > > But I'm sure there are better ways to do this? > > Not that I know of. Maybe somebody else knows of one. I'd just use > perl or bash (either would probably work, although parsing is generally > easier in perl), parse e.g. > > Apr 11 08:18:09 lucifer dhcpd: DHCPREQUEST for 192.168.1.140 from 00:20:e0:6d:a0:05 via eth0 > Apr 11 08:18:09 lucifer dhcpd: DHCPACK on 192.168.1.140 to 00:20:e0:6d:a0:05 via eth0 > > from /var/log/messages on the dhcp server, and write an output routine > to generate > > # golem (Linux/Windows laptop lilith, second/100BT interface) > host golem { > hardware ethernet 00:20:e0:6d:a0:05; > fixed-address 192.168.1.140; > next-server 192.168.1.131; > option routers 192.168.1.1; > option domain-name "rgb.private.net"; > option host-name "golem"; > } > > and > > 192.168.1.140 golem.rgb.private.net golem > > and append them to /etc/dhcpd.conf and /etc/hosts respectively, and then > distribute copies of the resulting /etc/hosts -- as Josip made > eloquently clear your private internal network should resolve > consistently on all PIN hosts and probably should have SOME sort of > domainname defined so that software the might include a > getdomainbyname() call and might not include an adequate check and > handle of a null value can cope. It's hard to know what assumptions > were made by the designer of every single piece of network software you > might want to run... > > Of coures you'll probably want to do the b01, b02, b03... hostname > iteration -- I'm just pulling an example at random out of my own log > tables. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hartner at cs.utah.edu Thu Apr 11 20:55:00 2002 From: hartner at cs.utah.edu (Mark Hartner) Date: Wed Nov 25 01:02:15 2009 Subject: (no subject) In-Reply-To: Message-ID: > analyzing proteins and DNA molecules on 256 node AthlonXP rackmounts with > Myrinet. We are not experts, we have ALOT of questions, and all we want to > do is see Linux do something cool that we can show our > freinds/students/selves. How about encoding some mp3's www.osl.ui.edu/~jsquyres/bladeenc/ Mark From patrick at myri.com Thu Apr 11 22:40:08 2002 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB67338.8080906@myri.com> Steffen Persvold wrote: >>I talked to a guy at SC2002 from Quadrics and he said >>that list pricing on a Quadrics network was about $3500 >>per node when you are in the 100s of nodes and up. >>The price includes the cards, cables, switches, >>etc. This doesn't include any sort of discount that you >>might get. Myrinet is about $2000 for an equivelent >>network at list price. Dolphin/SCI falls around $2245 list >>per node (if the system is > 144 nodes and you have to get >>the 3d card). > > This is list prices for the cards only, right ? Not for Myrinet. Actually $2000 per node is the total cost (NIC/cable/port/software) for the high-end products (with L9/200 MHz), should be more like $1500 for low-end ones. Craig is spoiled, only buys the top stuff :-) > What about the switches > needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which > makes the total system cost a bit lower doesn't it ?). Dunno for QSW, but the NIC represent roughly 3/4 of the price per node for Myrinet. Sure, as the smallest switch has 8 ports (16 ports chassis and one blade with 8 fibers), It is not interesting for very small configurations, i.e less than 8 nodes, but I don't think it's Myricom's market. It's a common mistake to believe that switchless solutions are by definition cheaper. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From manel at labtie.mmt.upc.es Fri Apr 12 01:12:49 2002 From: manel at labtie.mmt.upc.es (Manel Soria) Date: Wed Nov 25 01:02:15 2009 Subject: power control Message-ID: <3CB69701.B9AC38C5@labtie.mmt.upc.es> We need a power control unit for our 72 nodes cluster. My first idea was to do it ourselves with a digital i/o card and a set of relais, but I can't find such a card for Linux. Actually, I have an ISA card that is perfect for this application but for some reason with PCI bus it is more difficult. Also, it seems that the "normal" solution is to buy a comercial APC system. Any experiences with in-house made power controls ? Would you recomend us to buy the APC product ? -- =============================================== Dr. Manel Soria ETSEIT - Centre Tecnologic de Transferencia de Calor C/ Colom 11 08222 Terrassa (Barcelona) SPAIN Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 E-Mail: manel@labtie.mmt.upc.es -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020412/9b8380df/attachment.html From manel at labtie.mmt.upc.es Fri Apr 12 01:53:09 2002 From: manel at labtie.mmt.upc.es (Manel Soria) Date: Wed Nov 25 01:02:15 2009 Subject: power control Message-ID: <3CB6A075.5AE6626@labtie.mmt.upc.es> We need a power control unit for our 72 nodes cluster. My first idea was to do it ourselves with a digital i/o card and a set of relais, but I can't find such a card for Linux. Actually, I have an ISA card that is perfect for this application but for some reason with PCI bus it is more difficult. Also, it seems that the "normal" solution is to buy a comercial APC system. Any experiences with in-house made power controls ? Would you recomend us to buy the APC product ? -- =============================================== Dr. Manel Soria ETSEIT - Centre Tecnologic de Transferencia de Calor C/ Colom 11 08222 Terrassa (Barcelona) SPAIN Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 E-Mail: manel@labtie.mmt.upc.es From suraj_peri at yahoo.com Fri Apr 12 03:16:15 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster Message-ID: <20020412101615.77502.qmail@web10507.mail.yahoo.com> Hi group, I was calculating the performance of my cluster. The features are 1. 8 nodes 2. Processor: AMD Athlon XP 1800+ 3. 8 CPUs 4. 8*1.5 GB DDR RAM 5. 1 Server with 2 processorts with AMD MP 1800+ and 2GB DDR RAM I calculated this to be 48 Mflops . Is this correct ? if not, what is the correct performance of my cluster. I also comparatively calculated that my cluster would be 3 times faster than AlphaServer DS20E ( 833 MHz alpha 64 bit processor, 4 GB max memory) Is my calculation correct or wrong? please help me ASAP. thanks in advance. cheers suraj. ===== PIL/BMB/SDU/DK ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From suraj_peri at yahoo.com Fri Apr 12 03:32:34 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020411125920.D32605@hpti.com> Message-ID: <20020412103234.7580.qmail@web10508.mail.yahoo.com> Hi Craig, Many thanks for your mail. please excuse me for asking a dumb question and I am novice in this area. I am interested in using this cluster for BLAST purposes. I want to store ESTs( Expressed Sequence Tags) and GenBank ( nucleotide sequence database) and GenPept ( Protein sequence database) and total predicted protein sets of Human genome. I will use BLAST ( basic local alignment search tool algorithm) on this cluster. As the computataions are intensive and time consuming. So I wanted to compare the AlphaServer DS20E and my cluster in their computing abilities. Because there are no one in my friend circles no about this. Please help me if you have used clusters for BLAST purpose. thanks Suraj. --- Craig Tierney wrote: > It depends on what you are trying to do (doesn't > everyone > love that answer). > > The number of flops your cluster can do should > be equal to: > > flops = (no. of cpus) * (Mhz) * (flops per hz) > > So for your cluster > > flops = 8 * 1.53 Ghz * 2 > > I am assuming that with SSE you can get 2 flops > per cycle. > > flops = 24.48 Gflops > > Now, there are some issues with this. First, you > are never > going to get 1.53*2 Gflops out of a single > processor. Second, > leveraging all 8 cpus to get their maximum is going > to be > difficult if there is any communication between the > nodes. > > Compilers play a big role in extracting the best > performance > out of the system. If you don't have a commerical > compiler > from the likes of Intel or Portland Group, I highly > recommend > getting one. You only have to purchase the compiler > for where > you compile, and not where you run. You can get > away with > one copy of the compiler on your server. > > If you are trying to compare the AMD system to the > DS20E system, > it will depend on what you are actually trying to > do. If > you are running single precision floating point > codes that do > not require all the memory bandwidth a DS20E > provides, I would > think that within 10% that AMD processor will do the > work > of one 833 Mhz Alpha Cpu (You didn't say if you had > 2 cpus > in your DS20e). At least this is what I am seeing > for my codes when comparing Dual Xeon's, Dual AMD's, > and > dual API 833 boxes. > > Craig > > > > > > On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri > wrote: > > Hi group, > > I was calculating the performance of my cluster. > The > > features are > > > > 1. 8 nodes > > 2. Processor: AMD Athlon XP 1800+ > > 3. 8 CPUs > > 4. 8*1.5 GB DDR RAM > > 5. 1 Server with 2 processorts with AMD MP 1800+ > and > > 2GB DDR RAM > > > > I calculated this to be 48 Mflops . Is this > correct ? > > if not, what is the correct performance of my > cluster. > > I also comparatively calculated that my cluster > would > > be 3 times faster than AlphaServer DS20E ( 833 MHz > > alpha 64 bit processor, 4 GB max memory) > > > > Is my calculation correct or wrong? please help me > > ASAP. thanks in advance. > > > > cheers > > suraj. > > > > ===== > > PIL/BMB/SDU/DK > > > > __________________________________________________ > > Do You Yahoo!? > > Yahoo! Tax Center - online filing with TurboTax > > http://taxes.yahoo.com/ > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Craig Tierney (ctierney@hpti.com) ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From rickey-co at mug.biglobe.ne.jp Thu Apr 11 20:54:59 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: References: Message-ID: At 5:29 +0200 12.04.2002, Steffen Persvold wrote: >On Thu, 11 Apr 2002, Craig Tierney wrote: > >> >> I talked to a guy at SC2002 from Quadrics and he said >> that list pricing on a Quadrics network was about $3500 >> per node when you are in the 100s of nodes and up. >> The price includes the cards, cables, switches, >> etc. This doesn't include any sort of discount that you >> might get. Myrinet is about $2000 for an equivelent >> network at list price. Dolphin/SCI falls around $2245 list >> per node (if the system is > 144 nodes and you have to get >> the 3d card). >> >> > >This is list prices for the cards only, right ? What about the switches >needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which >makes the total system cost a bit lower doesn't it ?). As said on above, QsNet is per node price including all. Do does Myrinet, so SCI/Dolphin and Myrinet is about equal. SCI has good idea of not using switches, but on the other hand, it is little more complex to connect. -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ --> Now Shipping 1U Dual Athlon DDR <- --> Ask me about the new Alpha DDR UP1500 Systems <- From sp at scali.com Fri Apr 12 06:10:34 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: Message-ID: On Fri, 12 Apr 2002, Iwao Makino wrote: > At 5:29 +0200 12.04.2002, Steffen Persvold wrote: > >On Thu, 11 Apr 2002, Craig Tierney wrote: > > > >> > >> I talked to a guy at SC2002 from Quadrics and he said > >> that list pricing on a Quadrics network was about $3500 > >> per node when you are in the 100s of nodes and up. > >> The price includes the cards, cables, switches, > >> etc. This doesn't include any sort of discount that you > >> might get. Myrinet is about $2000 for an equivelent > >> network at list price. Dolphin/SCI falls around $2245 list > >> per node (if the system is > 144 nodes and you have to get > >> the 3d card). > >> > >> > > > >This is list prices for the cards only, right ? What about the switches > >needed. AFAIK Quadrics and Myrinet both need switches, SCI don't (which > >makes the total system cost a bit lower doesn't it ?). > > As said on above, QsNet is per node price including all. > Do does Myrinet, so SCI/Dolphin and Myrinet is about equal. > Yes, sorry I missed that statement :) > SCI has good idea of not using switches, but on the other hand, it is > little more complex to connect. > True, but if one of your Myrinet switches breaks down you loose 64 nodes in a 256 node system (standard "CLOS" configuration). I don't know the MBTF for Myrinet switches, but I would expect it to be rather high (redundant power supplies ?). Please don't misunderstand me, I find the Myrinet interconnect very interesting and also competitive with SCI both from a technological point of view and wrt. pricing. The only thing this list is lacking is some head to head performance comparisons of the different interconnects e.g some NAS benchmarks and maybe also PMB. Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From sp at scali.com Fri Apr 12 06:15:37 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:15 2009 Subject: power control In-Reply-To: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: On Fri, 12 Apr 2002, Manel Soria wrote: > > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > We've had success with the Baytech (www.baytechdcd.com) RPC-3 units. The only disadvantage is that they only have 8 controllable ports (which means that you need 9 of them...). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From rgb at phy.duke.edu Fri Apr 12 06:17:57 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:15 2009 Subject: (no subject) In-Reply-To: Message-ID: On Thu, 11 Apr 2002, Eric Miller wrote: > Myrinet. We are not experts, we have ALOT of questions, and all we want to > do is see Linux do something cool that we can show our > freinds/students/selves. > > Robert, thank you for your positive and informative reply. I appreciate your interest and understand your goals, actually; whenever I present beowulfery to a new group (which I seem to do three or four times a year) I do exactly the same thing -- a bit of a dog and pony show. In addition to the pvm povray, I like the pvm mandelbrot set demo (xep) which I've hacked so the colormap is effectively deeper and so that it doesn't run out of floating point room so rapidly. I've been using or playing with mandelbrot set demo programs long enough that I can remember when it would take a LONG time to update a single rubberbanded section. Nowadays one can quickly enough get to the bottom of double precision resolution even on a single CPU -- 13 digits isn't really all that many when you rubberband down close to an order of magnitude at a time. Still, with even a small cluster you can get nearly linear speedup and actually "see" the nodes returning their independent strips -- if you have mix of "slow" nodes and faster ones you can even learn some useful things about parallel programming just watching them come in and discussing what you see. The only point I was making is that your class should definitely take the time to go over at least Amdahl's law and one of the improved estimates that account for both the serial fraction and the communications time, and get some understanding of the embarassingly parallel (SETI, distributed monte carlo) -> coarse grained, non-synchronous (pvmpov, xep) -> coarse grained, synchronous (lattice partitioned monte carlo) -> medium-to-fine grained, (non-)synchronous (galactic evolution, weather models) sequencing where for each step up the chain one has to exercise additional care in engineering an effective cluster to deal with it. EP chores (as Eric pointed out) are "perfect" for a cluster because "any" cluster or parallel computer including the simplest SMP boxes will do. Coarse grained tasks will also generally run well on a "standard" linux cluster -- a bunch of boxes on a network, where the kind of network and whether the boxes are workstations, desktops in active use, or dedicated nodes doesn't much matter. When you hit synchronous tasks in general, but especially the finer grained synchronous tasks (tasks where all nodes have to complete a parallel computation sequence -- reach a "barrier" -- and then exchange information before beginning the next parallel computation sequence) then you really have to start paying attention to the network (latency and bandwidth both), it helps to have dedicated nodes that AREN'T doing double duty as workstations (since the rate of progress is determined by the slowest node), and most of these tasks have a strict upper bound on the number of nodes that one can assign to a task and still decrease the time of completion. This last point is a very important one. It is easy to see a coarse grained task speed up N-fold on N nodes and conclude that all problems can them be solved faster if we just add more nodes. Make sure that your students see that this is not so, so that if they ever DO engineer a compute cluster to accomplish some particular task, they don't just buy lots of nodes, but instead do the arithmetic first... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From bob at drzyzgula.org Fri Apr 12 06:43:31 2002 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:02:15 2009 Subject: power control In-Reply-To: <3CB6A075.5AE6626@labtie.mmt.upc.es> References: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: <20020412094331.I20839@www2> We have good luck with the Pulizzi "Z-line" controllers: http://www.pulizzi.com/ Their high-end units can be networked into an RS-485 chain, so that dozens of units can be controled from a single serial interface. --BOb On Fri, Apr 12, 2002 at 10:53:09AM +0200, Manel Soria wrote: > > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > > -- > =============================================== > Dr. Manel Soria > ETSEIT - Centre Tecnologic de Transferencia de Calor > C/ Colom 11 08222 Terrassa (Barcelona) SPAIN > Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 > E-Mail: manel@labtie.mmt.upc.es > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From timm at fnal.gov Fri Apr 12 07:19:07 2002 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:02:15 2009 Subject: power control In-Reply-To: <3CB69701.B9AC38C5@labtie.mmt.upc.es> Message-ID: We run with the APC units on our cluster. We use the vertical-mount strips which are good for 20 amps apiece. They provide three major benefits. One is that you can sequence the power up to have a delay so that the power draw of all systems coming on at once doesn't saturate your circuit. Second is that you can remote reset a node that is hung and not responding without going into the computer room. Third is that you get a real-time monitor of how much current your systems are drawing. They are somewhat expensive but do have good discounts for educational institutions and non-profits. Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations On Fri, 12 Apr 2002, Manel Soria wrote: > We need a power control unit for our 72 nodes cluster. My first > idea was to do it ourselves with a digital i/o card and a set of relais, but > I can't find such a card for Linux. Actually, I have an ISA card that > is perfect for this application but for some reason with PCI bus it > is more difficult. Also, it seems that the "normal" solution is to buy a > comercial APC system. > > Any experiences with in-house made power controls ? Would you > recomend us to buy the APC product ? > > -- > =============================================== > Dr. Manel Soria > ETSEIT - Centre Tecnologic de Transferencia de Calor > C/ Colom 11 08222 Terrassa (Barcelona) SPAIN > Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 > E-Mail: manel@labtie.mmt.upc.es > > > From muno at aem.umn.edu Fri Apr 12 07:54:05 2002 From: muno at aem.umn.edu (Ray Muno) Date: Wed Nov 25 01:02:15 2009 Subject: power control In-Reply-To: References: <3CB6A075.5AE6626@labtie.mmt.upc.es> Message-ID: <20020412095405.A10200@aem.umn.edu> We are using a variety of Baytech power control units in our 2 clusters. We have 3 RPC28 units powering 48 1U Dual PIII machines in 2 racks. They have 30A inputs and 21 outlets (20 controlled, 1 always on). With the Dual PIII boxes, I can only run 16 from each strip. Each strip is divided in to a pair of 15A segments, 8 machines per segment. At full load, 10 machines was too much for a 15A segment. I was really suprised at the increase in power draw under load when we first started running these. In addition, there are 2 RPC4-20 running the disk arrays, server boxes and ethernet and Myrinet switches in the 2 racks. All told, 130A availble in the 2 racks, all pretty well utilized. We also have a pair of RPC3-20 powering 2 racks of Alpha machines. These have ethernet interfaces but we decided later it was not worth the added cost. I could not be happier with these units. They can be configured to stage the startup of the machine in sequence so you do not try and power up 48 machines all at one time. On Fri, Apr 12, 2002 at 03:15:37PM +0200, Steffen Persvold wrote: > On Fri, 12 Apr 2002, Manel Soria wrote: > > > > > We need a power control unit for our 72 nodes cluster. My first > > idea was to do it ourselves with a digital i/o card and a set of relais, but > > I can't find such a card for Linux. Actually, I have an ISA card that > > is perfect for this application but for some reason with PCI bus it > > is more difficult. Also, it seems that the "normal" solution is to buy a > > comercial APC system. > > > > Any experiences with in-house made power controls ? Would you > > recomend us to buy the APC product ? > > > > We've had success with the Baytech (www.baytechdcd.com) RPC-3 units. The > only disadvantage is that they only have 8 controllable ports (which > means that you need 9 of them...). > > Regards, > -- > Steffen Persvold | Scalable Linux Systems | Try out the world's best > mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ============================================================================= Ray Muno http://www.aem.umn.edu/people/staff/muno University of Minnesota e-mail: muno@aem.umn.edu Aerospace Engineering and Mechanics Phone: (612) 625-9531 110 Union St. S.E. FAX: (612) 626-1558 Minneapolis, Mn 55455 ============================================================================= From patrick at myri.com Fri Apr 12 08:17:08 2002 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB6FA74.6040704@myri.com> Steffen Persvold wrote: > True, but if one of your Myrinet switches breaks down you loose 64 nodes > in a 256 node system (standard "CLOS" configuration). I don't know the > MBTF for Myrinet switches, but I would expect it to be rather high > (redundant power supplies ?). The calculated MTBF of the switches is +50 years. Actually, if all 6 fans go off, it will still work, then the switch will drop more and more packets, then the uC will shutdown the blades one by one if they reach the critical temperature limit. If there is a failure on a blade itself, it will affect only 8 ports. If there is a failure in a crossbar on the backplane, the mapper will use a redondant route (as many redondant routes as crossbars, so a failure in each 8 crossbars on the backplane is required to loose all ports). Chuck made a very nice talk at Cluster2001 about Clos topology. It presents thing very clearly, I like it a lot: http://www.cacr.caltech.edu/cluster2001/program/talks/seitz.pdf Regards. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From eugen at leitl.org Fri Apr 12 08:55:02 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:15 2009 Subject: [gamma_sw] New release of GAMMA available (fwd) Message-ID: ---------- Forwarded message ---------- Date: Fri, 12 Apr 2002 17:34:47 +0200 (MET DST) From: Giuseppe Ciaccio To: GAMMA mailing list Subject: [gamma_sw] New release of GAMMA available A wonderful, new release of GAMMA is available for download: http://www.disi.unige.it/project/gamma, section "How to install" Main features: 1) A driver for the Netgear GA621/GA622 Gigabit Ethernet adapter is now provided. The driver has been excellently implemented by Marco Ehlert (mehlert@cs.uni-potsdam.de), with support of prof. Bettina Schnor (schnor@cs.uni-potsdam.de) and in cooperation with myself (supported by prof. Schnor during a nice stage at Potsdam). I thank Marco and Bettina very much for this beautiful experience. The driver has been tested on a 16-nodes cluster using the GA621 adapters. We still miss tests on the GA622 (which should be backward-compatible). Performance numbers are impressive. The Potsdam testbed was a pair of back-to-back connected PCs, each with CPU Intel Pentium III 1 GHz motherboard: SuperMicro 370DE6 (chipset: ServerSet III HE-SL) 133 MHz FSB PCI bus 66 MHz, 64 bit Netgear GA621 adapter, dedicated to GAMMA Linux 2.4.16 + GAMMA On such a testbed, Marco got the following numbers: MTU size Latency (usec) Throughput (MByte/s) 1500 8.5 118.5 4116 8.5 122 2) Minor changes to the GAMMA user API: the family of set_port() routines has been slightly rearranged. This has implications on MPI/GAMMA, a new release of which is also available for download: http://www.disi.unige.it/project/gamma/mpigamma Older versions of MPI/GAMMA will no longer compile under the current version of GAMMA. 3) Documentation has been updated. The mysterious lock-up problems reported by someone on this mailing list might have been caused by the use of gcc 2.96. Still investigating...but I'm not yet able to reproduce the bug here (because I don't use gcc 2.96 ?). Enjoy! Giuseppe Ciaccio http://www.disi.unige.it/person/CiaccioG/ DISI - Universita' di Genova via Dodecaneso 35 16146 Genova, Italy phone +39 10 353 6638 fax +39 010 3536699 ciaccio@disi.unige.it ------------------------------------------------------------------------ _______________________________________________ gamma_sw mailing list gamma_sw@lists.dsi.uniroma1.it http://lists.dsi.uniroma1.it/mailman/listinfo/gamma_sw From hungjunglu at yahoo.com Fri Apr 12 08:56:35 2002 From: hungjunglu at yahoo.com (Hung Jung Lu) Date: Wed Nov 25 01:02:15 2009 Subject: BLAS-1, AMD, Pentium, gcc Message-ID: <20020412155635.92564.qmail@web12605.mail.yahoo.com> Hi, I am thinking in migrating some calculation programs from Windows to Linux, maybe eventually using a Beowulf cluster. However, I am kind of worried after I read in the mailing list archive about lack of CPU-optimized BLAS-1 code in Linux systems. Currently I run on a Wintel (Windows+Pentium) machine, and I know it's substantially faster than equivalent AMD machine, because I use the Intel's BLAS (MKL) library. (I apologize for any misapprehensions in what follows... I am only starting to explore in this arena.) (1) Does anyone know when gcc will have memory prefetching features? Any time frame? I can notice very significant performance improvement on my Wintel machine, and I think it's due to memory prefetching. (2) I am a bit confused on the following issue: Intel does release MKL for Linux. So, does this mean that if I use Pentium, I still get full benefit of the CPU-optimized features in BLAS-1, despite of gcc does not do memory prefetching? How is this possible? (3) Related to the above: for general linear algebra operations, is Pentium processor then better than AMD, since Intel has the machine-optimized BLAS library? I get contradictory information sometimes... I've seen somewhere that Pentium-4 compares unfavorably with AMD chips in calculation speed... Any opinions? thanks, Hung Jung Lu __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From rochus.schmid at ch.tum.de Fri Apr 12 09:33:18 2002 From: rochus.schmid at ch.tum.de (Dr. Rochus Schmid) Date: Wed Nov 25 01:02:15 2009 Subject: k7s5a mobo based cluster Message-ID: <3CB70C4E.9D1E720E@ch.tum.de> dear wulfers, i recently assembled a tiny and cheap (i know :-) cluster using the ecs k7s5a mobo (SIS 735 chipset). the board is very cheap (~75 ?) and comes with FE (SIS900) onboard. just want to tell about my experience (other hardware, netbooot + stream / netpipe results). i would be happy to know of others using this board. this mail also contains some questions ... maybe someone can answer or help me here? if this doesn't interst you please skip - apologies for the bandwidth. ############# HARDWARE (currently 4 nodes .. hope to get 4 more :-) mobo: k7s5a cpu: athlonxp 1,4 ghz (1600+) ram: 256 MB DDR (266MHz, CL2) graphics: various pci/agp graphics cards i could find floppy small tower with 250W PS nodes are diskless, master has an additional 40GB IDE disk. switch: D-Link DES 1008D 8 port switch. ############## POWER /GRAPHICS the cluster has continuously been up for about 3 weeks now with quite some load for most of the time. as far as i can tell, the 250W seems to be ok for the board and the 1,4 ghz athlonxp. the ami-bios does not allow booting without a graphics adapter. someone on the net (using a lot of the boards for a SETI@home "farm" told me that he did not get around it even with teaking tools for the ami-bios. i am happy to have a console for maintenance but one has to find a cheapo graphics card ... anyone out there managed to avoid this? ############## NETBOOT / OS i run a RH7.2 on it with a 2.4.17 kernel with NFS-root. the bios supports the RPL protocol for netbooting. i tried the rpld for linux and the board seems to communicate with the rpl-server and download something, but i didnt get it to boot. the rpld-developers sent me some patch to "switch off a DMA channel" of the onboard NIC but i have to admit that i didnt really understand what to do, nor did i try it. i currently boot from a syslinux floppy and use NFS-root. did soemone manage to netboot linux with this hardware? ############### STREAM because of the comments on the gcc versions 2.96 versus 2.95 issue mentioned on the ATLAS webpages i reinstalled the gcc 2.95 and found differences also for stream results and therefore i will post both here (both compiled with -O2 for comparison) Array size = 2000000, Offset = 0 gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) -O2 Function Rate (MB/s) RMS time Min time Max time Copy: 666.0980 0.0489 0.0480 0.0559 Scale: 585.3939 0.0547 0.0547 0.0549 Add: 726.1178 0.0662 0.0661 0.0663 Triad: 679.6655 0.0707 0.0706 0.0707 gcc version 2.95.3 20010315 (release) -O2 Function Rate (MB/s) RMS time Min time Max time Copy: 727.6031 0.0440 0.0440 0.0443 Scale: 627.5864 0.0526 0.0510 0.0649 Add: 798.1775 0.0602 0.0601 0.0603 Triad: 727.6691 0.0660 0.0660 0.0661 i guess it is obvious to reinstall gcc-2.95 when using RH7.1 or RH7.2. these results are not as good as reported recently for the nforce chipset. i tried to set the bios settings for the ddr-ram to optimal, but i didnt test/experiment. ########## NetPipe-2.4 The following results are NOT from a crossconnect cable but measured throug the D-Link switch!! kernel 2.4.17 / NIC-driver SIS900 for MPI: LAM-MPI 6.5.1 NPtcp: latency: 33 us bandwidth: ~89.7 MBit/s NPmpi: latency: 41 us bandwidth: ~82 MBit/s (maximux at about 85 MBit/s) the latency of around 40 microsec seems to be very low as far as i can tell from the information on the net (i am absolutly a beginner in this field). is there anything one can seriously do wrong? i tried it a couple of times. between different nodes always with basically the same result. ################### i hope i did not anoy the pros on this list too much, and this is helpfull for comparison. again: please contact me off list if you also use this type of hardware. thanks and best greetings from munich, rochus -- Dr. Rochus Schmid Technische Universit?t M?nchen Lehrstuhl f. Anorganische Chemie Lichtenbergstrasse 4, 85747 Garching Tel. ++49 89 2891 3174 Fax. ++49 89 2891 3473 Email rochus.schmid@ch.tum.de From ctierney at hpti.com Fri Apr 12 10:01:51 2002 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB67338.8080906@myri.com>; from patrick@myri.com on Fri, Apr 12, 2002 at 01:40:08AM -0400 References: <3CB67338.8080906@myri.com> Message-ID: <20020412110151.A1491@hpti.com> On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > Steffen Persvold wrote: > > >>I talked to a guy at SC2002 from Quadrics and he said > >>that list pricing on a Quadrics network was about $3500 > >>per node when you are in the 100s of nodes and up. > >>The price includes the cards, cables, switches, > >>etc. This doesn't include any sort of discount that you > >>might get. Myrinet is about $2000 for an equivelent > >>network at list price. Dolphin/SCI falls around $2245 list > >>per node (if the system is > 144 nodes and you have to get > >>the 3d card). > > > > This is list prices for the cards only, right ? > > Not for Myrinet. Actually $2000 per node is the total cost > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > should be more like $1500 for low-end ones. Craig is spoiled, only buys > the top stuff :-) Sorry Patrick. The problem with trying to state numbers is that if you get it wrong, then the real knowledgeable ones can point it out. I figure out list cost on a 256 node system at about $2000 before for basic hardware. I as wrong. I reworked it and it is $1500 for 256 (and would be the same for 512 and 1024). I thought I had decent information. What I was trying to provide was info on the three options. The SCI price is per node (cards, cables, software). It is $2245 list. This would be appropriate for systems over 144 nodes. The Quadrics number I got from the rep might have been the sales number, and not a perfect comparsion to all the required hardware for a system of a few hundred nodes. Craig From ctierney at hpti.com Fri Apr 12 10:15:52 2002 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020412103234.7580.qmail@web10508.mail.yahoo.com>; from suraj_peri@yahoo.com on Fri, Apr 12, 2002 at 03:32:34AM -0700 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> Message-ID: <20020412111552.B1491@hpti.com> All my experience is with oceanography and atmospheric applications. Is the BLAST code something that spends lots of time trying doing lots of little calculations, or doing one big calculation? How important is the speed of access to the database? What is the memory footprint of the code when it runs on the DS20E? Craig On Fri, Apr 12, 2002 at 03:32:34AM -0700, Suraj Peri wrote: > Hi Craig, > Many thanks for your mail. please excuse me for asking > a dumb question and I am novice in this area. > I am interested in using this cluster for BLAST > purposes. I want to store ESTs( Expressed Sequence > Tags) and GenBank ( nucleotide sequence database) and > GenPept ( Protein sequence database) and total > predicted protein sets of Human genome. > I will use BLAST ( basic local alignment search tool > algorithm) on this cluster. As the computataions are > intensive and time consuming. > So I wanted to compare the AlphaServer DS20E and my > cluster in their computing abilities. > Because there are no one in my friend circles no about > this. Please help me if you have used clusters for > BLAST purpose. > thanks > Suraj. > > --- Craig Tierney wrote: > > It depends on what you are trying to do (doesn't > > everyone > > love that answer). > > > > The number of flops your cluster can do should > > be equal to: > > > > flops = (no. of cpus) * (Mhz) * (flops per hz) > > > > So for your cluster > > > > flops = 8 * 1.53 Ghz * 2 > > > > I am assuming that with SSE you can get 2 flops > > per cycle. > > > > flops = 24.48 Gflops > > > > Now, there are some issues with this. First, you > > are never > > going to get 1.53*2 Gflops out of a single > > processor. Second, > > leveraging all 8 cpus to get their maximum is going > > to be > > difficult if there is any communication between the > > nodes. > > > > Compilers play a big role in extracting the best > > performance > > out of the system. If you don't have a commerical > > compiler > > from the likes of Intel or Portland Group, I highly > > recommend > > getting one. You only have to purchase the compiler > > for where > > you compile, and not where you run. You can get > > away with > > one copy of the compiler on your server. > > > > If you are trying to compare the AMD system to the > > DS20E system, > > it will depend on what you are actually trying to > > do. If > > you are running single precision floating point > > codes that do > > not require all the memory bandwidth a DS20E > > provides, I would > > think that within 10% that AMD processor will do the > > work > > of one 833 Mhz Alpha Cpu (You didn't say if you had > > 2 cpus > > in your DS20e). At least this is what I am seeing > > for my codes when comparing Dual Xeon's, Dual AMD's, > > and > > dual API 833 boxes. > > > > Craig > > > > > > > > > > > > On Sat, Apr 06, 2002 at 03:35:45AM -0800, Suraj Peri > > wrote: > > > Hi group, > > > I was calculating the performance of my cluster. > > The > > > features are > > > > > > 1. 8 nodes > > > 2. Processor: AMD Athlon XP 1800+ > > > 3. 8 CPUs > > > 4. 8*1.5 GB DDR RAM > > > 5. 1 Server with 2 processorts with AMD MP 1800+ > > and > > > 2GB DDR RAM > > > > > > I calculated this to be 48 Mflops . Is this > > correct ? > > > if not, what is the correct performance of my > > cluster. > > > I also comparatively calculated that my cluster > > would > > > be 3 times faster than AlphaServer DS20E ( 833 MHz > > > alpha 64 bit processor, 4 GB max memory) > > > > > > Is my calculation correct or wrong? please help me > > > ASAP. thanks in advance. > > > > > > cheers > > > suraj. > > > > > > ===== > > > PIL/BMB/SDU/DK > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Yahoo! Tax Center - online filing with TurboTax > > > http://taxes.yahoo.com/ > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or > > unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > > Craig Tierney (ctierney@hpti.com) > > > ===== > PIL/BMB/SDU/DK > > __________________________________________________ > Do You Yahoo!? > Yahoo! Tax Center - online filing with TurboTax > http://taxes.yahoo.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From tim at dolphinics.com Fri Apr 12 10:02:04 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? References: <3CB5DA04.AF2C609F@lfbs.rwth-aachen.de> <20020411132650.A32674@hpti.com> Message-ID: <3CB7130C.60107@dolphinics.com> Craig Tierney wrote: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. >The price includes the cards, cables, switches, >etc. This doesn't include any sort of discount that you >might get. Myrinet is about $2000 for an equivelent >network at list price. Dolphin/SCI falls around $2245 list >per node (if the system is > 144 nodes and you have to get >the 3d card). > A couple of corrections to this, the 2D card lists at $1695 per node and is suitable for up to 256 nodes. No one has built one that size and it is correct that is it is recommended to use 3D (lists $2245/node) for larger than 144 nodes. This is due to a potential saturation of a ring for certain communication patterns, it is not always the case. By going to 3D you shorten the rings and avoid this up to 1728 nodes. Anyone interested in building one? > > >I heard that Quadrics had a customer that just had to have >an Intel/Quadrics system so either they or he was working >on porting the drivers. The web page says they support >Linux and Tru64. You could probably get the hardware without >going through Compaq, but Compaq is most likely buying up >most of the supply. > >Craig > The Quadrics looks interesting, but I haven't the resources to afford the pleasure of playing with it. The major issue with it is pricing and lack of nodes out there using it pricing. Myricom and Dolphin tend to come to about the same price per node, chalk it up to friendly competition. Regards, Tim Wilcox From djholm at fnal.gov Fri Apr 12 10:36:22 2002 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:02:15 2009 Subject: BLAS-1, AMD, Pentium, gcc In-Reply-To: <20020412155635.92564.qmail@web12605.mail.yahoo.com> Message-ID: On Fri, 12 Apr 2002, Hung Jung Lu wrote: > Hi, > > I am thinking in migrating some calculation programs > from Windows to Linux, maybe eventually using a > Beowulf cluster. However, I am kind of worried after I > read in the mailing list archive about lack of > CPU-optimized BLAS-1 code in Linux systems. Currently > I run on a Wintel (Windows+Pentium) machine, and I > know it's substantially faster than equivalent AMD > machine, because I use the Intel's BLAS (MKL) library. > (I apologize for any misapprehensions in what > follows... I am only starting to explore in this > arena.) > > (1) Does anyone know when gcc will have memory > prefetching features? Any time frame? I can notice > very significant performance improvement on my Wintel > machine, and I think it's due to memory prefetching. If you mean, "when will gcc's optimizer do automatic prefetching?", I have no idea. But, many programmers have been doing manual prefetching with gcc for quite a while. If you don't mind defining and using assembler macros, gcc handles it just fine now. Here's an example: #define prefetch_loc(addr) \ __asm__ __volatile__ ("prefetchnta %0" \ : \ : \ "m" (*(((char*)(((unsigned int)(addr))&~0x7f))))) > (2) I am a bit confused on the following issue: Intel > does release MKL for Linux. So, does this mean that if > I use Pentium, I still get full benefit of the > CPU-optimized features in BLAS-1, despite of gcc does > not do memory prefetching? How is this possible? The Intel compiler produces object files compatible with gcc, and vice versa. I would assume they implemented the library with the Intel compiler, which has full SSE/SSE2 support (including prefetching). They list the MKL for Linux as compatible with both gnu and Intel compilers. > (3) Related to the above: for general linear algebra > operations, is Pentium processor then better than AMD, > since Intel has the machine-optimized BLAS library? I > get contradictory information sometimes... I've seen > somewhere that Pentium-4 compares unfavorably with AMD > chips in calculation speed... Any opinions? > > thanks, > > Hung Jung Lu For the very simple SU3 linear algebra (3X3 complex matrices and 3X1 complex vectors) used in our codes, the Pentium 4 outperforms the Athlon on most of our SSE-assisted routines. See the table near the bottom of http://qcdhome.fnal.gov/sse/inline.html for Mflops per gigahertz on various routines for P-III, P4, and Athlon. Perhaps re-coding in 3DNow! would give the Athlon a boost. For our codes, which are bound by memory bandwidth, P4's do significantly better than Athlons because of the faster front side bus (400 Mhz effective). See http://qcdhome.fnal.gov/qcdstream/compare.qcdstream for a table comparing memory bandwidth and SU3 linear algebra performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see http://qcdhome.fnal.gov/qcdstream/ for information about this benchmark). Don Holmgren Fermilab From sp at scali.com Fri Apr 12 10:50:26 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <20020412110151.A1491@hpti.com> Message-ID: On Fri, 12 Apr 2002, Craig Tierney wrote: > On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > > Steffen Persvold wrote: > > > > >>I talked to a guy at SC2002 from Quadrics and he said > > >>that list pricing on a Quadrics network was about $3500 > > >>per node when you are in the 100s of nodes and up. > > >>The price includes the cards, cables, switches, > > >>etc. This doesn't include any sort of discount that you > > >>might get. Myrinet is about $2000 for an equivelent > > >>network at list price. Dolphin/SCI falls around $2245 list > > >>per node (if the system is > 144 nodes and you have to get > > >>the 3d card). > > > > > > This is list prices for the cards only, right ? > > > > Not for Myrinet. Actually $2000 per node is the total cost > > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > > should be more like $1500 for low-end ones. Craig is spoiled, only buys > > the top stuff :-) > > Sorry Patrick. The problem with trying to state numbers is that > if you get it wrong, then the real knowledgeable ones can point it out. > > I figure out list cost on a 256 node system at about $2000 before for > basic hardware. I as wrong. I reworked it and it is $1500 for > 256 (and would be the same for 512 and 1024). So what is wron with my calculations : 256 node L9/2MB/133MHz config : M3F-PCI64B-2 NICs 256 * $1,195 = $305,920 M3-E128 Switch enclosures 6 * $12,800 = $76,800 M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 ----------------------------------------------------- Total cost = $561,920 Node cost = $2,195 and for a L9/2MB/200MHz config : M3F-PCI64B-2 NICs 256 * $1,495 = $382,720 M3-E128 Switch enclosures 6 * $12,800 = $76,800 M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 ----------------------------------------------------- Total cost = $638,720 Node cost = $2,495 And this is without cable cost (since I don't quite know the cable requirements for the total system, but atleast it is approx $100 per node). > > I thought I had decent information. What I was trying to provide was > info on the three options. The SCI price is per node (cards, cables, > software). It is $2245 list. This would be appropriate for systems over > 144 nodes. > Actually, Wulfkit3 comes in two flavors wether you want 1U or can manage with a 2U (or higher) solutuion. The 1U is $2,445 and the 2U solution is $2,245 per node. > The Quadrics number I got from the rep might have been the sales number, > and not a perfect comparsion to all the required hardware for a system of > a few hundred nodes. > Now we have price comparisons for the interconnects (SCI,Myrinet and Quadrics). What about performance ? Does anyone have NAS/PMB numbers for ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks HE-SL based cluster). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From lindahl at keyresearch.com Fri Apr 12 10:43:35 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster In-Reply-To: <20020412111552.B1491@hpti.com>; from ctierney@hpti.com on Fri, Apr 12, 2002 at 11:15:52AM -0600 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> Message-ID: <20020412134335.B1810@wumpus.skymv.com> On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote: > Is the BLAST code something that spends lots > of time trying doing lots of little calculations, > or doing one big calculation? How important is > the speed of access to the database? What is > the memory footprint of the code when it runs > on the DS20E? It depends. What BLAST does is compare a set of sequences against a big database of sequences. The databases come in small, medium, and large (bigger than 2 GByte) sizes; the sequences can either be a single sequence (imagine a researcher looking up a single protein using a web interface) or a large set of them. If it's a large set, the problem is embarrassingly parallel. The BLAST implementation used by most people isn't parallel. It can be fairly easily parallelized to divide the big database up into pieces. People build fairly different clusters to run BLAST depending on their details. The guys at Celera Geonmics didn't want to use a parallel version, and their database is bigger than 2 GBytes, so they bought Alphas. Most people have small enough databases to fit into 2 GBytes, but search against 1 sequence at a time, so they can't afford to read the entire database over NFS every time, and keep it on a local disk. greg From ctierney at hpti.com Fri Apr 12 11:19:13 2002 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: ; from sp@scali.com on Fri, Apr 12, 2002 at 07:50:26PM +0200 References: <20020412110151.A1491@hpti.com> Message-ID: <20020412121913.B1508@hpti.com> On Fri, Apr 12, 2002 at 07:50:26PM +0200, Steffen Persvold wrote: > On Fri, 12 Apr 2002, Craig Tierney wrote: > > > On Fri, Apr 12, 2002 at 01:40:08AM -0400, Patrick Geoffray wrote: > > > Steffen Persvold wrote: > > > > > > >>I talked to a guy at SC2002 from Quadrics and he said > > > >>that list pricing on a Quadrics network was about $3500 > > > >>per node when you are in the 100s of nodes and up. > > > >>The price includes the cards, cables, switches, > > > >>etc. This doesn't include any sort of discount that you > > > >>might get. Myrinet is about $2000 for an equivelent > > > >>network at list price. Dolphin/SCI falls around $2245 list > > > >>per node (if the system is > 144 nodes and you have to get > > > >>the 3d card). > > > > > > > > This is list prices for the cards only, right ? > > > > > > Not for Myrinet. Actually $2000 per node is the total cost > > > (NIC/cable/port/software) for the high-end products (with L9/200 MHz), > > > should be more like $1500 for low-end ones. Craig is spoiled, only buys > > > the top stuff :-) > > > > Sorry Patrick. The problem with trying to state numbers is that > > if you get it wrong, then the real knowledgeable ones can point it out. > > > > I figure out list cost on a 256 node system at about $2000 before for > > basic hardware. I as wrong. I reworked it and it is $1500 for > > 256 (and would be the same for 512 and 1024). Your calcuations are fine. I shouldn't be allowed to add and multiply numbers. When I redid the numbers I redid them incorrectly. From list prices, cables are about $100 each, and you need two per card. So add about $200 to your prices. > > So what is wron with my calculations : > > 256 node L9/2MB/133MHz config : > > M3F-PCI64B-2 NICs 256 * $1,195 = $305,920 > M3-E128 Switch enclosures 6 * $12,800 = $76,800 > M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 > M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 > ----------------------------------------------------- > Total cost = $561,920 > Node cost = $2,195 > > and for a L9/2MB/200MHz config : > > M3F-PCI64B-2 NICs 256 * $1,495 = $382,720 > M3-E128 Switch enclosures 6 * $12,800 = $76,800 > M3-SW16-8F "Leaf" cards 32 * $2,400 = $76,800 > M3-SPINE-8F "Spine" cards 64 * $1,600 = $102,400 > ----------------------------------------------------- > Total cost = $638,720 > Node cost = $2,495 > > And this is without cable cost (since I don't quite know the cable > requirements for the total system, but atleast it is approx $100 per > node). > > > > > I thought I had decent information. What I was trying to provide was > > info on the three options. The SCI price is per node (cards, cables, > > software). It is $2245 list. This would be appropriate for systems over > > 144 nodes. > > > > Actually, Wulfkit3 comes in two flavors wether you want 1U or can manage > with a 2U (or higher) solutuion. The 1U is $2,445 and the 2U solution is > $2,245 per node. > > > The Quadrics number I got from the rep might have been the sales number, > > and not a perfect comparsion to all the required hardware for a system of > > a few hundred nodes. > > > > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). I don't think that anyone is going to have numbers on the same hardware. Too bad. It would be interesting to see the differences. However, that may end all the discussions and that would be no fun. Craig From patrick at myri.com Fri Apr 12 12:48:00 2002 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CB739F0.6000109@myri.com> Steffen Persvold wrote: >>I figure out list cost on a 256 node system at about $2000 before for >>basic hardware. I as wrong. I reworked it and it is $1500 for >>256 (and would be the same for 512 and 1024). >> > So what is wron with my calculations : > > 256 node L9/2MB/133MHz config : > Node cost = $2,195 > and for a L9/2MB/200MHz config : > Node cost = $2,495 Nothing, it's right for 256 nodes. However: 128 nodes L9/133 MHz config: Node cost = $1,595 128 nodes L9/200 MHz config: Node cost = $1,895 For more than 128 ports, the number of switches increases to keep a guaranteed full-bissection, it adds about $500 per node. However, up to 128 nodes, you need only one switch. and the numbers I gave are correct. The switchless cost model makes sense for configs > than the biggest switch size for switched technologies, ie. 128 ports for Quadrics and Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-) > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). Ok, I will say again what I think about these comparaisons: it's already hard to compare dollars (what about discount, what about support, what about software, etc) despite that it the same dollars, it's wasting time to do that for micro-benchmarks. It's something you do when you want to publish something in a conference next to a beach. When a customer asks me about performance, I don't give him my NAS or PMB numbers, he doesn't care. He wants access to a XXX nodes machine to play with and run his set of applications, or he gives a list of codes to the vendors for the bid and the vendors guarantee the results because it's used officially in the bid process. If someone buys a machine because the NAS look pretty and his CFD code sucks, this guy will take his stuffs and look for a new job. Do you spend time to tune NAS ? I don't. People already told me that the NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much better. How many application behaves like the NAS LU, how many like HPL ? If a customer comes to me because his code behaves like NAS LU, I will tell him what to tune in his code to be more efficient. The pitfall with benchmarks is that you want to tune your MPI implementation to looks good on them. In real world, you cannot expect to run efficiently a code on a machine without tuning it, specially with MPI. My 2 pennies Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From robert at bay13.de Fri Apr 12 12:25:06 2002 From: robert at bay13.de (Robert Depenbrock) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> <20020412134335.B1810@wumpus.skymv.com> Message-ID: <3CB73492.78900CF4@bay13.de> Greg Lindahl wrote: > Hi Greg, > On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig Tierney wrote: > > > Is the BLAST code something that spends lots > > of time trying doing lots of little calculations, > > or doing one big calculation? How important is > > the speed of access to the database? What is > > the memory footprint of the code when it runs > > on the DS20E? > > It depends. > > What BLAST does is compare a set of sequences against a big database of > sequences. The databases come in small, medium, and large (bigger than > 2 GByte) sizes; the sequences can either be a single sequence (imagine > a researcher looking up a single protein using a web interface) or a > large set of them. If it's a large set, the problem is embarrassingly > parallel. > > The BLAST implementation used by most people isn't parallel. It can be > fairly easily parallelized to divide the big database up into pieces. > > People build fairly different clusters to run BLAST depending on their > details. The guys at Celera Geonmics didn't want to use a parallel > version, and their database is bigger than 2 GBytes, so they bought > Alphas. Most people have small enough databases to fit into 2 GBytes, > but search against 1 sequence at a time, so they can't afford to read > the entire database over NFS every time, and keep it on a local disk. Do you have some sample proteins and databases ? I would like to test some machines i have availble to mess around a little bit. (HP PA-Risc Series, SUN Sparc Fire, Itanium, Power PC). I would like to build a little benchmark around these datasets. regards Robert Depenbrock -- nic-hdl RD-RIPE http://www.bay13.de/ e-mail: robert@bay13.de Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E AC87 6830 F5DD From sp at scali.com Fri Apr 12 13:41:23 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB739F0.6000109@myri.com> Message-ID: On Fri, 12 Apr 2002, Patrick Geoffray wrote: > Steffen Persvold wrote: > > >>I figure out list cost on a 256 node system at about $2000 before for > >>basic hardware. I as wrong. I reworked it and it is $1500 for > >>256 (and would be the same for 512 and 1024). > >> > > > So what is wron with my calculations : > > > > 256 node L9/2MB/133MHz config : > > Node cost = $2,195 > > and for a L9/2MB/200MHz config : > > Node cost = $2,495 > > Nothing, it's right for 256 nodes. However: > > 128 nodes L9/133 MHz config: > Node cost = $1,595 > 128 nodes L9/200 MHz config: > Node cost = $1,895 > > For more than 128 ports, the number of switches increases to keep a > guaranteed full-bissection, it adds about $500 per node. However, up to > 128 nodes, you need only one switch. and the numbers I gave are correct. > Yes, I was just questioning Craig's numbers. I was actually suprised that the Myrinet node cost didn't increase more when going from 128 to 256 nodes since it basically involves a lot more hardware (i.e 4 additional switch enclousures, and 64 additional "spine" cards). > The switchless cost model makes sense for configs > than the biggest > switch size for switched technologies, ie. 128 ports for Quadrics and > Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-) > The largest SCI cluster (atleast switchless) is indeed 132 nodes. > > Now we have price comparisons for the interconnects (SCI,Myrinet and > > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > > HE-SL based cluster). > > Ok, I will say again what I think about these comparaisons: it's already > hard to compare dollars (what about discount, what about support, what > about software, etc) despite that it the same dollars, it's wasting time > to do that for micro-benchmarks. It's something you do when you want to > publish something in a conference next to a beach. > When a customer asks me about performance, I don't give him my NAS or > PMB numbers, he doesn't care. He wants access to a XXX nodes machine to > play with and run his set of applications, or he gives a list of codes > to the vendors for the bid and the vendors guarantee the results because > it's used officially in the bid process. If someone buys a machine > because the NAS look pretty and his CFD code sucks, this guy will take > his stuffs and look for a new job. > > Do you spend time to tune NAS ? I don't. People already told me that the > NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much > better. How many application behaves like the NAS LU, how many like HPL > ? If a customer comes to me because his code behaves like NAS LU, I will > tell him what to tune in his code to be more efficient. > > The pitfall with benchmarks is that you want to tune your MPI > implementation to looks good on them. In real world, you cannot expect > to run efficiently a code on a machine without tuning it, specially with > MPI. > I think that most people on this list agrees that it is really the customers application that counts, not NAS nor PMB numbers (and no, I don't spend much time tuning NAS it was a bad example). I also agree with most of your other statements, however I still think that atleast a MPI specific benchmark such as PMB (don't know if it's available for PVM...) will give the customers an initial feeling on what interconnect they need (if they know how their application is architected). > My 2 pennies > Thanks, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From joachim at sonne.lfbs.rwth-aachen.de Fri Apr 12 14:02:58 2002 From: joachim at sonne.lfbs.rwth-aachen.de (joachim) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CB739F0.6000109@myri.com> from Patrick Geoffray at "Apr 12, 2002 03:48:00 pm" Message-ID: <200204122102.XAA02537@wikkit.lfbs.rwth-aachen.de> Fully d'accord, but: comparing applications like MM5, GROMACS, ... based on the interconnect and MPI library (on otherwise identical systems) *would* make sense. At least for interconnect and MPI designers, and also for marketing (after carefully chosing the right cases...). And maybe for some buying decisions for smaller "home built" systems. We can discuss this on CAC'02 on monday. ;) regards, Joachim From lindahl at keyresearch.com Fri Apr 12 15:23:46 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster In-Reply-To: <3CB73492.78900CF4@bay13.de>; from robert@bay13.de on Fri, Apr 12, 2002 at 09:25:06PM +0200 References: <20020411125920.D32605@hpti.com> <20020412103234.7580.qmail@web10508.mail.yahoo.com> <20020412111552.B1491@hpti.com> <20020412134335.B1810@wumpus.skymv.com> <3CB73492.78900CF4@bay13.de> Message-ID: <20020412182346.A1990@wumpus.skymv.com> On Fri, Apr 12, 2002 at 09:25:06PM +0200, Robert Depenbrock wrote: > Do you have some sample proteins and databases ? Robert, I almost got a benchmark suite for BLAST together, but got side-tracked before I had anything useful. Just like 10,000 other projects ;-) greg From mas at ucla.edu Fri Apr 12 15:37:50 2002 From: mas at ucla.edu (Michael Stein) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: ; from sp@scali.com on Fri, Apr 12, 2002 at 05:37:17AM +0200 References: Message-ID: <20020412153750.A5315@mas1.ats.ucla.edu> > bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can > actually perform as well as 340MByte/sec since the Hub-Link (between > the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... The data sheet for the i860 shows 3 separate Hub-links A, B and C. A is 266 MByte/sec (and typically runs the 33 Mhz 32 bit stuff). B and C are 533 MByte/sec each and drive the P64Hs. (16 bits * 66 Mhz * 4x data xfers). http://developer.intel.com/design/chipsets/datashts/290713.htm The pdf is about 1.1 MB. From djholm at fnal.gov Fri Apr 12 16:11:41 2002 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: <20020412153750.A5315@mas1.ats.ucla.edu> Message-ID: Unfortunately the measured performance doesn't match the published specs. DMA rates reported by the Myrinet driver on 64/66 cards are about 315 MB/sec and 225 MB/sec, respectively, for bus writes and reads. See the reported measurements on a number of i860-based motherboards at Greg Lindahl's page, http://www.conservativecomputer.com/myrinet/perf.html This has been a sore point for lots of folks wanting to build clusters with i860-based machines. Don Holmgren Fermilab On Fri, 12 Apr 2002, Michael Stein wrote: > > bandwidth was highest on IA64 460GX...). I can't imagine that the i860 can > > actually perform as well as 340MByte/sec since the Hub-Link (between > > the MCH and the P64H) has a limit of 266MByte/sec (AFAIK) .... > > The data sheet for the i860 shows 3 separate Hub-links A, B and C. > > A is 266 MByte/sec (and typically runs the 33 Mhz 32 bit stuff). > > B and C are 533 MByte/sec each and drive the P64Hs. > (16 bits * 66 Mhz * 4x data xfers). > > http://developer.intel.com/design/chipsets/datashts/290713.htm > > The pdf is about 1.1 MB. > From fraser5 at cox.net Fri Apr 12 16:51:25 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:15 2009 Subject: BLAS-1, AMD, Pentium, gcc In-Reply-To: Message-ID: <001001c1e27c$f59f1470$0400005a@papabear> Sure the optimized BLAS by Intel IS faster (on Intel) the data you present while very impressive but are skewed towards Intel because the libs are optimized for ONLY for SSE and intel chips while AMD does not really fully SSE. BUT should replace your stale BLAS code with optimized ATLAS on for your AMD chips....its a whole new world my friend! AMD really kicks some butt when the libs are optimized for cache size. It blew me away. The libs optimize for a specific chip cache and detect for SSE or 3Dnow! and really exploit it and the performance is very impressive. (as well as the makefile that runs for quite some time to produce the libs.) Download the latest developers version compile and sit back and smile. WELL WORTH THE EFFORT, no question. I got into this to port a cfd code over from intel/mkl/scalapack/mpi to amd/atlas/scalapack/mpi. The bang for the buck with AMD is no comparison after you run with this package. BTW, the Atlas libs also run on intel ( runs ANY chip for that matter) and improved performance over the intel MKL package as well (for some chips = on others). I don't have the all numbers off hand but I would suggest you re-run your case with ATLAS, your conclusion may change. try it. Its free. (PS get the developers source and compile instead of downloading the binary, the term) http://www.netlib.org/atlas/ Jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Don Holmgren Sent: Friday, April 12, 2002 1:36 PM To: Hung Jung Lu Cc: beowulf@beowulf.org Subject: Re: BLAS-1, AMD, Pentium, gcc On Fri, 12 Apr 2002, Hung Jung Lu wrote: > Hi, > > I am thinking in migrating some calculation programs > from Windows to Linux, maybe eventually using a > Beowulf cluster. However, I am kind of worried after I > read in the mailing list archive about lack of > CPU-optimized BLAS-1 code in Linux systems. Currently > I run on a Wintel (Windows+Pentium) machine, and I > know it's substantially faster than equivalent AMD > machine, because I use the Intel's BLAS (MKL) library. > (I apologize for any misapprehensions in what > follows... I am only starting to explore in this > arena.) > > (1) Does anyone know when gcc will have memory > prefetching features? Any time frame? I can notice > very significant performance improvement on my Wintel > machine, and I think it's due to memory prefetching. If you mean, "when will gcc's optimizer do automatic prefetching?", I have no idea. But, many programmers have been doing manual prefetching with gcc for quite a while. If you don't mind defining and using assembler macros, gcc handles it just fine now. Here's an example: #define prefetch_loc(addr) \ __asm__ __volatile__ ("prefetchnta %0" \ : \ : \ "m" (*(((char*)(((unsigned int)(addr))&~0x7f))))) > (2) I am a bit confused on the following issue: Intel > does release MKL for Linux. So, does this mean that if > I use Pentium, I still get full benefit of the > CPU-optimized features in BLAS-1, despite of gcc does > not do memory prefetching? How is this possible? The Intel compiler produces object files compatible with gcc, and vice versa. I would assume they implemented the library with the Intel compiler, which has full SSE/SSE2 support (including prefetching). They list the MKL for Linux as compatible with both gnu and Intel compilers. > (3) Related to the above: for general linear algebra > operations, is Pentium processor then better than AMD, > since Intel has the machine-optimized BLAS library? I > get contradictory information sometimes... I've seen > somewhere that Pentium-4 compares unfavorably with AMD > chips in calculation speed... Any opinions? > > thanks, > > Hung Jung Lu For the very simple SU3 linear algebra (3X3 complex matrices and 3X1 complex vectors) used in our codes, the Pentium 4 outperforms the Athlon on most of our SSE-assisted routines. See the table near the bottom of http://qcdhome.fnal.gov/sse/inline.html for Mflops per gigahertz on various routines for P-III, P4, and Athlon. Perhaps re-coding in 3DNow! would give the Athlon a boost. For our codes, which are bound by memory bandwidth, P4's do significantly better than Athlons because of the faster front side bus (400 Mhz effective). See http://qcdhome.fnal.gov/qcdstream/compare.qcdstream for a table comparing memory bandwidth and SU3 linear algebra performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see http://qcdhome.fnal.gov/qcdstream/ for information about this benchmark). Don Holmgren Fermilab _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From suraj_peri at yahoo.com Sat Apr 13 03:21:52 2002 From: suraj_peri at yahoo.com (Suraj Peri) Date: Wed Nov 25 01:02:15 2009 Subject: What could be the performance of my cluster In-Reply-To: <3CB73492.78900CF4@bay13.de> Message-ID: <20020413102152.24030.qmail@web10506.mail.yahoo.com> BLAST ( Basic Local Alignment Search tool) takes the query ( either protein or DNA) sequence and try to match the small pathces ( lets say it breaks your sequence in to small pieces of 6 letters and then try to match them in a the database index file) . Once BLAST algo. finds any small match it tries to extend your query sequence for further match in the database. If it finds more then it makes a score and represent that score. If it doesnt then it represents low score and based on low scores we do not consider lower score hits. Thus, in my opinion it does many claculations and finally show the scores. ( P-value) Interestingly , BLAST is considered a local alignment search tool because it tries to match bits of your query sequence and then extends for more matches. in contrast there is another algorithm called FASTA ( Fast alignment search tool ) this is a global ( means it takes big chunks of sequences and then tries to thread them over database). So Bill Pearson (creator) made a PVM version of FASTA and his students at virginia are using it on a beowulf cluster. ( You can access that at ftp://ftp.virginia.edu/pub/fasta/) In my case my database would be ~80 GB. ( i hope to use this much data over NFS) I am planning to introduce this algorithm in every node and then using MPICH I would like to ask my node to access the whole database using NFS. I am new to this area, but I wonder the ideas I am having are practical or not. We will start configuring our cluster some time in May. cheers suraj. --- Robert Depenbrock wrote: > Greg Lindahl wrote: > > > > Hi Greg, > > > On Fri, Apr 12, 2002 at 11:15:52AM -0600, Craig > Tierney wrote: > > > > > Is the BLAST code something that spends lots > > > of time trying doing lots of little > calculations, > > > or doing one big calculation? How important is > > > the speed of access to the database? What is > > > the memory footprint of the code when it runs > > > on the DS20E? > > > > It depends. > > > > What BLAST does is compare a set of sequences > against a big database of > > sequences. The databases come in small, medium, > and large (bigger than > > 2 GByte) sizes; the sequences can either be a > single sequence (imagine > > a researcher looking up a single protein using a > web interface) or a > > large set of them. If it's a large set, the > problem is embarrassingly > > parallel. > > > > The BLAST implementation used by most people isn't > parallel. It can be > > fairly easily parallelized to divide the big > database up into pieces. > > > > People build fairly different clusters to run > BLAST depending on their > > details. The guys at Celera Geonmics didn't want > to use a parallel > > version, and their database is bigger than 2 > GBytes, so they bought > > Alphas. Most people have small enough databases to > fit into 2 GBytes, > > but search against 1 sequence at a time, so they > can't afford to read > > the entire database over NFS every time, and keep > it on a local disk. > > Do you have some sample proteins and databases ? > > I would like to test some machines i have availble > to mess around a > little bit. > (HP PA-Risc Series, SUN Sparc Fire, Itanium, Power > PC). > > I would like to build a little benchmark around > these datasets. > > regards > Robert Depenbrock > > -- > nic-hdl RD-RIPE > http://www.bay13.de/ > e-mail: robert@bay13.de > Fingerprint: 1CEF 67DC 52D7 252A 3BCD 9BC4 2C0E > AC87 6830 F5DD > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== PIL/BMB/SDU/DK __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Sat Apr 13 11:18:39 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? Message-ID: I'm doing some benchmarks to evaluate whether current Macs would make suitable nodes for a serial farm (lots of nodes, preferably fast CPU and dram, but no serious interconnect.) I've tried a variety of real codes and benchmarks, but can't seem to get something like a Mac G4/800 with PC133 to perform anywhere close to even a P4/1.7/i845/PC133. I'm using either the gcc 2.95 that comes with OSX or a recent 3.1 snapshot (which is MUCH better, but still bad). is it just that the performance Apple brags about is strictly in-cache, and/or when doing something ah specialized like single-precision SIMD (altivec/velocity engine)? I haven't really pushed to track down an account on the very latest dual G4/1000, but afaikt it's got the same boring PC133. is anyone using Macs in clusters, and what kind of performance are you observing? thanks, mark hahn. From wrp at alpha0.bioch.virginia.edu Sat Apr 13 13:11:25 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Wed Nov 25 01:02:15 2009 Subject: BLAST and FASTA benchmarks Message-ID: <200204132011.QAA22280@alpha0.bioch.virginia.edu> There was a bit of misinformation about the difference between the BLAST and FASTA programs for protein and DNA sequence comparison program. Both BLAST and FASTA search for local sequence similarity - indeed they have exactly the same goals, though they use somewhat different algorithms and statistical approaches. The advantage of an ES40 or other large shared memory machine for BLAST is that it has been optimized for searching databases that are large memory mapped files, and it runs multithreaded. PVM and MPI versions of BLAST are not available, but, it is important to remember that BLAST is extremely fast, and highly optimized to go through a large amount of memory very quickly; it would be difficult to provide an equally efficient distributed version - but, of course, a distributed memory machine would be much cheaper. PVM and MPI versions of FASTA are available. FASTA actually is a package of about a dozen programs that vary more than 100-fold in speed. It is easy to make efficient PVM/MPI versions of the slower algorithms (Smith-Waterman, TFASTY, TFASTX); parallel versions of the FASTA algorithm are less efficient. How to benchmark BLAST and FASTA - As Greg Lindahl pointed out, the appropriate platform for BLAST (less so for FASTA) depends on the size of the database. Very few databases are larger than 2 Gb (I think the person who said he had an 80 Gb database was mistaken - the largest publically available sequence database, Genbank, currently has 17Gb of sequence data). In contrast, protein sequence databases are much smaller, typically 50 - 500 Mb). If you would like to try searching some protein or DNA sequence databases, they are available from ftp.ncbi.nih.gov/blast/db. nr.Z and swissprot.Z are two representative protein sequence databases, nt.Z and est_mouse.Z are representative DNA databases. Simply select 10 - 100 sequences at random from these databases and run them against the full size databases. Bill Pearson From ron_chen_123 at yahoo.com Sat Apr 13 16:39:29 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: <20020413233929.65399.qmail@web14706.mail.yahoo.com> --- Mark Hahn wrote: > I'm doing some benchmarks to evaluate whether > current Macs would make suitable nodes for a serial > farm (lots of nodes, preferably fast CPU and dram, > but no serious interconnect.) Physics or bioscience code? > I've tried a variety of real codes and benchmarks, > but can't seem to get something like a Mac G4/800 > with PC133 to perform anywhere close to even a > P4/1.7/i845/PC133. > > I'm using either the gcc 2.95 that comes with OSX or > a recent 3.1 snapshot (which is MUCH better, but > still bad). What compiler are you using for the P4? > is it just that the performance Apple brags about is > strictly in-cache, and/or when doing something ah > specialized like single-precision SIMD >(altivec/velocity engine)? Apple has some libraries that take advantage of the Altivec instructions. http://www.apple.com/downloads/macosx/math_science/applegenentechblast.html > is anyone using Macs in clusters, and what kind of > performance > are you observing? AFAIK, there are several people using MacOS X in clusters, the SGE (Sun Grid Engine) project has a port for Mac OS X. May be you should ask for the experience in setting up Mac OS X compute farms. SGE is specifically written for that environment. SGE home: http://wwws.sun.com/software/gridware/ SGE Open source site: http://gridengine.sunsource.net Search for "Mac OS" in the mailing list Archive. http://gridengine.sunsource.net/servlets/SearchList?listName=dev&by=thread -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From steveb at aei-potsdam.mpg.de Sat Apr 13 17:37:42 2002 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Wed Nov 25 01:02:15 2009 Subject: DMA difficulties Message-ID: Hi all, This question may be very slightly off-topic, so I apologize. I'm in the process of setting up a network installation procedure using PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, among other things. One particular note is that I don't need/want CDROMs in these systems. Now, a vendor provided me with a couple of test nodes basically to our specifications, except that they included CDROMs and floppies. To make a longish story shorter, I wanted to make sure that the nodes work fine without the CDROM. So, I first looked into the BIOS. I disabled (set to "None") Primary Slave, Secondary Master/Slave (since my HDD is Primary Master), removed the CDROM from the list of boot devices, and disabled the Secondary IDE channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to enforce the HDD to use DMA during the Kickstart installation. Now, here is the kicker: regardless of the BIOS settings, if I have the CDROM plugged in (power+IDE, on the secondary channel) the installation takes ~ 5 times faster than if the thing isn't there. This installation includes installation of ~470 packages plus formatting the HDD. That's right, as long as the CDROM is plugged in, everything is peachy, but once gone, things slow down. I think this is a problem with the DMA settings, b/c when I pass "ide=nodma" to the kernel, WITH the CD attached, performance is slow. However, I can't even force DMA to be used. If anyone has any suggestions or similar experiences, please let me know. Thanks a bunch! Steve ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From alvin at Maggie.Linux-Consulting.com Sat Apr 13 18:09:18 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:15 2009 Subject: DMA difficulties In-Reply-To: Message-ID: hi ya i notice that when the cable is attached... things goes bonkers... even if no power ot the drive ( hd or cdrom ) remove the ide cable from the motherboard if its not used and tell the bios NOT to autodetect ide devices except those that is in fact present 150 nodes.... hummm .... one full cabinet..front and back.. :-) c ya alvin http://www.Linux-1U.net On Sun, 14 Apr 2002, Steven Berukoff wrote: > > Hi all, > > This question may be very slightly off-topic, so I apologize. > > I'm in the process of setting up a network installation procedure using > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > among other things. One particular note is that I don't need/want CDROMs > in these systems. > > Now, a vendor provided me with a couple of test nodes basically to our > specifications, except that they included CDROMs and floppies. To make a > longish story shorter, I wanted to make sure that the nodes work fine > without the CDROM. > > So, I first looked into the BIOS. I disabled (set to "None") Primary > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > the CDROM from the list of boot devices, and disabled the Secondary IDE > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > enforce the HDD to use DMA during the Kickstart installation. > > Now, here is the kicker: regardless of the BIOS settings, if I have the > CDROM plugged in (power+IDE, on the secondary channel) the installation > takes ~ 5 times faster than if the thing isn't there. This installation > includes installation of ~470 packages plus formatting the HDD. That's > right, as long as the CDROM is plugged in, everything is peachy, but once > gone, things slow down. > > I think this is a problem with the DMA settings, b/c when I pass > "ide=nodma" to the kernel, WITH the CD attached, performance is > slow. However, I can't even force DMA to be used. > > If anyone has any suggestions or similar experiences, please let me know. > > Thanks a bunch! > Steve > > > ===== > Steve Berukoff tel: 49-331-5677233 > Albert-Einstein-Institute fax: 49-331-5677298 > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From robl at mcs.anl.gov Sat Apr 13 18:29:56 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: References: Message-ID: <20020414012956.GA20390@mcs.anl.gov> On Sat, Apr 13, 2002 at 02:18:39PM -0400, Mark Hahn wrote: > is it just that the performance Apple brags about is strictly > in-cache, and/or when doing something ah specialized like > single-precision SIMD (altivec/velocity engine)? it's the altivec unit that makes G4s at all interesting. if you aren't using the vector unit, yeah, you won't even come close to x86. gcc is multi-platform, sure, but it's optimizer for x86 has received a lot of attention, while the powerpc optimizer has not. your observation that gcc 3.1 performance is better shows that focus on powerpc optimizations has grown, but yeah, it's going to get less attention than x86. too bad, really. register pressure on a powerpc is much less than on x86 ( register pressure on just about any arch not stack-based is less than that on x86 :> ) you are running on mac os x, yes? is there any chance you could put linux on it? if your application is making a significant number of system calls ( file i/o, network traffic... you know, system calls ) os x will hurt you. I'd be curious to hear if your application performs better under linux on powerpc (debian, suse, mandrake, yellowdog; there are many options) than it does under os x on the same hardware. ( if you use linux, you'll have to hand-code some assembly to use the G4. samples abound on the web. but if you are compute-intensive anyway, you might not see gains running under linux) microbenchmarks don't always correlate well with application performance, but here are lmbench numbers. the hardware is constant while i varied the operating system: http://clustermonkey.org/~laz/pbook/lmbench.powerbook.txt (the numbers are nearly 8 months old, but the newer versions of os X do not show any remarkable improvement and in fact regress on some scores) rgb, do you know what the cputest curves look like for a G4 mac? also bear in mind that G4s run significantly cooler than their x86 counterparts, so you might still come out ahead on price/performance, where price takes into account initial purchase + cost of running the cluster. so there you go. there are lots of reasons why you'll have to actually spend a bit of effort to move to a new architecture. i hope no one on this list finds that idea surprising. ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From echiu at imservice.com Sat Apr 13 18:57:36 2002 From: echiu at imservice.com (Eric Chiu) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" References: Message-ID: <00a101c1e357$ece44f90$e3c0fea9@squaw> Has anyone set up a cluster using the CD from Spector's book "Building Linux Clusters" (O'Reilly)? Eric Chiu, author/consultant Imservice, Inc. www.imservice.com From jsmith at structbio.vanderbilt.edu Sat Apr 13 19:45:41 2002 From: jsmith at structbio.vanderbilt.edu (Jarrod Smith) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: On Sat, 13 Apr 2002, Mark Hahn wrote: > is it just that the performance Apple brags about is strictly > in-cache, and/or when doing something ah specialized like > single-precision SIMD (altivec/velocity engine)? I've been making a foray into OS X on G4 hardware recently. After having compiled and benchmarked a couple of our compute-intensive codes, I have wondered the same thing... So far double-precision floating point has not impressed me in the least on the G4. Jarrod Smith From robl at mcs.anl.gov Sat Apr 13 19:58:49 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414025849.GB20390@mcs.anl.gov> On Sat, Apr 13, 2002 at 06:57:36PM -0700, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? http://www.oreilly.com/catalog/clusterlinux/ i'm guessing the answer is 'no' ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From walke at usna.edu Sat Apr 13 19:58:54 2002 From: walke at usna.edu (Vann H. Walke) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <1018753136.25541.3.camel@walkeonline.com> I don't have the book, but suspect that the included software would be well out of date. If you're just getting into clustering, I would suggest trying the Scyld distribution. You can get it for $3 at linuxcentral.com. Good Luck, Vann On Sat, 2002-04-13 at 21:57, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? > > Eric Chiu, author/consultant > Imservice, Inc. > www.imservice.com > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From spoel at xray.bmc.uu.se Sun Apr 14 00:18:02 2002 From: spoel at xray.bmc.uu.se (David van der Spoel) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: On Sat, 13 Apr 2002, Jarrod Smith wrote: >> is it just that the performance Apple brags about is strictly >> in-cache, and/or when doing something ah specialized like >> single-precision SIMD (altivec/velocity engine)? > >I've been making a foray into OS X on G4 hardware recently. After having >compiled and benchmarked a couple of our compute-intensive codes, I have >wondered the same thing... > >So far double-precision floating point has not impressed me in the least >on the G4. We have done some single precision (gcc with altivec code) tests using our molecular dynamics code GROMACS. The results are on http://www.gromacs.org/benchmarks/single.php the numbers are simulation time/real time, i.e. higher is better. The G4 is slightly slower than an Athlon (w 3DNow)/P3 (w SSE) at the same clock. Havent't tested double precision yet. Groeten, David. ________________________________________________________________________ Dr. David van der Spoel, Biomedical center, Dept. of Biochemistry Husargatan 3, Box 576, 75123 Uppsala, Sweden phone: 46 18 471 4205 fax: 46 18 511 755 spoel@xray.bmc.uu.se spoel@gromacs.org http://zorn.bmc.uu.se/~spoel ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From echiu at imservice.com Sat Apr 13 23:22:01 2002 From: echiu at imservice.com (Eric Chiu) Date: Wed Nov 25 01:02:15 2009 Subject: BladeFrame vs Beowulf References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <012201c1e37c$f71fa2a0$e3c0fea9@squaw> Has anyone worked on one of these BladeFrame? http://www.egenera.com/prod_spec_overview.php I'm wondering how this compares to a custom-built Beowulf. I like how they have consolidated the networking and hardware in this proprietary architecture. One of the biggest problems in a Beowulf is keeping track of the boxes and ethernet connections. Eric Chiu, author/consultant Imservice, Inc. www.imservice.com From emiller at techskills.com Sun Apr 14 05:28:21 2002 From: emiller at techskills.com (Eric Miller) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: >Has anyone set up a cluster using the CD from Spector's book >"Building Linux Clusters" (O'Reilly)? Eric, I tried to order that book about a year ago, it was taken out of print (at least the edition then was). An email response from the publisher stated that the book was such low quality that they had to take it off the shelves, too many returns/reader complaints. You may have a newer edition. From opengeometry at yahoo.ca Sun Apr 14 08:59:12 2002 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: ; from emiller@techskills.com on Sun, Apr 14, 2002 at 08:28:21AM -0400 References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414115912.A13058@node0.opengeometry.ca> On Sun, Apr 14, 2002 at 08:28:21AM -0400, Eric Miller wrote: > >Has anyone set up a cluster using the CD from Spector's book > >"Building Linux Clusters" (O'Reilly)? > > Eric, > > I tried to order that book about a year ago, it was taken out of print (at > least the edition then was). An email response from the publisher stated > that the book was such low quality that they had to take it off the shelves, > too many returns/reader complaints. > > You may have a newer edition. I have it, but it's so out-of-date now. Try Mosix or Beowulf. -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From hahn at physics.mcmaster.ca Sun Apr 14 09:24:54 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: <20020413233929.65399.qmail@web14706.mail.yahoo.com> Message-ID: > --- Mark Hahn wrote: > > I'm doing some benchmarks to evaluate whether > > current Macs would make suitable nodes for a serial > > farm (lots of nodes, preferably fast CPU and dram, > > but no serious interconnect.) > > Physics or bioscience code? why does it matter? we're not trying specifically to run BLAST, if that's what you're asking. I don't see any reason why the department would matter, but it's a mixture of math, chem, physics, astro, biologists, and perhaps a few psychologists. > > I've tried a variety of real codes and benchmarks, > > but can't seem to get something like a Mac G4/800 > > with PC133 to perform anywhere close to even a > > P4/1.7/i845/PC133. > > > > I'm using either the gcc 2.95 that comes with OSX or > > a recent 3.1 snapshot (which is MUCH better, but > > still bad). > > What compiler are you using for the P4? I'm pretty happy with recent snapshots of gcc 3.1 (pre-release). (still mystified why gnu fortran people are stuck at F77, but...) > > is it just that the performance Apple brags about is > > strictly in-cache, and/or when doing something ah > > specialized like single-precision SIMD > >(altivec/velocity engine)? > > Apple has some libraries that take advantage of the > Altivec instructions. linpack/lapack/atlas/fftw? > AFAIK, there are several people using MacOS X in > clusters, the SGE (Sun Grid Engine) project has a port > for Mac OS X. which doesn't give me ANY data on performance. From opengeometry at yahoo.ca Sun Apr 14 09:23:27 2002 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:02:15 2009 Subject: [MAILER-DAEMON@x263.net: Undelivered Mail Returned to Sender] Message-ID: <20020414122327.A13288@node0.opengeometry.ca> To list maintainer: Please unsubscribe . Everytime I post to the list, I get rejected notice from . It should go to you! -- William Park, Open Geometry Consulting, 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin -------------- next part -------------- An embedded message was scrubbed... From: MAILER-DAEMON@x263.net (Mail Delivery System) Subject: Undelivered Mail Returned to Sender Date: Mon, 15 Apr 2002 00:16:41 +0800 (CST) Size: 5390 Url: http://www.scyld.com/pipermail/beowulf/attachments/20020414/7bdf47bb/attachment.mht From heckendo at cs.uidaho.edu Sun Apr 14 10:05:39 2002 From: heckendo at cs.uidaho.edu (Robert B Heckendorn) Date: Wed Nov 25 01:02:15 2009 Subject: MPI/PVM for BLAST and FASTA In-Reply-To: <200204141601.g3EG14G09306@blueraja.scyld.com> Message-ID: <200204141705.KAA16409@brownlee.cs.uidaho.edu> Bill Pearson's paragraph introduces so many great questions that maybe Bill or others can answer. > The advantage of an ES40 or other large shared memory machine for > BLAST is that it has been optimized for searching databases that are > large memory mapped files, and it runs multithreaded. PVM and MPI > versions of BLAST are not available but, it is important to remember > that BLAST is extremely fast, and highly optimized to go through a > large amount of memory very quickly; it would be difficult to provide > an equally efficient distributed version - but, of course, a > distributed memory machine would be much cheaper. I think I could learn a lot by listening to the details of why this is not done. So here goes: Why is it that BLAST is not available for MPI/PVM? I would think clusters would be the prefect host for such an application. Is it there is no need because BLAST is already so fast and no one wants to break the database out onto node-resident disks? Or is it that BLAST is kept running on single processor or shared memory machines BLAST so that the DB is always in memory ready to roll without loading and doing the same for a cluster is not worth it because the same trick is difficult to do on a node given the current way clusters are built? I assume the same is true for FASTA? thanks for the clarification, -- | Robert Heckendorn | We may not be the only | heckendo@cs.uidaho.edu | species on the planet but | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. | CS Dept, University of Idaho | | Moscow, Idaho, USA 83844-1010 | From steveb at aei-potsdam.mpg.de Sun Apr 14 10:24:24 2002 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Wed Nov 25 01:02:15 2009 Subject: DMA difficulties In-Reply-To: Message-ID: Hi, Sorry, I should have given a bit more info. If the IDE cable is attached, but the power cable is not, the machine will not complete POST; it will hang. If the power cable is attached, but the IDE cable is not, the machine completes POST, and goes forward with the install. However, performance is slow. Only when both cables are attached to the CDROM does the installation run quickly. To address Alvin's comments, all settings in the BIOS relevant to the CDROM are disabled: the CDROM is not listed as a boot device, it's not a Master or Slave on either IDE channel, and the Secondary IDE channel is disabled. Further, no IDE cables are attached where they shouldn't be, i.e., only the HDD cable is plugged in. Finally, there is no option in the BIOS for enabling/disabling autodetection of IDE devices. To address Mark's comments, the kernel that I'm using is the 2.4.7-10 kernel that comes with RH7.2. In particular, I'm using the kernel found in images/pxeboot, which includes support for the network loopback device, initial ramdisk, etc. Also, the boot messages say that the HDD is DMA enabled, although, as I've said, I'm a bit wary of that pronouncement. I thought about compiling my own kernel for this, instead of using the RH distro version. However, going through some of the permutations of kernel configurations didn't produce a useful product. Anyone have insights as to the kernel config that will work for this, or the options in the stock RH kernel, or how to extract such options? TIA again for your insights. Steve > hi ya > > i notice that when the cable is attached... things goes > bonkers... even if no power ot the drive ( hd or cdrom ) > > remove the ide cable from the motherboard if its not used > > and tell the bios NOT to autodetect ide devices > except those that is in fact present > > 150 nodes.... hummm .... one full cabinet..front and back.. :-) > > c ya > alvin > http://www.Linux-1U.net > > > On Sun, 14 Apr 2002, Steven Berukoff wrote: > > > > > Hi all, > > > > This question may be very slightly off-topic, so I apologize. > > > > I'm in the process of setting up a network installation procedure using > > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > > among other things. One particular note is that I don't need/want CDROMs > > in these systems. > > > > Now, a vendor provided me with a couple of test nodes basically to our > > specifications, except that they included CDROMs and floppies. To make a > > longish story shorter, I wanted to make sure that the nodes work fine > > without the CDROM. > > > > So, I first looked into the BIOS. I disabled (set to "None") Primary > > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > > the CDROM from the list of boot devices, and disabled the Secondary IDE > > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > > enforce the HDD to use DMA during the Kickstart installation. > > > > Now, here is the kicker: regardless of the BIOS settings, if I have the > > CDROM plugged in (power+IDE, on the secondary channel) the installation > > takes ~ 5 times faster than if the thing isn't there. This installation > > includes installation of ~470 packages plus formatting the HDD. That's > > right, as long as the CDROM is plugged in, everything is peachy, but once > > gone, things slow down. > > > > I think this is a problem with the DMA settings, b/c when I pass > > "ide=nodma" to the kernel, WITH the CD attached, performance is > > slow. However, I can't even force DMA to be used. > > > > If anyone has any suggestions or similar experiences, please let me know. > > > > Thanks a bunch! > > Steve > > > > > > ===== > > Steve Berukoff tel: 49-331-5677233 > > Albert-Einstein-Institute fax: 49-331-5677298 > > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From hahn at physics.mcmaster.ca Sun Apr 14 10:43:43 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: <20020414012956.GA20390@mcs.anl.gov> Message-ID: > On Sat, Apr 13, 2002 at 02:18:39PM -0400, Mark Hahn wrote: > > is it just that the performance Apple brags about is strictly > > in-cache, and/or when doing something ah specialized like > > single-precision SIMD (altivec/velocity engine)? > > it's the altivec unit that makes G4s at all interesting. if you > aren't using the vector unit, yeah, you won't even come close to x86. as far as I can tell, the requirement to think highly of G4 is: hand-tuned altivec and a tiny working set which pretty much excludes any general-purpose scientific computing. > gcc is multi-platform, sure, but it's optimizer for x86 has received a > lot of attention, while the powerpc optimizer has not. your I'm not sure that's true: I read the gcc developers list and see significant efforts from Apple people. and remember that lots of code is not inherently vectorizable, so would never win big on SIMD. > observation that gcc 3.1 performance is better shows that focus on > powerpc optimizations has grown, but yeah, it's going to get less afaikt, 3.1 improvements are from improved infrastructure, nothing powerpc-specific. > you are running on mac os x, yes? is there any chance you could put > linux on it? if your application is making a significant number of > system calls ( file i/o, network traffic... you know, system calls ) no, I'm really only interested in compute-bound performance. > also bear in mind that G4s run significantly cooler than their x86 > counterparts, so you might still come out ahead on price/performance, I've heard Apple/Moto's PR on that, too. but my recent benchmarking has made me "think different": the G4 appears to be about the same performance as current Intel notebook PIII's. which, of course, burn about the same power as G4's... > where price takes into account initial purchase + cost of running the > cluster. we're in the market for 1-200 CPUs. it's not obvious to me that it matters whether the CPU burns 20 or 50W, since we're already got 30 KW of Alphas in the room ;) G4e/1000 21 probably "design" power PIIIulv/700 8 "design" power PIIIt/1113 28 "design" power P4a/2200 55 "design" power athxp/1800 66 max power > so there you go. there are lots of reasons why you'll have to actually > spend a bit of effort to move to a new architecture. i hope no one on > this list finds that idea surprising. I certainly do. powerpc support in gcc is not immature, and the cpu is supposed to be a general-purpose one. if my observations are true, then it's the slowest shipping GP machine, and is only viable if you can afford to structure your program around its SIMD and cache. regards, mark hahn. From gotero at linuxprophet.com Sun Apr 14 16:54:59 2002 From: gotero at linuxprophet.com (Glen Otero) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: <1018753136.25541.3.camel@walkeonline.com> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> <1018753136.25541.3.camel@walkeonline.com> Message-ID: <1018828499.1838.175.camel@prophet> I tried to build a cluster with the CD when I reviewed that book for Linux Journal. Incredibly, the software was released unfinished, and so building a cluster with it wasn't possible. The book was pulled from circulation for this and other editorial reasons. I recommend Rocks, Scyld, and OSCAR for building clusters. Glen On Sat, 2002-04-13 at 19:58, Vann H. Walke wrote: > I don't have the book, but suspect that the included software would be > well out of date. If you're just getting into clustering, I would > suggest trying the Scyld distribution. You can get it for $3 at > linuxcentral.com. > > Good Luck, > Vann > > On Sat, 2002-04-13 at 21:57, Eric Chiu wrote: > > Has anyone set up a cluster using the CD from Spector's book > > "Building Linux Clusters" (O'Reilly)? > > > > Eric Chiu, author/consultant > > Imservice, Inc. > > www.imservice.com > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Glen Otero, Ph.D. Linux Prophet Office:858.792.5561 Mobile:619.917.1772 www.linuxprophet.com "The Beowulf is primarily a mental phenomenon" From alvin at Maggie.Linux-Consulting.com Sun Apr 14 17:37:32 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:15 2009 Subject: DMA difficulties In-Reply-To: Message-ID: hi ya steven if you have the hdd cable plugged in ( am assuming into the motherboard ) but no ide drive .... you will get whacky results ... ( whether secondary ide is disabled on the bios or not... ) remove cables that dont go nowhere ( is what am trying to say ) remove um if the devices are disabled .. most bios does allow you to autodetect or user define the devices... but donno about your motherboard... the default rh-7.2 kernel should work fine... ( doesnt cough up erroneous messages on boot that i know of.. c ya alvin On Sun, 14 Apr 2002, Steven Berukoff wrote: > > Hi, > > Sorry, I should have given a bit more info. > > If the IDE cable is attached, but the power cable is not, the machine will > not complete POST; it will hang. If the power cable is attached, but the > IDE cable is not, the machine completes POST, and goes forward with the > install. However, performance is slow. Only when both cables are > attached to the CDROM does the installation run quickly. > > To address Alvin's comments, all settings in the BIOS relevant to the > CDROM are disabled: the CDROM is not listed as a boot device, it's not a > Master or Slave on either IDE channel, and the Secondary IDE channel is > disabled. Further, no IDE cables are attached where they shouldn't be, > i.e., only the HDD cable is plugged in. Finally, there is no option in > the BIOS for enabling/disabling autodetection of IDE devices. > > To address Mark's comments, the kernel that I'm using is the 2.4.7-10 > kernel that comes with RH7.2. In particular, I'm using the kernel found > in images/pxeboot, which includes support for the network loopback device, > initial ramdisk, etc. Also, the boot messages say that the HDD is > DMA enabled, although, as I've said, I'm a bit wary of that pronouncement. > > I thought about compiling my own kernel for this, instead of using the RH > distro version. However, going through some of the permutations of kernel > configurations didn't produce a useful product. Anyone have insights as > to the kernel config that will work for this, or the options in the stock > RH kernel, or how to extract such options? > > TIA again for your insights. > > Steve > > > > > hi ya > > > > i notice that when the cable is attached... things goes > > bonkers... even if no power ot the drive ( hd or cdrom ) > > > > remove the ide cable from the motherboard if its not used > > > > and tell the bios NOT to autodetect ide devices > > except those that is in fact present > > > > 150 nodes.... hummm .... one full cabinet..front and back.. :-) > > > > c ya > > alvin > > http://www.Linux-1U.net > > > > > > On Sun, 14 Apr 2002, Steven Berukoff wrote: > > > > > > > > Hi all, > > > > > > This question may be very slightly off-topic, so I apologize. > > > > > > I'm in the process of setting up a network installation procedure using > > > PXE/DHCP/NFS/Kickstart w/ RH7.2 for about 150 dual Athlon nodes. These > > > nodes use a Maxtor 6L080J4 80.0GB HDD and an ASUS A7M266-D motherboard, > > > among other things. One particular note is that I don't need/want CDROMs > > > in these systems. > > > > > > Now, a vendor provided me with a couple of test nodes basically to our > > > specifications, except that they included CDROMs and floppies. To make a > > > longish story shorter, I wanted to make sure that the nodes work fine > > > without the CDROM. > > > > > > So, I first looked into the BIOS. I disabled (set to "None") Primary > > > Slave, Secondary Master/Slave (since my HDD is Primary Master), removed > > > the CDROM from the list of boot devices, and disabled the Secondary IDE > > > channel. Then, I passed the kernel args "ide0=dma hdb=none" to try to > > > enforce the HDD to use DMA during the Kickstart installation. > > > > > > Now, here is the kicker: regardless of the BIOS settings, if I have the > > > CDROM plugged in (power+IDE, on the secondary channel) the installation > > > takes ~ 5 times faster than if the thing isn't there. This installation > > > includes installation of ~470 packages plus formatting the HDD. That's > > > right, as long as the CDROM is plugged in, everything is peachy, but once > > > gone, things slow down. > > > > > > I think this is a problem with the DMA settings, b/c when I pass > > > "ide=nodma" to the kernel, WITH the CD attached, performance is > > > slow. However, I can't even force DMA to be used. > > > > > > If anyone has any suggestions or similar experiences, please let me know. > > > > > > Thanks a bunch! > > > Steve > > > > > > > > > ===== > > > Steve Berukoff tel: 49-331-5677233 > > > Albert-Einstein-Institute fax: 49-331-5677298 > > > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > ===== > Steve Berukoff tel: 49-331-5677233 > Albert-Einstein-Institute fax: 49-331-5677298 > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > From ron_chen_123 at yahoo.com Sun Apr 14 18:55:13 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: Message-ID: <20020415015513.55770.qmail@web14702.mail.yahoo.com> --- Mark Hahn wrote: > > --- Mark Hahn wrote: > > > I'm doing some benchmarks to evaluate whether > > > current Macs would make suitable nodes for a > serial > > > farm (lots of nodes, preferably fast CPU and > dram, > > > but no serious interconnect.) > > > > Physics or bioscience code? > > why does it matter? we're not trying specifically > to run BLAST, > if that's what you're asking. I don't see any > reason why the > department would matter, but it's a mixture of math, > chem, > physics, astro, biologists, and perhaps a few > psychologists. > > > > I've tried a variety of real codes and > benchmarks, > > > but can't seem to get something like a Mac > G4/800 > > > with PC133 to perform anywhere close to even a > > > P4/1.7/i845/PC133. > > > > > > I'm using either the gcc 2.95 that comes with > OSX or > > > a recent 3.1 snapshot (which is MUCH better, but > > > still bad). > > > > What compiler are you using for the P4? > > I'm pretty happy with recent snapshots of gcc 3.1 > (pre-release). > (still mystified why gnu fortran people are stuck at > F77, but...) > > > > is it just that the performance Apple brags > about is > > > strictly in-cache, and/or when doing something > ah > > > specialized like single-precision SIMD > > >(altivec/velocity engine)? > > > > Apple has some libraries that take advantage of > the > > Altivec instructions. > > linpack/lapack/atlas/fftw? > > > AFAIK, there are several people using MacOS X in > > clusters, the SGE (Sun Grid Engine) project has a > port > > for Mac OS X. > > which doesn't give me ANY data on performance. > You can use a better compiler for the PPC: http://www.absoft.com/newproductpage.html Also, I did not say that SGE would provide you ANY data on performance -- all I said was that you could find people using Mac OS X/G4 machines in the cluster world. (or if you don't like SGE, you can choose PBS, they also have a Mac OS X port) -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From wrp at alpha0.bioch.virginia.edu Sun Apr 14 18:57:56 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Wed Nov 25 01:02:15 2009 Subject: G4's for scientific computing Message-ID: <200204150157.VAA22514@alpha0.bioch.virginia.edu> One of the advantages of the MacOSX gcc compiler is that in line Altivec instructions are available at a high level. One can define vector arrays, and do vector operations from 'C' code, e.g. while(vec_any_gt(T2, NAUGHT)) { T2 = vec_sub(LSHIFT(T2), RR); FF = vec_max(FF, T2); } We are testing an Altivec FASTA version; a Altivec BLAST was announced several months ago. We like Altivec because we can manipulate 8 16-bit integers or 16 8 bit integers at once - biological sequence comparison code is essentially all integer. We see a 6-fold speedups on when things are done 8-fold parallel. On our codes a dual 533 G4 and Altivec code is 6X-faster than a dual 1 GHz PIII (we don't have a GHz G4 yet). Because of the high level Altivec primitives in the Apple gcc compiler, vectorizing was very very easy; we would have to be much more sophisticated to do the same thing on the PIII (and the potential speed-up would be 1/2 as large, since the vector is 64, not 128 bits). I might have agreed with the statement that one must have hand-tuned Altivec code which pretty much excludes general purpose scientific computing 4 months ago, but our experience has been very positive - our programs are not specialized signal processing programs, but, in retrospect, it was easy to get very dramatic speed up. Bill Pearson From wrp at alpha0.bioch.virginia.edu Sun Apr 14 19:32:20 2002 From: wrp at alpha0.bioch.virginia.edu (William R. Pearson) Date: Wed Nov 25 01:02:15 2009 Subject: Parallel BLAST Message-ID: <200204150232.WAA22617@alpha0.bioch.virginia.edu> > Why is it that BLAST is not available for MPI/PVM? I would think > clusters would be the prefect host for such an application. > Is it there is no need because BLAST is already so fast and > no one wants to break the database out onto node-resident disks? > Or is it that BLAST is kept running on single processor or shared memory > machines BLAST so that the DB is always in memory ready to roll without > loading and doing the same for a cluster is not worth it > because the same trick is difficult to do on a node given the current > way clusters are built? I assume the same is true for FASTA? I suspect that BLAST is not available for MPI/PVM because (1) it is too fast, and (2) there is not much demand for it. 95% of the time, BLAST is almost an in-memory grep (the other 5% of the time it is working on the things it is looking for). Sequence comparison is embarrassingly parallel, and very easily threaded. Distributing the sequence databases and collecting results has more overhead (there probably aren't many distributed grep programs either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is another 5-20X slower than FASTA. Here, the communications overhead is low, and distributed systems work OK for FASTA, and great for Smith-Waterman (where the overhead fraction is very small). Of course, it is a lot easier to compile a threaded program, and just run it, than it is to install and configure the MPI or PVM environment and the programs to run in it. Bioinformatics software is often run by computer savvy biologists, not high-performance computing folks, and not having to install and configure PVM/MPI is a big advantage. The NCBI probably does not make a PVM/MPI parallel BLAST because there is very little demand for it, and it does not meet their computational needs. Bill Pearson From lindahl at keyresearch.com Fri Apr 12 17:08:52 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:15 2009 Subject: very high bandwidth, low latency manner? (i860) In-Reply-To: ; from djholm@fnal.gov on Fri, Apr 12, 2002 at 06:11:41PM -0500 References: <20020412153750.A5315@mas1.ats.ucla.edu> Message-ID: <20020412200852.B2381@wumpus.skymv.com> On Fri, Apr 12, 2002 at 06:11:41PM -0500, Don Holmgren wrote: > Unfortunately the measured performance doesn't match the published > specs. In fact, this is *always* true for every PCI and memory system out there. Measure, measure, measure. The myrinet perftest and STREAM memory benchmark are your friends. -- greg From rickey-co at mug.biglobe.ne.jp Sat Apr 13 11:57:27 2002 From: rickey-co at mug.biglobe.ne.jp (Iwao Makino) Date: Wed Nov 25 01:02:15 2009 Subject: decent performance from G4 Macs? In-Reply-To: References: Message-ID: Mark, At 14:18 -0400 13.04.2002, Mark Hahn wrote: >I'm doing some benchmarks to evaluate whether current Macs >would make suitable nodes for a serial farm (lots of nodes, >preferably fast CPU and dram, but no serious interconnect.) I agree about first 2, but for Interconnect, there's Myrinet for MacOS X!!! Myricom now released MPICH-GM for MacOS X as well. I personally haven't purchased Mac to test, but having said by Apple that G4's with Blast is a lot faster than P4, I'm quite interested to evaluate them soon. >I've tried a variety of real codes and benchmarks, but can't >seem to get something like a Mac G4/800 with PC133 to perform >anywhere close to even a P4/1.7/i845/PC133. > >I'm using either the gcc 2.95 that comes with OSX or a >recent 3.1 snapshot (which is MUCH better, but still bad). I think you have to MODIFY code a bit to take advantage of velocity engine for MacOS X gcc. I thought there are interesting post along with other bluff. >is it just that the performance Apple brags about is strictly >in-cache, and/or when doing something ah specialized like >single-precision SIMD (altivec/velocity engine)? I too think that's big part of it... -- Best regards, Iwao Makino Hard Data Ltd. Tokyo branch mailto:iwao@harddata.com http://www.harddata.com/ -> HPC cluster specialist<- -> Scientific Imaging/Life Science/Physical Science/Parallel Computing <- From bgb at itcnv.com Sun Apr 14 07:35:00 2002 From: bgb at itcnv.com (bgb@itcnv.com) Date: Wed Nov 25 01:02:15 2009 Subject: BladeFrame vs Beowulf In-Reply-To: <012201c1e37c$f71fa2a0$e3c0fea9@squaw> References: <00a101c1e357$ece44f90$e3c0fea9@squaw> <012201c1e37c$f71fa2a0$e3c0fea9@squaw> Message-ID: <20020414143501.29671.qmail@smtp.itcnv.com> There is also: http://www.rlxtechnologies.com/about/pr_blast.php Eric Chiu writes: > Has anyone worked on one of these BladeFrame? > http://www.egenera.com/prod_spec_overview.php > > I'm wondering how this compares to a custom-built Beowulf. > I like how they have consolidated the networking and > hardware in this proprietary architecture. One of the biggest > problems in a Beowulf is keeping track of the boxes and ethernet > connections. > > Eric Chiu, author/consultant B.G. Bruce Networking Technologies N.V / Internet Technologies (Curacao) N.V. Phone: +599 9 563-1836 Fax: +599 9 465-3594 Alternate Email: bgbruce@it-curacao.com, ancu321@attglobal.net From lindahl at keyresearch.com Sun Apr 14 17:50:47 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:15 2009 Subject: CD from "Building Linux Cluster" In-Reply-To: <00a101c1e357$ece44f90$e3c0fea9@squaw>; from echiu@imservice.com on Sat, Apr 13, 2002 at 06:57:36PM -0700 References: <00a101c1e357$ece44f90$e3c0fea9@squaw> Message-ID: <20020414175047.A12785@wumpus.attbi.com> On Sat, Apr 13, 2002 at 06:57:36PM -0700, Eric Chiu wrote: > Has anyone set up a cluster using the CD from Spector's book > "Building Linux Clusters" (O'Reilly)? I sat in an airport line for an hour once with a woman who knew Spector. So she asked me what I thought of the book, and you guys know me well enough to know how good of a spin I put on my answer: "Well, he seemed to have a clue about high availability, but the Beowulf section was pretty crappy." It turns out that he's well aware of that, and was egged on to write a "complete" book by the editors. Ah well, it's a shame no matter how it happened. greg From erayo at cs.bilkent.edu.tr Sun Apr 14 21:08:39 2002 From: erayo at cs.bilkent.edu.tr (Eray Ozkural) Date: Wed Nov 25 01:02:16 2009 Subject: G4's for scientific computing In-Reply-To: <200204150157.VAA22514@alpha0.bioch.virginia.edu> References: <200204150157.VAA22514@alpha0.bioch.virginia.edu> Message-ID: <200204150708.40331.erayo@cs.bilkent.edu.tr> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 15 April 2002 04:57, William R. Pearson wrote: > > I might have agreed with the statement that one must have hand-tuned > Altivec code which pretty much excludes general purpose scientific > computing 4 months ago, but our experience has been very positive - > our programs are not specialized signal processing programs, but, in > retrospect, it was easy to get very dramatic speed up. > I imagine fake vector processing would only work for certain type of problems. That's not SIMD by any measure. Don't you really need multiple data streams for general purpose HPC? Regards, - -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://mp3.com/ariza GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8ulJHfAeuFodNU5wRAn7oAJ9n7oJC3nfBv29EBYOpypOjBGLUmACcCmPO kY+ZBvrh1ev4iQnFMkQV4IA= =YeV6 -----END PGP SIGNATURE----- From ssy at prg.cpe.ku.ac.th Sun Apr 14 23:09:30 2002 From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul) Date: Wed Nov 25 01:02:16 2009 Subject: Need many C/C++ MPI programs Message-ID: <000d01c1e444$2177e860$0100a8c0@yggdrasil> Hello, I need to test my cluster by running many many MPI parallel program. Is there any MPI program archive or something which I could download the program source? It would be better if the program are written in C/C++ so I could tune its performance and see how is it going in my cluster. Thanks Somsak From markus at markus-fischer.de Mon Apr 15 02:40:49 2002 From: markus at markus-fischer.de (Markus Fischer) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CBAA021.DB753C6F@markus-fischer.de> Steffen Persvold wrote: > > Now we have price comparisons for the interconnects (SCI,Myrinet and > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks > HE-SL based cluster). yes, please. I would like to get/see some numbers. I have run tests with SCI for a non linear diffusion algorithm on a 96 node cluster with 32/33 interface. I thought that the poor scalability was due to the older interface, so I switched to a SCI system with 32 nodes and 64/66 interface. Still, the speedup values were behaving like a dog with more than 8 nodes. Especially, the startup time will reach minutes which is probably due to the exporting and mapping of memory. Yes, the MPI library used was Scampi. Thus, I think the (marketing) numbers you provide below are not relevant except for applying for more VC. Even worse, we noticed, that the SCI ring structure has an impact on the communication pattern/performance of other applications. This means we only got the same execution time if other nodes were I idle or did not have communication intensive applications. How will you determine the performance of the algorithm you just invented in such a case ? We then used a 512 node cluster with Myrinet2000. The algorithm scaled very fine up to 512 nodes. Markus > > Regards, > -- > Steffen Persvold | Scalable Linux Systems | Try out the world's best > mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jacobsgd21 at BrandonU.CA Mon Apr 15 09:04:42 2002 From: jacobsgd21 at BrandonU.CA (Geoffrey D. Jacobs) Date: Wed Nov 25 01:02:16 2009 Subject: David HM Spector, "Building Linux Clusters" Message-ID: <3CBAFA1A.9090802@brandonu.ca> A waste of ink and paper. This book has no depth, and the included software is incomplete. Look elsewhere for your reference needs. From tim at dolphinics.com Mon Apr 15 10:28:42 2002 From: tim at dolphinics.com (Tim Wilcox) Date: Wed Nov 25 01:02:16 2009 Subject: Need many C/C++ MPI programs References: <000d01c1e444$2177e860$0100a8c0@yggdrasil> Message-ID: <3CBB0DCA.3000908@dolphinics.com> Somsak Sriprayoonsakul wrote: >Hello, > I need to test my cluster by running many many MPI parallel >program. Is there any MPI program archive or something which I could >download the program source? It would be better if the program are >written in C/C++ so I could tune its performance and see how is it going >in my cluster. > There are several benchmarks available with source, I commonly use these for testing machines. Try Linpack at http://www.netlib.org/benchmark/hpl/ This is good for cpu performance. I also use PMB http://www.pallas.com/e/products/pmb/download.htm this is good for interconnect performance. Tim Wilcox > >Thanks >Somsak > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From SGaudet at turbotekcomputer.com Mon Apr 15 11:11:55 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:16 2009 Subject: Parallel BLAST Message-ID: <3450CC8673CFD411A24700105A618BD6267DC4@911TURBO> > -----Original Message----- > From: William R. Pearson [mailto:wrp@alpha0.bioch.virginia.edu] > Sent: Sunday, April 14, 2002 10:32 PM > To: beowulf@beowulf.org > Subject: Parallel BLAST > > > > > Why is it that BLAST is not available for MPI/PVM? I would think > > clusters would be the prefect host for such an application. > > Is it there is no need because BLAST is already so fast and > > no one wants to break the database out onto node-resident disks? > > Or is it that BLAST is kept running on single processor or > shared memory > > machines BLAST so that the DB is always in memory ready to > roll without > > loading and doing the same for a cluster is not worth it > > because the same trick is difficult to do on a node given > the current > > way clusters are built? I assume the same is true for FASTA? > > I suspect that BLAST is not available for MPI/PVM because (1) it is > too fast, and (2) there is not much demand for it. > > 95% of the time, BLAST is almost an in-memory grep (the other 5% of > the time it is working on the things it is looking for). Sequence > comparison is embarrassingly parallel, and very easily threaded. > Distributing the sequence databases and collecting results has more > overhead (there probably aren't many distributed grep programs > either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is > another 5-20X slower than FASTA. Here, the communications overhead is > low, and distributed systems work OK for FASTA, and great for > Smith-Waterman (where the overhead fraction is very small). > > Of course, it is a lot easier to compile a threaded program, and just > run it, than it is to install and configure the MPI or PVM environment > and the programs to run in it. Bioinformatics software is often run > by computer savvy biologists, not high-performance computing folks, > and not having to install and configure PVM/MPI is a big advantage. > The NCBI probably does not make a PVM/MPI parallel BLAST because there > is very little demand for it, and it does not meet their computational > needs. -------------- There's also a commerical version from Turbogenomics. http://www.turbogenomics.com Offering: 1) Ready to go, plug-n-play solution for parallel BLAST 2) Expertise and 20+ years of experience in parallel computing 3) Dynamic database splitting feature to take advantage of computers that have less memory than the size of the database 4) Smart load balancing - achieve linear to superlinear speedup 5) No modification made to the NCBI BLAST algorithm to ensure identical results with the non-parallel version 6) Easy drop-in update whenever NCBI releases newer versions of their algorithm 7) Excellent support 8) 30-days money back guarantee Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From Hakon.Bugge at scali.com Tue Apr 16 03:24:37 2002 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CBAA021.DB753C6F@markus-fischer.de> References: Message-ID: <5.1.0.14.0.20020416122156.05491530@62.70.89.10> Hi, I am sorry to hear that you was unable to achieve expected performance on the mentioned SCI based systems. You raise a couple of issues, which I would like to address: 1) Performance. Performance transparency is always goal. Nevertheless, sometimes an implementation will have a performance bug. The two organizations owning the mentioned systems, have both support agreements with Scali. I have checked the support requests, but cannot find any request where your incidents were reported. We find this fact strange if you truly were aiming at achieving good performance. We are happy to look into your application and report findings back to this news group. 2) Startup time. You contribute the bad scalability to high startup time and mapping of memory. This is an interesting hypothesis; and can easily be verified by using a switch when you start the program, and measure the difference between the elapsed time of the application and the time it uses after MPI_Init() has been called. However, the startup time measured on 64-nodes, two processors per node, where all processes have set up mapping to all other processes, is nn second. If this contributes to bad scalability, your application has a very short runtime. 3) SCI ring structure You state that on a multi user, multi-process environment, it is hard to get deterministic performance numbers. Indeed, that is true. True sharing of resources implies that. Whether the resource is a file-server, a memory controller, or a network component, you will probably always be subject to performance differences. Also, lack of page coloring will contribute to different execution times, even for a sequential program. You further indicate that performance numbers reported f. ex. by Pallas PMB benchmark only can be used for applying for more VC. I disagree for two reasons; first, you imply that venture capitalists are naive (and to some extent stupid). That is not my impression, merely the opposite. Secondly, such numbers are a good example to verify/deny your hypothesis that the SCI ring structure is volatile to traffic generated by other applications. PMB's *multi* option is architected to investigate exactly the problem you mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how performance is affected when the other N/2 of the machine is also running Alltoall(). This is the reason we are interested in comparative performance numbers to SCI based systems. It is to me strange, that no Pallas PMB benchmark results ever has been published for a reasonable sized system based on alternative interconnect technologies. To quote Lord Kelvin: "If you haven't measured it, you don't know what you're talking about". As a bottom line, I would appreciate that initiatives to compare cluster interconnect performance should be appreciated, rather than be scrutinized and be phrased as "only usable to apply for more VC". H At 11:40 AM 4/15/02 +0200, Markus Fischer wrote: >Steffen Persvold wrote: > > > > Now we have price comparisons for the interconnects (SCI,Myrinet and > > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for > > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 > > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII > ServerWorks > > HE-SL based cluster). > >yes, please. > >I would like to get/see some numbers. >I have run tests with SCI for a non linear diffusion algorithm on a 96 node >cluster with 32/33 interface. I thought that the poor >scalability was due to the older interface, so I switched to >a SCI system with 32 nodes and 64/66 interface. > >Still, the speedup values were behaving like a dog with more than 8 nodes. > >Especially, the startup time will reach minutes which is probably due to >the exporting and mapping of memory. > >Yes, the MPI library used was Scampi. Thus, I think the >(marketing) numbers you provide >below are not relevant except for applying for more VC. > >Even worse, we noticed, that the SCI ring structure has an impact on the >communication pattern/performance of other applications. >This means we only got the same execution time if other nodes were >I idle or did not have communication intensive applications. >How will you determine the performance of the algorithm you just invented >in such a case ? > >We then used a 512 node cluster with Myrinet2000. The algorithm scaled >very fine up to 512 nodes. > >Markus > > > > > Regards, > > -- > > Steffen Persvold | Scalable Linux Systems | Try out the world's best > > mailto:sp@scali.com | http://www.scali.com | performing MPI > implementation: > > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - > > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS > latency > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf -- H?kon Bugge; VP Product Development; Scali AS; mailto:hob@scali.no; http://www.scali.com; fax: +47 22 62 89 51; Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514; Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway; Mail Addr: Scali AS, Postboks 150, Oppsal, N-0619 Oslo, Norway; From Hakon.Bugge at scali.com Tue Apr 16 03:33:55 2002 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? Message-ID: <5.1.0.14.0.20020416123123.054aac70@62.70.89.10> I'm sorry. I forgot to fill in the startup time. Its 14.5 seconds for 128 processes on 64 nodes, when all processes have mapped remote memory of all other 127 processes. H From rauch at inf.ethz.ch Tue Apr 16 06:15:38 2002 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:02:16 2009 Subject: Memory benchmark (was Re: very high bandwidth, low latency manner? (i860)) In-Reply-To: <20020412200852.B2381@wumpus.skymv.com> Message-ID: On Fri, 12 Apr 2002, Greg Lindahl wrote: > The myrinet perftest and STREAM memory benchmark are your friends. If you need more detailed informations about the performance of your memory system than STREAM offers, then you might want to look at the ECT benchmarks (developed by colleagues of mine): Extended Copy Transfer Characterization http://www.cs.inf.ethz.ch/CoPs/ECT/ - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From jayne at sphynx.clara.co.uk Tue Apr 16 07:34:17 2002 From: jayne at sphynx.clara.co.uk (Jayne Heger) Date: Wed Nov 25 01:02:16 2009 Subject: what architecture was MPI and PVM 1st designed for? Message-ID: Hi, Coulld anyone tell me what computer architecture MPI and PVM were first designed for./written on. Thanks, Jayne Heger From eugen at leitl.org Tue Apr 16 06:40:49 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:16 2009 Subject: OpenMosix Message-ID: http://newsvac.newsforge.com/article.pl?sid=02/04/13/055227 Saturday April 13, 2002 - [ 05:00 AM GMT ] Bruce Knox writes "Tel Aviv (April 11, 2002) - Dr. Moshe Bar recently announced the creation of openMosix, a new OpenSource project. The project has quickly attracted a team of volunteer developers from around the globe and is off to a very fast start. openMosix, is an extension of the Linux kernel. For thousands of users, MOSIX has been a reliable, fast and cost-efficient clustering platform with users in life sciences, finance, industry, high-tech, research and government environments. The goal of openMosix is to give to these users continued support and an up-to-date fully GPLv2 OpenSource platform. Moshe Bar openMosix began as the last verifiable GPL version of MOSIX. All openMosix extensions are under the full GPLv2 license, the GNU General Public License (GPL) Version 2. The openMosix Copyright is held by Moshe Bar. openMosix is a Linux kernel extension for single-system image clustering. openMosix is perfectly scalable and adaptive. Once you have installed openMosix, the nodes in the cluster start talking to one another and the cluster adapts itself to the workload. There is no need to program applications specifically for openMosix. Since all openMosix extensions are inside the kernel, every application automatically and transparently benefits from the distributed computing concept of openMosix. The cluster behaves much as does a SMP, but this solution scales to well over a thousand nodes which can themselves be SMPs. OpenSource is more than just free access to software source code. The basic idea behind open source is very simple: When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing. the Open Source Initiative Moshe Bar is an Operating Systems researcher, writer of Byte Magazine column Serving With Linux , author of numerous Linux books, and frequent contributor to the Linux tree. Moshe lectures for universities, corporations, and international organizations. He holds a Bachelor degree in mathematics, a M.S. and a Ph.D. in computer science. Moshe runs moshebar.com with a mailing list of over 20,000 members, is Chief Technical Officer of Qlusters, Inc., and is the Project Manager for openMosix. Moshe was born in Israel, grew up in a kibbutz, and now lives in Tel Aviv. The development team of volunteers is truly international. The early team members reside in Chile, Spain, Italy, Norway, Germany, Israel, France and the United States. Plus, other mailing list queries have come from Canada, Pakistan, Oman, Estonia, Finland, India, South Africa, Switzerland, Tonga, and Shanghai China. Projects using openMosix already include astrophysics, medical research, and university laboratories. The openMosix project is hosted on SourceForge.net which provides collaborative development web tools for the project. Downloads, documentation, and additional information are available from www.openmosix.org. MOSIX is a very highly regarded, high performance, low cost, flexible, and scaleable Cluster Computing System for Linux. MOSIX was a GPL OpenSource project until late 2001. MOSIX, operational since 1983, integrates independent computers into a cluster, providing the user with what appears to be a single-machine Linux environment. Both the MOSIX Copyright and the MOSIX Trademark are owned by Professor Amnon Barak. Amnon Barak is a Professor of Computer Science and the Director of the Distributed Computing Laboratory in the Institute of Computer Science at the Hebrew University of Jerusalem on sabbatical leave for one year. openMosix is Copyright ? 2002 by Moshe Bar. Linux is Copyright ? 2002 by Linus Torvalds. Mosix is Copyright ? 2002 by Amnon Barak. openMosix is licensed under the GNU General Public License (GPL) Version 2, June 1991 as published by the Free Software Foundation. All logos and trademarks are the property of their respective owners. Copyright ? 2002 by Moshe Bar" From eugen at leitl.org Tue Apr 16 06:45:27 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:16 2009 Subject: GBit Ethernet over Cu evaluation Message-ID: http://www.cs.uni.edu/~gray/gig-over-copper/ Gigabit Over Copper Evaluation DRAFT Prepared by Anthony Betz and Paul Gray April 2, 2002 University of Northern Iowa Department of Computer Science Cedar Falls, IA 50614 Given the relatively low cost, backwards-compatibility, and widely-availability solutions for gigabit over copper network interfaces, the migration to commodity gigabit networks has begun. Copper-based gigabit solutions are now providing an alternative to the often more expensive fiber-based network solutions that are typically integrated in high performance environments such as today's tightly-coupled cluster systems. But how do these cards compare with their fiber based counterparts? Are the Linux-based drivers ready for prime-time? The intent of this paper is to provide an extensive comparison of the various Gigabit over copper network interface cards available. Since performance is based on numerous factors such as bus architecture and the network protocol being used, these are the two main subjects of our investigation. Our bandwidth benchmarks look at sustained throughput using TCP. While other communication protocols are available, indeed preferred, for high- performance computing, TCP-based benchmarks provide an immediate insight into the expected performance of the cards. With PCI-X coming into the marketplace in more and more motherboards as well as the multitude of systems with more traditional 32-bit PCI subsystems, numerous cards are available for today's 64bit and 32bit computer systems. The 64bit cards tested were as follows: Syskonnect SK9821, Syskonnect SK9D21, Asante Giganix, Ark Soho-GA2000T, 3Com 3c996BT and Intel's E1000 XT. The 32bit cards were Ark Soho-GA2500T, D-Link DGE500T. Comparisons for the various cards were made with respect to operation in alternate bus configurations and varied maximum transmission unit (MTU) sizes of TCP frames (jumbo frames). Results were gathered using Netpipe 2.4. By using Netpipe the peak sustained throughput would be provided as well as the transfer rate for varying packet sizes. Note: All cards were tested at 1500, 3000, 4000, and 6000 values for the TCP MTU size. The drivers for the cards were not modified. Cards based upon the dp83820 chipset were limited to 6000MTU due to driver defaults. All other cards were tested through 9000MTU. [results too voluminous to post] From rgb at phy.duke.edu Tue Apr 16 07:18:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:16 2009 Subject: what architecture was MPI and PVM 1st designed for? In-Reply-To: Message-ID: On Tue, 16 Apr 2002, Jayne Heger wrote: > > Hi, > > Coulld anyone tell me what computer architecture MPI and PVM were first > designed for./written on. See http://www.epm.ornl.gov/pvm/ and look under "documentation" for "PVM and MPI: A comparison of features". Read the "Background" section. Among many other sources, but this is terse and probably adequate, close to "horse's mouth" accurate for PVM (but "Project Overview" is also there and IS horse's mouth:-) and of course I'm sure that the primary MPI sites have similar historical stuff linked. In a very terse nutshell, PVM was written for the kitchen sink (whatever you happened to have handy and networked). MPI was written by a consortium of vendors and users to provide a common API for large, expensive massively parallel computers. As I understand it this wasn't really the vendors' idea -- they would've been happy to continue providing only their proprietary interfaces -- but the government finally put its foot down as it learned just how much money it was spending, first on the iron, then on porting code to run on the iron, and then on NEW iron and RE-porting their ported code to run on the NEW iron, etc. Moore's law demanding that they rebuy everything every few years or actually loose ground, of course... rgb > Thanks, > > Jayne Heger > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Sebastien.Cabaniols at Compaq.com Tue Apr 16 03:57:18 2002 From: Sebastien.Cabaniols at Compaq.com (Cabaniols, Sebastien) Date: Wed Nov 25 01:02:16 2009 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? Message-ID: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Hi beowulfs! Would it be interesting to decrease the #define HZ in the linux kernel for CPU/Memory bound computationnal farms ? (I just posted the same question to lkml) I mean we very often have only one running process eating 99% of the CPU, but we (in fact I) don't know if we loose time doing context switches .... Did anyone experiment on that ? Thanks in advance From rgb at phy.duke.edu Tue Apr 16 08:29:03 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:16 2009 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: On Tue, 16 Apr 2002, Cabaniols, Sebastien wrote: > Hi beowulfs! > > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? > (I just posted the same question to lkml) > > I mean we very often have only one running process eating 99% of > the CPU, but we (in fact I) don't know if we loose time doing context > switches .... > > Did anyone experiment on that ? > > Thanks in advance This was discussed a long time ago on kernel lists. IIRC (and it was a LONG time ago -- years -- so don't shoot me if I don't) the consensus was that Linus was comfortable keeping HZ where it provided very good interactive response time FIRST (primary design criterion) and efficient for long running tasks SECOND (secondary design criterion) so no, they weren't considering retuning anything anytime soon. Altering HZ isn't by any means guaranteed to improve task granularity (the scheduler already does a damn good job there and is hard to improve). Also, because there are a LOT of things that use it, written by many people some of whom may well not have used it RIGHT, altering HZ may cause odd side effects or break things. I wouldn't recommend it unless you are willing to live without or work pretty hard to fix whatever breaks. The context switch part of the question is a bit easier. By strange chance, I'm at this moment running a copy of xmlsysd and wulfstat (my current-project cluster monitoring toolset) on my home cluster, where (to help Jayne this morning) I also cranked up pvm and the xep mandelbrot set application. So it is easy for me to test this. During a panel update (with all my nodes whonking away on doing mandelbrot set iterations) the context switch rate is negligible -- 12-16/second -- on true nodes (ones doing nothing but computing or twiddling their metaphorical thumbs). The rate hardly changes relative to the idle load when the system is doing a computation -- the scheduler is quite efficient. Interrupt rates on true nodes similarly remains very close to baseline of a bit more than 100/second even when doing the computations, which are of course quite coarse grained with only a bit of network traffic per updated strip per node and strip times on the order of seconds. So for a coarse grained, CPU intensive task running on dedicated nodes I doubt you'd see so much as 1% improvement monkeying with pretty much any "simple" kernel tuning parameter -- I think that single numerical jobs run at well OVER 99% efficiency as is. Note that on workstation-nodes (ones running a GUI and this and that) the story is quite different, although still good. For example, I'm running X, xmms (can't work without music, can we:-), the xep GUI, wulfstat (the monitoring client), galeon, and a dozen other xterms and small apps on my desktop; my sons are running X and screensavers on their systems downstairs (grrr, have to talk to them about that, or just plain disable that:-) and on THESE nodes the context switch rates range closer to 1300-1800/sec (the latter for those MP3's). Interrupt rates are still just over 100/sec -- this tends to vary only when doing some sort of very intensive I/O. Note that even mp3 decoding only takes a few percent of my desktop's CPU. However, beautieously enough, when I do an xep rubberband update, I still get SIMULTANEOUSLY flawlessly decoded mp3's (not so much as a bobble of the music stream) AND the maximum possible amount of CPU diverted to the mandelbrot strip computations and their display. I view this delightful responsiveness of linux as a very important feature. I've never hesitated to distribute CPU-intensive work around on linux workstation nodes with an adequate amount of memory because I'm totally confident that unless the application fills memory or involves a very latency-bounded (e.g. small packet network) I/O stream, the workstation user will notice, basically, "nothing" -- their interactive response will be changed below the 0.1 second threshold where they are likely to be ABLE to notice. The one place I can recall where altering system timings has made a noticeable difference in performance for certain classes of parallel tasks is Josip Loncaric's tcp retuning, and I believe that he worked quite hard at that for a long time to get good results. Even that has a price -- the tunings that he makes (again, IIRC, don't shoot me if I'm wrong Josip:-) aren't really appropriate for use on a WAN as some of the things that slow TCP down are the ones that make it robust and reliable across the routing perils of the open internet. rgb > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From leunen.d at fsagx.ac.be Tue Apr 16 08:39:25 2002 From: leunen.d at fsagx.ac.be (David Leunen) Date: Wed Nov 25 01:02:16 2009 Subject: Cannot find -lpvfs Message-ID: <3CBC45AD.4060600@fsagx.ac.be> Hi all, We've installed Scyld Beowulf 27bz-8 on our cluster. But we cannot make the mpich examples to link the .o files. Here is the error we get: /usr/bin/ld: cannot find -lpvfs This error is thrown on every try to link an mpi program... any idea? Have a good day. David From hahn at physics.mcmaster.ca Tue Apr 16 10:35:54 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:16 2009 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? I'm guessing you're unaware that compute-bound processes actually get multiple 10ms slices (200ms or so, as I recall, but I'm remembering a discussion from 2.3.x days. Ingo's new scheduler probably preserves this limit.) > I mean we very often have only one running process eating 99% of > the CPU, but we (in fact I) don't know if we loose time doing context > switches .... think of the numbers a bit: it's basically impossible to buy a <1 GHz processor today, so you're getting at O(100M) instrs/HZ. if you're cache-friendly, you'll probably have >1 instr/cycle, so scale the number appropriately. perhaps you're worried about cache pollution? the kernel's footprint is fairly small, probably <4K or so for timer-irq-scheduler-nopreempt. since a null syscall is ~1 us or ~1000 instrs, and the work is about the same, I really don't think there's anything to worry about. there are people who run HZ=1024 or higher on ia32; I don't personally think they know what the heck they're doing, but they like it, and don't report any serious problems. From lindahl at keyresearch.com Tue Apr 16 08:31:46 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:16 2009 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? In-Reply-To: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net>; from Sebastien.Cabaniols@compaq.com on Tue, Apr 16, 2002 at 12:57:18PM +0200 References: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> Message-ID: <20020416083146.B2918@wumpus.attbi.com> On Tue, Apr 16, 2002 at 12:57:18PM +0200, Cabaniols, Sebastien wrote: > Would it be interesting to decrease the #define HZ in the linux kernel > for CPU/Memory bound computationnal farms ? > (I just posted the same question to lkml) All pre-compiled user programs would then have the wrong HZ. So /bin/time wouldn't work anymore. As for "what HZ would be a good value?", Alpha has always used 1000, and it isn't a significant performance hit. But x86 started life on much slower machines, and now we're stuck with 100, unless you want to rebuild ALL your packages. I suspect IA64 uses 100 for compatibility reasons. I wonder how the x86 emulator on AlphaLinux got around this... hm... greg From lindahl at keyresearch.com Tue Apr 16 08:39:27 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.0.20020416122156.05491530@62.70.89.10>; from Hakon.Bugge@scali.com on Tue, Apr 16, 2002 at 12:24:37PM +0200 References: <3CBAA021.DB753C6F@markus-fischer.de> <5.1.0.14.0.20020416122156.05491530@62.70.89.10> Message-ID: <20020416083927.C2918@wumpus.attbi.com> On Tue, Apr 16, 2002 at 12:24:37PM +0200, H?kon Bugge wrote: > Also, lack of page coloring will contribute to > different execution times, even for a sequential program. Andrea's kernel patches now have page coloring in them. The code has lived a tortured life, originally written by the Real World Computing guys, rewritten by me, rewritten by Jason Popadopoulos at UMd, and then by Andrea. 3 continents. > I disagree for two reasons; > first, you imply that venture capitalists are naive (and to some extent > stupid). That's what the local Silicon Valley VC tell me about VC. I guess non-Silicon Valley VC are smarter, then ;-) > It is to me strange, that no Pallas PMB > benchmark results ever has been published for a reasonable sized system > based on alternative interconnect technologies. To quote Lord Kelvin: "If > you haven't measured it, you don't know what you're talking about". Maybe that's because other people are measuring their applications, and not yet another synthetic benchmark? All-to-all isn't interesting to me. I have plenty of bisection measurements, though, as that's how I debug Myrinet. Typical variations are around 2%, by the way. Lord Kelvin engaged in a 10 year flamewar in the Letters of the RAS against people who thought the Sun was powered by nuclear fusion. He believed that it was only 10 million years old, and was powered by gravitational collapse. His mistake was ignoring geological evidence because he didn't understand it. He probably wrote that quote during that flamewar. It didn't make him right. greg From raysonlogin at yahoo.com Tue Apr 16 12:39:39 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:16 2009 Subject: again OpenPBS vs SGE In-Reply-To: <200204161902.XAA04166@nocserv.free.net> Message-ID: <20020416193939.95169.qmail@web11403.mail.yahoo.com> If you are looking for _free_ batch systems, you should choose SGE. --- Mikhail Kuzminsky wrote: > III. Some SGE minuses > 1) Do not support "multiclustering" I believe you can setup multiple "SGE_CELL"s to partition your cluster (I've never played with that before) Or you can use Globus, or other 3rd party scheduler on top of SGE. > 2) The schedule algorithms are restricted to only one > default (this is inconsistent w/Chris Black message, as > I understand) You talking about SGE 5.2.x? Chris Black must be talking about SGE 5.3, which has several advanced nice scheduler features: http://www.hardi.se/products/literature/sun_grid_engine.pdf > > IV. Some SGE pluses > > 1) Reliable work Ron Chen has been talking about the new shadow master on the SGE mailing list. which he said will improve fault tolerance, but I've never heard anything yet... > 2) Globus Grid is integrated (?? is it correct ?) correct. > 3) There is support of job migration > Also, you may want to look at job arrays, which is not available in PBS. (the other batch system which has job arrays is LSF) You can download the SGE source from: http://gridengine.sunsource.net Rayson __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From josip at icase.edu Tue Apr 16 12:58:40 2002 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:02:16 2009 Subject: decreasing #define HZ in the linux kernel for CPU/memory bound apps ? References: <11EB52F86530894F98FFB1E21F9972547EF918@aeoexc01.emea.cpqcorp.net> <20020416083146.B2918@wumpus.attbi.com> Message-ID: <3CBC8270.814F9387@icase.edu> Greg Lindahl wrote: > > As for "what HZ would be a good value?", Alpha has always used 1000, > and it isn't a significant performance hit. But x86 started life on > much slower machines, and now we're stuck with 100, unless you want to > rebuild ALL your packages. > > I suspect IA64 uses 100 for compatibility reasons. I wonder how the > x86 emulator on AlphaLinux got around this... hm... A minor correction: HZ=1024 on Alphas and on ia64 (elsewhere HZ=100). HZ=1024 helps, e.g. it prevents certain kinds of timer-resolved TCP stalls in kernel 2.2 on Alphas. However, recompiling user programs which were built with HZ=100 would be a pain... and one might uncover new problems with the i386 hardware which has not been tested much with HZ=1024. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From ting at fai.fujitsu.com Tue Apr 16 15:08:59 2002 From: ting at fai.fujitsu.com (Ting) Date: Wed Nov 25 01:02:16 2009 Subject: Parallel BLAST - help In-Reply-To: <3450CC8673CFD411A24700105A618BD6267DC4@911TURBO> Message-ID: Hello, All, I have three nodes Beowulf cluster MPI environment up and running now. And download the FASTA from NCBI on the master node. I successful wrote a code to break the data, but unfortunately I could not have the runable code to get the data back from the nodes to the host(master). :-( Can anyone give me some suggestion or web site that I can have the runable code to use? It would help me a lot. Thank you very much. Ting -----Original Message----- From: Steve Gaudet Sent: Monday, April 15, 2002 11:12 AM To: 'William R. Pearson'; beowulf@beowulf.org Subject: RE: Parallel BLAST > -----Original Message----- > From: William R. Pearson > Sent: Sunday, April 14, 2002 10:32 PM > To: beowulf@beowulf.org > Subject: Parallel BLAST > > > > > Why is it that BLAST is not available for MPI/PVM? I would think > > clusters would be the prefect host for such an application. > > Is it there is no need because BLAST is already so fast and > > no one wants to break the database out onto node-resident disks? > > Or is it that BLAST is kept running on single processor or > shared memory > > machines BLAST so that the DB is always in memory ready to > roll without > > loading and doing the same for a cluster is not worth it > > because the same trick is difficult to do on a node given > the current > > way clusters are built? I assume the same is true for FASTA? > > I suspect that BLAST is not available for MPI/PVM because (1) it is > too fast, and (2) there is not much demand for it. > > 95% of the time, BLAST is almost an in-memory grep (the other 5% of > the time it is working on the things it is looking for). Sequence > comparison is embarrassingly parallel, and very easily threaded. > Distributing the sequence databases and collecting results has more > overhead (there probably aren't many distributed grep programs > either). FASTA is 5 - 10X slower than BLAST, and Smith-Waterman is > another 5-20X slower than FASTA. Here, the communications overhead is > low, and distributed systems work OK for FASTA, and great for > Smith-Waterman (where the overhead fraction is very small). > > Of course, it is a lot easier to compile a threaded program, and just > run it, than it is to install and configure the MPI or PVM environment > and the programs to run in it. Bioinformatics software is often run > by computer savvy biologists, not high-performance computing folks, > and not having to install and configure PVM/MPI is a big advantage. > The NCBI probably does not make a PVM/MPI parallel BLAST because there > is very little demand for it, and it does not meet their computational > needs. -------------- There's also a commerical version from Turbogenomics. http://www.turbogenomics.com Offering: 1) Ready to go, plug-n-play solution for parallel BLAST 2) Expertise and 20+ years of experience in parallel computing 3) Dynamic database splitting feature to take advantage of computers that have less memory than the size of the database 4) Smart load balancing - achieve linear to superlinear speedup 5) No modification made to the NCBI BLAST algorithm to ensure identical results with the non-parallel version 6) Easy drop-in update whenever NCBI releases newer versions of their algorithm 7) Excellent support 8) 30-days money back guarantee Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From aby_sinha at yahoo.com Tue Apr 16 18:18:16 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Wed Nov 25 01:02:16 2009 Subject: Dual Xeon Clusters Message-ID: <3CBCCD58.1080104@yahoo.com> Hi list members, I am building a dual Xeon 4-node cluster; My understanding of hyperthreading leaves me to conclusion that it depends largely on the code to benefit from it. Otherwise in many cases the performance can become worse than before using a hyperthreaded Xeon processors. My question is ; Are there any benchmarks available for the benchmarking of the Xeon processors in hyperthreaded mode ? Will the normal benchmarks that we use ..work on these systems and would it give a fair glance at the power of the Xeon .? If not what other way can i find the performance of Xeon processors in a clustered env. I am using 2.2 Ghz Xeon processors on an E7500 chipset. Thanks in advance to all Abhishek From ron_chen_123 at yahoo.com Tue Apr 16 20:58:31 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:16 2009 Subject: Data management on Beowulf Clusters? Message-ID: <20020417035831.90871.qmail@web14703.mail.yahoo.com> Hi, Is data management a real issue on Beowulf clusters? Does anyone have problems moving data from one node to another, or finds rcp not enough? I recently discovered that the Globus project has released the Globus ToolKit 2.0, which has some components for data grids. Here are some of their nice features that we may be able to take advantage of: 1) do data I/O accounting. 2) we don't depend on a shared filesystem anymore. 3) better security -- GridFTP is integrated with KRB5. 4) better performance in data transfer. I am wondering if anyone knows if we can take advantage of GridFTP and other components to solve data management problems on beowulf clusters? Any experience is welcome! And lastly, Globus is an opensource, non-profit project. -Ron __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From sp at scali.com Wed Apr 17 00:46:44 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:16 2009 Subject: Dual Xeon Clusters In-Reply-To: <3CBCCD58.1080104@yahoo.com> Message-ID: On Tue, 16 Apr 2002, Abhishek sinha wrote: > Hi list members, > > I am building a dual Xeon 4-node cluster; My understanding of > hyperthreading leaves me to conclusion that it depends largely on the > code to benefit from it. Otherwise in many cases the performance can > become worse than before using a hyperthreaded Xeon processors. My > question is ; Are there any benchmarks available for the benchmarking of > the Xeon processors in hyperthreaded mode ? Will the normal benchmarks > that we use ..work on these systems and would it give a fair glance at > the power of the Xeon .? If not what other way can i find the > performance of Xeon processors in a clustered env. > > I am using 2.2 Ghz Xeon processors on an E7500 chipset. > Hi, First of all you will have to use a 2.4.18 kernel with these E7500 motherboards. Second, if you take a look at the linux-kernel malinglist you will find a patch (originally developed bu Ingo Molnar, enhanced a bit by me) that will do some IRQ balancing on Xeon chipsets (with the stock 2.4.18 kernel i860 and E7500 chipsets are only able to handle interrupts with CPU0). I don't know if this patch has made it to the 2.4.19-pre kernels yet, but you can check them out too. Finally, I have some bad news about HT. I haven't been able to get it to work stable enough with 2.4.18 (haven't tested 2.4.19-pre). The thing is that in the beginning all works fine, but after a random amount of time things start to slow down. Suddenly you find yourself having 'top' using 50% system time which is not normal. Turning off HT in the BIOS solves this. As a side note I can tell you that the PCI architecture on this chipset is _much_ better than on i860 and you can expect it to perform well with high speed interconnects (Myrinet, SCI, GBE). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From shewa at inel.gov Wed Apr 17 06:45:05 2002 From: shewa at inel.gov (Andrew Shewmaker) Date: Wed Nov 25 01:02:16 2009 Subject: again OpenPBS vs SGE References: <20020416193939.95169.qmail@web11403.mail.yahoo.com> Message-ID: <3CBD7C61.4010108@inel.gov> Rayson Ho wrote: >You can download the SGE source from: > >http://gridengine.sunsource.net > I believe you must join the product (at least as an observer) or else you won't see the source in the download section. Andrew From timm at fnal.gov Wed Apr 17 06:48:05 2002 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:02:16 2009 Subject: Dual Xeon Clusters In-Reply-To: <3CBCCD58.1080104@yahoo.com> Message-ID: > Hi list members, > > I am building a dual Xeon 4-node cluster; My understanding of > hyperthreading leaves me to conclusion that it depends largely on the > code to benefit from it. Otherwise in many cases the performance can > become worse than before using a hyperthreaded Xeon processors. My > question is ; Are there any benchmarks available for the benchmarking of > the Xeon processors in hyperthreaded mode ? Will the normal benchmarks > that we use ..work on these systems and would it give a fair glance at > the power of the Xeon .? If not what other way can i find the > performance of Xeon processors in a clustered env. > > I am using 2.2 Ghz Xeon processors on an E7500 chipset. > > Thanks in advance to all > > Abhishek The only way we found to do it in hyperthreading mode under Linux was just to keep on starting two instances of the process until we got one started on either 0 or 1 and the other on 2 or 3. It would be interesting to see a comparison of the SPEC rate benchmarks between the same machine with hyperthreading disabled and two processors, which is what we finally did, and hyperthreading enabled. Steve Timm > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From eugen at leitl.org Wed Apr 17 08:14:39 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer Message-ID: http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 An anonymous reader wrote in to say "Pacific Northwest National Laboratory (US DOE) signed a $24.5 million dollar contract with HP for a Linux supercomputer. This will be one of the top ten fastest computers in the world. Some cool features: 8.3 Trillion Floating Point Operations per Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Today?s announcement shows how HP has worked to help accelerate the shift from proprietary platforms to open architectures, which provide increased scalability, speed and functionality at a lower cost,' said Rich DeMillo, vice president and chief technology officer at HP. Read Details of the announcement here or here." From robl at mcs.anl.gov Wed Apr 17 08:16:05 2002 From: robl at mcs.anl.gov (Robert Latham) Date: Wed Nov 25 01:02:16 2009 Subject: Cannot find -lpvfs In-Reply-To: <3CBC45AD.4060600@fsagx.ac.be> References: <3CBC45AD.4060600@fsagx.ac.be> Message-ID: <20020417151605.GG5243@mcs.anl.gov> On Tue, Apr 16, 2002 at 05:39:25PM +0200, David Leunen wrote: > We've installed Scyld Beowulf 27bz-8 on our cluster. But we cannot make > the mpich examples to link the .o files. Here is the error we get: > > /usr/bin/ld: cannot find -lpvfs > > This error is thrown on every try to link an mpi program... > any idea? -lpvfs ... that's the PVFS library. check if you have the pvfs-devel rpm installed. If it *is* installed, your mpicc needs to specify where to find it in LDFLAGS. The scyld guys are quite good at integrating all the software pieces though, so i bet this is not the case :> ==rob -- Rob Latham A215 0178 EA2D B059 8CDF B29D F333 664A 4280 315B From raysonlogin at yahoo.com Wed Apr 17 08:24:48 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:16 2009 Subject: again OpenPBS vs SGE In-Reply-To: <3CBD7C61.4010108@inel.gov> Message-ID: <20020417152448.49905.qmail@web11406.mail.yahoo.com> I am an observer of the project, but I never need to logon to the server to download source via cvs. http://gridengine.sunsource.net/servlets/ProjectSource But I think you need to logon to download the source code archives. Rayson --- Andrew Shewmaker wrote: > Rayson Ho wrote: > > >You can download the SGE source from: > > > >http://gridengine.sunsource.net > > > I believe you must join the product (at least as an observer) or else > you won't see the source > in the download section. > > Andrew > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Wed Apr 17 10:14:15 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:16 2009 Subject: Dual Xeon Clusters In-Reply-To: Message-ID: > The only way we found to do it in hyperthreading mode under Linux was > just to keep on starting two instances of the process until we got one > started on either 0 or 1 and the other on 2 or 3. It would be there are a number of cpu-affinity patches for 2.4 and 2.5 (I doubt anyone has bothered with 2.2) From leandro at ep.petrobras.com.br Wed Apr 17 10:17:57 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Wed Nov 25 01:02:16 2009 Subject: HIGH MEM suport for up to 64GB Message-ID: <1019063877.1795.13.camel@linux60> Hi everyone, I am writing to ask to you all if anyone have tesed or used an machine with more than 4GB of RAM or paging in virtual memory on intel machines. He have an linux beowulf cluster and one of ours developers have asked us for how much memory an process can allocate to use. In the tests we have made, we cannot allocate much more than 3GB, using an dual PIII with 1GB of ram and 12Gb of swap area for testing. We can use 2 process alocating more or less 3Gb, but one process alone canot pass this test. We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High Mem suport. I have tested the same test aplication on an Itanium machine, with 1GB of ram and 16Gb of swap area, and they passed. The aplication can alocate more than 5GB of memory, using swap. In this machine, we are using turbolinux 7, with kernel version 2.4.4-010508-18smp. Thanks in advance for the help, Best regards, -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From leandro at ep.petrobras.com.br Wed Apr 17 10:30:54 2002 From: leandro at ep.petrobras.com.br (Leandro Tavares Carneiro) Date: Wed Nov 25 01:02:16 2009 Subject: HIGH MEM suport for up to 64GB Message-ID: <1019064654.1795.18.camel@linux60> Hi, Anyone have tesed or used an machine with more than 4GB of RAM or paging in virtual memory on intel machines? He have an linux beowulf cluster and one of ours developers have asked us for how much memory an process can allocate to use. In the tests we have made, we cannot allocate much more than 3GB, using an dual PIII with 1GB of ram and 12Gb of swap area for testing. We can use 2 process alocating more or less 3Gb, but one process alone canot pass this test. We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High Mem suport. I have tested the same test aplication on an Itanium machine, with 1GB of ram and 16Gb of swap area, and they passed. The aplication can alocate more than 5GB of memory, using swap. In this machine, we are using turbolinux 7, with kernel version 2.4.4-010508-18smp. If this works, we can improve our applications. Thanks in advance for the help, and sorry about my bad english. Best regards, -- Leandro Tavares Carneiro Analista de Suporte EP-CORP/TIDT/INFI Telefone: 2534-1427 From sp at scali.com Wed Apr 17 11:44:10 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:16 2009 Subject: HIGH MEM suport for up to 64GB In-Reply-To: <1019064654.1795.18.camel@linux60> Message-ID: On 17 Apr 2002, Leandro Tavares Carneiro wrote: > Hi, > > Anyone have tesed or used an machine with more than 4GB of RAM or paging > in virtual memory on intel machines? > He have an linux beowulf cluster and one of ours developers have asked > us for how much memory an process can allocate to use. In the tests we > have made, we cannot allocate much more than 3GB, using an dual PIII > with 1GB of ram and 12Gb of swap area for testing. > We can use 2 process alocating more or less 3Gb, but one process alone > canot pass this test. > We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High > Mem suport. > I have tested the same test aplication on an Itanium machine, with 1GB > of ram and 16Gb of swap area, and they passed. The aplication can > alocate more than 5GB of memory, using swap. In this machine, we are > using turbolinux 7, with kernel version 2.4.4-010508-18smp. > If this works, we can improve our applications. > There is simply no way you can make a 32 bit machine address more than 4GB of memory in a single application simply because memory pointers are only 32 bit (2^32 = 4GB). The reason why you can only address 3GB on Linux is that normally 1GB of the virtual memory area is reserved for the kernel (can be trimmed down to 512MB, which gives you 3.5GB accessible from userspace). Sure, you can have several applications each using 3GB if you have the memory for it (either swap or real) providing that you use the 64GB option in Linux. Huge memory requirement from applications is one of the reasons people choose a 64bit platform (ppc, sparc, s390, alpha and ia64). Regards, -- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.com | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency From rbw at ahpcrc.org Wed Apr 17 12:31:31 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer Message-ID: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org> Eugene Leitel wrote: >http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 >An anonymous reader wrote in to say "Pacific Northwest National Laboratory >(US DOE) signed a $24.5 million dollar contract with HP for a Linux >supercomputer. This will be one of the top ten fastest computers in the >world. Some cool features: 8.3 Trillion Floating Point Operations per >Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB >SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Todays >announcement shows how HP has worked to help accelerate the shift from >proprietary platforms to open architectures, which provide increased >scalability, speed and functionality at a lower cost,' said Rich DeMillo, >vice president and chief technology officer at HP. Read Details of the >announcement here or here." Mmmm ... working through some numbers ... 8.3 TFLOPS (if they are quoting peak) with 1400 processors would mean they are getting chips with 1.5 GHz clocks (peak performance would be 6 GFLOPS per chip [4 ops per clock]). Stream numbers for this 1.5 GHz chip (estimated) would be around 250 MFLOPS for the triad. Using the triad as a baseline for performance for this and several others systems and relating it back to some estimated cost for several other systems (government purchase price only, no recurring costs) this is $70 per MFLOPS sustained for the Mckinley (again using triad) ... or more than the CRAY SV2 ($65), EV6($55), EV7 ($50), Pentium 4 ($30). Interesting number ... the high-end IA-64 stuff does not look cheap when stream triad defines sustained performance. Of course, blocking for cache will push the sustained number up (maybe alot and on all the systems), but you would think that QCHEM stuff they run at PNNL (G98) will be mostly memory bound and therefore the stream triad sustained performance is not too far off. I am not sure this looks like a very good deal. rbw #--------------------------------------------------- # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # "Why waste time learning when ignornace is # instantaneous?" -Thomas Hobbes #--------------------------------------------------- # "In the chaos of a river thrashing, all that water # still has to stand in line." -Dave Dobbyn #--------------------------------------------------- From mfischer at mufasa.informatik.uni-mannheim.de Wed Apr 17 10:33:43 2002 From: mfischer at mufasa.informatik.uni-mannheim.de (Markus Fischer) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <5.1.0.14.0.20020416122156.05491530@62.70.89.10> Message-ID: On Tue, 16 Apr 2002, [iso-8859-1] Håkon Bugge wrote: >1) Performance. > >Performance transparency is always goal. Nevertheless, sometimes an >implementation will have a performance bug. The two organizations owning >the mentioned systems, have both support agreements with Scali. I have >checked the support requests, but cannot find any request where your >incidents were reported. We find this fact strange if you truly were aiming >at achieving good performance. We are happy to look into your application >and report findings back to this news group. I don't think we have a performance bug. We have developed a real world application using frequent communication and have tested/run it on multiple systems. We do not intend to modify our algorithms to try to get better performance on a particular system. If people need help for gaining performance on a particular system, then this platform is not a target again if I can not do the tuning by myself, which we did. Not all codes are PD which makes the point before also important. >2) Startup time. > >You contribute the bad scalability to high startup time and mapping of >memory. This is an interesting hypothesis; and can easily be verified by No, I said that with larger numbers of nodes (I would like to talk about >100 , but here I mean more than 16) the scalability is limited (amount spent in communication increases significantly and speedup values decrease after a certain number of nodes) and yes the startup time also increases, which I thought to be caused by the SCI mechanisms of exporting/mapping mem). >using a switch when you start the program, and measure the difference >between the elapsed time of the application and the time it uses after >MPI_Init() has been called. However, the startup time measured on 64-nodes, >two processors per node, where all processes have set up mapping to all >other processes, is nn second. If this contributes to bad scalability, your >application has a very short runtime. I certainly think that scalability has nothing to do with startup time. And I just checked my earlier posting on this. > >3) SCI ring structure > >You state that on a multi user, multi-process environment, it is hard to >get deterministic performance numbers. Indeed, that is true. True sharing >of resources implies that. Whether the resource is a file-server, a memory >controller, or a network component, you will probably always be subject to >performance differences. Also, lack of page coloring will contribute to I think that when running on a dedicated partition of a cluster, I would not like to receive a significant impact from other applications because their communication increases nor would I like to influence my advisor's application. >different execution times, even for a sequential program. You further >indicate that performance numbers reported f. ex. by Pallas PMB benchmark >only can be used for applying for more VC. I disagree for two reasons; >first, you imply that venture capitalists are naive (and to some extent >stupid). That is not my impression, merely the opposite. Secondly, such >numbers are a good example to verify/deny your hypothesis that the SCI ring >structure is volatile to traffic generated by other applications. PMB's >*multi* option is architected to investigate exactly the problem you >mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how >performance is affected when the other N/2 of the machine is also running >Alltoall(). This is the reason we are interested in comparative performance >numbers to SCI based systems. It is to me strange, that no Pallas PMB >benchmark results ever has been published for a reasonable sized system >based on alternative interconnect technologies. To quote Lord Kelvin: "If >you haven't measured it, you don't know what you're talking about". > >As a bottom line, I would appreciate that initiatives to compare cluster >interconnect performance should be appreciated, rather than be scrutinized >and be phrased as "only usable to apply for more VC". > what's the goal then of having marketing statements which can not be applied in general in a .signature ? there is also PD SCI-MPICH which from reading papers applies for the same statement. Markus > >H >At 11:40 AM 4/15/02 +0200, Markus Fischer wrote: >>Steffen Persvold wrote: >> > >> > Now we have price comparisons for the interconnects (SCI,Myrinet and >> > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for >> > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132 >> > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII >> ServerWorks >> > HE-SL based cluster). >> >>yes, please. >> >>I would like to get/see some numbers. >>I have run tests with SCI for a non linear diffusion algorithm on a 96 node >>cluster with 32/33 interface. I thought that the poor >>scalability was due to the older interface, so I switched to >>a SCI system with 32 nodes and 64/66 interface. >> >>Still, the speedup values were behaving like a dog with more than 8 nodes. >> >>Especially, the startup time will reach minutes which is probably due to >>the exporting and mapping of memory. >> >>Yes, the MPI library used was Scampi. Thus, I think the >>(marketing) numbers you provide >>below are not relevant except for applying for more VC. >> >>Even worse, we noticed, that the SCI ring structure has an impact on the >>communication pattern/performance of other applications. >>This means we only got the same execution time if other nodes were >>I idle or did not have communication intensive applications. >>How will you determine the performance of the algorithm you just invented >>in such a case ? >> >>We then used a 512 node cluster with Myrinet2000. The algorithm scaled >>very fine up to 512 nodes. >> >>Markus >> >> > >> > Regards, >> > -- >> > Steffen Persvold | Scalable Linux Systems | Try out the world's best >> > mailto:sp@scali.com | http://www.scali.com | performing MPI >> implementation: >> > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 - >> > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS >> latency >> > >> > _______________________________________________ >> > Beowulf mailing list, Beowulf@beowulf.org >> > To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf > >-- >Håkon Bugge; VP Product Development; Scali AS; >mailto:hob@scali.no; http://www.scali.com; fax: +47 22 62 89 51; >Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514; >Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway; >Mail Addr: Scali AS, Postboks 150, Oppsal, N-0619 Oslo, Norway; > > From rocky at atipa.com Wed Apr 17 13:02:40 2002 From: rocky at atipa.com (Rocky McGaugh) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org> Message-ID: On Wed, 17 Apr 2002, Richard Walsh wrote: > > Eugene Leitel wrote: > > >http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 > > >An anonymous reader wrote in to say "Pacific Northwest National Laboratory > >(US DOE) signed a $24.5 million dollar contract with HP for a Linux > >supercomputer. This will be one of the top ten fastest computers in the > >world. Some cool features: 8.3 Trillion Floating Point Operations per > >Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB > >SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Todays > >announcement shows how HP has worked to help accelerate the shift from > >proprietary platforms to open architectures, which provide increased > >scalability, speed and functionality at a lower cost,' said Rich DeMillo, > >vice president and chief technology officer at HP. Read Details of the > >announcement here or here." > > Mmmm ... working through some numbers ... > > 8.3 TFLOPS (if they are quoting peak) with 1400 processors > would mean they are getting chips with 1.5 GHz clocks (peak > performance would be 6 GFLOPS per chip [4 ops per clock]). > I'll give ya an even 9.0 Tflops (peak, of course) for $3million. Myrinet included. Storage not. Can't deliver till June though, as we're installing a 7.3 at the moment. -- Rocky McGaugh Atipa Technologies rocky@atipatechnologies.com rmcgaugh@atipa.com 1-785-841-9513 x3110 http://1087800222/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' From tim.carlson at pnl.gov Wed Apr 17 14:16:01 2002 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org> Message-ID: On Wed, 17 Apr 2002, Richard Walsh wrote: > Stream numbers for this 1.5 GHz chip (estimated) would be around > 250 MFLOPS for the triad. Using the triad as a baseline for performance > for this and several others systems and relating it back to > some estimated cost for several other systems (government purchase > price only, no recurring costs) this is $70 per MFLOPS sustained > for the Mckinley (again using triad) ... or more than the CRAY SV2 > ($65), EV6($55), EV7 ($50), Pentium 4 ($30). I don't think you can calculate the cost at $70 without subtracting out a few million dollars for various parts. Off the top of my head 1) 53 TB SAN 2) 1.8 TB RAM 3) Quadrics interconnect 4) 117TB local storage The list just had a discussion about the cost of Quadrics. People were guessing something like 3K per box? I haven't seen the actual breakdown of costs, and I'm sure it is under some NDA anyway. Having said that, I don't work for the MSCF folks, and I don't want to speak for them. :) Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson@pnl.gov EMSL UNIX System Support From raysonlogin at yahoo.com Wed Apr 17 15:38:22 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: Message-ID: <20020417223822.75469.qmail@web11407.mail.yahoo.com> What OS does the machines run?? If it is IA64 HP-UX, we need to subtract out a few more million dollars... Rayson --- Tim Carlson wrote: > On Wed, 17 Apr 2002, Richard Walsh wrote: > > > Stream numbers for this 1.5 GHz chip (estimated) would be around > > 250 MFLOPS for the triad. Using the triad as a baseline for > performance > > for this and several others systems and relating it back to > > some estimated cost for several other systems (government purchase > > price only, no recurring costs) this is $70 per MFLOPS sustained > > for the Mckinley (again using triad) ... or more than the CRAY SV2 > > ($65), EV6($55), EV7 ($50), Pentium 4 ($30). > > I don't think you can calculate the cost at $70 without subtracting > out a > few million dollars for various parts. Off the top of my head > > 1) 53 TB SAN > 2) 1.8 TB RAM > 3) Quadrics interconnect > 4) 117TB local storage > > The list just had a discussion about the cost of Quadrics. People > were > guessing something like 3K per box? > > I haven't seen the actual breakdown of costs, and I'm sure it is > under > some NDA anyway. > > Having said that, I don't work for the MSCF folks, and I don't want > to > speak for them. :) > > Tim Carlson > Voice: (509) 376 3423 > Email: Tim.Carlson@pnl.gov > EMSL UNIX System Support > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From lindahl at conservativecomputer.com Wed Apr 17 15:51:48 2002 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org>; from rbw@ahpcrc.org on Wed, Apr 17, 2002 at 02:31:31PM -0500 References: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org> Message-ID: <20020417155148.A2248@wumpus.skymv.com> On Wed, Apr 17, 2002 at 02:31:31PM -0500, Richard Walsh wrote: > 8.3 TFLOPS (if they are quoting peak) with 1400 processors > would mean they are getting chips with 1.5 GHz clocks (peak > performance would be 6 GFLOPS per chip [4 ops per clock]). This bid is for an install in the future, and it involves a combination of McKinley and Madison parts. I don't believe that Intel has made Madison's specs available, nor has HP made the specs of the chipset they'll be using available. It's likely that they aren't quoting peak; PNL prefers figures like the actual speed of matrix-matrix multiple (DGEMM). Now the Itanium is reasonably good at delivering a nice % of peak for DGEMM, but it's not the same as peak. It's a lot more fair number to use than peak, and gives you a good idea of what the Top500 Linpack score will be. > this is $70 per MFLOPS sustained > for the Mckinley (again using triad) ... or more than the CRAY SV2 > ($65), EV6($55), EV7 ($50), Pentium 4 ($30). And Cray SV2 figures aren't publically available either. Cough. > but you would think that QCHEM stuff > they run at PNNL (G98) will be mostly memory bound and therefore > the stream triad sustained performance is not too far off. As you might guess, the bid required that you benchmark PNL's actual codes at PNL's actual data sizes. I don't believe that your analysis is correct. Alas, I can no longer say that FSL was the only time a Linux cluster won a traditional supercomputing bid. greg From shaeffer at neuralscape.com Wed Apr 17 08:37:03 2002 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <20020417223822.75469.qmail@web11407.mail.yahoo.com>; from raysonlogin@yahoo.com on Wed, Apr 17, 2002 at 03:38:22PM -0700 References: <20020417223822.75469.qmail@web11407.mail.yahoo.com> Message-ID: <20020417083703.A30407@synapse.neuralscape.com> On Wed, Apr 17, 2002 at 03:38:22PM -0700, Rayson Ho wrote: > What OS does the machines run?? If it is IA64 HP-UX, we need to > subtract out a few more million dollars... > > Rayson > CNET says it will run Linux. http://news.com.com/2100-1001-884297.html cheers, Karen -- Karen Shaeffer Neuralscape; Santa Cruz, Ca. 95060 shaeffer@neuralscape.com http://www.neuralscape.com From tim.carlson at pnl.gov Wed Apr 17 18:54:11 2002 From: tim.carlson at pnl.gov (Tim Carlson) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <20020417223822.75469.qmail@web11407.mail.yahoo.com> Message-ID: On Wed, 17 Apr 2002, Rayson Ho wrote: > What OS does the machines run?? If it is IA64 HP-UX, we need to > subtract out a few more million dollars... If it ran HP-UX, I don't think they would have bought it :) It will run Linux. And as Greg pointed out, for some of our core applications (NWChem for one), the machine may be the best bang for the buck. Could you get more Tflops for less money? Of course you could. Factor in the fact that you need fast access to the SAN, sustained saturation of the interconnect, some horendous amount of memory bandwidth, etc, etc. The RFP had some pretty specific (and... err... odd) requirements. Again.. I don't speak for the MSCF, I know nothing of the other bids, disclaimer, disclaimer, discaimer :) One thing I will say is that from my office which is a good 50 yards from the big computer room, I should be able to feel the heat this thing is going to put out. Tim Carlson Voice: (509) 376 3423 Email: Tim.Carlson@pnl.gov EMSL UNIX System Support From ron_chen_123 at yahoo.com Wed Apr 17 20:58:37 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:16 2009 Subject: again OpenPBS vs SGE In-Reply-To: <200204170816.MAA08490@nocserv.free.net> Message-ID: <20020418035837.64134.qmail@web14702.mail.yahoo.com> In fact, SGE 5.3 is the newest production version. See the "announce" mailing-list for details. But Sun haven't update the official web site yet -- may be due to marketing strategy or something? -Ron --- Mikhail Kuzminsky wrote: > According to Rayson Ho > > From raysonlogin@yahoo.com Tue Apr 16 23:39:42 > 2002 > > Date: Tue, 16 Apr 2002 12:39:39 -0700 (PDT) > > From: Rayson Ho > > Subject: Re: again OpenPBS vs SGE > > To: Mikhail Kuzminsky , > beowulf@beowulf.org > > ... > > > > > 2) The schedule algorithms are restricted to > only one > > > default (this is inconsistent w/Chris Black > message, as > > > I understand) > > > > You talking about SGE 5.2.x? > Yes, I wrote about 5.2.3.1 which is last > "production" version > currently available. > > > Chris Black must be talking about SGE 5.3, which > has several advanced > > nice scheduler features: > > > > > http://www.hardi.se/products/literature/sun_grid_engine.pdf > > > > Mikhail Kuzminsky > Zelinsky Institute of Organic Chemistry > Moscow > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From joachim at lfbs.RWTH-Aachen.DE Thu Apr 18 01:56:59 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CBE8A5B.AADEC997@lfbs.rwth-aachen.de> Markus Fischer wrote: > I don't think we have a performance bug. We have developed > a real world application using frequent communication and > have tested/run it on multiple systems. I think Hakon was thinking of a performance bug in ScaMPI (the MPI library), not in your application. > No, I said that with larger numbers of nodes (I would like to talk > about >100 , but here I mean more than 16) the scalability is limited > (amount spent in communication increases significantly and speedup > values decrease after a certain number of nodes) and yes > the startup time also increases, which I thought to be caused > by the SCI mechanisms of exporting/mapping mem). If you could give some numbers, it would help very much. And which kind of communication pattern is used in this application? Which MPI communication calls, which message sizes? > there is also PD SCI-MPICH which from reading papers applies for > the same statement. I am the author of SCI-MPICH. I do not understand the meaning of this sentence of yours ("applies for the same statement"). What are you refering to? Anyway, I would be happy to test your application with SCI-MPICH on our cluster. You may just want to sent me an object file linked to dynamic MPICH libraries, if you can not publish the source code. My bottom line is: I do not consider it good style to publically blaim a product for bad performance without having checked back with the people behind this product, and being a consultant for another product at the same time. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From alex at compusys.co.uk Thu Apr 18 01:58:16 2002 From: alex at compusys.co.uk (alex@compusys.co.uk) Date: Wed Nov 25 01:02:16 2009 Subject: Intial Pallas performance with Myrinet on a 860 & E7500 Message-ID: For your information, please look at the following performance measurements for the 'C' class Myrinet2000 cards. Details of the two machines (optimisation level: -fast & PGI): - 2.4.17 kernel - mpich-1.2.1..7b - gm-1.5.1 - measurement performed between machines 860 Supermicro DCE: - Dual P4 2 GHz - C class Myrinet2000 The new E75000 Supermicro DDR : - Dual P4 1.8GHz - C class Myrinet2000, using PCI-X slot Notice the results for E75000 Sendrecv (and Exchange): 4194304 --> 290.29Mbytes/s That is more than the serverworks LE chipset. Alex (shown results are limited due to mailing list size limit) ///////////////////////// E75000 ////////////////////////////////////// #--------------------------------------------------- # Benchmarking PingPong # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 8.79 0.00 1 1000 8.96 0.11 2 1000 8.94 0.21 4 1000 8.96 0.43 8 1000 8.96 0.85 16 1000 9.03 1.69 32 1000 9.32 3.27 64 1000 9.44 6.47 128 1000 12.14 10.06 2097152 20 8726.80 229.18 4194304 10 17300.95 231.20 #--------------------------------------------------- # Benchmarking PingPing # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 11.97 0.00 1 1000 12.64 0.08 2 1000 12.09 0.16 4 1000 12.87 0.30 8 1000 12.33 0.62 16 1000 12.36 1.23 32 1000 11.73 2.60 64 1000 11.90 5.13 128 1000 14.65 8.33 2097152 20 13792.25 145.01 4194304 10 27535.80 145.27 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 12.60 12.60 12.60 0.00 1 1000 12.68 12.69 12.68 0.15 2 1000 12.76 12.76 12.76 0.30 4 1000 12.73 12.73 12.73 0.60 8 1000 12.52 12.53 12.53 1.22 16 1000 12.59 12.59 12.59 2.42 32 1000 11.74 11.74 11.74 5.20 64 1000 11.81 11.81 11.81 10.34 128 1000 14.41 14.42 14.42 16.93 2097152 20 13778.64 13778.80 13778.72 290.30 4194304 10 27558.40 27558.70 27558.55 290.29 #----------------------------------------------------------------------------- # Benchmarking Exchange # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 20.93 20.93 20.93 0.00 1 1000 20.96 20.97 20.97 0.18 2 1000 20.96 20.97 20.96 0.36 4 1000 20.96 20.97 20.97 0.73 8 1000 21.03 21.05 21.04 1.45 16 1000 21.01 21.02 21.02 2.90 32 1000 21.30 21.30 21.30 5.73 64 1000 21.33 21.34 21.33 11.44 128 1000 23.98 23.98 23.98 20.36 2097152 20 27563.40 27563.60 27563.50 290.24 4194304 10 55052.30 55053.10 55052.70 290.63 #---------------------------------------------------------------- # Benchmarking Allreduce # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.15 0.15 0.15 4 1000 19.22 19.23 19.23 8 1000 19.25 19.26 19.25 16 1000 19.31 19.33 19.32 32 1000 20.02 20.03 20.03 64 1000 20.39 20.40 20.40 128 1000 25.96 25.97 25.96 2097152 20 30421.15 30422.15 30421.65 4194304 10 67887.70 67889.70 67888.70 #---------------------------------------------------------------- # Benchmarking Reduce # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.08 0.08 0.08 4 1000 10.01 10.02 10.01 8 1000 10.01 10.02 10.01 16 1000 10.06 10.07 10.07 32 1000 10.40 10.41 10.40 64 1000 10.62 10.63 10.63 128 1000 14.25 14.26 14.25 2097152 20 25153.70 25231.65 25192.67 4194304 10 49852.60 50856.40 50354.50 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.53 0.55 0.54 4 1000 21.00 21.00 21.00 8 1000 21.39 21.40 21.39 16 1000 21.29 21.30 21.29 32 1000 21.70 21.71 21.70 64 1000 22.37 22.38 22.37 128 1000 25.39 25.40 25.40 2097152 20 41247.20 41436.80 41342.00 4194304 10 70550.10 70943.20 70746.65 #---------------------------------------------------------------- # Benchmarking Allgather # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.16 12.16 12.16 1 1000 12.61 12.61 12.61 2 1000 12.52 12.52 12.52 4 1000 12.41 12.41 12.41 8 1000 12.69 12.69 12.69 16 1000 12.71 12.71 12.71 32 1000 13.26 13.27 13.26 64 1000 13.06 13.06 13.06 128 1000 17.72 17.73 17.72 2097152 20 23349.75 23350.30 23350.03 4194304 10 38150.00 38151.60 38150.80 #---------------------------------------------------------------- # Benchmarking Allgatherv # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.48 12.48 12.48 1 1000 12.41 12.42 12.42 2 1000 12.16 12.17 12.16 4 1000 12.21 12.21 12.21 8 1000 12.52 12.52 12.52 16 1000 12.35 12.35 12.35 32 1000 13.09 13.09 13.09 64 1000 12.80 12.80 12.80 128 1000 17.19 17.19 17.19 2097152 20 19057.95 19058.55 19058.25 4194304 10 37964.00 37965.39 37964.70 #---------------------------------------------------------------- # Benchmarking Alltoall # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.44 12.44 12.44 1 1000 12.72 12.72 12.72 2 1000 12.50 12.50 12.50 4 1000 12.56 12.56 12.56 8 1000 12.56 12.56 12.56 16 1000 12.75 12.75 12.75 32 1000 12.86 12.86 12.86 64 1000 13.73 13.73 13.73 128 1000 17.87 17.87 17.87 2097152 20 19927.10 19927.60 19927.35 4194304 10 39608.10 39609.79 39608.94 #---------------------------------------------------------------- # Benchmarking Bcast # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.09 0.09 0.09 1 1000 9.09 9.09 9.09 2 1000 9.07 9.08 9.07 4 1000 9.06 9.07 9.07 8 1000 9.09 9.10 9.09 16 1000 9.15 9.16 9.16 32 1000 9.44 9.44 9.44 64 1000 9.56 9.57 9.56 128 1000 11.63 11.64 11.63 2097152 20 8740.05 8740.20 8740.12 4194304 10 17313.69 17313.90 17313.80 #--------------------------------------------------- # Benchmarking Barrier # ( #processes = 2 ) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 12.45 12.45 12.45 /////////////////////////// 860 ////////////////////////////////// #--------------------------------------------------- # Benchmarking PingPong # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 8.93 0.00 1 1000 9.14 0.10 2 1000 9.17 0.21 4 1000 9.14 0.42 8 1000 9.41 0.81 16 1000 9.54 1.60 32 1000 9.85 3.10 64 1000 10.06 6.06 128 1000 12.77 9.56 2097152 20 11924.45 167.72 4194304 10 23752.65 168.40 #--------------------------------------------------- # Benchmarking PingPing # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 11.94 0.00 1 1000 12.47 0.08 2 1000 12.83 0.15 4 1000 13.02 0.29 8 1000 12.41 0.61 16 1000 12.82 1.19 32 1000 12.07 2.53 64 1000 12.27 4.98 128 1000 14.50 8.42 2097152 20 21075.20 94.90 4194304 10 42104.29 95.00 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 12.20 12.21 12.20 0.00 1 1000 12.83 12.84 12.83 0.15 2 1000 12.85 12.85 12.85 0.30 4 1000 12.61 12.62 12.62 0.60 8 1000 12.63 12.63 12.63 1.21 16 1000 12.55 12.55 12.55 2.43 32 1000 12.40 12.40 12.40 4.92 64 1000 12.70 12.70 12.70 9.61 128 1000 14.54 14.55 14.55 16.78 2097152 20 21075.50 21075.85 21075.67 189.79 4194304 10 42110.90 42112.10 42111.50 189.97 #----------------------------------------------------------------------------- # Benchmarking Exchange # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 21.23 21.23 21.23 0.00 1 1000 21.56 21.63 21.59 0.18 2 1000 21.56 21.57 21.56 0.35 4 1000 21.49 21.49 21.49 0.71 8 1000 21.63 21.63 21.63 1.41 16 1000 21.68 21.68 21.68 2.81 32 1000 21.87 21.88 21.88 5.58 64 1000 22.16 22.16 22.16 11.02 128 1000 24.66 24.66 24.66 19.80 2097152 20 42154.20 42155.05 42154.62 189.78 4194304 10 84224.61 84225.20 84224.90 189.97 #---------------------------------------------------------------- # Benchmarking Allreduce # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.15 0.16 0.16 4 1000 19.93 19.94 19.94 8 1000 20.31 20.33 20.32 16 1000 20.77 20.78 20.78 32 1000 21.54 21.55 21.55 64 1000 21.96 21.97 21.96 128 1000 26.05 26.06 26.05 2097152 20 36295.15 36300.15 36297.65 4194304 10 72057.59 72060.60 72059.09 #---------------------------------------------------------------- # Benchmarking Reduce # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.09 0.09 0.09 4 1000 10.48 10.49 10.49 8 1000 10.92 10.93 10.92 16 1000 11.03 11.04 11.03 32 1000 11.40 11.41 11.40 64 1000 11.65 11.66 11.65 128 1000 13.85 13.86 13.86 2097152 20 24145.65 24442.65 24294.15 4194304 10 47357.39 48542.51 47949.95 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.65 0.66 0.66 4 1000 22.04 22.05 22.05 8 1000 22.72 22.73 22.73 16 1000 23.31 23.32 23.31 32 1000 23.89 23.90 23.90 64 1000 24.45 24.46 24.45 128 1000 26.94 26.95 26.94 2097152 20 33828.60 33844.10 33836.35 4194304 10 67314.90 67377.90 67346.40 #---------------------------------------------------------------- # Benchmarking Allgather # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.16 12.16 12.16 1 1000 12.32 12.32 12.32 2 1000 12.33 12.33 12.33 4 1000 12.36 12.36 12.36 8 1000 12.61 12.61 12.61 16 1000 12.71 12.71 12.71 32 1000 13.04 13.04 13.04 64 1000 13.86 13.86 13.86 128 1000 17.59 17.59 17.59 2097152 20 26607.50 26608.05 26607.78 4194304 10 53338.10 53338.91 53338.50 #---------------------------------------------------------------- # Benchmarking Allgatherv # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.18 12.18 12.18 1 1000 12.35 12.35 12.35 2 1000 12.30 12.30 12.30 4 1000 12.32 12.32 12.32 8 1000 12.52 12.53 12.53 16 1000 12.91 12.91 12.91 32 1000 13.11 13.11 13.11 64 1000 13.73 13.73 13.73 128 1000 17.58 17.58 17.58 2097152 20 26836.70 26838.00 26837.35 4194304 10 53090.61 53091.80 53091.20 #---------------------------------------------------------------- # Benchmarking Alltoall # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 12.86 12.86 12.86 1 1000 12.85 12.85 12.85 2 1000 13.19 13.19 13.19 4 1000 13.01 13.01 13.01 8 1000 13.23 13.23 13.23 16 1000 13.43 13.44 13.44 32 1000 13.78 13.78 13.78 64 1000 14.41 14.41 14.41 128 1000 18.18 18.18 18.18 2097152 20 27169.85 27170.25 27170.05 4194304 10 54303.90 54304.40 54304.15 #---------------------------------------------------------------- # Benchmarking Bcast # ( #processes = 2 ) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.10 0.08 1 1000 9.29 9.30 9.29 2 1000 9.23 9.24 9.24 4 1000 9.25 9.26 9.26 8 1000 9.54 9.55 9.55 16 1000 9.69 9.69 9.69 32 1000 9.96 9.98 9.97 64 1000 10.14 10.15 10.15 128 1000 12.21 12.22 12.21 2097152 20 11937.15 11937.50 11937.32 4194304 10 23764.00 23764.71 23764.35 #--------------------------------------------------- # Benchmarking Barrier # ( #processes = 2 ) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 12.90 12.90 12.90 From alex at compusys.co.uk Thu Apr 18 02:58:03 2002 From: alex at compusys.co.uk (alex@compusys.co.uk) Date: Wed Nov 25 01:02:16 2009 Subject: Intial Pallas performance with Myrinet on a 860 & E7500 Message-ID: For your information, please look at the following performance measurements for the 'C' class Myrinet2000 cards. Details of the two machines (optimisation level: -fast & PGI): - 2.4.17 kernel - mpich-1.2.1..7b - gm-1.5.1 - measurement performed between machines 860 Supermicro DCE: - Dual P4 2 GHz - C class Myrinet2000 The new E75000 Supermicro DDR : - Dual P4 1.8GHz - C class Myrinet2000, using PCI-X slot Notice the results for E75000 Sendrecv: 4194304 --> 290.29Mbytes/s That is more than the serverworks LE chipset. Alex (shown results are limited due to mailing list size limit) ///////////////////////// E75000 ////////////////////////////////////// #--------------------------------------------------- # Benchmarking PingPong # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 8.79 0.00 1 1000 8.96 0.11 2 1000 8.94 0.21 4 1000 8.96 0.43 8 1000 8.96 0.85 16 1000 9.03 1.69 32 1000 9.32 3.27 64 1000 9.44 6.47 128 1000 12.14 10.06 2097152 20 8726.80 229.18 4194304 10 17300.95 231.20 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 12.60 12.60 12.60 0.00 1 1000 12.68 12.69 12.68 0.15 2 1000 12.76 12.76 12.76 0.30 4 1000 12.73 12.73 12.73 0.60 8 1000 12.52 12.53 12.53 1.22 16 1000 12.59 12.59 12.59 2.42 32 1000 11.74 11.74 11.74 5.20 64 1000 11.81 11.81 11.81 10.34 128 1000 14.41 14.42 14.42 16.93 2097152 20 13778.64 13778.80 13778.72 290.30 4194304 10 27558.40 27558.70 27558.55 290.29 /////////////////////////// 860 ////////////////////////////////// #--------------------------------------------------- # Benchmarking PingPong # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 8.93 0.00 1 1000 9.14 0.10 2 1000 9.17 0.21 4 1000 9.14 0.42 8 1000 9.41 0.81 16 1000 9.54 1.60 32 1000 9.85 3.10 64 1000 10.06 6.06 128 1000 12.77 9.56 2097152 20 11924.45 167.72 4194304 10 23752.65 168.40 #--------------------------------------------------- # Benchmarking PingPing # ( #processes = 2 ) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 11.94 0.00 1 1000 12.47 0.08 2 1000 12.83 0.15 4 1000 13.02 0.29 8 1000 12.41 0.61 16 1000 12.82 1.19 32 1000 12.07 2.53 64 1000 12.27 4.98 128 1000 14.50 8.42 2097152 20 21075.20 94.90 4194304 10 42104.29 95.00 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # ( #processes = 2 ) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 12.20 12.21 12.20 0.00 1 1000 12.83 12.84 12.83 0.15 2 1000 12.85 12.85 12.85 0.30 4 1000 12.61 12.62 12.62 0.60 8 1000 12.63 12.63 12.63 1.21 16 1000 12.55 12.55 12.55 2.43 32 1000 12.40 12.40 12.40 4.92 64 1000 12.70 12.70 12.70 9.61 128 1000 14.54 14.55 14.55 16.78 2097152 20 21075.50 21075.85 21075.67 189.79 4194304 10 42110.90 42112.10 42111.50 189.97 From joachim at lfbs.RWTH-Aachen.DE Thu Apr 18 03:36:29 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:16 2009 Subject: Intial Pallas performance with Myrinet on a 860 & E7500 References: Message-ID: <3CBEA1AD.46C2A773@lfbs.rwth-aachen.de> Thanks - is it possible to post related numbers for - intra-node communication (shared memory) - mixed inter- and intra-node communication - maybe more than 2 nodes (if available) Is there a reason for the "initial" in your subject - do you expect these numbers to change? Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From joachim at lfbs.RWTH-Aachen.DE Thu Apr 18 05:29:34 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? References: Message-ID: <3CBEBC2E.B2F0BEDE@lfbs.rwth-aachen.de> Markus Fischer wrote: > > On Thu, 18 Apr 2002, Joachim Worringen wrote: > > >Markus Fischer wrote: > >If you could give some numbers, it would help very much. And which kind > >of communication pattern is used in this application? Which MPI > >communication calls, which message sizes? > > > >I am the author of SCI-MPICH. I do not understand the meaning of this > >sentence of yours ("applies for the same statement"). What are you > >refering to? > > > to have "the best performing" Please indicate where you read this statement for SCI-MPICH. If at all, it says "best performing of the evaluated implementations" in a certain context. I do not (contrary to Scali) say that SCI-MPICH is the solution to all your problems. Don't quote me wrong, please. > >MPICH libraries, if you can not publish the source code. > > > >My bottom line is: I do not consider it good style to publically blaim a > >product for bad performance without having checked back with the people > >behind this product, and being a consultant for another product at the > >same time. > > the first message was a follow up to other messages as a real > world application performance issue. As I stated > earlier, I did not focus on bringing this application to the max > on every system, but to use an existing system and see how it goes. This is perfectly understood. But if you experience strange results which do not relate to what you would expect, wouldn't it be a good idea to ask (in this case the Scali guys): "Hey, what is the reason for this ugly numbers?" before publically announing "This technology is not able to scale beyond 8 nodes for a simple application!". You won't deny that there are numerous counter-examples. > I also can read between the lines of some postings, too. That doesn't help the reader of your postings. BTW, it's enough to post to the mailing list, CC is not required. regards, Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From shin at guss.org.uk Thu Apr 18 06:27:01 2002 From: shin at guss.org.uk (shin@guss.org.uk) Date: Wed Nov 25 01:02:16 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? Message-ID: <20020418142700.A680@gre.ac.uk> Hi, We have an old cluster setup that has 3 Alpha 4100 nodes (each node has 4x466 processors) connected with memory channel (first version), 1Gb Ram per node. The cluster is used to run internal code which is mostly CFD (fine grain synchronous) problems. The code is parallized and currently uses dec's mpi implementation. We now need to replicate this system at a remote site, and with an eye on keeping the cost down, so the idea is to go with a bunch of dual processor P4 (2GHz xeon?) systems with 2Gb ram each and myrinet interconnect. We expect to want to scale up to at least 8 of these dual nodes initially. I need to look into the performance of various aspects of the proposed system as we have no experience in this type of setup. Disclaimer: I dont' necessarily know what I'm talking about - I'm the hardware/admin guy; the parallel guys do all the coding! Sorry. I'd appreciate any answers anyone could offer on: 1. In terms of the floating point performance, looking at CFP2000 on www.spec.org and the Xeon should offer much better FP performace that the older alphas we have. I could only find results for a 4100 5/533 (which is the closest to our current setup) and these were much lower than the results from Dell Precision Workstation 530 with 2.0Ghz proc. So I assume this won't be an issue - we'll get fast processors. Is there a mboard that really sticks out here for offering best support to these processors - or should we even be looking at AMD MP systems now. I'm not sure I have the timescale to get in test systems and test anything out. 2. Quad systems seem to be way more expensive than duals and I could only find quad systems running at 900Mhz per proc instead of 2GHz in the duals - so I assume the quads are out on cost and proc. speed alone. 3. One of my concerns was the use of mpi across 8xdual Xeon nodes versus 3xquad alpha nodes. I'm assuming that mpi(ch) will look after all the necessary for us in terms of communication between processors within a node and communication across nodes - but is the speed of memory, throughput etc a limiting factor on this type of PC architecture? Will we hit latency issues within a node that we're not currently hitting? What sort of memory is recommended? DDR/SDRAM/other? However having ruled out the quads above - will they offer better memory performance than the duals - on a par with the quad alpha nodes? (I appreciate it's not a like for like comparison). 3. I think an entry level myrinet switch will enable me to connect 8 nodes - at a cost of approx 2400 USD for a switch and 1700 USD per myrinet card per node? And it will offer better performance than our MC - so I'm assuming that the choice of myrinet is ok. 4. In terms of cache - we believe that the large cache on the alpha's helps our performance quite significantly - as far as I can determine the cache on the xeons is still 256/512K? Presumably this won't make that much of a difference as we're scaling out across 8 nodes instead of 3? Many thanks in advance, Rgds Shin From rbw at ahpcrc.org Thu Apr 18 07:37:07 2002 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer Message-ID: <200204181437.g3IEb7w27647@mycroft.ahpcrc.org> Greg Lindahl wrote: >This bid is for an install in the future, and it involves a >combination of McKinley and Madison parts. I don't believe that Intel >has made Madison's specs available, nor has HP made the specs of the >chipset they'll be using available. True, but the dual-floating point units in the core are not likely to be added to ... so its a question of what the clock is going to be (is my estimate of 1.5 GHz unreasonable?), what the impact of the chipset/system-bus on memory bandwidth is going to, and cache sizes. > >It's likely that they aren't quoting peak; PNL prefers figures like >the actual speed of matrix-matrix multiple (DGEMM). Now the Itanium is >reasonably good at delivering a nice % of peak for DGEMM, but it's not >the same as peak. It's a lot more fair number to use than peak, and >gives you a good idea of what the Top500 Linpack score will be. True, the source is not official, I guess, but when no qualifying information is given the numbers presented are usually peak. If they aren't then my numbers would need to be reworked on a different envelop ;-) ... ... but there is another issue if we assume that the 8.3 TFLOPS is DGEMM performance at say 50% of peak (doing this on a large matrix (G98) would require very good bandwidth to memory) then these 1400 processors must have a system peak of around 17 TFLOPS. What does this mean for clock period ... ?? Assuming the same number (4) of FMA's per core on the Madison, then each processor is capable of 12 GFLOPS peak. This would mean that the processors would have to be running at 3 GHz (when are they taking delivery). This seems a bit high to me seeing as the Itanium is sitting at 800 MHz and does not have a 20 stage pipeline like the Pentium 4 ... but if the deliver is far enough into the future who knows. The idea that the Madison will have more floating point cores seems unlikely (how you going to feed them without real vector memory loads?). My $$ per MFLOPS estimates are ballpark numbers, but did include the cost of interconnect (Myrinet or better), and a large chunk of disk and memory. But I won't claim they are perfectly apples-to -apples. They were estimated based on estimated purchase price only ... they do not include total cost of ownership effects or factor in expected utilization over the term of ownership (an import consideration). When I saw the posting, I was surprise how few IA-64 processors (even with the extras) were to be had for ~$25,000,000. The Pentium 4 looks a better deal at this level of analysis. Cheers, rbw From pblaise at cea.fr Thu Apr 18 08:21:18 2002 From: pblaise at cea.fr (Philippe Blaise - GRENOBLE) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer References: Message-ID: <3CBEE46E.D0B8158D@cea.fr> Oh la la, belle machine ! any idea about QSNet2/elan4 quick specs ? and what file system(s) will be used to achieve a 200 MB/s bw in parallel, a super HP/Linux new one ? Philippe Blaise Eugen Leitl wrote: > http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 > > An anonymous reader wrote in to say "Pacific Northwest National Laboratory > (US DOE) signed a $24.5 million dollar contract with HP for a Linux > supercomputer. This will be one of the top ten fastest computers in the > world. Some cool features: 8.3 Trillion Floating Point Operations per > Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB > SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Today?s > announcement shows how HP has worked to help accelerate the shift from > proprietary platforms to open architectures, which provide increased > scalability, speed and functionality at a lower cost,' said Rich DeMillo, > vice president and chief technology officer at HP. Read Details of the > announcement here or here." > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mfischer at mufasa.informatik.uni-mannheim.de Thu Apr 18 04:54:51 2002 From: mfischer at mufasa.informatik.uni-mannheim.de (Markus Fischer) Date: Wed Nov 25 01:02:16 2009 Subject: very high bandwidth, low latency manner? In-Reply-To: <3CBE8A5B.AADEC997@lfbs.rwth-aachen.de> Message-ID: On Thu, 18 Apr 2002, Joachim Worringen wrote: >Markus Fischer wrote: >If you could give some numbers, it would help very much. And which kind >of communication pattern is used in this application? Which MPI >communication calls, which message sizes? > >I am the author of SCI-MPICH. I do not understand the meaning of this >sentence of yours ("applies for the same statement"). What are you >refering to? > to have "the best performing" >MPICH libraries, if you can not publish the source code. > >My bottom line is: I do not consider it good style to publically blaim a >product for bad performance without having checked back with the people >behind this product, and being a consultant for another product at the >same time. the first message was a follow up to other messages as a real world application performance issue. As I stated earlier, I did not focus on bringing this application to the max on every system, but to use an existing system and see how it goes. I know this general process of 'yes this is the current status but we have a bunch of fixes which will help you' very well. I also can read between the lines of some postings, too. Markus > Joachim > >-- >| _ RWTH| Joachim Worringen >|_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen > | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim > |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From raysonlogin at yahoo.com Thu Apr 18 10:54:06 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:16 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? In-Reply-To: <20020418142700.A680@gre.ac.uk> Message-ID: <20020418175406.91343.qmail@web11408.mail.yahoo.com> --- shin@guss.org.uk wrote: > 1. In terms of the floating point performance, looking at CFP2000 on > www.spec.org and the Xeon should offer much better FP performace > that the older alphas we have. I could only find results for a 4100 > 5/533 (which is the closest to our current setup) and these were > much lower than the results from Dell Precision Workstation 530 with > 2.0Ghz proc. Since you are using the processors in SMP configurations, you should be looking at SPECfp_rate2000. SPECfp2000 tells you how fast you code runs on a single CPU. But with SPECfp_rate2000, you can see how well a processor scales in SMP configuration. One thing I found last yr was that when the processors share the memory bandwidth on an SMP machine, the performance is really bad. My configuration was daul-P3s with Myirnet. I measured the performance of the cluster running MPI programs using 8 machines with 1 process on each machine, and 2 processes on 4 machines. To my suprise, 1 process on 8 machines had a better performance. My prof. then gave us his results on an IBM server machine (not PC server), his results were the opposite. His conclusion was that the memory bandwidth of PCs does not scale with the number of processors. (BTW, it was an assignment -- I wasn't the only one who found similar results -- there were around 20 people in that class) > 2. Quad systems seem to be way more expensive than duals and I could > only find quad systems running at 900Mhz per proc instead of 2GHz in > the duals - so I assume the quads are out on cost and proc. speed > alone. I believe the performance of qurd systems will not give you double of the duals, even if you use the 2Ghz CPUs. The PC (or should I say Intel??) architecture has the shared memory bus, which does not scale with the #CPUs. BTW, is AMD MP better?? I've heard that each Althon MP CPU talks to its own system cpuset. > > 3. One of my concerns was the use of mpi across 8xdual Xeon nodes > versus 3xquad alpha nodes. I'm assuming that mpi(ch) will look after > all the necessary for us in terms of communication between > processors within a node and communication across nodes - but is the > speed of memory, throughput etc a limiting factor on this type of PC > architecture? Will we hit latency issues within a node that we're > not currently hitting? See above. You can actually use OpenMP within the nodes and MPI between the nodes. However, MPICH and LAM MPI are not thread safe... Rayson __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From hahn at physics.mcmaster.ca Thu Apr 18 11:07:55 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:16 2009 Subject: Dual Xeon Clusters In-Reply-To: <3CBCCD58.1080104@yahoo.com> Message-ID: > I am building a dual Xeon 4-node cluster; My understanding of > hyperthreading leaves me to conclusion that it depends largely on the > code to benefit from it. I think that's an ambiguous way to put it. yes, the benefit depends very much on what your code does. but no, you code doesn't have to do anything special: HT simply turns one cpu into a pair of "virtual" cpus which compete for the same fixed set of resources. so you don't necessarily have to use threads to see HT benefit, since the virtualization is not visible at the programmer level. but you will certainly not see any HT benefit if your program somehow manages to keep every functional unit (including cache and ram) busy all the time. > Otherwise in many cases the performance can > become worse than before using a hyperthreaded Xeon processors. My well, first, you don't have to turn on HT at all, so a prestonia can be expected to behave like a northwood. but I would expect most system resources to deteriorate linearly, except for cache, which is highly nonlinear when working set size is near cache size. so if you run two threads/procs on an HT chip, and a small working set (both seeing high cache hit rates) they should get along just fine, with half the speed each. or very large working sets (where cache is basically irrelevant). hmm, maybe that's too verbose to be clear. in short: 1. if you normally have some idle resources, HT may let you achieve higher efficiency through interleaving. 2. some physical resources will deteriorate fairly linearly, so two procs see half the throughput. 3. several resources can behave nonlinearly, so two procs could interfere badly when interleaved. 4. most of these issues are the same for traditional timesliced multiprocessing, except that HT interleaves much finer. > If not what other way can i find the > performance of Xeon processors in a clustered env. I don't know why clustering would change anything. From edwardsa at plk.af.mil Thu Apr 18 10:21:20 2002 From: edwardsa at plk.af.mil (Arthur H. Edwards) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org>; from rbw@ahpcrc.org on Wed, Apr 17, 2002 at 02:31:31PM -0500 References: <200204171931.g3HJVVs22371@mycroft.ahpcrc.org> Message-ID: <20020418112120.B12634@plk.af.mil> On Wed, Apr 17, 2002 at 02:31:31PM -0500, Richard Walsh wrote: > > Eugene Leitel wrote: > > >http://slashdot.org/articles/02/04/17/1324227.shtml?tid=162 > > >An anonymous reader wrote in to say "Pacific Northwest National Laboratory > >(US DOE) signed a $24.5 million dollar contract with HP for a Linux > >supercomputer. This will be one of the top ten fastest computers in the > >world. Some cool features: 8.3 Trillion Floating Point Operations per > >Second, 1.8 Terabytes of RAM, 170 Terabytes of disk, (including a 53 TB > >SAN), and 1400 Intel McKinley and Madison Processors. Nice quote: 'Todays > >announcement shows how HP has worked to help accelerate the shift from > >proprietary platforms to open architectures, which provide increased > >scalability, speed and functionality at a lower cost,' said Rich DeMillo, > >vice president and chief technology officer at HP. Read Details of the > >announcement here or here." > > Mmmm ... working through some numbers ... > > 8.3 TFLOPS (if they are quoting peak) with 1400 processors > would mean they are getting chips with 1.5 GHz clocks (peak > performance would be 6 GFLOPS per chip [4 ops per clock]). > > Stream numbers for this 1.5 GHz chip (estimated) would be around > 250 MFLOPS for the triad. Using the triad as a baseline for performance > for this and several others systems and relating it back to > some estimated cost for several other systems (government purchase > price only, no recurring costs) this is $70 per MFLOPS sustained > for the Mckinley (again using triad) ... or more than the CRAY SV2 > ($65), EV6($55), EV7 ($50), Pentium 4 ($30). > > Interesting number ... the high-end IA-64 stuff does not look > cheap when stream triad defines sustained performance. Of course, > blocking for cache will push the sustained number up (maybe alot > and on all the systems), but you would think that QCHEM stuff > they run at PNNL (G98) will be mostly memory bound and therefore I think they will be using NWChem- an intrinsically parallel code. It has some really bad numbers for serial but apparently scales fairly well. > the stream triad sustained performance is not too far off. > > I am not sure this looks like a very good deal. > > rbw > > > #--------------------------------------------------- > # Richard Walsh > # Project Manager, Cluster Computing, Computational > # Chemistry and Finance > # netASPx, Inc. > # 1200 Washington Ave. So. > # Minneapolis, MN 55415 > # VOX: 612-337-3467 > # FAX: 612-337-3400 > # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com > # > #--------------------------------------------------- > # "What you can do, or dream you can, begin it; > # Boldness has genius, power, and magic in it." > # -Goethe > #--------------------------------------------------- > # "Without mystery, there can be no authority." > # -Charles DeGaulle > #--------------------------------------------------- > # "Why waste time learning when ignornace is > # instantaneous?" -Thomas Hobbes > #--------------------------------------------------- > # "In the chaos of a river thrashing, all that water > # still has to stand in line." -Dave Dobbyn > #--------------------------------------------------- > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Arthur H. Edwards AFRL/VSSE Bldg. 914 3550 Aberdeen Ave SE KAFB, NM 87117-5776 From shewa at inel.gov Thu Apr 18 12:08:30 2002 From: shewa at inel.gov (Andrew Shewmaker) Date: Wed Nov 25 01:02:16 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? References: <20020418175406.91343.qmail@web11408.mail.yahoo.com> Message-ID: <3CBF19AE.1050105@inel.gov> Rayson Ho wrote: >--- shin@guss.org.uk wrote: > > >One thing I found last yr was that when the processors share the memory >bandwidth on an SMP machine, the performance is really bad. My >configuration was daul-P3s with Myirnet. I measured the performance of >the cluster running MPI programs using 8 machines with 1 process on >each machine, and 2 processes on 4 machines. To my suprise, 1 process >on 8 machines had a better performance. > For the 1 process on 8 machines case, were those 8 machines also duals? If they were, did you notice if one cpu was taking care of networking overhead while the other was doing work? If these were duals did you also run the same tests on similar speed uniprocessor system? Just wondering. > > >My prof. then gave us his results on an IBM server machine (not PC >server), his results were the opposite. His conclusion was that the >memory bandwidth of PCs does not scale with the number of processors. > >(BTW, it was an assignment -- I wasn't the only one who found similar >results -- there were around 20 people in that class) > >>2. Quad systems seem to be way more expensive than duals and I could >>only find quad systems running at 900Mhz per proc instead of 2GHz in >>the duals - so I assume the quads are out on cost and proc. speed >>alone. >> > >I believe the performance of qurd systems will not give you double of >the duals, even if you use the 2Ghz CPUs. The PC (or should I say >Intel??) architecture has the shared memory bus, which does not scale >with the #CPUs. > >BTW, is AMD MP better?? I've heard that each Althon MP CPU talks to its >own system cpuset. > I have mostly used dual AMDs in a high throughput rather than high performance setting. Our CFD codes would take 99% of each processor and 500-800 MB each of RAM and would not interfere with each other. Each would complete in the same amount of time as we saw in the 1:1 case. We also are using a monte carlo based code over PVM and there is almost no difference between 1*8 and 2*4. I don't remember how much memory each process uses though (less than the above). We have been pleased so far. Andrew From lindahl at keyresearch.com Thu Apr 18 11:12:52 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:16 2009 Subject: /. US DOE gets a $24.5 Million Linux Supercomputer In-Reply-To: <200204181437.g3IEb7w27647@mycroft.ahpcrc.org>; from rbw@ahpcrc.org on Thu, Apr 18, 2002 at 09:37:07AM -0500 References: <200204181437.g3IEb7w27647@mycroft.ahpcrc.org> Message-ID: <20020418111252.A2016@wumpus.skymv.com> On Thu, Apr 18, 2002 at 09:37:07AM -0500, Richard Walsh wrote: > True, but the dual-floating point units in the core are not > likely to be added to ... I don't think that's a good guess. Not only are conventional cpus getting wider over time, but the whole point of EPIC and VLIW in general is that they potentially go _really_ wide. And most scientific codes can use lots of functional units. > When I saw the posting, I was surprise how few IA-64 processors > (even with the extras) were to be had for ~$25,000,000. The Pentium > 4 looks a better deal at this level of analysis. PNL's codes require 64-bit addressing. I don't think anyone was willing to bid AMD's Hammer, although it was a possibility. The problem with pricing this bid are that it's all forward-priced. HP is committing to delivering Madison processors when McKinley isn't even formally released. That means risk, and that means you need to subtract off $$ to cover that risk. That skews all of your analysis. BTW, the interconnect is next generation Quadrics. I've never seen any specs for it, nor pricing. greg From alex at compusys.co.uk Thu Apr 18 13:57:49 2002 From: alex at compusys.co.uk (alex@compusys.co.uk) Date: Wed Nov 25 01:02:16 2009 Subject: Intial Pallas performance with Myrinet on a 860 & E7500 In-Reply-To: <200204181601.g3IG1jG09278@blueraja.scyld.com> Message-ID: I can show PALLAS -multi numbers up to 94 CPUs orso. It would be interesting to see whether Dolphin can keep up with that, Myrinet so far looks pretty solid. I will make the results public as soon as we have confirmation that the results are within the boundaries of what is expected. 'Initial' means that with pushing and tweaking we probably win that last 10% performance increase. Alex From france at handhelds.org Thu Apr 18 10:11:28 2002 From: france at handhelds.org (George France) Date: Wed Nov 25 01:02:17 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? In-Reply-To: <20020418142700.A680@gre.ac.uk> References: <20020418142700.A680@gre.ac.uk> Message-ID: <02041813112800.01959@shadowfax.middleearth> Have you considered a quad processor EV68 (EV7 when they become available) Alpha system? --George On Thursday 18 April 2002 09:27, shin@guss.org.uk wrote: > Hi, > > We have an old cluster setup that has 3 Alpha 4100 nodes (each node > has 4x466 processors) connected with memory channel (first version), > 1Gb Ram per node. The cluster is used to run internal code which is > mostly CFD (fine grain synchronous) problems. The code is parallized > and currently uses dec's mpi implementation. > > We now need to replicate this system at a remote site, and with an > eye on keeping the cost down, so the idea is to go with a bunch of > dual processor P4 (2GHz xeon?) systems with 2Gb ram each and myrinet > interconnect. > > We expect to want to scale up to at least 8 of these dual nodes > initially. > > I need to look into the performance of various aspects of the > proposed system as we have no experience in this type of setup. > > Disclaimer: I dont' necessarily know what I'm talking about - I'm > the hardware/admin guy; the parallel guys do all the coding! Sorry. > > I'd appreciate any answers anyone could offer on: > > 1. In terms of the floating point performance, looking at CFP2000 on > www.spec.org and the Xeon should offer much better FP performace > that the older alphas we have. I could only find results for a 4100 > 5/533 (which is the closest to our current setup) and these were > much lower than the results from Dell Precision Workstation 530 with > 2.0Ghz proc. > > So I assume this won't be an issue - we'll get fast processors. Is > there a mboard that really sticks out here for offering best support > to these processors - or should we even be looking at AMD MP systems > now. I'm not sure I have the timescale to get in test systems and > test anything out. > > 2. Quad systems seem to be way more expensive than duals and I could > only find quad systems running at 900Mhz per proc instead of 2GHz in > the duals - so I assume the quads are out on cost and proc. speed > alone. > > 3. One of my concerns was the use of mpi across 8xdual Xeon nodes > versus 3xquad alpha nodes. I'm assuming that mpi(ch) will look after > all the necessary for us in terms of communication between > processors within a node and communication across nodes - but is the > speed of memory, throughput etc a limiting factor on this type of PC > architecture? Will we hit latency issues within a node that we're > not currently hitting? > > What sort of memory is recommended? DDR/SDRAM/other? > > However having ruled out the quads above - will they offer better > memory performance than the duals - on a par with the quad alpha > nodes? (I appreciate it's not a like for like comparison). > > 3. I think an entry level myrinet switch will enable me to connect 8 > nodes - at a cost of approx 2400 USD for a switch and 1700 USD per > myrinet card per node? And it will offer better performance than our > MC - so I'm assuming that the choice of myrinet is ok. > > 4. In terms of cache - we believe that the large cache on the > alpha's helps our performance quite significantly - as far as I can > determine the cache on the xeons is still 256/512K? Presumably this > won't make that much of a difference as we're scaling out across 8 > nodes instead of 3? > > Many thanks in advance, > Rgds > Shin > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From shin at guss.org.uk Fri Apr 19 02:47:17 2002 From: shin at guss.org.uk (shin@guss.org.uk) Date: Wed Nov 25 01:02:17 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? In-Reply-To: <02041813112800.01959@shadowfax.middleearth>; from france@handhelds.org on Thu, Apr 18, 2002 at 01:11:28PM -0400 References: <20020418142700.A680@gre.ac.uk> <02041813112800.01959@shadowfax.middleearth> Message-ID: <20020419104717.B680@gre.ac.uk> Hi, On Thu, Apr 18, 2002 at 01:11:28PM -0400, George France wrote: > Have you considered a quad processor EV68 (EV7 when they become available) > Alpha system? I ruled this out as being more costly than going down the PC route - I might be mistaken on that - but the last time I looked (about a yr ago for another project) it was quite costly. I'm not even sure what the alpha future is now - and we have had good performance from our previous alpha setups - but cost is a real issue in this particular case and x86 of some variety looked the cheapest way forward. Shin From john.hearns at cern.ch Fri Apr 19 03:07:15 2002 From: john.hearns at cern.ch (John Hearns) Date: Wed Nov 25 01:02:17 2009 Subject: what architecture was MPI and PVM 1st designed for? In-Reply-To: References: Message-ID: <1019210836.14791.8.camel@ues4> On Tue, 2002-04-16 at 16:34, Jayne Heger wrote: > > Hi, > > Coulld anyone tell me what computer architecture MPI and PVM were first > designed for./written on. > Thanks, > Jayne, there is a nice discussion on PVM and MPI in chapter 11 of Dowd and Severances Oreilly bok on High Performance Computing. http://www.oreilly.com/catalog/hpc2/ By the way, it was nice to hear that your cluster is flying. (Hmmm.... I suppose Beowulves howl really). From Daniel.Kidger at quadrics.com Fri Apr 19 06:06:00 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:02:17 2009 Subject: very high bandwidth, low latency manner? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D51@stegosaurus.bristol.quadrics.com> Craig Tierney wrote: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. >The price includes the cards, cables, switches, >etc. This doesn't include any sort of discount that you >might get. Myrinet is about $2000 for an equivelent >network at list price. Dolphin/SCI falls around $2245 list >per node (if the system is > 144 nodes and you have to get >the 3d card). I guess I should jump in here to give the Quadrics perspective... I have spent 2 weeks in the USA doing some benchmarking on clusters of McKinleys under Linux and I get home find lots of e-mails talking about the Quadrics stuff, but no e-mails were from people either at Quadrics or from a customer site. I have only been with Quadrics for six months or so, and (fortunately) it is not me but the marketing people that decide the pricing scheme. Quadrics have sold most systems to date as part of Compaq's Alphaserver SC range. However we also do sell via other vendors, particularly as linux clusters. Our model though is not to sell direct to end-users but via systems integrators. ?3500 per node is maybe about right - pricing would always include all cards, cables, switches and software. The cost is admittedly high, after all as well as having the fastest line-speed, the Quadrics interconnect sends all data as virtual addresses (the NIC has its own MMU and TLB). That way any process can read and write the memory of any other node without any CPU overhead. The cost also tries to cover the high R+D; with volume sales the price may end up significantly lower. Any discussion on costing would be better taken up with our marketing types - but I am happy to share my knowledge on performance issues. :-) Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From David_Walters at sra.com Fri Apr 19 07:48:16 2002 From: David_Walters at sra.com (Walters, David) Date: Wed Nov 25 01:02:17 2009 Subject: very high bandwidth, low latency manner? Message-ID: >I talked to a guy at SC2002 from Quadrics and he said >that list pricing on a Quadrics network was about $3500 >per node when you are in the 100s of nodes and up. To heck with the Quadrics, I want a ride on that Time Machine!! SC2002 is (was?) in November 2002, IIRC... Man, what I could accomplish by seeing SC2002 presentations 6 months in advance... Dave From france at handhelds.org Fri Apr 19 05:10:03 2002 From: france at handhelds.org (George France) Date: Wed Nov 25 01:02:17 2009 Subject: Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster? In-Reply-To: <20020419104717.B680@gre.ac.uk> References: <20020418142700.A680@gre.ac.uk> <02041813112800.01959@shadowfax.middleearth> <20020419104717.B680@gre.ac.uk> Message-ID: <02041908100300.05034@shadowfax.middleearth> Hello Shin, On Friday 19 April 2002 05:47, shin@guss.org.uk wrote: > Hi, > > On Thu, Apr 18, 2002 at 01:11:28PM -0400, George France wrote: > > Have you considered a quad processor EV68 (EV7 when they become > > available) Alpha system? > > I ruled this out as being more costly than going down the PC route - > I might be mistaken on that - but the last time I looked (about a yr > ago for another project) it was quite costly. If you need a 64 bit architecture, then I prefer the Alpha Architecture to other 64 bit systems. The EV67, EV68 and EV7 systems price vs performence appears reasonable to me, assuming that you really need a 64 bit Architecture. If you do not need a 64 bit system aimed at High Performance Technical Computing, then a 32 bit system will probably provide a less expensive solution. > > I'm not even sure what the alpha future is now - and we have had > good performance from our previous alpha setups - but cost is a real > issue in this particular case and x86 of some variety looked the > cheapest way forward. The Alpha EV7 CPU will probably be last the alpha chip produced as we know it. I believe the EV7 systems should be out late summer or in the fall. I suspect that these systems will be available until at lease 2005. When the the Hammer chips / systems are released or the next release of Itanium, we will have to wait and see how they compare to Alpha. Best Regards, --George From Todd_Henderson at raytheon.com Fri Apr 19 12:20:51 2002 From: Todd_Henderson at raytheon.com (Todd Henderson) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? References: Message-ID: <3CC06E13.E1D4FBFA@raytheon.com> We are currently specing out a new cluster and since we have a corp. agreement with one of the big pc vendors, I thought I'd contact them. They claim that the P4 Itanium is 64 bit, only in the 733 and 800 mhz speeds. Is Linux on Intel 64 bit for these processors? Thanks, Todd From rfoster at lnxi.com Sat Apr 20 07:22:15 2002 From: rfoster at lnxi.com (William Harman) Date: Wed Nov 25 01:02:17 2009 Subject: HIGH MEM suport for up to 64GB Message-ID: <20020420.Oah.69718200@bart> Sounds right. __PAGE_OFFSET is set to 0xC0000000 by default, so you get 3GB of address space for user processes. And about 2GB max for the data segment. You can change the kernel to give you more than 3GB of address space and greater than 2GB of data segment for heap allocations. It's three lines to change. Two lines in header files and one line in the one of the assembly init files. Leandro Tavares Carneiro (leandro@ep.petrobras.com.br) wrote*: > >Hi everyone, > > I am writing to ask to you all if anyone have tesed or used an machine >with more than 4GB of RAM or paging in virtual memory on intel machines. > He have an linux beowulf cluster and one of ours developers have asked >us for how much memory an process can allocate to use. In the tests we >have made, we cannot allocate much more than 3GB, using an dual PIII >with 1GB of ram and 12Gb of swap area for testing. > We can use 2 process alocating more or less 3Gb, but one process alone >canot pass this test. > We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High >Mem suport. > I have tested the same test aplication on an Itanium machine, with 1GB >of ram and 16Gb of swap area, and they passed. The aplication can >alocate more than 5GB of memory, using swap. In this machine, we are >using turbolinux 7, with kernel version 2.4.4-010508-18smp. > >Thanks in advance for the help, > >Best regards, > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From bari at onelabs.com Sun Apr 21 19:24:40 2002 From: bari at onelabs.com (Bari Ari) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? References: <3CC06E13.E1D4FBFA@raytheon.com> Message-ID: <3CC37468.9090902@onelabs.com> P4 and Itanium are two different Intel processors. Itanium is 64 bit and is currently available in 733 and 800 MHz speed grades. P4 is only 32 bit. http://developer.intel.com/design/itanium/downloads/249634.htm vs http://developer.intel.com/design/Pentium4/datashts/ For info on Linux on IA-64, see: http://www.linuxia64.org/ Bari Todd Henderson wrote: >We are currently specing out a new cluster and since we have a corp. agreement with one of the big pc vendors, I >thought I'd contact them. They claim that the P4 Itanium is 64 bit, only in the 733 and 800 mhz speeds. Is >Linux on Intel 64 bit for these processors? > >Thanks, >Todd > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From troy at osc.edu Mon Apr 22 06:32:41 2002 From: troy at osc.edu (Troy Baer) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? In-Reply-To: <3CC06E13.E1D4FBFA@raytheon.com> Message-ID: On Fri, 19 Apr 2002, Todd Henderson wrote: > We are currently specing out a new cluster and since we have a corp. agreement with one of the big pc vendors, I > thought I'd contact them. They claim that the P4 Itanium is 64 bit, only in the 733 and 800 mhz speeds. Is > Linux on Intel 64 bit for these processors? The Itanium isn't a P4 derivative; it's a totally different architecture that is in fact 64-bit. Linux on them is 64-bit clean AFAIK. --Troy -- Troy Baer email: troy@osc.edu Science & Technology Support phone: 614-292-9701 Ohio Supercomputer Center web: http://oscinfo.osc.edu From joe.griffin at mscsoftware.com Mon Apr 22 06:52:02 2002 From: joe.griffin at mscsoftware.com (Joe Griffin) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? References: Message-ID: <3CC41582.7060706@mscsoftware.com> The Itanium is NOT 64 bit like a CRAY is 64 bit. It is an LP64 (longs and pointers). In FORTRAN: INTEGERs and REALs are still 32 bits. In C, int are still 32 bits. You are allowed larger addressing because longs and pointers are 64 bits. You may get 64 bit numberical accuracy in FORTRAN by use of "DOUBLE PRECISSION" but this capability is the same as on IA32 systems. The item you gain is that you have a bigger address space. Regards, Joe Troy Baer wrote: > On Fri, 19 Apr 2002, Todd Henderson wrote: > >>We are currently specing out a new cluster and since we have a corp. agreement with one of the big pc vendors, I >>thought I'd contact them. They claim that the P4 Itanium is 64 bit, only in the 733 and 800 mhz speeds. Is >>Linux on Intel 64 bit for these processors? > > > The Itanium isn't a P4 derivative; it's a totally different architecture > that is in fact 64-bit. Linux on them is 64-bit clean AFAIK. > > --Troy From manel at labtie.mmt.upc.es Mon Apr 22 07:37:29 2002 From: manel at labtie.mmt.upc.es (Manel Soria) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature References: <200204201603.g3KG3iF08234@blueraja.scyld.com> Message-ID: <3CC42029.5B484258@labtie.mmt.upc.es> I'm wondering what is the maximum reasonable ambient temperature to have in a cluster room. In our room with 72 nodes we have about 29-30 oC (84-86 oF). Is this too high ? Can this be the cause of hardware failures ? Thanks. -- =============================================== Dr. Manel Soria ETSEIT - Centre Tecnologic de Transferencia de Calor C/ Colom 11 08222 Terrassa (Barcelona) SPAIN Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 E-Mail: manel@labtie.mmt.upc.es -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020422/c06c190b/attachment.html From raysonlogin at yahoo.com Mon Apr 22 09:44:01 2002 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? In-Reply-To: <3CC41582.7060706@mscsoftware.com> Message-ID: <20020422164401.5570.qmail@web11407.mail.yahoo.com> --- Joe Griffin wrote: > The Itanium is NOT 64 bit like a CRAY is 64 bit. > It is an LP64 (longs and pointers). > > In FORTRAN: INTEGERs and REALs are still 32 bits. > > In C, int are still 32 bits. Those are software issues, you can always define ints as 64-bit on IA64 if you know where to hack the gcc source. Rayson > You are allowed larger addressing because > longs and pointers are 64 bits. > > You may get 64 bit numberical accuracy > in FORTRAN by use of "DOUBLE PRECISSION" but > this capability is the same as on IA32 systems. > The item you gain is that you have a > bigger address space. > > Regards, > Joe > > > Troy Baer wrote: > > On Fri, 19 Apr 2002, Todd Henderson wrote: > > > >>We are currently specing out a new cluster and since we have a > corp. agreement with one of the big pc vendors, I > >>thought I'd contact them. They claim that the P4 Itanium is 64 > bit, only in the 733 and 800 mhz speeds. Is > >>Linux on Intel 64 bit for these processors? > > > > > > The Itanium isn't a P4 derivative; it's a totally different > architecture > > that is in fact 64-bit. Linux on them is 64-bit clean AFAIK. > > > > --Troy > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ From aby_sinha at yahoo.com Sun Apr 21 18:08:10 2002 From: aby_sinha at yahoo.com (Abhishek sinha) Date: Wed Nov 25 01:02:17 2009 Subject: HIGH MEM suport for up to 64GB References: <20020420.Oah.69718200@bart> Message-ID: <3CC3627A.1010301@yahoo.com> hi all This is what i did to allocate more than 3 Gb out of 4 Gb of RAM to the userspace . In the file /usr/src/linux2.4.*/include/asm-i386/page_offset.h Under the line #ifdef CONFIG_1GB Changed #define PAGE_OFFSET_RAW 0xC00000000 to PAGE_OFFSET_RAW 0xE00000000 then in processor.h changed #define TASK_UNMAPPED_BASE (TASK_SIZE/3) to (TASK_SIZE/16) and then recomplied the kernel with high mem support. this was on 2.4.7-10 , in 2.4.18 there are minor changes I am not a big kernel hacker and still learning so please use it with backups. comments invited abhishek William Harman wrote: >Sounds right. __PAGE_OFFSET is set to 0xC0000000 by default, so >you get 3GB of address space for user processes. And about 2GB >max for the data segment. > >You can change the kernel to give you more than 3GB of address >space and greater than 2GB of data segment for heap allocations. >It's three lines to change. Two lines in header files and one >line in the one of the assembly init files. > > > >Leandro Tavares Carneiro (leandro@ep.petrobras.com.br) wrote*: > >>Hi everyone, >> >> I am writing to ask to you all if anyone have tesed or used an machine >>with more than 4GB of RAM or paging in virtual memory on intel machines. >> He have an linux beowulf cluster and one of ours developers have asked >>us for how much memory an process can allocate to use. In the tests we >>have made, we cannot allocate much more than 3GB, using an dual PIII >>with 1GB of ram and 12Gb of swap area for testing. >> We can use 2 process alocating more or less 3Gb, but one process alone >>canot pass this test. >> We have using redhat linux 7.2, kernel 2.4.9-21, recompiled with High >>Mem suport. >> I have tested the same test aplication on an Itanium machine, with 1GB >>of ram and 16Gb of swap area, and they passed. The aplication can >>alocate more than 5GB of memory, using swap. In this machine, we are >>using turbolinux 7, with kernel version 2.4.4-010508-18smp. >> >>Thanks in advance for the help, >> >>Best regards, >> >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >> >http://www.beowulf.org/mailman/listinfo/beowulf > >> >> Part 1.1 >> >> Content-Type: >> >> text/plain >> Content-Encoding: >> >> 8bit >> >> From john at computation.com Mon Apr 22 10:12:48 2002 From: john at computation.com (John Nelson) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? In-Reply-To: <3CC41582.7060706@mscsoftware.com> Message-ID: Have the number of bits per machine instruction also increased to 64 bits? This would imply that all of your compiled executables have now doubled in size (although I don't know why you would need 2**32 additional instructions). Are all pointers consistantly using 64 bits? If so, there will be a proportional growth in the size of your executable. The larger architecture also impacts your data formats. If your data sets are in binary format, and depending on the language you are using, there may be incompatibilities as well as new demands on storage. Stating the obvious I guess, but there are considerations when going to larger architectures. -- John On Mon, 22 Apr 2002, Joe Griffin wrote: > The Itanium is NOT 64 bit like a CRAY is 64 bit. > It is an LP64 (longs and pointers). > > In FORTRAN: INTEGERs and REALs are still 32 bits. > > In C, int are still 32 bits. > > You are allowed larger addressing because > longs and pointers are 64 bits. -- _____________________________________________________ John T. Nelson President | Computation.com Inc mail: | john@computation.com company: | http://www.computation.com/ journal of computation: | http://www.computation.org/ _____________________________________________________ "Providing quality IT consulting services since 1992" From rgb at phy.duke.edu Mon Apr 22 11:58:44 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature In-Reply-To: <3CC42029.5B484258@labtie.mmt.upc.es> Message-ID: On Mon, 22 Apr 2002, Manel Soria wrote: > I'm wondering what is the maximum reasonable ambient > temperature to have in a cluster room. In our room > with 72 nodes we have about 29-30 oC (84-86 oF). > Is this too high ? Can this be the cause of hardware > failures ? Yes, it can. This is pretty high for a server room. The best way to think of temperature and heat disposal in a cluster is to think in layers. Heat generally flows from hot to cold, at a rate proportional to the difference in temperture in degrees Kelvin. More specifically, the rate of flow is influenced by things like conductivities, convective flow, and radiative trapping. The CPU core generates heat at some roughly constant rate under load. Current/modern CPU's "can" operate at very high temperatures, order of 100C, although they will almost certainly operate more reliably and longer at considerably cooler core temperatures. This heat generally flows from the CPU into the attached heat sink/fan at a rate determined by the temperature DIFFERENCE between the heatsink and the CPU. If the conductivity of the heatsink is high, and the conductivity of the interface is also high, a small temperature difference will cause a lot of heat to flow from the hotter to the cooler. The CPU is thus cooled until it isn't too much warmer than the operating temperature of the heatsink. The heatsink then has to be cooled so that IT is cooler than the desired operating temperature of the CPU. The hotter it is, the faster it loses heat to the ambient air. The cooler the ambient air, the faster it loses heat. Here things get a bit arcane. Air is not all that great a conductor of heat. It does have some heat capacity and will warm up when in contact with a warmer surface. Heat sinks therefore generally have lots of surface area and fans in the case and heatsink itself move (hopefully cooler) air rapidly across this surface. All things being equal, though, when the CPU produces heat at a constant rate the heatsink/fan/air arrangement can remove heat at that rate only when the air and the heatsink have a given, approximately constant, temperature difference. This warmed air has to then be removed from the case and replaced with cooler ambient air from the server room, and the warmed air eventually has to be circulated over actively cooled (refrigerated) coils to remove it from the room altogether and eventually dump it, plus all the energy required to do the cooling, into the outside air. The cooler the room air, the cooler all the components inside your system, especially the CPU. Cooling down the room air temperature 10C should reduce the operating temperature of your CPU by very close to 10C. Most systems are probably engineered with the assumption that they will operate in air in the 68-75F temperature range (20-23C), and can probably tolerate ambient air up to 80F or 26C without much risk. If the ambient temperatures get much higher than this, though, your risk of catastrophic heat-induced failure starts creeping up. At around 100F/38C they become very high indeed -- close to "certain" if you try operating a system 24 hours under a high load at or above this ambient air temperature. If a system is ever operated for an extended period over 30C (in the 90s F) it may not fail, but even if you cool it back down you may have marginally damaged components that will fail later. An additional risk for even fairly short periods of high temperature operation is that hard disks are made of metal that expands when heated. If a disk expands too much, the write head can actually become misaligned with the tracks and your disk can be instantly and irrecoverably trashed. This can also happen if the disk is COOLED too much -- it is a bad idea to crank up a laptop after it has sat all night in a sub-zero car without letting it come to a "normal" operating temperature first... If I were you I'd engineer enough cooling to drop the ambient air in your cluster space by at least 5C, if not 10C, and make sure that there is enough air circulation and mixing that no systems are in local "hot spots" (where air exhausted from one system is sucked into another system, for example). A really happy server room is one you need to wear a jacket or sweater in to be comfortable, not one that makes you want to take clothes off...;-) rgb > > Thanks. > > -- > =============================================== > Dr. Manel Soria > ETSEIT - Centre Tecnologic de Transferencia de Calor > C/ Colom 11 08222 Terrassa (Barcelona) SPAIN > Tf: +34 93 739 8287 ; Fax: +34 93 739 8101 > E-Mail: manel@labtie.mmt.upc.es > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From alvin at Maggie.Linux-Consulting.com Mon Apr 22 13:33:54 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature In-Reply-To: <3CC42029.5B484258@labtie.mmt.upc.es> Message-ID: hi ya manel if your systems have "health monitoring"... check your bios to see what it thinks is your current system temp and cpu temp... you're cpu reliability/performance goes down by 1/2 for every 10 degree C - ie if it would have lasted 5 more years... ( starting temp is "normal/nominal temp" as provided ( by intel or amd that they provide a cpu warranty for 5 years - if temp went up another 10deg C... its now 2.5 yrs - it temp went up 20 degree.... its now 1.25yrs... - or some silly guidelines like that.. to test if the ambient temperature is too high... - add a regular fan and blow air on it... - if the cpu temp drops significantly... than its too hot in the room max cpu temp... http://users.erols.com/chare/elec.htm http://www.heatsink-guide.com/maxtemp.htm -- add lm_sensors to your "distro" to read the cpu temp and if it gets too high... shutdown the server or at least dont do heavy computations on it - add more fans ... and better air flow... c ya alvin http://www.Linux-1U.net/CPU .. more specs ... On Mon, 22 Apr 2002, Manel Soria wrote: > I'm wondering what is the maximum reasonable ambient > temperature to have in a cluster room. In our room > with 72 nodes we have about 29-30 oC (84-86 oF). > Is this too high ? Can this be the cause of hardware > failures ? > From richard_fryer at charter.net Mon Apr 22 10:01:08 2002 From: richard_fryer at charter.net (Richard Fryer) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance Message-ID: <003601c1ea1f$4c3a75f0$6601a8c0@charterpipeline.com> On Fri, 19 Apr 2002 14:06:00 +0100 Daniel Kidger wrote: > after all as well as having the fastest line-speed, the Quadrics > interconnect sends all data as virtual addresses (the NIC has its > own MMU and TLB). That way any process can read and write > the memory of any other node without any CPU overhead. I appreciate getting a bit of technical detail on Quadrics interfaces. Is there a web location that might provide more information - comparative benchmarks or protocol information or ??? This message also reminded me to ask if a long-held opinion is valid - and that opinion is "that a cache coherent interconnect would offer performance enhancement when applications are at the 'more tightly coupled' end of the spectrum." I know that present PCI based interfaces can't do that without invoking software overhead and latencies. Anyone have data - or an argument for invalidating this opinion? I did recently read that the AMD 'HyperTransport' interfaces ARE capable of cache coherent transactions. This would appear to allow protocols (such as SCI) that support cache coherence to operate in that mode. But I wonder if it matters to the MPI world. Seems to me that it would be a factor in improving scalability (providing that other interconnect issues such as bandwidth bottlenecks) don't prevent it. My recollection is that the SCI simulations I saw required very little added traffic to maintain coherency. Also a brief note about the Dolphin product line, since the issue of link saturation has come up: - they DO also sell switches - or at least offer them. And if you check the SCI specification, you'll see that there are some elaborate discussions of fabric architectures that the protocol supports and switches enable. What I DO NOT know is if the SCALI software supports switch-based operation, and also don't know what the impact is on the system cost per node. My 'inexperienced' assessment of the appeal in the Dolphin family is that you can start without the switch and later add it if the performance benefit warrents. That's what I'd say if I were selling them anyway - and didn't know otherwise. :-) Richard Fryer rfryer@beomax.com From hahn at physics.mcmaster.ca Mon Apr 22 14:37:31 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:17 2009 Subject: 64 bit Intels? In-Reply-To: Message-ID: >> Have the number of bits per machine instruction also increased to 64 bits? not exactly. ia64 has "bundles" as its atomic instruction-stream format; a 128b bundle contains three 41b instruction fields as well as a template field. the legal combinations of instruction fields are fairly constrained, which means that the compiler is somtimes (often?) forced to put nops into bundles. > instructions). Are all pointers consistantly using 64 bits? If so, there > will be a proportional growth in the size of your executable. how often are pointers encoded in your executables? not often, I think. > The larger architecture also impacts your data formats. If your data sets > are in binary format, and depending on the language you are using, there > may be incompatibilities as well as new demands on storage. it's easy to say that ia64 is/was a pretty crazy thing to do, but Intel isn't quite *that* far gone that they'd define wholly new data formats. modulo the usual endian considerations, they're using familiar 2's complement integers and IEEE FP. for PR-level slides: http://developer.intel.com/design/itanium/idfisa/index.htm for programmer-level intro: http://developer.intel.com/design/itanium/downloads/24531703s.htm From hahn at physics.mcmaster.ca Mon Apr 22 14:51:49 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature In-Reply-To: Message-ID: > - or some silly guidelines like that.. uh, yeah, that's the word that came to my mind too. I get quite upset when my 33KW machineroom is over 23C or so. when it is at 20C, various bits of hardware report that they're at 30-35C inside their case. if you assume a fairly safe .5 C/W thermal resistance for heatsink/fan combination, that means you can technically have, a CPU dissipating 140W. this is why dual-athlon machines (two 60+W CPUs) are a bit tricky to cool. especially since although current CPUs are spec'ed at 90-100C, you REALLY DO NOT WANT TO DO SO. I consider 50C a fairly hot CPU. From timm at fnal.gov Mon Apr 22 14:58:21 2002 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature In-Reply-To: Message-ID: On Mon, 22 Apr 2002 alvin@Maggie.Linux-Consulting.com wrote: > > hi ya manel > > if your systems have "health monitoring"... > check your bios to see what it thinks is your > current system temp and cpu temp... > Am I missing something here? The BIOS sensors can only tell you what the temperature is when the machine is effectively idle. A good CPU load can be good for an increase of 7-10 degrees C. > > you're cpu reliability/performance goes down > by 1/2 for every 10 degree C > - ie if it would have lasted 5 more years... > ( starting temp is "normal/nominal temp" as provided > ( by intel or amd that they provide a cpu warranty for 5 years > - if temp went up another 10deg C... its now 2.5 yrs > - it temp went up 20 degree.... its now 1.25yrs... > - or some silly guidelines like that.. > > to test if the ambient temperature is too high... > - add a regular fan and blow air on it... > > - if the cpu temp drops significantly... > than its too hot in the room > > max cpu temp... > http://users.erols.com/chare/elec.htm > http://www.heatsink-guide.com/maxtemp.htm > > -- add lm_sensors to your "distro" to read the cpu temp Does anyone have success yet with making lm_sensors work on the Tyan 246x series of dual AMD motherboards? > and if it gets too high... shutdown the server > or at least dont do heavy computations on it > - add more fans ... and better air flow... > > c ya > alvin > http://www.Linux-1U.net/CPU .. more specs ... > > > On Mon, 22 Apr 2002, Manel Soria wrote: > > > I'm wondering what is the maximum reasonable ambient > > temperature to have in a cluster room. In our room > > with 72 nodes we have about 29-30 oC (84-86 oF). > > Is this too high ? Can this be the cause of hardware > > failures ? > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From walke at usna.edu Mon Apr 22 15:10:21 2002 From: walke at usna.edu (LT V. H. Walke) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature (Tyan S246x) In-Reply-To: References: Message-ID: <1019513422.21658.38.camel@vhwalke.mathsci.usna.edu> Temperature monitoring for the dual processor Tyan board is mostly working in lm_sensors 2.6.3 (March 22, 2002). See the tickets referenced on the lm_sensors web page: http://www.netroedge.com/~lm78/ Unfortunately, you still have to go into bios on every boot to initialize the monitoring chips, but everything seems to work. Fortunately it seems progress is being made - a solution was posted on April 11th (again, see the web page). Good luck, Vann On Mon, 2002-04-22 at 17:58, Steven Timm wrote: > On Mon, 22 Apr 2002 alvin@Maggie.Linux-Consulting.com wrote: > > > > > hi ya manel > > > > if your systems have "health monitoring"... > > check your bios to see what it thinks is your > > current system temp and cpu temp... > > > > Am I missing something here? The BIOS sensors can only > tell you what the temperature is when the machine is > effectively idle. A good CPU load can be good for an > increase of 7-10 degrees C. > > > > > you're cpu reliability/performance goes down > > by 1/2 for every 10 degree C > > - ie if it would have lasted 5 more years... > > ( starting temp is "normal/nominal temp" as provided > > ( by intel or amd that they provide a cpu warranty for 5 years > > - if temp went up another 10deg C... its now 2.5 yrs > > - it temp went up 20 degree.... its now 1.25yrs... > > - or some silly guidelines like that.. > > > > to test if the ambient temperature is too high... > > - add a regular fan and blow air on it... > > > > - if the cpu temp drops significantly... > > than its too hot in the room > > > > max cpu temp... > > http://users.erols.com/chare/elec.htm > > http://www.heatsink-guide.com/maxtemp.htm > > > > -- add lm_sensors to your "distro" to read the cpu temp > > Does anyone have success yet with making > lm_sensors work on the Tyan 246x series of dual AMD motherboards? > > > and if it gets too high... shutdown the server > > or at least dont do heavy computations on it > > - add more fans ... and better air flow... > > > > c ya > > alvin > > http://www.Linux-1U.net/CPU .. more specs ... > > > > > > On Mon, 22 Apr 2002, Manel Soria wrote: > > > > > I'm wondering what is the maximum reasonable ambient > > > temperature to have in a cluster room. In our room > > > with 72 nodes we have about 29-30 oC (84-86 oF). > > > Is this too high ? Can this be the cause of hardware > > > failures ? > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ---------------------------------------------------------------------- Vann H. Walke Office: Chauvenet 341 Computer Science Dept. Ph: 410-293-6811 572 Holloway Road, Stop 9F Fax: 410-293-2686 United States Naval Academy email: walke@usna.edu Annapolis, MD 21402-5002 http://www.cs.usna.edu/~walke ---------------------------------------------------------------------- From xyzzy at speakeasy.org Mon Apr 22 15:46:52 2002 From: xyzzy at speakeasy.org (Trent Piepho) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature (Tyan S246x) In-Reply-To: <1019513422.21658.38.camel@vhwalke.mathsci.usna.edu> Message-ID: On 22 Apr 2002, LT V. H. Walke wrote: > Temperature monitoring for the dual processor Tyan board is mostly > working in lm_sensors 2.6.3 (March 22, 2002). See the tickets > referenced on the lm_sensors web page: http://www.netroedge.com/~lm78/ > > Unfortunately, you still have to go into bios on every boot to > initialize the monitoring chips, but everything seems to work. > Fortunately it seems progress is being made - a solution was posted on > April 11th (again, see the web page). I've created a patch for the the lm_sensors w83781d driver that lets it properly initialize and detect the chips in the Tyan dual-amd boards. This lets me get temperature monitoring without going into the bios first. Temperature and fan speed monitoring works, but voltage doesn't work correctly for some inputs (like +12V) because Tyan used non-standard resistor values and I don't know what they are. I'm planning to give it to the lm sensors people soon. In the past, they've been very receptive to my fixes for the supermicro 370DE6 and vid settings for socketA, P4, and P3-S boards. From moor007 at bellsouth.net Mon Apr 22 17:01:22 2002 From: moor007 at bellsouth.net (Timothy W. Moore) Date: Wed Nov 25 01:02:17 2009 Subject: Ethernet Channel Bonding (ECB) Message-ID: <3CC4A452.9080403@bellsouth.net> I am new to the Beowulf cluster computing and am awaiting UPS to deliver my systems. I have been researching ECB and it seems to have mixed reviews regarding performance enhancement. Would/Could someone shed some light on this topic to the following effect: [1] Is it truly necessary? [2] If using RedHat 7.2, should I re-compile the kernel? Any and all assistance is truly appreciated! Tim From siegert at sfu.ca Mon Apr 22 17:45:41 2002 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:02:17 2009 Subject: Ethernet Channel Bonding (ECB) In-Reply-To: <3CC4A452.9080403@bellsouth.net>; from moor007@bellsouth.net on Mon, Apr 22, 2002 at 06:01:22PM -0600 References: <3CC4A452.9080403@bellsouth.net> Message-ID: <20020422174541.B24777@stikine.ucs.sfu.ca> On Mon, Apr 22, 2002 at 06:01:22PM -0600, Timothy W. Moore wrote: > I am new to the Beowulf cluster computing and am awaiting UPS to deliver > my systems. I have been researching ECB and it seems to have mixed > reviews regarding performance enhancement. Would/Could someone shed > some light on this topic to the following effect: > > [1] Is it truly necessary? That depends on the programs you are planning to run your cluster. With respect to performance: I am getting 269Mbit/s bandwith with 3-way channel bonded fast ethernet (using 3Com NICs). This is almost exactly three times as much as I get from a single NIC. The latency is not quite as good as with a single NIC: 55us vs. 43us (all numbers measured with netpipe). Thus if your program doesn't need the bandwith or if your program is extremely sensitive to latencies then you don't need channel bonding. > [2] If using RedHat 7.2, should I re-compile the kernel? Not necessarily. The bonding.o module is part of all RedHat kernels. However, I strongly recommend upgrading the kernel to 2.4.18 - I had problems with all earlier 2.4.x versions. Regards, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From ron_chen_123 at yahoo.com Mon Apr 22 23:58:36 2002 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:02:17 2009 Subject: FYI : Out of the box clustering in SuSE 8.0 Message-ID: <20020423065836.27793.qmail@web14706.mail.yahoo.com> Just read the news today about SGE 5.3 being included in SuSE 8.0. From now on, building compute farms is easier, saving the time on downloading and compiling the batch systems. You can read the news here at -- http://zdnet.com.com/2110-1104-888742.html And also, SGE is mainly used in compute farms, but some beowulfers use batch systems (like SGE, PBS) to improve the throughput of their clusters. Thanks, -Ron __________________________________________________ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ From eugen at leitl.org Tue Apr 23 07:29:44 2002 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:02:17 2009 Subject: Intel releases C++/Fortran suite V 6.0 for Linux Message-ID: Intel announces the release of Version 6 of the Intel(R) C++ and Fortran Compilers for Windows and Linux. Take advantage of performance for your software. Please visit our Compilers Home Page today. http://www.intel.com/software/products/compilers/ From josip at icase.edu Tue Apr 23 09:33:05 2002 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:02:17 2009 Subject: Maximum room temperature References: <200204201603.g3KG3iF08234@blueraja.scyld.com> <3CC42029.5B484258@labtie.mmt.upc.es> Message-ID: <3CC58CC1.41C0EBDC@icase.edu> Manel Soria wrote: > > I'm wondering what is the maximum reasonable ambient > temperature to have in a cluster room. In our room > with 72 nodes we have about 29-30 oC (84-86 oF). > Is this too high ? Can this be the cause of hardware > failures ? Yes it can. We start to lose hardware (disks, etc.) whenever temperature climbs to 85 deg. F (30 deg. C). Our computer room AC is set to maintain about 70 deg. F (21 deg. C), and we turn on spare AC units if this reaches 75 deg. F (about 24 deg. C). By 80 deg. F (27 deg. C), we start shutting down machines. BTW, hardware temperature monitoring measures temperatures inside the boxes, which are higher. CPU temperatures vary a lot and can easily reach 55 deg. C when loaded; motherboard temperatures are more stable (typically about 29-30 deg. C). We also wrote some periodic scripts which can e-mail root or even trigger automatic cluster shutdown when the average motherboard temperatures exceed reasonable limits (e.g. 35-40 deg. C). Unfortunately, dual CPU machines do not poweroff (Red Hat's Linux kernel 2.4.9-31smp considers "poweroff" unsafe on SMP machines) but at least they produce less heat when halted. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From timm at fnal.gov Tue Apr 23 10:31:28 2002 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:02:17 2009 Subject: Packet Engines "Hamachi" gigabit ethernet card Message-ID: Does anyone have a Hamachi gigabit ethernet card that is working under any kind of a 2.4 kernel at all? If so, what did it take to make it work? With all the various drivers we have tried including the latest available, we get the error message "too much work at interrupt" and no traffic through the card. Thanks Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations From keithu at parl.clemson.edu Tue Apr 23 13:21:01 2002 From: keithu at parl.clemson.edu (Keith Underwood) Date: Wed Nov 25 01:02:17 2009 Subject: Packet Engines "Hamachi" gigabit ethernet card In-Reply-To: Message-ID: It should be working "soon" in newer 2.4 kernels. There was a bug introduced by someone doing some "clean-ups" early in 2.4. I have attached a patch against the 2.4.18 version of the driver that should let you get traffic through the card. I don't know if anything else was broken along the way or not. Keith On Tue, 23 Apr 2002, Steven Timm wrote: > > Does anyone have a Hamachi gigabit ethernet card that is > working under any kind of a 2.4 kernel at all? If so, what did > it take to make it work? With all the various drivers we have > tried including the latest available, we get the error > message "too much work at interrupt" and no traffic through the card. > > Thanks > > Steve Timm > > ------------------------------------------------------------------ > Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ > Fermilab Computing Division/Operating Systems Support > Scientific Computing Support Group--Computing Farms Operations > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > --------------------------------------------------------------------------- Keith Underwood Parallel Architecture Research Lab (PARL) keithu@parl.clemson.edu Clemson University -------------- next part -------------- --- drivers/net/hamachi.c Mon Feb 25 14:37:59 2002 +++ drivers/net/hamachi.patched.c Tue Apr 2 15:32:45 2002 @@ -210,8 +210,10 @@ /* Condensed bus+endian portability operations. */ #if ADDRLEN == 64 #define cpu_to_leXX(addr) cpu_to_le64(addr) +#define desc_to_virt(addr) bus_to_virt(le64_to_cpu(addr)) #else #define cpu_to_leXX(addr) cpu_to_le32(addr) +#define desc_to_virt(addr) bus_to_virt(le32_to_cpu(addr)) #endif @@ -1544,7 +1546,8 @@ break; pci_dma_sync_single(hmp->pci_dev, desc->addr, hmp->rx_buf_sz, PCI_DMA_FROMDEVICE); - buf_addr = (u8 *)hmp->rx_ring + entry*sizeof(*desc); + //buf_addr = (u8 *)hmp->rx_ring + entry*sizeof(*desc); + buf_addr = desc_to_virt(desc->addr); frame_status = le32_to_cpu(get_unaligned((s32*)&(buf_addr[data_size - 12]))); if (hamachi_debug > 4) printk(KERN_DEBUG " hamachi_rx() status was %8.8x.\n", From joachim at lfbs.RWTH-Aachen.DE Tue Apr 23 01:01:18 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance References: <003601c1ea1f$4c3a75f0$6601a8c0@charterpipeline.com> Message-ID: <3CC514CE.8D08E38@lfbs.rwth-aachen.de> Richard Fryer wrote: > > On Fri, 19 Apr 2002 14:06:00 +0100 > Daniel Kidger wrote: > > > after all as well as having the fastest line-speed, the Quadrics > > interconnect sends all data as virtual addresses (the NIC has its > > own MMU and TLB). That way any process can read and write > > the memory of any other node without any CPU overhead. > > I appreciate getting a bit of technical detail on Quadrics interfaces. Is > there a web location that might provide more information - comparative > benchmarks or protocol information or ??? Of course www.quadrics.com, and Fabrizio Petrini is doing a lot of evaluation work (http://www.c3.lanl.gov/~fabrizio, esp. http://www.c3.lanl.gov/~fabrizio/quadrics.html). > This message also reminded me to ask if a long-held opinion is valid - and > that opinion is "that a cache coherent interconnect would offer performance > enhancement when applications are at the 'more tightly coupled' end of the > spectrum." I know that present PCI based interfaces can't do that without > invoking software overhead and latencies. Anyone have data - or an argument > for invalidating this opinion? You would need another programming model than MPI for that (see below), maybe OpenMP as you basically have the characteristics of a SMP system with cc-NUMA architecture. > I did recently read that the AMD 'HyperTransport' interfaces ARE capable of > cache coherent transactions. This would appear to allow protocols (such as > SCI) that support cache coherence to operate in that mode. But I wonder if > it matters to the MPI world. Seems to me that it would be a factor in > improving scalability (providing that other interconnect issues such as > bandwidth bottlenecks) don't prevent it. My recollection is that the SCI > simulations I saw required very little added traffic to maintain coherency. This is true (for an introduction, see http://www.SCIzzL.com/HowSCIcohWorks.html). However, for MPI, cache-coherence would not really add a performance benefit. MPI is designed to be efficient with "write-only" protocols. One-sided communication may benefit from it, but other techniques like Cray SHMEM do the same w/o cache-coherence. And I do not expect anybody except AMD or chipset designers to design network adapters / bus bridges for something propietary like HyperTransport... > Also a brief note about the Dolphin product line, since the issue of link > saturation has come up: - they DO also sell switches - or at least offer > them. And if you check the SCI specification, you'll see that there are > some elaborate discussions of fabric architectures that the protocol > supports and switches enable. What I DO NOT know is if the SCALI software > supports switch-based operation, and also don't know what the impact is on > the system cost per node. My 'inexperienced' assessment of the appeal in > the Dolphin family is that you can start without the switch and later add it > if the performance benefit warrents. That's what I'd say if I were selling > them anyway - and didn't know otherwise. :-) The "external" switches are not designed for large-scale HPC applications (although they scale quite well inside the range of their supported number of nodes), but for high-performance, high-availabitlity small-scale cluster or embedded applications, as i.e. Sun sells. With ext. switches, you don't have to do anything to keep the network up if a node fails (and also nothing if it comes back as SCI is not source-routed). In torus topologies, re-routing needs to be applied to bypass bad nodes (Scali does this on-the-fly). Scali does not support external switches AFAIK (at least doesn't sell such systems any longer), which is less a technical issue but more a design-issue as the topology is fully transparent for the nodes accessing the network (they did use switches in the past, see http://www.scali.com/whitepaper/ehpc97/slide_9.html). For large scale applications, distributed switches as in torus topologies scale better and more cost-efficient (see http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf and other resources). With switches, you need *a lot* of cables and switches (which doesn't hinder Quadrics to do so - resulting in an impressive 14 miles of cables for a recent system (IIRC) with single cables being up to 25m in length). It would need to be verified if such a system build with a Quadrics-like fat-tree topologie using Dolphins 8-port switches would scale better than the equivalent torus topologie for different communication patterns. I doubt it. At least, the interconect would cost a lot more (at least twice, or even more depending on the dimension of the tree). SCI-MPICH, can be used with arbitraries SCI topologies (because it uses the SISCI interface and thus runs with Scali or Dolphin SCI drivers). It is not that closely coupled to the SCI drivers as ScaMPI is. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From cpignol at seismiccity.com Tue Apr 23 15:32:00 2002 From: cpignol at seismiccity.com (Claude Pignol) Date: Wed Nov 25 01:02:17 2009 Subject: Ethernet Channel Bonding (ECB) References: <3CC4A452.9080403@bellsouth.net> <20020422174541.B24777@stikine.ucs.sfu.ca> Message-ID: <3CC5E0E0.3000706@seismiccity.com> Martin, Have you generalized channel bonding to all the nodes of your cluster? Which switch do you use? Thanks Claude Martin Siegert wrote: >On Mon, Apr 22, 2002 at 06:01:22PM -0600, Timothy W. Moore wrote: > >>I am new to the Beowulf cluster computing and am awaiting UPS to deliver >>my systems. I have been researching ECB and it seems to have mixed >>reviews regarding performance enhancement. Would/Could someone shed >>some light on this topic to the following effect: >> >>[1] Is it truly necessary? >> > >That depends on the programs you are planning to run your cluster. With >respect to performance: I am getting 269Mbit/s bandwith with 3-way >channel bonded fast ethernet (using 3Com NICs). This is almost exactly >three times as much as I get from a single NIC. The latency is not quite >as good as with a single NIC: 55us vs. 43us (all numbers measured with >netpipe). Thus if your program doesn't need the bandwith or if your >program is extremely sensitive to latencies then you don't need channel >bonding. > >>[2] If using RedHat 7.2, should I re-compile the kernel? >> > >Not necessarily. The bonding.o module is part of all RedHat kernels. >However, I strongly recommend upgrading the kernel to 2.4.18 - I had >problems with all earlier 2.4.x versions. > >Regards, >Martin > >======================================================================== >Martin Siegert >Academic Computing Services phone: (604) 291-4691 >Simon Fraser University fax: (604) 291-4242 >Burnaby, British Columbia email: siegert@sfu.ca >Canada V5A 1S6 >======================================================================== >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ------------------------------------------------------------------------ Claude Pignol SeismicCity, Inc. 2900 Wilcrest Dr. Suite 470 Houston TX 77042 Phone:832 251 1471 Mob:281 703 2933 Fax:832 251 0586 From heckendo at cs.uidaho.edu Tue Apr 23 17:41:35 2002 From: heckendo at cs.uidaho.edu (Robert B Heckendorn) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <200204231601.g3NG13b05509@blueraja.scyld.com> Message-ID: <200204240041.RAA21877@brownlee.cs.uidaho.edu> We are looking at the facilities issues in installing a beowulf on the order of 500 nodes. What facilities is telling us is that it is going to almost cost us more to buy the cooling for the machine than to buy machine itself. How are people making the air conditioning for their machines affordable? Have we miscalculated the HVAC loads? Are we being over charged? thanks for any guidance. -- | Robert Heckendorn | We may not be the only | heckendo@cs.uidaho.edu | species on the planet but | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. | CS Dept, University of Idaho | | Moscow, Idaho, USA 83844-1010 | From bob at drzyzgula.org Tue Apr 23 18:54:37 2002 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <200204240041.RAA21877@brownlee.cs.uidaho.edu> References: <200204231601.g3NG13b05509@blueraja.scyld.com> <200204240041.RAA21877@brownlee.cs.uidaho.edu> Message-ID: <20020423215437.A3370@www2> If the new load requires the installation of new chillers, it could indeed cost a pile-o'-money. Even if each node burned electricity at 100 Watts, you are looking at 50 kW of power consumption, or about 170,000 BTU/hr, requiring about 14 tons of cooling to remove -- your facilities folks may well be looking at installing something like one or more Liebert chillers such as these: http://www.liebert.com/dynamic/displayproduct.asp?id=545&cycles=60Hz There could well be additional shortfalls in external heat exchanger capacity, pipe capacity out to the heat exchangers, electric power for the computers and for the chillers, etc. If you don't already have the raised floor space, that could also add quite a bit to the cost to cool all those nodes. As to how we are making the A/C for our systems "affordable", we do it by virtue of the HVAC budget belonging to a different division, :-) although that also means that we don't have *control* over that budget, and when we hit the ceiling on cooling we kind of have to just stop installing new equipment until the whining and begging and pleading might eventually get us a new chiller -- and even then we might have to give up some rack space so there'd be a place to put it. :-( --Bob On Tue, Apr 23, 2002 at 05:41:35PM -0700, Robert B Heckendorn wrote: > > We are looking at the facilities issues in installing a beowulf on the > order of 500 nodes. What facilities is telling us is that it is going > to almost cost us more to buy the cooling for the machine than to buy > machine itself. How are people making the air conditioning for their > machines affordable? Have we miscalculated the HVAC loads? Are we > being over charged? > > thanks for any guidance. > > -- > | Robert Heckendorn | We may not be the only > | heckendo@cs.uidaho.edu | species on the planet but > | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. > | CS Dept, University of Idaho | > | Moscow, Idaho, USA 83844-1010 | > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jon at minotaur.com Tue Apr 23 19:32:47 2002 From: jon at minotaur.com (Jon Mitchiner) Date: Wed Nov 25 01:02:17 2009 Subject: cooling References: <200204231601.g3NG13b05509@blueraja.scyld.com> <200204240041.RAA21877@brownlee.cs.uidaho.edu> <20020423215437.A3370@www2> Message-ID: <024901c1eb38$52a8ba90$0d01a8c0@jonxp> The other consideration to have is some kind of monitoring/alerting system for the room. A client has a dedicated cooling equipment for a beowulf cluster for 52 machines. Recently the A/C broke one morning and they did not find out till the afternoon when someone walked into the small network room and found the room was in excess of 100 degrees. I dont want to think about what could have happened if it happened on a friday evening and nobody found about it until Monday. :) Jon Mitchiner ----- Original Message ----- From: "Bob Drzyzgula" To: "Robert B Heckendorn" Cc: Sent: Tuesday, April 23, 2002 9:54 PM Subject: Re: cooling > If the new load requires the installation of new > chillers, it could indeed cost a pile-o'-money. Even > if each node burned electricity at 100 Watts, you > are looking at 50 kW of power consumption, or about > 170,000 BTU/hr, requiring about 14 tons of cooling to > remove -- your facilities folks may well be looking > at installing something like one or more Liebert > chillers such as these: > http://www.liebert.com/dynamic/displayproduct.asp?id=545&cycles=60Hz > > There could well be additional shortfalls in external > heat exchanger capacity, pipe capacity out to the > heat exchangers, electric power for the computers > and for the chillers, etc. If you don't already > have the raised floor space, that could also add > quite a bit to the cost to cool all those nodes. > > As to how we are making the A/C for our systems "affordable", > we do it by virtue of the HVAC budget belonging to > a different division, :-) although that also means > that we don't have *control* over that budget, and > when we hit the ceiling on cooling we kind of have > to just stop installing new equipment until the whining > and begging and pleading might eventually get us > a new chiller -- and even then we might have to give > up some rack space so there'd be a place to put it. :-( > > --Bob > > On Tue, Apr 23, 2002 at 05:41:35PM -0700, Robert B Heckendorn wrote: > > > > We are looking at the facilities issues in installing a beowulf on the > > order of 500 nodes. What facilities is telling us is that it is going > > to almost cost us more to buy the cooling for the machine than to buy > > machine itself. How are people making the air conditioning for their > > machines affordable? Have we miscalculated the HVAC loads? Are we > > being over charged? > > > > thanks for any guidance. > > > > -- > > | Robert Heckendorn | We may not be the only > > | heckendo@cs.uidaho.edu | species on the planet but > > | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. > > | CS Dept, University of Idaho | > > | Moscow, Idaho, USA 83844-1010 | > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From scott.delinger at ualberta.ca Tue Apr 23 19:45:36 2002 From: scott.delinger at ualberta.ca (Scott Delinger) Date: Wed Nov 25 01:02:17 2009 Subject: cooling Message-ID: Hmm. I just bought 126 dual AthlonMP boxes, and needed to renovate the lab (electricity and AC). I've now got 7 tons of AC, and a whole panel devoted to the clusters in this facility. The reno was about CDN$35K (US$1.50), and the machines cost maybe eight times that? -- Scott L. Delinger, Ph.D. IT Administrator Department of Chemistry University of Alberta Edmonton, Alberta, Canada T6G 2G2 scott.delinger@ualberta.ca From scott.delinger at ualberta.ca Tue Apr 23 19:56:53 2002 From: scott.delinger at ualberta.ca (Scott Delinger) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <024901c1eb38$52a8ba90$0d01a8c0@jonxp> References: <200204231601.g3NG13b05509@blueraja.scyld.com> <200204240041.RAA21877@brownlee.cs.uidaho.edu> <20020423215437.A3370@www2> <024901c1eb38$52a8ba90$0d01a8c0@jonxp> Message-ID: >The other consideration to have is some kind of monitoring/alerting system >for the room. A client has a dedicated cooling equipment for a beowulf >cluster for 52 machines. Recently the A/C broke one morning and they did >not find out till the afternoon when someone walked into the small network >room and found the room was in excess of 100 degrees. > >I dont want to think about what could have happened if it happened on a >friday evening and nobody found about it until Monday. :) Ah, and on that: http://www.netbotz.com/ (rackmountable and wall- or camera-mountable Temp, RH, air speed, door contact, camera, and external sensors: all web/SNMP addressable w/email alerts available). I've got four WallBotz 310 units, in server rooms, wiring closets, and cluster rooms. And the cluster room has a thermostat monitored 24x7 by our Physical Plant. Braces and belt, when that much hardware is on the line. -- Scott L. Delinger, Ph.D. IT Administrator Department of Chemistry University of Alberta Edmonton, Alberta, Canada T6G 2G2 scott.delinger@ualberta.ca From steveb at aei-potsdam.mpg.de Tue Apr 23 21:01:02 2002 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <200204240041.RAA21877@brownlee.cs.uidaho.edu> Message-ID: Hi, We just purchased ~150 dual AMDs, and are cooling them with 4 Fujitsu ceiling-mounted air-conditioners: about 50kW of AC cost us about $25k, which is about 10% of the cost of the machines. So yeah, you're getting screwed. It might be that your facilites people are thinking that you need one huge monolithic cooling system, and you may; however, cooling in a piecemeal way ends up costing a lot less. Steve > We are looking at the facilities issues in installing a beowulf on the > order of 500 nodes. What facilities is telling us is that it is going > to almost cost us more to buy the cooling for the machine than to buy > machine itself. How are people making the air conditioning for their > machines affordable? Have we miscalculated the HVAC loads? Are we > being over charged? > > thanks for any guidance. > > -- > | Robert Heckendorn | We may not be the only > | heckendo@cs.uidaho.edu | species on the planet but > | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. > | CS Dept, University of Idaho | > | Moscow, Idaho, USA 83844-1010 | > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From heckendo at cs.uidaho.edu Tue Apr 23 22:00:16 2002 From: heckendo at cs.uidaho.edu (Robert B Heckendorn) Date: Wed Nov 25 01:02:17 2009 Subject: COTS cooling In-Reply-To: <200204240407.g3O47Jb22438@blueraja.scyld.com> Message-ID: <200204240500.WAA23212@brownlee.cs.uidaho.edu> We don't have to pay for the cooling but the cost of the installation of cooling is being used as an argument to cut corners on the machine itself. :-( So I would love to get the cost of the installation of cooling down. One of the responses to my mail said: "We just purchased ~150 dual AMDs, and are cooling them with 4 Fujitsu ceiling-mounted air-conditioners: about 50kW of AC cost us about $25k, which is about 10% of the cost of the machines." This sounds like COTS cooling to go with our COTS machines. :-) It has the nice feature that if one AC goes out the others keep running. It is also nice in that half a dozen 125KBTU/hr units in the ceiling would seem to handle a fairly large load and all machines for the next 4 years of expansion. 450W/dualnode * 3.4BTU/hr/W * 400 nodes = 612K BTU/hr Does anyone else comments on this scheme (pros or con)? Is anyone doing anything like this? -- | Robert Heckendorn | We may not be the only | heckendo@cs.uidaho.edu | species on the planet but | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. | CS Dept, University of Idaho | | Moscow, Idaho, USA 83844-1010 | From rauch at inf.ethz.ch Wed Apr 24 00:46:38 2002 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:02:17 2009 Subject: Packet Engines "Hamachi" gigabit ethernet card In-Reply-To: Message-ID: On Tue, 23 Apr 2002, Steven Timm wrote: > Does anyone have a Hamachi gigabit ethernet card that is > working under any kind of a 2.4 kernel at all? If so, what did > it take to make it work? With all the various drivers we have > tried including the latest available, we get the error > message "too much work at interrupt" and no traffic through the card. We use the Hamachi GNIC-II cards on 2.4.3 kernels without any problems. We currently use the driver compiled as a module. Here's some more information which might be relevant for you: Apr 6 13:36:39 c1 kernel: hamachi.c:v1.01 5/16/2000 Written by Donald Becker [...] Apr 6 13:36:39 c1 kernel: eth1: Hamachi GNIC-II type 10911 at 0xe081d000, 00:e0:b1:04:16:cb, IRQ 20. Apr 6 13:36:39 c1 kernel: eth1: 64-bit 33 Mhz PCI bus (60), Virtual Jumpers 30, LPA 0000. Regards, Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From bjornts at mi.uib.no Wed Apr 24 00:58:06 2002 From: bjornts at mi.uib.no (Bjorn Tore Sund) Date: Wed Nov 25 01:02:17 2009 Subject: Intel releases C++/Fortran suite V 6.0 for Linux In-Reply-To: <200204231601.g3NG1Wb05553@blueraja.scyld.com> Message-ID: On Tue, 23 Apr 2002 beowulf-request@beowulf.org wrote: > Date: Tue, 23 Apr 2002 16:29:44 +0200 (CEST) > From: Eugen Leitl > To: > Subject: Intel releases C++/Fortran suite V 6.0 for Linux > > > Intel announces the release of Version 6 of the Intel(R) C++ and Fortran > Compilers for Windows and Linux. Take advantage of performance for your > software. Please visit our Compilers Home Page today. > > http://www.intel.com/software/products/compilers/ I've been wanting to test these out, both in the previous versions and this, but as long as Intel are only releasing them as RedHat rpms, they are fundamentally useless on a SuSE system. Or at least a lot of hassle to install. Anyone know if Intel are going to come to their senses and start releasing tarballs, or am I going to have to go through that hassle? Bj?rn -- Bj?rn Tore Sund Phone: (+47) 555-84894 Stupidity is like a System administrator Fax: (+47) 555-89672 fractal; universal and Math. Department Mobile: (+47) 918 68075 infinitely repetitive. University of Bergen VIP: 81724 teknisk@mi.uib.no Email: bjornts@mi.uib.no http://www.mi.uib.no/ From jcownie at etnus.com Wed Apr 24 02:09:17 2002 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:02:17 2009 Subject: A better "Titanium" reference Message-ID: <170Im5-0I7-00@etnus.com> This is a better reference site for Titanium than the one I gave in my previous mail :- http://www.cs.berkeley.edu/Research/Projects/titanium/ -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From javier.iglesias at freesurf.ch Wed Apr 24 03:18:46 2002 From: javier.iglesias at freesurf.ch (javier.iglesias@freesurf.ch) Date: Wed Nov 25 01:02:17 2009 Subject: Suggestions on fiber Gigabit NICs Message-ID: <1019639926.webexpressdV3.1.f@smtp.freesurf.ch> Hi all, To cope with some network bottleneck problems leading to calculation crashes, we envisage to migrate our 18-nodes' (bi-AMD 1600+/Tyan Tiger MP/FastEthernet/Scyld 27-b8) master to Gigabit. I would like to get your feelings/experiences on two fiber Gigabit NICs : 1) Netgear GA-621 -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=106 2) 3Com 3C996-SX -> http://www.3com.com/products/en_US/detail.jsp?tab=features&pathtype=purchase&sku=3C996-SX We have a really nice ethernet Extreme Networks Summit 48 switch -> http://www.extremenetworks.com/products/datasheets/summit24.asp that offers 2 fiber Gigabit ports we want to push to work :) Any other suggestion ? want to share comments ? Thanks in advance for your help ! --javier --- Genug gewartet? sunrise ADSL: schneller im Internet. http://www.sunrise.ch/de/internet/int_ads.asp --- Assez attendu? Avec sunrise ADSL, surfez encore plus vite sur le net. http://www.sunrise.ch/fr/internet/int_ads.asp --- Stufo di aspettare? Con sunrise ADSL più veloce che mai in Internet. http://www.sunrise.ch/it/internet/int_ads.asp From daniel.pfenniger at obs.unige.ch Wed Apr 24 02:51:28 2002 From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger) Date: Wed Nov 25 01:02:17 2009 Subject: Intel releases C++/Fortran suite V 6.0 for Linux References: Message-ID: <3CC68020.2080609@obs.unige.ch> Bjorn Tore Sund wrote: ... > > I've been wanting to test these out, both in the previous versions > and this, but as long as Intel are only releasing them as RedHat > rpms, they are fundamentally useless on a SuSE system. Or at least > a lot of hassle to install. Anyone know if Intel are going to come > to their senses and start releasing tarballs, or am I going to have > to go through that hassle? > > Bj?rn At least on Mandrake 8.2 no particular problem occured. The installation just detect which kernel and glibc versions are there and install using rpm the files (by default, but modifiable, in /opt/intel/) The installation script warns that on a non RedHat distribution the compilers have not been tested, etc.. There is also an uninstall script that use rpm too. On Intel based computers the compilers are sufficiently interesting in boosting performance (sometimes by over 100% w.r.t. gcc/g77) to invest perhaps 30min for their installation and preliminary testing. The remaining "hassle" is perhaps to extend the PATH and LD_LIBRARY variables, or add a line in /etc/ld.so.conf + run ldconfig Dan From rgb at phy.duke.edu Wed Apr 24 06:07:51 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <200204240041.RAA21877@brownlee.cs.uidaho.edu> Message-ID: On Tue, 23 Apr 2002, Robert B Heckendorn wrote: > We are looking at the facilities issues in installing a beowulf on the > order of 500 nodes. What facilities is telling us is that it is going > to almost cost us more to buy the cooling for the machine than to buy > machine itself. How are people making the air conditioning for their > machines affordable? Have we miscalculated the HVAC loads? Are we > being over charged? No, this is one of the miracles of modern beowulfery. Our new facility in the physics department here is a modest sized room, perhaps 5mx13m. It has 75 KW of power in umpty 20A and 15A (120VAC) circuits. It has a heat exchange unit in one end of the room (unfortunately we were unable to commandeer the small room next door which would have put it and its noise out of the space itself) that is about 3mx3mx3m (to the ceiling, anyway) and that eats an extra half-meter or more on the sides in wasted space (across from the door, fortunately), making the first 3m+ of the room unusable for anything but entrance and AC. The room did require a certain amount of prep -- old floor out, new floor in, asbestos removal, paint. It did require fairly extensive wiring for all of the nodes -- a couple of large power distribution panels, power poles every couple of meters where they can service clusters of racks, a nifty thermal kill for the room power (room temp hits a preset of say, 30-35C and bammo, all nodes are just shut down the hard way). It did require a certain number of overhead cable trays and so forth. Still, I believe that the AC alone (one capable of removing 75 KW continuously) dominated the cost of the $150K renovation. It was so expensive that we had to reall work to convince the University to do it at all, and share the space with another department to ensure that it is filled as much as possible. Right now we are probably balancing along at the point where the number of nodes in the room equals the cost of renovation -- we probably have on the order of $150K worth of systems racked up and shelved. However, we are also ordering new nodes and upgrades pretty steadily as grants and so forth come in, and will likely have well over $250K worth of hardware in the room by the end of the year (which will translate into order 250 CPUs -- even buying duals, our nodes (without myrinet and with only some nodes on gigabit ethernet) are costing roughly $1K/cpu in a 2U dual athlon rackmount configuration. By the time the room is FULL (or as full as we can get it), probably in a couple of years, it should have order of 500 cpus (we're highball estimating 150W per CPU, although we're hoping for an average that is more like 100W -- high end Athlons draw about 70W loaded all by themselves, and then there is the rest of the system). At that point our node investment will likely exceed our renovation expense by 3 to 1 or better, and of course the value to the University in grant-funded research enabled by all of those nodes will be higher still -- every postdoc or faculty person grant-supported by research done with the cluster will probably net the university $30K or more in indirect costs. Overall, I therefore think that this is a solid win for the University and an investment essential to keeping the University current and competitive in its theoretical physics (and statistics, the group with whom we share) research. The University has at this point some two or three similar facilities in several buildings on campus. Computer science has an even (much) larger cluster/server facility that it shares with e.g. math (which has at least one large cluster doing imaging research supported by petrochemical companies). I believe that they are considering the construction of an even larger centralized facility to put genomic research and some biomed engineering clusters in. In a way it this is wistfully interesting. Old Guys (tm) will remember well the days of totally centralized compute resources, where huge, expensive facilities housed rows of e.g. IBM 370s. There were high priests who cared for and fed these beasts, acolytes who scurried in and out, and one prayed to them in the form of Fortran IV card decks with HASP job control prologue/epilogues and awaited the granting of your prayers in the form of a green-barred lineprinter output (charged per page including the bloody header page) placed into the box labelled with your last name initial. It was all very solemn, expensive, and ritualized. Then first the minicomputer, then the PC, liberated us from all of that. An IBM PC didn't run as fast as a 370, but time on the 370 was billed at something like $1/minute of CPU and time on the PC, even at a capital cost of $5K for the PC itself (yes, they were expensive little pups) was amortized out over YEARS (at 1440 minutes/day). Even using the PC as a terminal to the 370 allowed one to edit remotely instead of on a timeshare basis (billed at $1/minute, damn it!) and saved one loads of connect time (hence money). And then came Sun workstations, faster PCs, linux and somewhere in there computing became almost completely decentralized with a client/server paradigm -- yes, there were a few centralized servers, but most actual computation and presentation was done at the desktop. Even early beowulfs were largely spread out and not horribly centralized. An 8 node or 16 node system could fit in an office, a 32 node or even 64 node shelved beowulf could fit in a small server room. The beauty of them was that you bought one for YOUR research, you didn't share it (time or otherwise), and once you figured out how to put it all together it didn't require much care and feeding, certainly not at the high priest/acolyte stage (although cooling even 32 nodes starts to be serious business). Alas, we now seem to have come full circle. Beowulfs are indeed COTS supercomputers, but high density beowulfs are rackmounted and put in centralized, expensive, often shared server rooms and strongly resemble those centralized computers from which we once were freed. I exaggerate the woe, of course. The whole cluster NOW is transparently accessible at gigabit speeds from your desktop across campus (and wouldn't be any MORE accessible if you were sitting at a workstation in the room with it listening to 80db worth of AC roar in your ear), linux is excruciatingly stable (when it isn't unstable as hell, of course:-), and once you get the nodes installed and burned in a human needs to actually visit the cluster room only once in a long while. We've replaced the high priests and acolytes with sysadmin wizards and application/programming gurus but this is a welcome change, actually (they may appear similar but philosophically they are very different indeed:-). Still, the centralization threatens to a greater or lesser extent the freedom -- it puts control much more into the hands of administrators, costs more, involves more people in decisions. Not much to do with the original question, sure, but I needed a little philosophical ramble to start my day. Now I have to write an hour exam for my kiddies, which is less fun. Last day of class, though, which IS fun! Hooray! (It isn't only the students that anticipate summer...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Wed Apr 24 06:16:17 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:17 2009 Subject: cooling In-Reply-To: <024901c1eb38$52a8ba90$0d01a8c0@jonxp> Message-ID: On Tue, 23 Apr 2002, Jon Mitchiner wrote: > The other consideration to have is some kind of monitoring/alerting system > for the room. A client has a dedicated cooling equipment for a beowulf > cluster for 52 machines. Recently the A/C broke one morning and they did > not find out till the afternoon when someone walked into the small network > room and found the room was in excess of 100 degrees. > > I dont want to think about what could have happened if it happened on a > friday evening and nobody found about it until Monday. :) A very good idea is a thermal kill switch on the master power panels, mentioned in my previous reply. If room temperature hits a preset FOR ANY REASON, all nodes go down, period. Strategically, one would still want to monitor node and room temperature and install alarms and automated node shutdown scripts as previously mentioned in many discussions, but set THOSE alarms and shutdowns to go off at (say) 25C and 30C and set the room kill switch at (say) 35C. That way if cooling fails, first you get mail/pages/human alarms (at 25C, assuming room temperature is set to 20C and ordinarily stays there +/- 2C), then at 30C (or when CPU temps pass a preset alarm that goes off when ambient room temperature gets about there) nodes start shutting themselves down, and only if the humans and shutdowns fail to control the temperature or get the AC running again and the temperature keeps climbing does the room kill go off. This protects the systems against ANY POSSIBILITY that they will operate for an extended time at a "dangerous" temperature. Conservative people, or people with evidence that the properly functioning room has a very stable temperature profile, could reduce the alarm margins even further -- 35C is already quite a bit hotter than one wants to let the ambient air reach. 30C would be better, but it doesn't leave much room for less intrusive alarms and scripts to take effect. rgb > > Jon Mitchiner > > ----- Original Message ----- > From: "Bob Drzyzgula" > To: "Robert B Heckendorn" > Cc: > Sent: Tuesday, April 23, 2002 9:54 PM > Subject: Re: cooling > > > > If the new load requires the installation of new > > chillers, it could indeed cost a pile-o'-money. Even > > if each node burned electricity at 100 Watts, you > > are looking at 50 kW of power consumption, or about > > 170,000 BTU/hr, requiring about 14 tons of cooling to > > remove -- your facilities folks may well be looking > > at installing something like one or more Liebert > > chillers such as these: > > http://www.liebert.com/dynamic/displayproduct.asp?id=545&cycles=60Hz > > > > There could well be additional shortfalls in external > > heat exchanger capacity, pipe capacity out to the > > heat exchangers, electric power for the computers > > and for the chillers, etc. If you don't already > > have the raised floor space, that could also add > > quite a bit to the cost to cool all those nodes. > > > > As to how we are making the A/C for our systems "affordable", > > we do it by virtue of the HVAC budget belonging to > > a different division, :-) although that also means > > that we don't have *control* over that budget, and > > when we hit the ceiling on cooling we kind of have > > to just stop installing new equipment until the whining > > and begging and pleading might eventually get us > > a new chiller -- and even then we might have to give > > up some rack space so there'd be a place to put it. :-( > > > > --Bob > > > > On Tue, Apr 23, 2002 at 05:41:35PM -0700, Robert B Heckendorn wrote: > > > > > > We are looking at the facilities issues in installing a beowulf on the > > > order of 500 nodes. What facilities is telling us is that it is going > > > to almost cost us more to buy the cooling for the machine than to buy > > > machine itself. How are people making the air conditioning for their > > > machines affordable? Have we miscalculated the HVAC loads? Are we > > > being over charged? > > > > > > thanks for any guidance. > > > > > > -- > > > | Robert Heckendorn | We may not be the only > > > | heckendo@cs.uidaho.edu | species on the planet but > > > | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. > > > | CS Dept, University of Idaho | > > > | Moscow, Idaho, USA 83844-1010 | > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Wed Apr 24 06:50:38 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:17 2009 Subject: COTS cooling In-Reply-To: <200204240500.WAA23212@brownlee.cs.uidaho.edu> Message-ID: On Tue, 23 Apr 2002, Robert B Heckendorn wrote: > We don't have to pay for the cooling but the cost of the installation > of cooling is being used as an argument to cut corners on the machine > itself. :-( So I would love to get the cost of the installation of > cooling down. > > One of the responses to my mail said: > > "We just purchased ~150 dual AMDs, and are cooling them with 4 Fujitsu > ceiling-mounted air-conditioners: about 50kW of AC cost us about $25k, > which is about 10% of the cost of the machines." > > This sounds like COTS cooling to go with our COTS machines. :-) It > has the nice feature that if one AC goes out the others keep running. > It is also nice in that half a dozen 125KBTU/hr units in the ceiling > would seem to handle a fairly large load and all machines for the next > 4 years of expansion. > > 450W/dualnode * 3.4BTU/hr/W * 400 nodes = 612K BTU/hr > > Does anyone else comments on this scheme (pros or con)? > Is anyone doing anything like this? An AC consists of two separate components. One is the heat exchanger/blower/ductwork, which generally lives "in" the space being cooled. To remove a lot of heat, one has to move a lot of air over a lot of cold surface, so whether you use one large blower or several smaller ones, you have to move the same volume of air over the same area cooled the same amount, and one ALSO has to locate the ducting so that cold air flows out, gets pulled through your systems, and then goes into the return efficiently. Otherwise your room will have hot spots and cool spots, and hot-spot nodes will fail. You can feel the heat just walking past banks of nodes in our room, but there is always a feel of cold air going past you towards the heat as well. The other is the chiller, the part that actually takes the heat from the room, squeezes it out (literally) into the outside air, and returns cold-something (water, coolant, whatever) to the in-room heat exchanger. Window unit ACs combine the two into a single package -- in central AC and building AC they almost always are distinct. In a centralized operation, the chillers might be located far from the room. They may have constraints on WHERE they can be located (typically on the roof, for example, but likely only on certain parts of the scarce roof real estate). Getting unplanned insulated high volume pipes through a big building made out of steel reinforced concrete is nontrivial, getting power to the roof for new chillers is nontrivial, putting the (heavy) chillers on the roof where they won't just fall through onto the people working below is nontrivial. Nontrivial = expensive. Then there is a wide range of ways that people/organizations "bill" for this sort of construction -- beware creative accounting. So sure, it might cost a relatively small amount to install a relative large number of relatively cheap chillers and heat exchangers IF your room e.g. has an outside wall with a preexisting window or ductwork to the outside, there is plenty of room outside for a concrete pad and wiring to support the chillers, and so forth. OTOH, if you are in the basement (we are) of a large building with an existing chiller delivery system and "have" to add capacity or upgrade capacity to our existing chiller farm (as does Bob, clearly, and likely many others) and get "billed" according to how they account for the cost (which may add in all sorts of administrative, architectural, engineering expenses and not just the cost of the hardware) and if your room is fairly small and has relatively low ceilings (precluding lots of ceiling mounted heat exchangers) this might not work. YMMV. Some sites might be able to do things more cheaply than others, and it wouldn't surprise me at all to see a factor of 4 difference in cost from one end to the other. Ours was quite expensive, but it gives us a pretty good node density in a relatively small space, and SPACE is very, very "expensive" in our building (which we "share" with math while both departments try to grow). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From maurice at harddata.com Wed Apr 24 07:26:51 2002 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:02:17 2009 Subject: Beowulf digest, Vol 1 #841 - 9 msgs In-Reply-To: <200204241357.g3ODvUb00963@blueraja.scyld.com> Message-ID: <5.1.0.14.2.20020424082446.03967e10@mail.harddata.com> With regards to your message at 07:57 AM 4/24/02, beowulf-request@beowulf.org. Where you stated: >Message: 5 >Date: Wed, 24 Apr 2002 11:18:46 +0100 >From: javier.iglesias@freesurf.ch >To: beowulf@beowulf.org >Subject: Suggestions on fiber Gigabit NICs > >Hi all, > >To cope with some network bottleneck problems leading to calculation >crashes, we envisage to migrate our 18-nodes' (bi-AMD 1600+/Tyan Tiger >MP/FastEthernet/Scyld 27-b8) master to Gigabit. > >I would like to get your feelings/experiences on two fiber Gigabit >NICs : You probably should consider the SysKonnect and Intel offerings too.. Also, if in a cluster that all machines are close together why fibre? Copper is a lot less expensive. Also why Scyld? More modern bproc and other parts are available as downloadable GPL materials.. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue mailto:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 Ask me about the UP1500 Alpha - Full systems from $3,500! From sp at scali.com Tue Apr 23 16:45:24 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance In-Reply-To: <3CC514CE.8D08E38@lfbs.rwth-aachen.de> Message-ID: On Tue, 23 Apr 2002, Joachim Worringen wrote: [snip] > Richard Fryer wrote: > > Also a brief note about the Dolphin product line, since the issue of link > > saturation has come up: - they DO also sell switches - or at least offer > > them. And if you check the SCI specification, you'll see that there are > > some elaborate discussions of fabric architectures that the protocol > > supports and switches enable. What I DO NOT know is if the SCALI software > > supports switch-based operation, and also don't know what the impact is on > > the system cost per node. My 'inexperienced' assessment of the appeal in > > the Dolphin family is that you can start without the switch and later add it > > if the performance benefit warrents. That's what I'd say if I were selling > > them anyway - and didn't know otherwise. :-) > > The "external" switches are not designed for large-scale HPC > applications (although they scale quite well inside the range of their > supported number of nodes), but for high-performance, high-availabitlity > small-scale cluster or embedded applications, as i.e. Sun sells. With > ext. switches, you don't have to do anything to keep the network up if a > node fails (and also nothing if it comes back as SCI is not > source-routed). In torus topologies, re-routing needs to be applied to > bypass bad nodes (Scali does this on-the-fly). > > Scali does not support external switches AFAIK (at least doesn't sell > such systems any longer), which is less a technical issue but more a > design-issue as the topology is fully transparent for the nodes > accessing the network (they did use switches in the past, see > http://www.scali.com/whitepaper/ehpc97/slide_9.html). > In theory there is no problem using the Scali SCI driver in a SCI switched environment, we just haven't got the software to manage the switch (i.e set up the routing tables). You could use Dolphin SW to manage the switch though... We used to use switches (and even the Dolphin driver) back in the SPARC SBus days because (IIRC) the Dolphin SBus cards didn't have separate out/in connectors (necessary to build ringlets and toruses). > For large scale applications, distributed switches as in torus > topologies scale better and more cost-efficient (see > http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf and other > resources). With switches, you need *a lot* of cables and switches > (which doesn't hinder Quadrics to do so - resulting in an impressive 14 > miles of cables for a recent system (IIRC) with single cables being up > to 25m in length). It would need to be verified if such a system build > with a Quadrics-like fat-tree topologie using Dolphins 8-port switches > would scale better than the equivalent torus topologie for different > communication patterns. I doubt it. At least, the interconect would cost > a lot more (at least twice, or even more depending on the dimension of > the tree). > > SCI-MPICH, can be used with arbitraries SCI topologies (because it uses > the SISCI interface and thus runs with Scali or Dolphin SCI drivers). It > is not that closely coupled to the SCI drivers as ScaMPI is. > It is true that ScaMPI uses a (proprietary) interface between userspace (MPI library) and kernel space (SCI driver), but the SCI topology is still transparent to the userspace layer. Best regards, Steffen From ajax at aragorn.sapphire.no Wed Apr 24 00:35:38 2002 From: ajax at aragorn.sapphire.no (Steffen Persvold) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance Message-ID: On Tue, 23 Apr 2002, Joachim Worringen wrote: [snip] > Richard Fryer wrote: > > Also a brief note about the Dolphin product line, since the issue of link > > saturation has come up: - they DO also sell switches - or at least offer > > them. And if you check the SCI specification, you'll see that there are > > some elaborate discussions of fabric architectures that the protocol > > supports and switches enable. What I DO NOT know is if the SCALI software > > supports switch-based operation, and also don't know what the impact is on > > the system cost per node. My 'inexperienced' assessment of the appeal in > > the Dolphin family is that you can start without the switch and later add it > > if the performance benefit warrents. That's what I'd say if I were selling > > them anyway - and didn't know otherwise. :-) > > The "external" switches are not designed for large-scale HPC > applications (although they scale quite well inside the range of their > supported number of nodes), but for high-performance, high-availabitlity > small-scale cluster or embedded applications, as i.e. Sun sells. With > ext. switches, you don't have to do anything to keep the network up if a > node fails (and also nothing if it comes back as SCI is not > source-routed). In torus topologies, re-routing needs to be applied to > bypass bad nodes (Scali does this on-the-fly). > > Scali does not support external switches AFAIK (at least doesn't sell > such systems any longer), which is less a technical issue but more a > design-issue as the topology is fully transparent for the nodes > accessing the network (they did use switches in the past, see > http://www.scali.com/whitepaper/ehpc97/slide_9.html). > In theory there is no problem using the Scali SCI driver in a SCI switched environment, we just haven't got the software to manage the switch (i.e set up the routing tables). You could use Dolphin SW to manage the switch though... We used to use switches (and even the Dolphin driver) back in the SPARC SBus days because (IIRC) the Dolphin SBus cards didn't have separate out/in connectors (necessary to build ringlets and toruses). > For large scale applications, distributed switches as in torus > topologies scale better and more cost-efficient (see > http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf and other > resources). With switches, you need *a lot* of cables and switches > (which doesn't hinder Quadrics to do so - resulting in an impressive 14 > miles of cables for a recent system (IIRC) with single cables being up > to 25m in length). It would need to be verified if such a system build > with a Quadrics-like fat-tree topologie using Dolphins 8-port switches > would scale better than the equivalent torus topologie for different > communication patterns. I doubt it. At least, the interconect would cost > a lot more (at least twice, or even more depending on the dimension of > the tree). > > SCI-MPICH, can be used with arbitraries SCI topologies (because it uses > the SISCI interface and thus runs with Scali or Dolphin SCI drivers). It > is not that closely coupled to the SCI drivers as ScaMPI is. > It is true that ScaMPI uses a (proprietary) interface between userspace (MPI library) and kernel space (SCI driver), but the SCI topology is still transparent to the userspace layer. Best regards, Steffen From jcownie at etnus.com Wed Apr 24 01:42:03 2002 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance In-Reply-To: Message from Joachim Worringen of "Tue, 23 Apr 2002 10:01:18 +0200." <3CC514CE.8D08E38@lfbs.rwth-aachen.de> Message-ID: <170ILj-0HT-00@etnus.com> > > This message also reminded me to ask if a long-held opinion is valid - and > > that opinion is "that a cache coherent interconnect would offer performance > > enhancement when applications are at the 'more tightly coupled' end of the > > spectrum." I know that present PCI based interfaces can't do that without > > invoking software overhead and latencies. Anyone have data - or an argument > > for invalidating this opinion? > > You would need another programming model than MPI for that (see below), > maybe OpenMP as you basically have the characteristics of a SMP system > with cc-NUMA architecture. No, you don't have an SMP model. You need to distinguish between a system which has a single address space and one with multiple address spaces accessed explicitly. You can have a cache coherent interface in the second, but that doesn't make it into the first. What you have in the Quadrics (assuming it's still like the Meiko in this respect) is an explicit cache coherent remote store access model. You can access remote store without the active collaboration of the owner of the remote store (so it's not message passing), but you have to _know_ that you're accessing remotely and generate different code (maybe execute a channel program) to do it. You can't just indirect a random int * and fetch from remote store. In the OpenMP model you generally don't know which accesses are remote, all of the UPC threads live in the same address space and can pass pointers around at will. The compiler does not know which references will be to non-local store. Languages for the explicit remote store access model include UPC http://hpc.gwu.edu/~upc/ Co-array Fortran http://www.co-array.org/ Titanium http://www.cs.berkeley.edu/~liblit/titanium/ Of course these languages can also run on SMP machines (and indeed one might hope that they can achieve better performance than something like OpenMP, because the compiler can better lay out shared areas to avoid false sharing effects and has better knowledge about which accesses are to shared variables). Enjoy -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From michael.worsham at intermedia.com Wed Apr 24 06:10:31 2002 From: michael.worsham at intermedia.com (Worsham, Michael A.) Date: Wed Nov 25 01:02:17 2009 Subject: Liquid cooling? Message-ID: Has anyone attempting to create a beowulf cluster using extreme methods of cooling, such as the liquid cooling? Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & http://www.overclockershideout.com/ -- M -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020424/05bfc96f/attachment.html From jcownie at etnus.com Wed Apr 24 06:26:53 2002 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance Message-ID: <170MnN-0Lm-00@etnus.com> Sorry if you get something like this message twice, I submitted it once and nothing has come back, although my correction to one of the www addresses went through :-( Joachim Worringen wrote > > This message also reminded me to ask if a long-held opinion is valid - and > > that opinion is "that a cache coherent interconnect would offer performance > > enhancement when applications are at the 'more tightly coupled' end of the > > spectrum." I know that present PCI based interfaces can't do that without > > invoking software overhead and latencies. Anyone have data - or an argument > > for invalidating this opinion? > > You would need another programming model than MPI for that (see below), > maybe OpenMP as you basically have the characteristics of a SMP system > with cc-NUMA architecture. No, you are confusing two completely different issues. To support OpenMP you need a single address space which spans the processors. You can have cache coherent communication interfaces which do not implement such a thing. (If it's still the same as it was at Meiko, the Quadrics is an example of such an interface). What Quadrics provides is an explicit remote store access model. You can perform reads or writes cache coherently to a remote process' address space, but you have to know that you're doing a remote access and do something different to achieve it. You can't just indirect through some random pointer and have that fetch data. OpenMP assumes a single address space within which pointers can be passed around freely, so will not implement easily on top of an interface like Quadrics, even though that is (I believe) cache coherent at both ends. Languages which are built on an explicit remote store access model include Co-Array Fortran http://www.co-array.org UPC http://hpc.gwu.edu/~upc Titanium http://www.cs.berkeley.edu/Research/Projects/titanium/ in these languages the compiler always knows which accesses may be remote. Of course such languages can also run on SMP boxes and use a "genuinely" shared memory (and, indeed one might hope that the extra information available in such languages allows the compiler to generate better code for such a machine than one can generate from OpenMP, since it should be able to avoid much false sharing). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From jnemmers at helix.nih.gov Wed Apr 24 07:45:19 2002 From: jnemmers at helix.nih.gov (Justin Nemmers) Date: Wed Nov 25 01:02:17 2009 Subject: Burn-in Utilities Message-ID: All: I am in search of a utility that will allow me to burn-in a new PC. Ideally, it would peg the procs at 100% as well as exercise the memory (as much as 2Gb/Node. I know there is a Sun provided utility to do this on Sparc systems, but does anyone have a suggestion for a linux-based (perl would work, too) that will do the same thing? Cheers, Justin -- System Administrator National Institutes of Health Center for Information Technology 9000 Rockville PK Building 12B 2N/207 Bethesda, MD 20892-5680 301.496.0396 http://biowulf.nih.gov From math at velocet.ca Wed Apr 24 08:30:08 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:17 2009 Subject: Burn-in Utilities In-Reply-To: ; from jnemmers@helix.nih.gov on Wed, Apr 24, 2002 at 10:45:19AM -0400 References: Message-ID: <20020424113008.E56252@velocet.ca> On Wed, Apr 24, 2002 at 10:45:19AM -0400, Justin Nemmers's all... > All: > I am in search of a utility that will allow me to burn-in a > new PC. Ideally, it would peg the procs at 100% as well as exercise > the memory (as much as 2Gb/Node. I know there is a Sun provided > utility to do this on Sparc systems, but does anyone have a > suggestion for a linux-based (perl would work, too) that will do the > same thing? The packages (in debian and redhat AFAIK) cpuburn and memtest will do you nicely. We run 5 odd of each of burnMMX burnK7 and memtest on our athlon machines for 2-3 days and see if even one crashes. We've had a crash on machines tested AFTER being in service with no problems for 3-4 months. So its definitely a hardcore excercise. Oh we also stick dnetc on them on top of all that just to make sure its hurting. I think they're set to generate the most heat possible in the CPU during operation. They definitely draw the most current - when we were first setting up our cluster and werent sure of power draw, 8 dual 1.333Ghz athlon boards (no drives) would run G98 fine on a 15 amp circuit - as soon as we ran burnMMX/k7 we'd blow breakers. We run 5-10 to get a nice high context switch going and excercise the OS as well ;) We (through trial and error) found that running only 1 each of burnMMX/burnK7 at a time will often not crash for days, whereas running 5-10 will. (In fact, we only consider a crash within 12 hours to be a reason to RMA it if its slated for a workstation running windows. 12 hours of that test is almost equivalent to a crash every 3-6 months of regular LINUX desktop use (and with windows how can you tell? :)) Its actually suprising how well you can measure the quality of boards that way. Out of 40 246x Tyan boards we found one bad stick of ram and 0 cpus and boards bad using this method. However with ECS K75As we found 1/10 boards as shipped to us would die in 1-6 hours under this load, and another 1/10 will die within the 2-3 days. while ! burnMMX; do RMA_via_VAR; done Nonetheless we've never seen every unit of a certain brand always crash within that time - eventually we get good boards - so using proper sorting after testing in this manner you can always end up with a set of good boards (at least as far as these tests are concerned). So far with any board that makes it past 2-3 days of this we've never seen a problem with Gaussian98, Gromacs or distributed-net afterwards (at least until we hit long term electron migration path problems due to regular CPU heat wear and tear...) but none of our boards/CPUs (the PcChips M817 LMRs are hitting 16 months of continuous operation) are there yet. /kc > > Cheers, > Justin > -- > > System Administrator > National Institutes of Health > Center for Information Technology > 9000 Rockville PK > Building 12B 2N/207 > Bethesda, MD 20892-5680 > 301.496.0396 > http://biowulf.nih.gov > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From walke at usna.edu Wed Apr 24 08:32:56 2002 From: walke at usna.edu (LT V. H. Walke) Date: Wed Nov 25 01:02:17 2009 Subject: Burn-in Utilities In-Reply-To: References: Message-ID: <1019662376.28412.11.camel@vhwalke.mathsci.usna.edu> Try cpu-burn http://users.ev1.net/~redelm/ or MemTest86 http://www.teresaudio.com/memtest86/ Good luck, Vann On Wed, 2002-04-24 at 10:45, Justin Nemmers wrote: > All: > I am in search of a utility that will allow me to burn-in a > new PC. Ideally, it would peg the procs at 100% as well as exercise > the memory (as much as 2Gb/Node. I know there is a Sun provided > utility to do this on Sparc systems, but does anyone have a > suggestion for a linux-based (perl would work, too) that will do the > same thing? > > Cheers, > Justin > -- > > System Administrator > National Institutes of Health > Center for Information Technology > 9000 Rockville PK > Building 12B 2N/207 > Bethesda, MD 20892-5680 > 301.496.0396 > http://biowulf.nih.gov > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ---------------------------------------------------------------------- Vann H. Walke Office: Chauvenet 341 Computer Science Dept. Ph: 410-293-6811 572 Holloway Road, Stop 9F Fax: 410-293-2686 United States Naval Academy email: walke@usna.edu Annapolis, MD 21402-5002 http://www.cs.usna.edu/~walke ---------------------------------------------------------------------- From cozzi at nd.edu Wed Apr 24 08:44:04 2002 From: cozzi at nd.edu (Marc Cozzi) Date: Wed Nov 25 01:02:17 2009 Subject: Liquid cooling? Message-ID: I have a difficult time understanding all this overclocking/cooling stuff. If you go to some of the web sites listed below you get a real sense of all show no go. Cases carved out on the side, plastic windows installed, neon lights installed inside. extra fans on the top, bottom, sides, front and back. The original stuff is engineered for gods sake! Although in some cases poorly. Liquid cooling an overclocked AMD chip could cost between $50 and $400!!! Wouldn't that money be better spent toward a a faster chip? Perhaps as much as %250 faster compared to what little you may get with overclocking? Not to mention the warranty problems with damn near every thing in the box. The potential problems with broken cooling lines inside and out of the boxes. I would think for most of us time is money and the maintenance of such systems (we are talking of clusters and not single systems) would be prohibitive. Unless the boss is easily fooled. Reminds me of the 1966 Chevrolet Impalas with hydraulics, dingo balls, neon license plate frames.. Ok, maybe makes limited sense for 1U type systems.... --marc -----Original Message----- From: Worsham, Michael A. [mailto:michael.worsham@intermedia.com] Sent: April 24, 2002 8:11 AM To: 'beowulf@beowulf.org' Subject: Liquid cooling? Has anyone attempting to create a beowulf cluster using extreme methods of cooling, such as the liquid cooling? Example sites: http://www.koolance.com/ , http://www.senfu.com.tw/ , & http://www.overclockershideout.com/ -- M -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20020424/a958bccb/attachment.html From josip at icase.edu Wed Apr 24 08:56:53 2002 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:02:17 2009 Subject: cooling References: Message-ID: <3CC6D5C5.A83691AA@icase.edu> Steven Berukoff wrote: > > We just purchased ~150 dual AMDs, and are cooling them with 4 Fujitsu > ceiling-mounted air-conditioners: about 50kW of AC cost us about $25k, > which is about 10% of the cost of the machines. Your setup also has the advantage of redundancy. We've got a large AC (a 5-ton unit) plus two smaller/self-contained AC units for those inevitable times when the large unit is not performing properly. AC typically breaks down when you need it the most (i.e. on the hottest day when it has to work the hardest). We try to have some redundancy and the ability to stage 1-2-3 AC units as needed. Oversized AC to handle event the hottest days is not as efficient, and when it breaks down, most of the computers have to be powered off... Sincerely, Josip P.S. Temperature alarms are a highly recommended. One can even wire the temperature sensor to kill the computer power at the circuit breaker box when the temperature exceeds dangerous levels. Automatic handling of overheating is needed because when AC fails, the temperature in a small room (heated by 5-10KW dissipated by the computers) can go above 100 deg F within ~30 minutes. This may be faster than a system manager can drive in on a weekend -- so automated shutdown may be needed even if the temperature alarm pages someone. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From john.hearns at cern.ch Wed Apr 24 08:57:13 2002 From: john.hearns at cern.ch (John Hearns) Date: Wed Nov 25 01:02:17 2009 Subject: Liquid cooling? In-Reply-To: References: Message-ID: <1019663834.6257.10.camel@ues4> On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > Has anyone attempting to create a beowulf cluster using extreme methods of > cooling, such as the liquid cooling? > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > http://www.overclockershideout.com/ > Well, I think Robert Brown has FINALLY been beaten here. You're not going to install Freon tanks, complete with plastic fish are you Bob? I just have this bizarre vision of Bob in an aqualung visiting a Freon-flooded machine room... From becker at scyld.com Wed Apr 24 09:34:43 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:17 2009 Subject: List moderation and related info In-Reply-To: <170MnN-0Lm-00@etnus.com> Message-ID: On Wed, 24 Apr 2002, James Cownie wrote: > Sorry if you get something like this message twice, I submitted it > once and nothing has come back, although my correction to one of the > www addresses went through :-( Your messages to the list are held for moderation due to the header contents. Your messages don't appear to be coming from your subscribed address. I've since added an exception that should allow your posts to be automatically approved. The message filters on the Beowulf lists are frequently updated, and for good reason. There are many attempts to post spam and virus emails. Only a tiny fraction manages to slip through, and I add new rules to attempt to catch future copies. But the spammers learn new tricks. Readers should expect occasional bogus messages. On the topic of moderation-holds: I try to do the approval moderation each day. I'm sometime on travel and don't have the opportunity to moderate. (In the past delegating this work has not been reliable.) To avoid moderation holds: Have your post appear to come from your subscribed email address Use a non-trivial subject line Post from an IP address address that can be reverse-resolved Don't post from any machine in ".kr", ".cn" or ".pt". Avoid any mention of printer supplies or "enlargement" drugs (unless the latter is directly related to your cluster use ;->) -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From joachim at lfbs.RWTH-Aachen.DE Wed Apr 24 09:16:01 2002 From: joachim at lfbs.RWTH-Aachen.DE (Joachim Worringen) Date: Wed Nov 25 01:02:17 2009 Subject: Kidger's comments on Quadric's design and performance References: <170MnN-0Lm-00@etnus.com> Message-ID: <3CC6DA41.367BEB55@lfbs.rwth-aachen.de> James Cownie wrote: > > Sorry if you get something like this message twice, I submitted it > once and nothing has come back, although my correction to one of the > www addresses went through :-( > > Joachim Worringen wrote > > > > This message also reminded me to ask if a long-held opinion is valid - and > > > that opinion is "that a cache coherent interconnect would offer performance > > > enhancement when applications are at the 'more tightly coupled' end of the > > > spectrum." I know that present PCI based interfaces can't do that without > > > invoking software overhead and latencies. Anyone have data - or an argument > > > for invalidating this opinion? > > > > You would need another programming model than MPI for that (see below), > > maybe OpenMP as you basically have the characteristics of a SMP system > > with cc-NUMA architecture. > > No, you are confusing two completely different issues. To support > OpenMP you need a single address space which spans the processors. You are right, this is completely different. However, I did not mean that connecting nodes of a cluster with a cache-coherent interface "gives you an SMP", but more precisely "gives the shared parts of the distributed distinct address spaces nearly SMP-like access characteristics", with respect to a suitable programming model. This would enable a matching OpenMP-Compiler/run-time-lib to generate and run code with (more or less) SMP-like performance as does the OMNI OpenMP-Compiler (currently on top of a software DSM library SCASH on top of SCore, see http://www.hpcc.jp/Omni - this is all software which is much more perfomance-sensitive to bad data-placement and has generally a much higher overhead than such a hw-based solution would have). There is something similar on top of SCI, namely the HAMSTER project (http://hamster.informatik.tu-muenchen.de/), but w/o OpenMP, IIRC, and still some software-overhead to "simulate" cachable remote memory on top of SCI-connected PCs. With Quadrics, this should be possible in an even more efficient manner due to the hardware-MMU and -TLB on the adapter. To have a real cc-NUMA-SMP, the integration needs to be higher (HP X-Class, DG/IBM NUMA-Q, ...), this is for sure. The question is: are large-scale SMPs as sold by IBM, Sun, ... not the better solution for such tasks? Quadrics is expensive, and you still have to manage a bunch of PCs instead a nice, single SMP. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339 From trey at ERC.MsState.Edu Wed Apr 24 09:43:23 2002 From: trey at ERC.MsState.Edu (Trey Breckenridge) Date: Wed Nov 25 01:02:17 2009 Subject: cooling Message-ID: <200204241643.g3OGhNGU010172@ERC.MsState.Edu> >Steven Berukoff wrote: >> >> We just purchased ~150 dual AMDs, and are cooling them with 4 Fujitsu >> ceiling-mounted air-conditioners: about 50kW of AC cost us about $25k, >> which is about 10% of the cost of the machines. One of the major disadvantages of overhead A/C units is that when the overflow drain clogs (and it will), the excess water will spill into your machines. Another point to consider is access for maintenance of the A/C. In some cases, it may require you to shut down and move your racks for the maintenance personnel to have full access the A/C units. A second disadvantage with ceiling mounted units (with the supply-side vent in the ceiling) is that your cooling efficiency will be lower than with a under-floor based system. Basically, from a ceiling mounted unit, the cool air from the supply will have to travel the entire height of the room in order to "reach" the machine that is the farthest away (your bottommost machine in a rack) which may be 10 feet. When cooling from under the floor in a raised floor scenario, the furthest machine from the supply is maybe 6-8 feet away (the topmost machine in your rack). The difference in the cold air velocity at 6 feet versus 10 feet may be significant. My experience is that it does make a difference. In our data center with ceiling mounted A/C's, the machines at the bottom of our racks run considerable hotter than the top machines. However, in our second data center, we have under-floor A/C. The machine temperatures in the racks there are much more consistent (and cooler on average) from top to bottom. Of course, all of this is just my opinion. __________________________________________________________________________ Trey Breckenridge - Computing Systems Manager - trey@ERC.MsState.Edu Mississippi State University Engineering Research Center From pzb at datastacks.com Wed Apr 24 09:59:34 2002 From: pzb at datastacks.com (Peter Bowen) Date: Wed Nov 25 01:02:17 2009 Subject: Burn-in Utilities In-Reply-To: References: Message-ID: <1019667574.6829.3.camel@gargleblaster.caffeinexchange.org> On Wed, 2002-04-24 at 10:45, Justin Nemmers wrote: > All: > I am in search of a utility that will allow me to burn-in a > new PC. Ideally, it would peg the procs at 100% as well as exercise > the memory (as much as 2Gb/Node. I know there is a Sun provided > utility to do this on Sparc systems, but does anyone have a > suggestion for a linux-based (perl would work, too) that will do the > same thing? The best answer is cerberus. VA Linux Systems wrote it for burn-in testing their machines, and open sourced it for others to use. Red Hat is maintaining a version of it that probably will do many of the things you want. See http://people.redhat.com/bmatthews/cerberus and http://sourceforge.net/projects/va-ctcs for more info. Thanks. Peter From maurice at harddata.com Wed Apr 24 10:41:59 2002 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:02:17 2009 Subject: Beowulf digest, Vol 1 #843 - 2 msgs In-Reply-To: <200204241601.g3OG17b06015@blueraja.scyld.com> Message-ID: <5.1.0.14.2.20020424114047.07630ba0@mail.harddata.com> With regards to your message at 10:01 AM 4/24/02, beowulf-request@beowulf.org. Where you stated: >On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > > Has anyone attempting to create a beowulf cluster using extreme methods of > > cooling, such as the liquid cooling? > > > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > > http://www.overclockershideout.com/ > > > >Well, I think Robert Brown has FINALLY been beaten here. >You're not going to install Freon tanks, complete with plastic >fish are you Bob? >I just have this bizarre vision of Bob in an aqualung visiting >a Freon-flooded machine room... Interesting you should ask, as we are about a month away from shipping our clusters (dual athlon and dual XEON) with liquid cooling using a centralised radiator and pump unit per rack of machines.. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue mailto:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 Ask me about the UP1500 Alpha - Full systems from $3,500! From rgb at phy.duke.edu Wed Apr 24 10:49:03 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? In-Reply-To: <1019663834.6257.10.camel@ues4> Message-ID: On 24 Apr 2002, John Hearns wrote: > On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > > Has anyone attempting to create a beowulf cluster using extreme methods of > > cooling, such as the liquid cooling? > > > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > > http://www.overclockershideout.com/ > > > > Well, I think Robert Brown has FINALLY been beaten here. > You're not going to install Freon tanks, complete with plastic > fish are you Bob? > I just have this bizarre vision of Bob in an aqualung visiting > a Freon-flooded machine room... Oh no, this has all been discussed before on the list before (many times, actually -- look back at the archives with google to find some of them) and MY favorite solution is to build a really large computer room in, say, Antarctica and just put fans in the windows. Liquid solutions (no pun intended:-) tend to be expensive, messy, environmentally nasty (if you don't use water), risky (water and electricity don't mix well) and, as you note, servicing the machines in a full immersion rack can be, well, "involved". ;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Wed Apr 24 10:51:22 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: List moderation and related info In-Reply-To: Message-ID: On Wed, 24 Apr 2002, Donald Becker wrote: > Don't post from any machine in ".kr", ".cn" or ".pt". Good choices:-) > Avoid any mention of printer supplies or "enlargement" drugs > (unless the latter is directly related to your cluster use ;->) But wait, then how did this originally message get in? How did this reply get in? Self-referential systems are all too confusing...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From fraser5 at cox.net Wed Apr 24 11:14:50 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? In-Reply-To: <1019663834.6257.10.camel@ues4> Message-ID: <000001c1ebbb$ed5e3370$0800005a@papabear> While I think overclocking is kinda silly in extreme cases (like hot-rods) and nearly pointless for most real serious computing applications, I think the water-cooling has real merits and could be considered for dense clusters. 1) Water cools orders-of-magnitudes better then air 2) It is far quieter 3) CPU temps hardly vary as compared to air (even under load) (better stability) 4) It does not have to be that much more expensive then high-end air cooling (there is a real price to cool dual cpu's in a 1U steel box.) 5) leaks are almost unheard-of, and are not as catastrophic as they sound (distilled water is generally not a problem, but a leak is a possible mode of failure) 6) Dense water cooled systems could be easily be engineered to remove bulk heat far better then the rows of tiny little cheap jap fans whirring at 7000 rpm...talk about failure rates!?! Fans are most prone to failure that result in hardware breakdowns. Don't discard water cooling. Also for the non-budget minded, there is the complete board submergence route with hydrofluoroether (HFW) 3M makes it http://products.3m.com/usenglish/mfg_industrial/elec_materials.jhtml?powurl= SKKXCT77P5be2FCSL3BCQXgeGST1T4S9TCgv5NGBVHDQ19gl ...this is expensive (~250 bucks+/gal but exceptionally effective. I think CRAY made a machine that used a similar fluid once. A couple of weeks ago I saw the guys on TECHTV demonstrate this stuff in a fish-tank like set-up: http://www.techtv.com/screensavers/supergeek/story/0,24330,3380128,00.html jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of John Hearns Sent: Wednesday, April 24, 2002 11:57 AM To: beowulf@beowulf.org Subject: Re: Liquid cooling? On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > Has anyone attempting to create a beowulf cluster using extreme methods of > cooling, such as the liquid cooling? > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > http://www.overclockershideout.com/ > Well, I think Robert Brown has FINALLY been beaten here. You're not going to install Freon tanks, complete with plastic fish are you Bob? I just have this bizarre vision of Bob in an aqualung visiting a Freon-flooded machine room... _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From James.P.Lux at jpl.nasa.gov Wed Apr 24 12:47:07 2002 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? In-Reply-To: <000001c1ebbb$ed5e3370$0800005a@papabear> References: <1019663834.6257.10.camel@ues4> Message-ID: <5.1.0.14.2.20020424123459.00b1e010@mail1.jpl.nasa.gov> Jim makes a number of good points, some of which would also apply to conduction cooled equipment (as used in space applications and some avionics, where you can't depend on the air density). In my efforts to design a "field usable Beowulf" which is entirely sealed (mud/dust proof, etc.), I had come to similar conclusions about the viability of liquid cooling (mass gets to be a bit of a problem, though). However, a basic philosophical issue arises... The "beauty" of a Beowulf is that it uses "commodity" computers, and so, can capitalize on enormous efficiencies of scale for the huge consumer market. As you start to go towards less consumer configurations, you're straying, to a certain extent from the "pile of cheap PC's" paradigm and back towards the "behemoth in the machine room" model. Certainly, the 500+ node computers that folks are putting together with 1U cases, dozens of machines in dozens of racks, with dedicated cooling, etc., is straying pretty far from the original Beowulf concept. It still is cluster computing, and maybe what makes it Beowulf'ish is not the hardware, per se, but the fact that you are using "off the shelf" cheap/free software (Linux, e.g.), and "off the shelf" interconnects?? By the way, I wouldn't fool with DI water in a totally immersed system.. too corrosive. If you don't want to pop for the Fluorinerts, then various silicone and mineral oils would work well, are non-toxic, and inexpensive. These have been used for decades for immersed cooling of all sorts of stuff (transformers, for instance). You'd want to assess compatibility with adhesives and existing coatings. And, you better hope that it really does improve reliability... it's going to be a mess to service. You might want to do some hard core burn in first and get past the infant mortalities, before you "literally take the plunge". And, while plunging the Mobo in oil wouldn't bother it, I wonder if the same is true of things like the hard disk drive? They're probably vented, and/or have moving parts that are outside the sealed area. At 02:14 PM 4/24/2002 -0400, Jim Fraser wrote: >While I think overclocking is kinda silly in extreme cases (like hot-rods) >and nearly pointless for most real serious computing applications, I think >the water-cooling has real merits and could be considered for dense >clusters. >1) Water cools orders-of-magnitudes better then air >2) It is far quieter >3) CPU temps hardly vary as compared to air (even under load) (better >stability) >4) It does not have to be that much more expensive then high-end air cooling >(there is a real price to cool dual cpu's in a 1U steel box.) >5) leaks are almost unheard-of, and are not as catastrophic as they sound >(distilled water is generally not a problem, but a leak is a possible mode >of failure) >6) Dense water cooled systems could be easily be engineered to remove bulk >heat far better then the rows of tiny little cheap jap fans whirring at 7000 >rpm...talk about failure rates!?! Fans are most prone to failure that result >in hardware breakdowns. Don't discard water cooling. > >jim > > >--- >On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > > Has anyone attempting to create a beowulf cluster using extreme methods of > > cooling, such as the liquid cooling? > > > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > > http://www.overclockershideout.com/ > > > >Well, I think Robert Brown has FINALLY been beaten here. >You're not going to install Freon tanks, complete with plastic >fish are you Bob? >I just have this bizarre vision of Bob in an aqualung visiting >a Freon-flooded machine room... > Jim Lux Spacecraft Telecommunications Equipment Section Jet Propulsion Laboratory 4800 Oak Grove Road, Mail Stop 161-213 Pasadena CA 91109 818/354-2075, fax 818/393-6875 From math at velocet.ca Wed Apr 24 14:59:16 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? In-Reply-To: ; from rgb@phy.duke.edu on Wed, Apr 24, 2002 at 01:49:03PM -0400 References: <1019663834.6257.10.camel@ues4> Message-ID: <20020424175916.D12933@velocet.ca> On Wed, Apr 24, 2002 at 01:49:03PM -0400, Robert G. Brown's all... > On 24 Apr 2002, John Hearns wrote: > > > On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > > > Has anyone attempting to create a beowulf cluster using extreme methods of > > > cooling, such as the liquid cooling? > > > > > > Example sites: http://www.koolance.com/, http://www.senfu.com.tw/, & > > > http://www.overclockershideout.com/ > > > > > > > Well, I think Robert Brown has FINALLY been beaten here. > > You're not going to install Freon tanks, complete with plastic > > fish are you Bob? > > I just have this bizarre vision of Bob in an aqualung visiting > > a Freon-flooded machine room... > > Oh no, this has all been discussed before on the list before (many > times, actually -- look back at the archives with google to find some of > them) and MY favorite solution is to build a really large computer room > in, say, Antarctica and just put fans in the windows. > > Liquid solutions (no pun intended:-) tend to be expensive, messy, > environmentally nasty (if you don't use water), risky (water and > electricity don't mix well) and, as you note, servicing the machines in > a full immersion rack can be, well, "involved". Wasnt someone suggesting putting a huge machine room in alaska for this reason? Right near 'pacific rim fabric' and right near some huge power plants in alaska or what not? Environmental damage notwithstanding. Anyone ever sell the heat generated from the clusters to someone else? :) /kc > > ;-) > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From gerry at cs.tamu.edu Wed Apr 24 15:48:58 2002 From: gerry at cs.tamu.edu (Gerry Creager N5JXS) Date: Wed Nov 25 01:02:18 2009 Subject: List moderation and related info References: Message-ID: <3CC7365A.7030105@cs.tamu.edu> Robert G. Brown wrote: > On Wed, 24 Apr 2002, Donald Becker wrote: > > >> Don't post from any machine in ".kr", ".cn" or ".pt". >> > > Good choices:-) Some of my more entertaining non-technical reading, recently, has come from sites in those countries... >> Avoid any mention of printer supplies or "enlargement" drugs >> (unless the latter is directly related to your cluster use ;->) >> > > But wait, then how did this originally message get in? How did this > reply get in? Error in the human-scanning function? > Self-referential systems are all too confusing...;-) And to think that I learned to call 'em "self-eating watermelons gerry -- Gerry Creager -- gerry@cs.tamu.edu Network Engineering Academy for Advanced Telecommunications and Learning Technologies Texas A&M University 979.458.4020 (Phone) -- 979.847.8578 (Fax) From canon at nersc.gov Wed Apr 24 16:01:48 2002 From: canon at nersc.gov (canon@nersc.gov) Date: Wed Nov 25 01:02:18 2009 Subject: Power Strips and cables Message-ID: <200204242301.g3ON1m930090@pookie.nersc.gov> Greetings, I was curious what novel ways everyone is powering rack systems. I'm mainly curious about high density (1U) systems. We currently are racking 32 nodes in a rack and using 4 20A, vertically mounted power strips. I'm concerned that next wave of machines will be too deep to accommodate the four power strips. What is everyone doing for these types of scenarios? I don't need remote power management at this point. I'm more concerned with just fitting everything in neatly. Also, I've seen custom power cords that clean things up. Does anyone know a vendor or supplier for these types of things? Thanks in advance, --Shane Canon From Chester.Fitch at mdx.com Wed Apr 24 16:14:59 2002 From: Chester.Fitch at mdx.com (Fitch, Chester) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? Message-ID: <19E8BE159FECD4118FE700508BEE12D2012FF7EE@mdx-email1.den.mdx.com> Yes, there was some talk on /. a while back about putting a server farm up on the North slope of Alaska... here's the link: http://slashdot.org/article.pl?sid=01/05/14/159258&mode=thread Idea was lots of cooling capacity (especially in winter) and lots of low-cost natural gas to power the thing.. Problems, however, included staffing and getting the data traffic to/from the lower 48 states.. (not to mention the time required for a service call!) Interesting idea -- I actually used the idea as an exercise in class last semester - as a (obviously extreme) exercise in facilities management. Point was to get them to think about all the infrastructure we often take for granted.. As far as selling the heat generated by our systems... A co-generation facility off of the computer room? (Hmm.. Maybe, for some of our bigger beowulfs..) But if your campus buildings have steam heating, well, there might be something to it.. ;-) Chet > -----Original Message----- > From: Velocet [mailto:math@velocet.ca] > Sent: Wednesday, April 24, 2002 3:59 PM > To: beowulf@beowulf.org > Subject: Re: Liquid cooling? > > > On Wed, Apr 24, 2002 at 01:49:03PM -0400, Robert G. Brown's all... > > On 24 Apr 2002, John Hearns wrote: > > > > > On Wed, 2002-04-24 at 15:10, Worsham, Michael A. wrote: > > > > Has anyone attempting to create a beowulf cluster using > extreme methods of > > > > cooling, such as the liquid cooling? > > > > > > > > Example sites: http://www.koolance.com/, > http://www.senfu.com.tw/, & > > > > http://www.overclockershideout.com/ > > > > > > > > > > Well, I think Robert Brown has FINALLY been beaten here. > > > You're not going to install Freon tanks, complete with plastic > > > fish are you Bob? > > > I just have this bizarre vision of Bob in an aqualung visiting > > > a Freon-flooded machine room... > > > > Oh no, this has all been discussed before on the list before (many > > times, actually -- look back at the archives with google to > find some of > > them) and MY favorite solution is to build a really large > computer room > > in, say, Antarctica and just put fans in the windows. > > > > Liquid solutions (no pun intended:-) tend to be expensive, messy, > > environmentally nasty (if you don't use water), risky (water and > > electricity don't mix well) and, as you note, servicing the > machines in > > a full immersion rack can be, well, "involved". > > Wasnt someone suggesting putting a huge machine room in > alaska for this > reason? Right near 'pacific rim fabric' and right near some huge > power plants in alaska or what not? > > Environmental damage notwithstanding. > > Anyone ever sell the heat generated from the clusters to > someone else? :) > > /kc > > > > > ;-) > > > > rgb > > > > -- > > Robert G. Brown > http://www.phy.duke.edu/~rgb/ > > Duke University Dept. of Physics, Box 90305 > > Durham, N.C. 27708-0305 > > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Ken Chase, math@velocet.ca * Velocet Communications Inc. * > Toronto, CANADA > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > From serguei.patchkovskii at sympatico.ca Wed Apr 24 16:15:13 2002 From: serguei.patchkovskii at sympatico.ca (Serguei Patchkovskii) Date: Wed Nov 25 01:02:18 2009 Subject: Intel releases C++/Fortran suite V 6.0 for Linux References: Message-ID: <004e01c1ebe5$e37f6660$6401a8c0@sympatico.ca> > From: "Bjorn Tore Sund" > I've been wanting to test these out, both in the previous versions > and this, but as long as Intel are only releasing them as RedHat > rpms, they are fundamentally useless on a SuSE system. Or at least > a lot of hassle to install. No they are not - the installation script works without any changes, at least on Suse 7.1 and Suse 7.3. It may whine a little about the kernel and glibc versions, but that's it. I've been running both 5.x and 6.x (beta) versions Intel's compiler under Suse for the last six month, and haven't seen -any- Suse-specific problems. Serguei From purp at wildbrain.com Wed Apr 24 21:49:09 2002 From: purp at wildbrain.com (Jim Meyer) Date: Wed Nov 25 01:02:18 2009 Subject: COTS cooling In-Reply-To: <200204240500.WAA23212@brownlee.cs.uidaho.edu> References: <200204240500.WAA23212@brownlee.cs.uidaho.edu> Message-ID: <1019710150.3596.37.camel@milagro.wildbrain.com> On Tue, 2002-04-23 at 22:00, Robert B Heckendorn wrote: > We don't have to pay for the cooling but the cost of the installation > of cooling is being used as an argument to cut corners on the machine > itself. :-( So I would love to get the cost of the installation of > cooling down. I just faced a similar circumstance; we're building a new facility and our CFO originally nixed raised floors and serious cooling because the general contractor showed him a big pricetag with no context. I didn't end up involved in the project until six months later. I was lucky enough to get three magic formulas and an excellent bit of advice. The formulas: Formulas: KVA @ 3 Phase = ((I*E)*1.73)/1000 BTU/hr = (((I*E)*0.8)/1000)*3413 AC Tonnage = BTU/12000 The advice: Create a spreadsheet. Show these formulas. Show replacement cost of your equipment, both current and future if you plan to expand in that room. Total it all up. Then show the costs of installation against that. Context. For us, it showed that the total buildout of a real computer room would cost less than 10% of the cost of the machines. That and a discussion of raised failure rates due to heat turned the corner on that one. Good luck! --j -- Jim Meyer, Geek At Large purp@wildbrain.com From rgb at phy.duke.edu Wed Apr 24 22:01:52 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: Liquid cooling? In-Reply-To: <20020424175916.D12933@velocet.ca> Message-ID: On Wed, 24 Apr 2002, Velocet wrote: > Anyone ever sell the heat generated from the clusters to someone else? :) Step right up, folks, I gotcher heat right here, yes ma'am, packaged to go. How about you, sir, couldja use some heat? Whatzat? It's hot enough outside already today? Snark snark snark -- every crowd's got one, folks, a JOKER! THIS heat is special, Geuuwiiiine Beowulf Cluster heat, imported from one of the O-riginal beowulf clusters, you can't find heat like this any more, it's practically antique heat. No, now stop that. Quit walking away. And don't shake your head like that, sir, why, one day you'll be LACKING some heat and NEEDING some heat and then you'll be sorry indeed you didn't take advantage of this special offer, never mind that it is midsummer and hotter than H*** out here... well, OK then. I guess I'll just keep my heat. Have to dump it outside again, or worse, pay to have it hauled away. Don't nobody seem to WANT heat anymore, and a few months ago everybody was begging me for heat. Sigh. That didn't go too well. The problem with heat is one or t'other of those darned laws of thermodynamics -- always making more of it when it is waste, can't get enough of it when it is an energy source, can't (generally speaking, although there are specific exceptions) take waste heat and use it to make more organized energy without an ever-cooler reservoir to ultimately dump it in. Otherwise, by the time you run your cluster hot enough for the heat to be "useful" (which requires a signficant temperature differential relative to ambient) you're frying the cluster's innards. On a modest scale, of course, sure. In the winter, I recycle my home cluster's heat. In the summer, I probably pay more than I gained in the winter to remove it and dump it outdoors, but at least I break (more) APPROXIMATELY even. The same cycle probably holds elsewhere -- where it is already cold, you can probably reuse the heat IF you can get it from here to there (heat actually being pretty difficult and expensive to pack up and ship from where you make it to where you MIGHT want it). Where it is already hot, folks just look at you funny when you try to sell them your garbage. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Apr 25 00:19:01 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 Message-ID: Dear List, We've had problems (as have others on this list) getting our 2U rackmount Tyan Tiger 2460 motherboards to boot/install/run reliably and stably. Seth (our systems guy) and I worked on a couple of the boxes today armed with a 32 bit riser, a 64 bit riser, and an ATI rage video card and a 3c905m NIC. We took the PCI cards off of their frames so we could mount them vertically directly in the slots for testing. We also dismounted the risers so we could try them in different slots as well. The following is a summary of our findings. a) Only the video card would work in slot 1. Period. If we put the 3c905 in slot one all by itself (using the BIOS console), the system would behave erratically, actually mistaking the number and speed of processors during boot and crashing under heavy network loads if and when it booted. b) If slot one had video or was empty, the system would work fine for all other vertical configurations. That is, video in 1, net in 6, video in 2, net in 3 or vice versa, video in 5, NIC in 2, etc. I don't know that we tested every combination but we didn't find another that failed in all our tests. Slot 1 alone seems to be the ringer. It is not a 64 vs 32 bit slot question or a power question per se, as far as we can tell. Slots 1-4 are all apparently identical 32 bit, five volt slots, slots 5+ are 32 bit five volt slots, and both the 3c905 and ATI are slotted for 3.3/32 bit slots with the extra notch near the back. There is no reason that we can see for the 3c905 to work in slot 2, 3, 4, 5, 6, 7 but not in slot 1. This is further verified by the fact that we had a 2566 to play with as well, which has two 64/66 3.3 volt slots, and the cards worked perfectly in them in any order. c) Our real torment comes from the riser. Most riser cards are designed so they HAVE to plug into slot 1 so that their physical framework can hold the cards sideways in the remaining room over the PCI bus. Plugged into slot 2, there isn't generally room to fit a full height card (or the support frame) into the remaining space to the side. With the riser in slot 1, no combination of cards in the riser that included the NIC would work, and even the video alone in the slot that should have been a "straight through" connection appeared to have problems, although a system without a NIC is useless to us so the issue is moot. Again, the most common symptom was that the system wouldn't even get the CPU info correct at the bios level before any boot is even initiated, and if the boot/install succeeded at all the system was highly unstable under any kind of load. The problem persisted, identically, when we put the 64 bit riser (which we were really counting on to fix things) into slot 1 and plugged the NIC and video into it, in either order. We had hoped that the problem was just the 32 bit riser not correctly connecting lines needed for the power/clock to automatically set to the needs of the card and that the 64 bit card would "fix" this. As noted above, the problem is all slot 1, though, in any card orientation even without the riser at all. HOWEVER, being clever little beasties, we put the dismounted (32 bit) riser in slot 2 with the extra cabled keys in slots 3 and 4, added the dismounted PCI cards to any slots we felt like and voila! The system, she work perfectly. Right number of CPUs, flawless boot/install, still running under heavy load for ten hours or so now. Since the 3c905 is a highly reliable NIC (and the ATI rage is ditto a reliable video card and for that matter we also saw the problem earlier with other NICs, e.g. tulipsj) that work perfectly in many, many systems, one has to be at least tempted to conclude that this is a reproducible BUG in the 2460 Tiger motherboard, either in the BIOS or (worse) in the physical wiring of slot 1. We are reporting it to Tyan as such to see if they are aware of it (couldn't find it on their website if they are) and if they know of any fix. In the meantime, we are testing a workaround consisting of a riser with a flexible ribbon connecting the primary slot, so that it can be installed offset from where it is plugged into the PCI bus. We hypothesize that if we mount this riser in the framework (so it sits physically above slot 1 and can take full height cards) but plug it into slots 2-4, it will work fine and the systems will stabilize. Of course the RIGHT solution would be to keep our perfectly good cards and risers and get Tyan to replace the 2460's (if there isn't a bios upgrade that fixes the ones we have). Given the frustration and downtime and lost productivity we have suffered, giving us 2466 replacements seems reasonable to me:-). Anyway, this explains to at least some extent why such a wide range of experiences has been reported for these motherboards on the list. People who rackmounted them probably had problems, although I'm willing to believe that there are riser cards out there or particular card combinations that would "fix" the problem, possibly without the owner ever knowing it existed. People who tower mounted them probably did not have problems, especially if they used an AGP video card or put their video and NIC into the regular 32 bit slots (or in any event "accidentally" avoided putting something into slot 1 that wouldn't work there). The discussion above may help anybody out there who is still having problems -- rearrange your cards as described above and all SHOULD be well and/or replace your riser and/or get Tyan to make it right. BTW, so far the 2466 runs fine, as noted by many listvolken. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From john.hearns at cern.ch Thu Apr 25 01:42:37 2002 From: john.hearns at cern.ch (John Hearns) Date: Wed Nov 25 01:02:18 2009 Subject: IBM Lego brick storage Message-ID: <1019724158.7365.24.camel@ues4> Perhaps not relevent to Beowulfery, but there is discussion of cooling :-) IBM makes Lego-brick like storage cube: http://www.eetimes.com/at/news/OEG20020423S0091 From math at velocet.ca Thu Apr 25 09:18:04 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 In-Reply-To: ; from rgb@phy.duke.edu on Thu, Apr 25, 2002 at 03:19:01AM -0400 References: Message-ID: <20020425121804.U12933@velocet.ca> On Thu, Apr 25, 2002 at 03:19:01AM -0400, Robert G. Brown's all... > Dear List, > > We've had problems (as have others on this list) getting our 2U > rackmount Tyan Tiger 2460 motherboards to boot/install/run reliably and > stably. Seth (our systems guy) and I worked on a couple of the boxes > today armed with a 32 bit riser, a 64 bit riser, and an ATI rage video > card and a 3c905m NIC. re Tigers.... We got back a 2466 from RMA that was somehow fried. New replacement board came back. The new bios reports "V4.0 rel 6" and also "Phoenix 4.01". I saw this change from previous versions and decided to try our Tbirds in it that we had tried before under previous BIOS versions (and I cant remember the version #s from before and I cant reboot any nodes to find out :) Well something has changed because it warns that the processors are non MP and so it will operate uniprocessor as SMP is unsupported with non MPs. Cant flash back to a previous bios version either. So Tyan musta struck some deal with AMD on this. :) Im wondering why they bothered, really, since Tbirds are almost out of production anyway. We still have a few test boards running happily with dual Tbird 1.33Ghz on both 2460s and 2466s, I assume on the older bios. No major problems with either type of board, except those wierd Addtron GBE cards which y'all should stay away from. :) /kc > > We took the PCI cards off of their frames so we could mount them > vertically directly in the slots for testing. We also dismounted the > risers so we could try them in different slots as well. The following is > a summary of our findings. > > a) Only the video card would work in slot 1. Period. If we put the > 3c905 in slot one all by itself (using the BIOS console), the system > would behave erratically, actually mistaking the number and speed of > processors during boot and crashing under heavy network loads if and > when it booted. > > b) If slot one had video or was empty, the system would work fine for > all other vertical configurations. That is, video in 1, net in 6, video > in 2, net in 3 or vice versa, video in 5, NIC in 2, etc. I don't know > that we tested every combination but we didn't find another that failed > in all our tests. Slot 1 alone seems to be the ringer. > > It is not a 64 vs 32 bit slot question or a power question per se, as > far as we can tell. Slots 1-4 are all apparently identical 32 bit, five > volt slots, slots 5+ are 32 bit five volt slots, and both the 3c905 and > ATI are slotted for 3.3/32 bit slots with the extra notch near the > back. There is no reason that we can see for the 3c905 to work in slot > 2, 3, 4, 5, 6, 7 but not in slot 1. > > This is further verified by the fact that we had a 2566 to play with as > well, which has two 64/66 3.3 volt slots, and the cards worked perfectly > in them in any order. > > c) Our real torment comes from the riser. Most riser cards are > designed so they HAVE to plug into slot 1 so that their physical > framework can hold the cards sideways in the remaining room over the PCI > bus. Plugged into slot 2, there isn't generally room to fit a full > height card (or the support frame) into the remaining space to the side. > With the riser in slot 1, no combination of cards in the riser that > included the NIC would work, and even the video alone in the slot that > should have been a "straight through" connection appeared to have > problems, although a system without a NIC is useless to us so the issue > is moot. Again, the most common symptom was that the system wouldn't > even get the CPU info correct at the bios level before any boot is even > initiated, and if the boot/install succeeded at all the system was > highly unstable under any kind of load. > > The problem persisted, identically, when we put the 64 bit riser (which > we were really counting on to fix things) into slot 1 and plugged the > NIC and video into it, in either order. We had hoped that the problem > was just the 32 bit riser not correctly connecting lines needed for the > power/clock to automatically set to the needs of the card and that the > 64 bit card would "fix" this. As noted above, the problem is all slot > 1, though, in any card orientation even without the riser at all. > > HOWEVER, being clever little beasties, we put the dismounted (32 bit) > riser in slot 2 with the extra cabled keys in slots 3 and 4, added the > dismounted PCI cards to any slots we felt like and voila! The system, > she work perfectly. Right number of CPUs, flawless boot/install, still > running under heavy load for ten hours or so now. > > Since the 3c905 is a highly reliable NIC (and the ATI rage is ditto a > reliable video card and for that matter we also saw the problem earlier > with other NICs, e.g. tulipsj) that work perfectly in many, many > systems, one has to be at least tempted to conclude that this is a > reproducible BUG in the 2460 Tiger motherboard, either in the BIOS or > (worse) in the physical wiring of slot 1. We are reporting it to Tyan as > such to see if they are aware of it (couldn't find it on their website > if they are) and if they know of any fix. In the meantime, we are > testing a workaround consisting of a riser with a flexible ribbon > connecting the primary slot, so that it can be installed offset from > where it is plugged into the PCI bus. We hypothesize that if we mount > this riser in the framework (so it sits physically above slot 1 and can > take full height cards) but plug it into slots 2-4, it will work fine > and the systems will stabilize. > > Of course the RIGHT solution would be to keep our perfectly good cards > and risers and get Tyan to replace the 2460's (if there isn't a bios > upgrade that fixes the ones we have). Given the frustration and > downtime and lost productivity we have suffered, giving us 2466 > replacements seems reasonable to me:-). > > Anyway, this explains to at least some extent why such a wide range of > experiences has been reported for these motherboards on the list. > People who rackmounted them probably had problems, although I'm willing > to believe that there are riser cards out there or particular card > combinations that would "fix" the problem, possibly without the owner > ever knowing it existed. People who tower mounted them probably did not > have problems, especially if they used an AGP video card or put their > video and NIC into the regular 32 bit slots (or in any event > "accidentally" avoided putting something into slot 1 that wouldn't work > there). The discussion above may help anybody out there who is still > having problems -- rearrange your cards as described above and all > SHOULD be well and/or replace your riser and/or get Tyan to make it > right. > > BTW, so far the 2466 runs fine, as noted by many listvolken. > > rgb > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From gabriel.weinstock at dnamerican.com Thu Apr 25 09:58:22 2002 From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock) Date: Wed Nov 25 01:02:18 2009 Subject: network card problems Message-ID: <17193765606726@DNAMERICAN.COM> One of my co-workers is having a problem with his CNet SinglePoint 10/100 CardBus PC card on his laptop, and I thought I would ask the gurus here since so many of you have done kernel/network driver development. Essentially, what is happening is that the driver is dumping huge amount of messages to the syslog facility, to the point of filling up his root partition. The messages are as follows: ----- kernel: rtl8139_rx_interrupt: eth0: In rtl8139_rx(), current ef74 BufAddr efd8, free to ef64, Cmd 0c. kernel: rtl8139_rx_interrupt: eth0: rtl8139_rx() status 602001, size 0060, cur ef74. kernel: rtl8139_rx_interrupt: eth0: Done rtl8139_rx(), current efd8 BufAddr efd8, free to efc8, Cmd 0d. kernel: rtl8139_interrupt: eth0: interrupt status=0x0000 ackstat=0x0000 new intstat=0x0000. kernel: rtl8139_interrupt: eth0: exiting interrupt, intr_status=0x0000. kernel: rtl8139_interrupt: eth0: interrupt status=0x0000 ackstat=0x0001 new intstat=0x0001. ----- I have no clue as to why this is happening and what to do about it. If anyone has any suggestions, I would greatly appreciate it. Thanks, Gabriel From jlb17 at duke.edu Thu Apr 25 10:47:58 2002 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 In-Reply-To: <20020425121804.U12933@velocet.ca> Message-ID: On Thu, 25 Apr 2002 at 12:18pm, Velocet wrote > We got back a 2466 from RMA that was somehow fried. New replacement board came > back. The new bios reports "V4.0 rel 6" and also "Phoenix 4.01". > > I saw this change from previous versions and decided to try our Tbirds in it > that we had tried before under previous BIOS versions (and I cant remember the > version #s from before and I cant reboot any nodes to find out :) > > Well something has changed because it warns that the processors are non > MP and so it will operate uniprocessor as SMP is unsupported with non MPs. > Cant flash back to a previous bios version either. So Tyan musta struck some > deal with AMD on this. :) Im wondering why they bothered, really, since > Tbirds are almost out of production anyway. Tyan's products page reports that there are two versions of the S2466. The new ones are the S2466N-4M. The main difference listed is that the 4Ms have functional onboard USB (v1.1), whereas the original S2466Ns were shipping with those addon 4port PCI USB cards. That's why there are now two BIOSes for the S2466, and they warn that flashing one type of board with the BIOS for the other is Bad. They must have taken the opportunity in fixing the USB problem to also "fix" the non-SMP chip in SMP config "problem". -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From gary at umsl.edu Thu Apr 25 11:48:24 2002 From: gary at umsl.edu (Gary Stiehr) Date: Wed Nov 25 01:02:18 2009 Subject: preemption in PBSPro with MPICH on Linux Message-ID: <3CC84F78.4010106@umsl.edu> Hi, Does anyone have experience using the preemption feature of PBSPro with MPICH jobs on Linux clusters? I believe in the release notes for PBSPro 5.2, it says that it will only send SIGSTOP to the process that PBS started (i.e, the mpirun process). Therefore, if that process started other processes (as is the case with my MPICH job), the other processes will continue to run. Does anyone know of a way to suspend all processes started from an MPICH job? I need to do this because some MPICH jobs last several weeks and other smaller jobs submitted would have to wait if I do not use preemption. I suppose another method would be to make sure that the long MPICH job checkpoints and then just have PBS kill the job after a certain amount of time. Any experiences and/or suggestions would be appreciated. Thanks, Gary Stiehr Information Technology Services University of Missouri - St. Louis gary@umsl.edu From becker at scyld.com Thu Apr 25 18:52:00 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:18 2009 Subject: network card problems In-Reply-To: <17193765606726@DNAMERICAN.COM> Message-ID: On Thu, 25 Apr 2002, Gabriel J. Weinstock wrote: > One of my co-workers is having a problem with his CNet SinglePoint 10/100 > CardBus PC card on his laptop, and I thought I would ask the gurus here since > so many of you have done kernel/network driver development. This is the wrong list to ask this question. The appropriate place is the realtek@scyld.com mailing list. See http://www.scyld.com/mailman/listinfo/ > Essentially, what is happening is that the driver is dumping huge amount of > messages to the syslog facility, to the point of filling up his root > partition. The messages are as follows: He turned on debugging to the highest level. Solution: Don't do that. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From jcownie at etnus.com Thu Apr 25 01:44:59 2002 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:02:18 2009 Subject: Kidger's comments on Quadric's design and performance In-Reply-To: Message from Joachim Worringen of "Wed, 24 Apr 2002 18:16:01 +0200." <3CC6DA41.367BEB55@lfbs.rwth-aachen.de> Message-ID: <170es7-0HS-00@etnus.com> Joachim Worringen wrote :- > > No, you are confusing two completely different issues. To support > > OpenMP you need a single address space which spans the processors. > > You are right, this is completely different. However, I did not mean > that connecting nodes of a cluster with a cache-coherent interface > "gives you an SMP", but more precisely "gives the shared parts of the > distributed distinct address spaces nearly SMP-like access > characteristics", with respect to a suitable programming model. ... > With Quadrics, this should be possible in an even more efficient manner > due to the hardware-MMU and -TLB on the adapter. (One caveat, I'm assuming here that the Quadrics' model remains the same as it was when we were all back at Meiko). I think you still do not understand Quadrics' model. There is _no_ shared part of the address space. Access to remote address spaces is never achieved directly by an arbitrary load/store from any CPU, it always requires an access to the communication processor. As I said before the model is of _explicit_ remote store access. You have to generate different instructions to perform a remote access. On the other hand, all remote accesses are fully cache coherent both locally and remotely. The issue of cache coherence of the interface is unrelated to the issue of how you cause a remote store access. > To have a real cc-NUMA-SMP, the integration needs to be higher (HP > X-Class, DG/IBM NUMA-Q, ...), this is for sure. The question is: are > large-scale SMPs as sold by IBM, Sun, ... not the better solution for > such tasks? Quadrics is expensive, and you still have to manage a bunch > of PCs instead a nice, single SMP. But, as I said, Quadrics' doesn't pretend to be a cc-NUMA-SMP at all. Their technology is used to build _big_ clusters which may contain the SMPs as nodes, but certainly scale above the range where the SMPs run out. (See the recently announced HP/Quadrics PNL cluster, or the Compaq SCs at LANL and CEA, for instance). -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From SGaudet at turbotekcomputer.com Thu Apr 25 10:09:56 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 (Re) Message-ID: <3450CC8673CFD411A24700105A618BD6267E5E@911TURBO> Hello > According to Robert G. Brown > > From beowulf-admin@beowulf.org Thu Apr 25 11:50:34 2002 > > From: "Robert G. Brown" > > To: Beowulf Mailing List > > Subject: Tyan Tiger 2460 > > > > We've had problems (as have others on this list) getting our 2U > > rackmount Tyan Tiger 2460 motherboards to boot/install/run > reliably and > > stably. > > > > ... to conclude that this is a > > reproducible BUG in the 2460 Tiger motherboard, either in > the BIOS or > > (worse) in the physical wiring of slot 1... > > > BTW, so far the 2466 runs fine, as noted by many listvolken. > > > It's not only problem w/Tyan dual motherboards. The problem > exist also w/correct work of Hardware Monitor chips (for work of > lm_sensors it's necessary to do (at the boot) some trick w/BIOS), > for both 2460 and 2466. Moreover, for Thunder w/Tualatin > chips lm_sensors > can't work. May be Supermicro boards are more stable ... Has anyone reported these issues to Tyan's technical support? Tyan knows that their products sell very well in the Linux cluster market. So not addressing these issues would cause customers to look elsewhere. Regards, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From sp at scali.com Thu Apr 25 12:02:44 2002 From: sp at scali.com (Steffen Persvold) Date: Wed Nov 25 01:02:18 2009 Subject: [NFS] NFS clients behind a masqueraded gateway In-Reply-To: Message-ID: Hi Wulfers, I'm taking the liberty to post my question here as well since it might be relevant for some of you and you might have some experience with this. I've also posted the mail to the NFS mailing list, but I haven't gotten any answers (yet). Any pointers to what the problem might be are higly appreciated. On Thu, 18 Apr 2002, Steffen Persvold wrote: > Hi all, > > I'm experiencing some problems with a cluster setup. The cluster is set up > in a way that you have a frontend machine configured as a masquerading > gateway and all the compute nodes behind it on a private network (i.e the > frontend has two network interfaces). User home directories and also other > data directories which should be available to the cluster (i.e statically > mounted in the same location on both frontend and nodes) are located on > external NFS servers (IRIX and Linux servers). This seems to work fine > when the cluster is in use, but if the cluster is idle for some time (e.g > over night), the NFS directories has become unavailable and trying to > reboot the frontend results in a complete hang when it tries to unmount > the NFS directories (it hangs in a fuser command). The frontend and all > the nodes are running RedHat 7.2, but with a stock 2.4.18 kernel (plus > Trond's seekdir patch, thanks for the help BTW). > > Ideas anyone ? > > Thanks in advance, > Best regards, Steffen From rgb at phy.duke.edu Thu Apr 25 20:21:21 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 (Re) In-Reply-To: <3450CC8673CFD411A24700105A618BD6267E5E@911TURBO> Message-ID: On Thu, 25 Apr 2002, Steve Gaudet wrote: > Has anyone reported these issues to Tyan's technical support? Tyan knows > that their products sell very well in the Linux cluster market. So not > addressing these issues would cause customers to look elsewhere. We filed a ticket yesterday, but AFAIK no response yet. I've been snooping their website (and 3com's, and google, and anything else I can think of) trying to find some hint that this problem has been reported before. It is very strange that such a complete screw up could still exist -- how can anybody be using Tigers in 1U or 2U cases if this problem is universal? I posted here half hoping to hear somebody say "Ah, you need to change XXX in the bios, you idiot". I'd even welcome the idiot part, if only I could get things to work. Still, it is completely reproducible on the Tigers we have, it has hit other Tiger users at Duke, some of whom posted here a month or two ago and finally gave up and replaced their 2460's with 2466's in sheer frustration (or so we suspect -- we only figured out the details I posted yesterday and haven't had time to verify that the problems are indeed the same) and it is a REAL problem. The 3c905 unfortunately is about 1 cm too tall to fit in a 2U case straight up if the metal backplate is taken off, although the ATI Rage actually will fit. I just registered (shudder, but what's a bit more SPAM in my already hi-cal email diet:-) at amdmb.com, which looks like a support forum for amd motherboard and will try a post there. It looks like it gets the attention of both AMD and Tyan engineers and may be faster than Tyan's support page to return something useful. People are (purportedly) SELLING cluster nodes with Tiger 2460's in a 1U or 2U chassis with a riser; it MUST work somehow, for some bios flash or configuration. If not and the Tiger 2460 has a dark secret (it doesn't, uhh, actually work in a 1-2U rackmount configuration, oops) well, I'm preparing a few flaming torches and oiling my pitchfork...unless of course Tyan makes it right. Very quickly -- this has cost us at least a month of potential productivity at this point, and looks to cost us at least a few days more. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From heckendo at cs.uidaho.edu Fri Apr 26 00:04:12 2002 From: heckendo at cs.uidaho.edu (Robert B Heckendorn) Date: Wed Nov 25 01:02:18 2009 Subject: liquid cooling In-Reply-To: <200204251601.g3PG1Bb04333@blueraja.scyld.com> Message-ID: <200204260704.AAA12953@brownlee.cs.uidaho.edu> > Interesting idea -- I actually used the idea as an exercise in class last > semester - as a (obviously extreme) exercise in facilities management. Point > was to get them to think about all the infrastructure we often take for > granted.. > > As far as selling the heat generated by our systems... A co-generation > facility off of the computer room? (Hmm.. Maybe, for some of our bigger > beowulfs..) But if your campus buildings have steam heating, well, there > might be something to it.. The best idea we heard was to use the waste heat from the beowulf to heat the university swimming pool. :-) -- | Robert Heckendorn | We may not be the only | heckendo@cs.uidaho.edu | species on the planet but | http://www.cs.uidaho.edu/~heckendo | we sure do act like it. | CS Dept, University of Idaho | | Moscow, Idaho, USA 83844-1010 | From mack.joseph at epa.gov Fri Apr 26 06:43:34 2002 From: mack.joseph at epa.gov (Joseph Mack) Date: Wed Nov 25 01:02:18 2009 Subject: howto increase MTU size on 100Mbps FE Message-ID: <3CC95986.EB9B5B9F@epa.gov> I know that jumbo frames increase throughput rate on GigE and was wondering if a similar thing is possible with regular FE. according to http://sd.wareonearth.com/~phil/jumbo.html the MTU of 1500 was chosen for 10Mbps ethernet and was kept for 100Mbps and 1Gbps ethernet for backwards compatibility on mixed networks. However MTU=1500 is too small for 100Mbps and 1Gbps ethernet. In Gbps ethernet jumbo frames (ie bigger MTU) is used to increase throughput. With netpipe I found that throughput on FE was approx linear with increasing MTU upto the max=1500bytes. I assume that there is no sharp corner at 1500 and if in principle larger frames could be sent, then throughput should also increase for FE. (Let's assume that the larger packets will never get off the LAN and will never need to be fragmented). I couldn't increase the MTU above 1500 with ifconfig or ip link. I found that the MTU seemed to be defined in linux/include/if_ether.h as ETH_DATA_LEN and ETH_FRAME_LEN and increased these by 1500, recompiled the kernel and net-tools and rebooted. I still can't install a device with MTU>1500 VLAN sends a packet larger than the standard MTU, having an extra 4 bytes of out of band data. The VLAN people have problems with larger MTUs. Here's their mailing list http://www.WANfear.com/pipermail/vlan/ where I found the following e-mails http://www.WANfear.com/pipermail/vlan/2002q2/002385.html http://www.WANfear.com/pipermail/vlan/2002q2/002399.html http://www.WANfear.com/pipermail/vlan/2002q2/002401.html which indicate that the MTU is set in the NIC driver and that in some cases the MTU=1500 is coded into the hardware or is at least hard to change. I don't know whether regular commodity switches (eg Netgear FS series) care about packet size, but I was going to try to send packets over a cross-over cable initially. Am I barking up sensible trees here? Thanks Joe -- Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin contractor to the National Environmental Supercomputer Center, mailto:mack.joseph@epa.gov ph# 919-541-0007, RTP, NC, USA From maurice at harddata.com Fri Apr 26 08:25:51 2002 From: maurice at harddata.com (Maurice Hilarius) Date: Wed Nov 25 01:02:18 2009 Subject: [Fwd: Tyan Tiger 2460] In-Reply-To: <3CC94369.4DBF1BAF@scyld.com> Message-ID: <5.1.0.14.2.20020426090916.072721e0@mail.harddata.com> With regards to your message at 06:09 AM 4/26/02, Karen Keadle-Calvert. Where you stated: >Daniel, > >Thought this might be of interest. Didn't know if it would apply to >your situation or not. > >Karen > >-------- Original Message -------- >Subject: Tyan Tiger 2460 >Date: Thu, 25 Apr 2002 03:19:01 -0400 (EDT) >From: "Robert G. Brown" >To: Beowulf Mailing List >CC: Matthew Durbin > >Dear List, > >We've had problems (as have others on this list) getting our 2U >rackmount Tyan Tiger 2460 motherboards to boot/install/run reliably and >stably. Seth (our systems guy) and I worked on a couple of the boxes >today armed with a 32 bit riser, a 64 bit riser, and an ATI rage video >card and a 3c905m NIC. > >We took the PCI cards off of their frames so we could mount them >vertically directly in the slots for testing. We also dismounted the >risers so we could try them in different slots as well. The following is >a summary of our findings. > > a) Only the video card would work in slot 1. Period. If we put the >3c905 in slot one all by itself (using the BIOS console), the system >would behave erratically, actually mistaking the number and speed of >processors during boot and crashing under heavy network loads if and >when it booted. That is basically correct, with SOME video cards. In general the BIOS and bus setup seem to prefer the first slot be used by video, but it really seems to matter what card it is more than anything else. In general the ATI RageXL cards are not happy, but the RAGE Pro are, and many TNT2 cards work well over all slots. > b) If slot one had video or was empty, the system would work fine for >all other vertical configurations. That is, video in 1, net in 6, video >in 2, net in 3 or vice versa, video in 5, NIC in 2, etc. I don't know >that we tested every combination but we didn't find another that failed >in all our tests. Slot 1 alone seems to be the ringer. If you are using a riser the other slots are mainly irrelevant. In some risers they use extension boards to derive addressing from the next two slots, and in others they use some logic on the riser. It is advisable to use the Tyan M2039 riser as it seems to behave well with this, although, depending on cards used sometimes we see the ability to only support two out of three cards on the riser. >It is not a 64 vs 32 bit slot question or a power question per se, as >far as we can tell. Slots 1-4 are all apparently identical 32 bit, five >volt slots, slots 5+ are 32 bit five volt slots, and both the 3c905 and >ATI are slotted for 3.3/32 bit slots with the extra notch near the >back. There is no reason that we can see for the 3c905 to work in slot >2, 3, 4, 5, 6, 7 but not in slot 1. > >This is further verified by the fact that we had a 2566 to play with as >well, which has two 64/66 3.3 volt slots, and the cards worked perfectly >in them in any order. In the case of the 2466 the only drawback with what you describe is that generally to get 33MHz cards running off a riser in slot1 or 2 usually requires the motherboard to be jumpered to 33MHz on the 64 bit PCI. There ARE however NICs and video cards that will run on a 66MHz bus successfully, but it does require some testing to find the right choices.. > c) Our real torment comes from the riser. Most riser cards are >designed so they HAVE to plug into slot 1 so that their physical >framework can hold the cards sideways in the remaining room over the PCI >bus. Plugged into slot 2, there isn't generally room to fit a full >height card (or the support frame) into the remaining space to the side. >With the riser in slot 1, no combination of cards in the riser that >included the NIC would work, and even the video alone in the slot that >should have been a "straight through" connection appeared to have >problems, although a system without a NIC is useless to us so the issue >is moot. Again, the most common symptom was that the system wouldn't >even get the CPU info correct at the bios level before any boot is even >initiated, and if the boot/install succeeded at all the system was >highly unstable under any kind of load. Again, I think you are mostly seeing a riser card issue. We have used different risers with 3COM, Intel, and DLink NICs successfully, with the riser plugged into slot 1. These have included some 32 bit, and a few 64 bit risers. In general we have the best results, supporting 64 bit, on the Tyan riser. But with 32 bit only cards we are successful with more generic models. Of course the RIGHT solution would be to keep our perfectly good cards >and risers and get Tyan to replace the 2460's (if there isn't a bios >upgrade that fixes the ones we have). Given the frustration and >downtime and lost productivity we have suffered, giving us 2466 >replacements seems reasonable to me:-). While I am sure that this would be a possible solution, I feel that the right solution is to use a different (better) riser card. >Anyway, this explains to at least some extent why such a wide range of >experiences has been reported for these motherboards on the list. Most of the problems I see are caused by: 1) Obsolete BIOS versions 2) Poor RAM 3) problems with cooling 4) In appropriate BIOS setup choices 5) Riser cards with issues >BTW, so far the 2466 runs fine, as noted by many listvolken. 2466 is actually MUCH more difficult to deal with, especially if you want to use a 64 bit/66MHz card, as the bus is very particular about what cards you use. 5 volt cards are definitely going to make problems on most risers, in our testing. Still as you mention, people have had success, but you can not just throw ANY riser or NIC or (especially) video card in and have it work.. With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue mailto:maurice@harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 Ask me about the UP1500 Alpha - Full systems from $3,500! From rgb at phy.duke.edu Fri Apr 26 08:55:56 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: [Fwd: Tyan Tiger 2460] In-Reply-To: <5.1.0.14.2.20020426090916.072721e0@mail.harddata.com> Message-ID: On Fri, 26 Apr 2002, Maurice Hilarius wrote: > > a) Only the video card would work in slot 1. Period. If we put the > >3c905 in slot one all by itself (using the BIOS console), the system > >would behave erratically, actually mistaking the number and speed of > >processors during boot and crashing under heavy network loads if and > >when it booted. > > That is basically correct, with SOME video cards. > In general the BIOS and bus setup seem to prefer the first slot be used by > video, but it really seems to matter what card it is more than anything > else. In general the ATI RageXL cards are not happy, but the RAGE Pro are, > and many TNT2 cards work well over all slots. You misunderstand. The video card works fine in all slots. The system locks with a 3c905C-TX-M in slot 1 even with the system stripped so that 2 processors, a stick of certified registered ECC DDR, and the 3C905C are it (NOTHING else plugged into the mobo). Tyan is now refusing to own the problem, and we're on the phone with 3Com to see if we can get some help at that end. They, at least, are being constructive. I was mistaken that only video works in slot 1. A Netgear in slot 1 still appears to work, but it doesn't support PXE or WOL and we need PXE for the nodes. Or maybe I misunderstand, if you are pointing out that the 2460 does have a history of problems with certain video cards in slot 1 AS WELL AS the 3c905. > > b) If slot one had video or was empty, the system would work fine for > >all other vertical configurations. That is, video in 1, net in 6, video > >in 2, net in 3 or vice versa, video in 5, NIC in 2, etc. I don't know > >that we tested every combination but we didn't find another that failed > >in all our tests. Slot 1 alone seems to be the ringer. > > If you are using a riser the other slots are mainly irrelevant. > In some risers they use extension boards to derive addressing from the next > two slots, and in others they use some logic on the riser. It is advisable > to use the Tyan M2039 riser as it seems to behave well with this, although, > depending on cards used sometimes we see the ability to only support two > out of three cards on the riser. We've just verified that the system works fine with a riser that plugs into the data lines in slot 2, not slot 1 (via a ribbon cable). It really is a slot 1/3C905 issue. Interestingly, the system does work if the FSB is set back to 200 MHz. I've now gone through dozens of threads on the motherboard on amdmb.com, and the 246X motherboards appear to be extremely "finicky", often requiring immense amounts of energy to find a companion configuration that will work. Tyan refused to acknowledge that there were any problems with the motherboard at all when we were on the phone with them, which is odd given the dozens of threads reporting them in the forum. I consider it a "problem" when a motherboard is so bleeding edge sensitive to timing/configuration issues that moving a well-known stable PCI card from a major manufacturer over a slot makes the system break. Obviously Tyan doesn't;-) > >It is not a 64 vs 32 bit slot question or a power question per se, as > >far as we can tell. Slots 1-4 are all apparently identical 32 bit, five > >volt slots, slots 5+ are 32 bit five volt slots, and both the 3c905 and > >ATI are slotted for 3.3/32 bit slots with the extra notch near the > >back. There is no reason that we can see for the 3c905 to work in slot > >2, 3, 4, 5, 6, 7 but not in slot 1. > > > >This is further verified by the fact that we had a 2566 to play with as > >well, which has two 64/66 3.3 volt slots, and the cards worked perfectly > >in them in any order. > > In the case of the 2466 the only drawback with what you describe is that > generally to get 33MHz cards running off a riser in slot1 or 2 usually > requires the motherboard to be jumpered to 33MHz on the 64 bit PCI. There > ARE however NICs and video cards that will run on a 66MHz bus successfully, > but it does require some testing to find the right choices.. > > > c) Our real torment comes from the riser. Most riser cards are > >designed so they HAVE to plug into slot 1 so that their physical > >framework can hold the cards sideways in the remaining room over the PCI > >bus. Plugged into slot 2, there isn't generally room to fit a full > >height card (or the support frame) into the remaining space to the side. > >With the riser in slot 1, no combination of cards in the riser that > >included the NIC would work, and even the video alone in the slot that > >should have been a "straight through" connection appeared to have > >problems, although a system without a NIC is useless to us so the issue > >is moot. Again, the most common symptom was that the system wouldn't > >even get the CPU info correct at the bios level before any boot is even > >initiated, and if the boot/install succeeded at all the system was > >highly unstable under any kind of load. > > Again, I think you are mostly seeing a riser card issue. We have used > different risers with 3COM, Intel, and DLink NICs successfully, with the > riser plugged into slot 1. > These have included some 32 bit, and a few 64 bit risers. In general we > have the best results, supporting 64 bit, on the Tyan riser. But with 32 > bit only cards we are successful with more generic models. It's not a riser issue. The system locks, as noted above, if the 3c905 is the ONLY card in the system and is plugged vertically into slot 1 (no riser in the system at all). The only riser-related issue is that it does seem to be related to the use of the slot 1 data lines and not the power rails, since the other riser slots draw power from other PCI slots with little extension cables, and a 3c905 in any slot-1 mounted riser then causes the lockup. > > Of course the RIGHT solution would be to keep our perfectly good cards > >and risers and get Tyan to replace the 2460's (if there isn't a bios > >upgrade that fixes the ones we have). Given the frustration and > >downtime and lost productivity we have suffered, giving us 2466 > >replacements seems reasonable to me:-). > While I am sure that this would be a possible solution, I feel that the > right solution is to use a different (better) riser card. > > >Anyway, this explains to at least some extent why such a wide range of > >experiences has been reported for these motherboards on the list. > Most of the problems I see are caused by: > 1) Obsolete BIOS versions > 2) Poor RAM > 3) problems with cooling > 4) In appropriate BIOS setup choices > 5) Riser cards with issues > > >BTW, so far the 2466 runs fine, as noted by many listvolken. > > > 2466 is actually MUCH more difficult to deal with, especially if you want > to use a 64 bit/66MHz card, as the bus is very particular about what cards > you use. 5 volt cards are definitely going to make problems on most risers, > in our testing. The good thing about the 2466 is that it has onboard 100BT in addition to the serial console, so that one doesn't necessarily need any cards at all to run as a simple node. If one wants to use it as a gigabit-linked node, then one probably wants a 64/66 card anyway. We've only been playing with one since yesterday, but it does seem a bit better (with what we've tested) than the 2460, but then, our 2460's do not work at all in the configuration we're trying to run. > > Still as you mention, people have had success, but you can not just throw > ANY riser or NIC or (especially) video card in and have it work.. Overall, the Tyans seem a bit on the maddening side. Marginal hardware is Evil. I'm sure we'll eventually get things worked out (we're trying to microconfigure the 3c905 in ITS bios on the phone with 3com now) but it costs a lot in time, energy, and lost productivity. (Well, looks like configuring the 3c905 bios by hand didn't do it). So far, the only solutions we've found appear to be displaced risers or (possibly) different NICs. Someone suggested that EEpro's work in a slot 1 riser, and they do PXE and perform well. Setting the FSB back isn't an option. I do appreciate your help and the remarks/suggestions above. If I sound abrupt, it is due to two nights running up til 3 websurfing on this issue, and a pending meeting on why our cluster nodes still aren't in production this afternoon. rgb > > > > With our best regards, > > Maurice W. Hilarius Telephone: 01-780-456-9771 > Hard Data Ltd. FAX: 01-780-456-9772 > 11060 - 166 Avenue mailto:maurice@harddata.com > Edmonton, AB, Canada http://www.harddata.com/ > T5X 1Y3 > > Ask me about the UP1500 Alpha - Full systems from $3,500! > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From SGaudet at turbotekcomputer.com Fri Apr 26 05:34:36 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 (Re) Message-ID: <3450CC8673CFD411A24700105A618BD6267E6F@911TURBO> Hello, > On Thu, 25 Apr 2002, Steve Gaudet wrote: > > > Has anyone reported these issues to Tyan's technical > support? Tyan knows > > that their products sell very well in the Linux cluster > market. So not > > addressing these issues would cause customers to look elsewhere. > > We filed a ticket yesterday, but AFAIK no response yet. I've been > snooping their website (and 3com's, and google, and anything > else I can > think of) trying to find some hint that this problem has been reported > before. It is very strange that such a complete screw up could still > exist -- how can anybody be using Tigers in 1U or 2U cases if this > problem is universal? I posted here half hoping to hear somebody say > "Ah, you need to change XXX in the bios, you idiot". I'd even welcome > the idiot part, if only I could get things to work. I also submitted a ticket on this and made them aware of the list. I'll follow up with a call next week if I don't hear something soon. We're also going to do some testing here in our lab. Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From rok at ucsd.edu Fri Apr 26 09:36:15 2002 From: rok at ucsd.edu (Robert Konecny) Date: Wed Nov 25 01:02:18 2009 Subject: [NFS] NFS clients behind a masqueraded gateway In-Reply-To: ; from sp@scali.com on Thu, Apr 25, 2002 at 09:02:44PM +0200 References: Message-ID: <20020426093615.A3960@ucsd.edu> Hi Steffen, you're probably hitting NAT timeouts and NFS clients don't handle this well. We have the same setup on our cluster but we are using automountd on our NFS mounted directories - which works nicely around this problem. cheers, robert On Thu, Apr 25, 2002 at 09:02:44PM +0200, Steffen Persvold wrote: > Hi Wulfers, > > I'm taking the liberty to post my question here as well since it might be > relevant for some of you and you might have some experience with this. > > I've also posted the mail to the NFS mailing list, but I haven't gotten > any answers (yet). > > Any pointers to what the problem might be are higly appreciated. > > On Thu, 18 Apr 2002, Steffen Persvold wrote: > > > Hi all, > > > > I'm experiencing some problems with a cluster setup. The cluster is set up > > in a way that you have a frontend machine configured as a masquerading > > gateway and all the compute nodes behind it on a private network (i.e the > > frontend has two network interfaces). User home directories and also other > > data directories which should be available to the cluster (i.e statically > > mounted in the same location on both frontend and nodes) are located on > > external NFS servers (IRIX and Linux servers). This seems to work fine > > when the cluster is in use, but if the cluster is idle for some time (e.g > > over night), the NFS directories has become unavailable and trying to > > reboot the frontend results in a complete hang when it tries to unmount > > the NFS directories (it hangs in a fuser command). The frontend and all > > the nodes are running RedHat 7.2, but with a stock 2.4.18 kernel (plus > > Trond's seekdir patch, thanks for the help BTW). > > > > Ideas anyone ? > > > > Thanks in advance, > > > > > Best regards, > Steffen > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bogdan.costescu at iwr.uni-heidelberg.de Fri Apr 26 14:08:32 2002 From: bogdan.costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:02:18 2009 Subject: howto increase MTU size on 100Mbps FE In-Reply-To: <3CC95986.EB9B5B9F@epa.gov> Message-ID: On Fri, 26 Apr 2002, Joseph Mack wrote: > I know that jumbo frames increase throughput rate on GigE and was > wondering if a similar thing is possible with regular FE. I was just about to ask the same thing... but to be more precise I was looking for first-hand experience with MPI/PVM over oversized Ethernet frames. As you say, the current situation doesn't really encourages experiments with Fast Ethernet, but I was thinking especially to people using Gigabit Ethernet. > I couldn't increase the MTU above 1500 with ifconfig or ip link. There are also limitations in the drivers. The upper network layers will refuse to set a value that is not supported by the driver. > VLAN sends a packet larger than the standard MTU, having an > extra 4 bytes of out of band data. The VLAN people have > problems with larger MTUs. I think that most of the problems come from the fact that most cards do not have support for VLAN - allow the extra 4 bytes _only_ if the VLAN tag is present. Most of the cards allow oversized frames but there is no control over size and/or VLAN tag presence. > which indicate that the MTU is set in the NIC driver and > that in some cases the MTU=1500 is coded into the hardware > or is at least hard to change. There are actually some hardware limitations: cards have FIFO buffers which are designed based on normal Ethernet frame size; while VLAN's 4 extra bytes usually fit, a 4-8 KiByte packet usually doesn't. Some drivers also take active measures to prevent Tx underruns which are probably disturbed by oversized frames. > I don't know whether regular commodity switches (eg Netgear > FS series) care about packet size, but I was going to > try to send packets over a cross-over cable initially. That's another question. Store-and-forward switches probably need to store the whole packet before transmitting it further... A week ago, I released a patch to add large MTU/VLAN support to the 3c59x driver in 2.4.18. So far I haven't receive any feedback about it... It still needs some work, but I first wanted to test functional support. http://www.iwr.uni-heidelberg.de/groups/biocomp/bogdan/tornado/index.html -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From becker at scyld.com Fri Apr 26 15:27:21 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:18 2009 Subject: howto increase MTU size on 100Mbps FE In-Reply-To: <3CC95986.EB9B5B9F@epa.gov> Message-ID: On Fri, 26 Apr 2002, Joseph Mack wrote: > I know that jumbo frames increase throughput rate on GigE and was > wondering if a similar thing is possible with regular FE. I used to track which FE NICs support oversized frames. Jumbo frames turned out to be so problematic that I've stopped maintaining the table. > the MTU of 1500 was chosen for 10Mbps ethernet and was kept > for 100Mbps and 1Gbps ethernet for backwards compatibility Yup, 1500 bytes was chosen for interactive response on original Ethernet. (Note: originally Ethernet was 3Mbps, but commercial equipment started at 10Mbps.) The backwards compatibility issue is severe. The only way to automatically support jumbo frames is using the paged autonegotiation information, and there is no standard established for this. Jumbo frame *will* break equipment that isn't expecting oversized packets. If you detect a receive jabber (which is what a jumbo frame looks like), you are allowed (and _should_) disable your receiver for a period of time. The rationale is that a network with an on-going problem is likely to be generating flawed packets that shouldn't be interpreted as valid. > VLAN sends a packet larger than the standard MTU, having an > extra 4 bytes of out of band data. The VLAN people have > problems with larger MTUs. Here's their mailing list > http://www.WANfear.com/pipermail/vlan/ Most of the vLAN people don't initially understand the capability of the NICs, or why disabling Rx length checks is a Very Bad Idea. There are many modern NIC types that have explicit VLAN support, and VLAN should only be used with those NICs. (Generic clients do not require VLAN support. > which indicate that the MTU is set in the NIC driver and > that in some cases the MTU=1500 is coded into the hardware > or is at least hard to change. Hardware that isn't expecting to handle oversized frames might break in unexpected ways when Rx frame size checking is disabled. Breaking for every packet is fine. Occasionally corrupting packets as a counter rolls over might never be pinned on the NIC. The driver also comes into play. Most drivers are designed to receive packets into a single skbuff, assigned to a single descriptor. With jumbo frames the driver might need to be redesigned with multiple descriptors per packet. This adds complexity and might introduce new race conditions. Another aspect is that dynamic Tx FIFO threshold code is likely to be broken when the threshold size exceeds 2KB. This is a lurking failure -- it will not reveal itself until the PCI is very busy, then Boom... > I don't know whether regular commodity switches (eg Netgear > FS series) care about packet size, but I was going to > try to send packets over a cross-over cable initially. Most switches very much care about packet size. Consider what happens in store-and-forward mode. All of these issues can be fixed or addressed on a case-by-case basis. If you know the hardware you are using, and the symptoms of the potential problems, it's fine to use jumbo frames. But I would never ship a turn-key product or preconfigured software that used jumbo frames by default. It should always require expertise and explicit action for the end user to turn it on. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From atctam at csis.hku.hk Fri Apr 26 21:33:17 2002 From: atctam at csis.hku.hk (Anthony Tam) Date: Wed Nov 25 01:02:18 2009 Subject: Large chassis switch Message-ID: <5.1.0.14.0.20020427121505.03813570@staff.csis.hku.hk> Hi, We are going to construct a PC cluster with more than 200 nodes. Due to the limited budget, we decided to use Fast Ethernet as the interconnect. We are thinking of using either Alpine 3808 from Extreme or FastIron 1500 from Foundry, could anybody comment on their performances? Or, if you have any web links that point to some performance/evaluations of switches of similar type, this would be appreciated. Thanks. Cheers Anthony e Y8 d8 88 d8b Y8 88*8e d8888 88*e 88 88 88*8e Y8b Y888 d888b Y8 88 88b 88 88 88 88 88 88 88b Y8b Y8 d888888888 88 888 88 88 88 88 88 88 888 Y8b d888 b Y8 88 888 888 88 88 88 88 88 888 88 88 88 From chaimj at singnet.com.sg Fri Apr 26 21:29:21 2002 From: chaimj at singnet.com.sg (Chai Mee Joon) Date: Wed Nov 25 01:02:18 2009 Subject: Commercial: Cluster Service (request for comments) Message-ID: <1306.192.168.1.88.1019881761.squirrel@cdr.shogi.com.sg> Hi everyone, We provide a cluster computing service accessible online from anywhere. Our OpenMosix Clusters are available now, and Scyld Beowulf Clusters will be available in June 2002. Please take a look at: http://cluster.homelinux.net Comments please ! Best regards, Chai Mee Joon From fraser5 at cox.net Sat Apr 27 06:12:47 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:18 2009 Subject: LAN question... Message-ID: <000801c1eded$3aa77690$0800005a@papabear> I am sorry this is not a direct beowulf question but I am trying to understand the whole Wake-on-LAN feature and Resume on PME# features on my motherboards that I am using in a cluster. I figured someone here might know more about this them me. (because I know very little on this subject and I am having difficulty finding out from the hardware folks info as well) My question are: 1) What exactly is PME#? My motherboard can "Resume on PME#" and in the bios I have it active and it says that if you can resume if there is traffic on the onboard network adapter or PCI LAN card and that I need an ATX power supply (which I understand and have) is there a special number or code that I must send the NIC to turn on the system? If I try and telnet to the system (while it is off) I get no response. 2) Is "resume on PME#" different then WOL? 3) what does PME# stand for? I am guessing but Power Management...something number? 4) Some of the newer external NIC's clam to support WOL and don't have a WOL cable that goes to the motherboard WOL connector (Linksys NIC's). In the manual it says that the cable is not needed anymore as this can be activated thru the PCI bus if your mother board supports PXE (which it does). Again, is there a magic code to have the NIC send a signal to the motherboard to have it turn itself on? Also, does it have to go in a special PCI slot? Any help you can shed would be appreciated. Thanks, Jim From fraser5 at cox.net Sat Apr 27 09:42:53 2002 From: fraser5 at cox.net (Jim Fraser) Date: Wed Nov 25 01:02:18 2009 Subject: LAN question... eureka! In-Reply-To: <000801c1eded$3aa77690$0800005a@papabear> Message-ID: <000b01c1ee0a$9573c340$0800005a@papabear> I managed to find a description of a "magic packet" on the AMD site and with a little more sniffing around found a perl script someone had wrote that sends magic packet (some stuff then MAC address of the card 16 times) to the card and the machine turned on. For some reason, it would only work if I broadcasted the packet to 255.255.255.255 along with the MAC address but it works. (If I sent it to the exact address it didn't) If anyone stills has anything to add feel free. I still don't fully understand this but I got it working. Jim -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Jim Fraser Sent: Saturday, April 27, 2002 9:13 AM To: beowulf@beowulf.org Subject: LAN question... I am sorry this is not a direct beowulf question but I am trying to understand the whole Wake-on-LAN feature and Resume on PME# features on my motherboards that I am using in a cluster. I figured someone here might know more about this them me. (because I know very little on this subject and I am having difficulty finding out from the hardware folks info as well) My question are: 1) What exactly is PME#? My motherboard can "Resume on PME#" and in the bios I have it active and it says that if you can resume if there is traffic on the onboard network adapter or PCI LAN card and that I need an ATX power supply (which I understand and have) is there a special number or code that I must send the NIC to turn on the system? If I try and telnet to the system (while it is off) I get no response. 2) Is "resume on PME#" different then WOL? 3) what does PME# stand for? I am guessing but Power Management...something number? 4) Some of the newer external NIC's clam to support WOL and don't have a WOL cable that goes to the motherboard WOL connector (Linksys NIC's). In the manual it says that the cable is not needed anymore as this can be activated thru the PCI bus if your mother board supports PXE (which it does). Again, is there a magic code to have the NIC send a signal to the motherboard to have it turn itself on? Also, does it have to go in a special PCI slot? Any help you can shed would be appreciated. Thanks, Jim _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ajiang at mail.eecis.udel.edu Sat Apr 27 18:24:29 2002 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Wed Nov 25 01:02:18 2009 Subject: Help! Scyld Beowuld Installation: In-Reply-To: <000b01c1ee0a$9573c340$0800005a@papabear> Message-ID: Hi, I am a beginner of Beowulf user. I tried to install it usingscyld beowulf release 2.0 Preview. I installed the master node successfully. But when I installed the slave nodes, I met a problem. According to Installation guide, when I boot the slave node from CD-Rom, the slave node will be listed in beosetup by MAC address. It happened! When I dragged it to center column, clicked 'apply'. The node is assigned to node 0. But the slave node is always 'down', even if I reboot the master node. The status slave node still is 'down'. By the way, the slave node always send singal "Sending RARP request..." If you can give me some suggestion, I'll appreciate it! Tom From agrajag at scyld.com Sat Apr 27 19:33:27 2002 From: agrajag at scyld.com (Sean DIlda) Date: Wed Nov 25 01:02:18 2009 Subject: Help! Scyld Beowuld Installation: In-Reply-To: References: Message-ID: <1019961207.1762.4.camel@loiosh> On Sat, 2002-04-27 at 21:24, Ao Jiang wrote: > Hi, > I am a beginner of Beowulf user. I tried to install it usingscyld > beowulf release 2.0 Preview. I installed the master node successfully. But Ack! That version's over a year and a half old.. If you are really interested in our software, I strongly recommend getting a newer version. > when I installed the slave nodes, I met a problem. According to > Installation guide, when I boot the slave node from CD-Rom, the slave > node will be listed in beosetup by MAC address. It happened! When I > dragged it to center column, clicked 'apply'. The node is assigned > to node 0. But the slave node is always 'down', even if I reboot the > master node. The status slave node still is 'down'. > By the way, the slave node always send singal "Sending RARP request..." Based on the message you see on the slave node, I'd say it sounds like either the file isn't getting written during the apply, or the daemons aren't getting sighup'ed like they should. My guess is its the daemons not getting sighuped? Where there any problems during boot? Instead of the green 'OK' message it would have had a red 'FAILED' message. You might want to try forcing the beowulf daemons to restart. As root, do: /sbin/service beowulf restart If you see failed messages when its shutting stuff down, don't worry about it, however if everything isn't 'OK' when they try to start again, then there is a problem there. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: This is a digitally signed message part Url : http://www.scyld.com/pipermail/beowulf/attachments/20020427/14ddec72/attachment.bin From ajiang at mail.eecis.udel.edu Sun Apr 28 14:20:49 2002 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Wed Nov 25 01:02:18 2009 Subject: Help! Scyld Beowuld Installation: In-Reply-To: <1019961207.1762.4.camel@loiosh> Message-ID: Hi, Thanks a lot for your suggestion. When I booted the system, the only 'Fail' item is: Bring up interface eth1, Determining IP address of eth1... I guess the reason is that the procotol of eth1 is set as DHCP and active on boot. I tried /sbin/service beowulf restart... everything is OK! But my problem is still exits. When I shut down the system, the only 'Fail' item is Starting kill all /etc/rc.d/rc6.d/S00killall:etc/init.d/beoserv No such file or directory. BTW, how much of the new version of scyld beowulf? Is it possible to get a low price version? Thanks agian. Tom On 27 Apr 2002, Sean DIlda wrote: > On Sat, 2002-04-27 at 21:24, Ao Jiang wrote: > > Hi, > > I am a beginner of Beowulf user. I tried to install it usingscyld > > beowulf release 2.0 Preview. I installed the master node successfully. But > > Ack! That version's over a year and a half old.. If you are really > interested in our software, I strongly recommend getting a newer > version. > > > when I installed the slave nodes, I met a problem. According to > > Installation guide, when I boot the slave node from CD-Rom, the slave > > node will be listed in beosetup by MAC address. It happened! When I > > dragged it to center column, clicked 'apply'. The node is assigned > > to node 0. But the slave node is always 'down', even if I reboot the > > master node. The status slave node still is 'down'. > > By the way, the slave node always send singal "Sending RARP request..." > > Based on the message you see on the slave node, I'd say it sounds like > either the file isn't getting written during the apply, or the daemons > aren't getting sighup'ed like they should. > > My guess is its the daemons not getting sighuped? Where there any > problems during boot? Instead of the green 'OK' message it would have > had a red 'FAILED' message. > > You might want to try forcing the beowulf daemons to restart. As root, > do: /sbin/service beowulf restart > If you see failed messages when its shutting stuff down, don't worry > about it, however if everything isn't 'OK' when they try to start again, > then there is a problem there. > From lindahl at keyresearch.com Sun Apr 28 12:16:16 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:18 2009 Subject: COTS cooling In-Reply-To: <200204240500.WAA23212@brownlee.cs.uidaho.edu>; from heckendo@cs.uidaho.edu on Tue, Apr 23, 2002 at 10:00:16PM -0700 References: <200204240407.g3O47Jb22438@blueraja.scyld.com> <200204240500.WAA23212@brownlee.cs.uidaho.edu> Message-ID: <20020428121616.C11625@wumpus.attbi.com> On Tue, Apr 23, 2002 at 10:00:16PM -0700, Robert B Heckendorn wrote: > 450W/dualnode * 3.4BTU/hr/W * 400 nodes = 612K BTU/hr One thing that I haven't seen anyone point out is that these nodes won't actually pull 450W in actual operation. However, you also need to spec enough additional cooling to (1) survive the failure of one of the AC units, and (2) account for the fact that the efficiency of the AC units falls over time, by as much as 30%. greg From ole at scali.no Sun Apr 28 06:53:27 2002 From: ole at scali.no (Ole W. Saastad) Date: Wed Nov 25 01:02:18 2009 Subject: Hyperthreading in P4 Message-ID: <3CCBFED7.FCF1DA94@scali.no> New Pentium 4 processors has hyper threading capabilities and when setting this the linux sees 4 cpus on each dual node. I have done some testing with OpenMP programs and found that for OpenMP threaded programs there is no performance gain in using the hypertheading. Using a number of threads that equal the number of real processors seems to be optimal. However, this is the results from just a few OpenMP programs and might not tell a full story. I would like comments from others who has played with threads and hyperthreading in a Pentium 4 processor environment. Hyperthreading is claimed to perform better when running a large number of processes and a high number of threads. But this is most probably different applications or different requests lets say to a web or db. server. -- Ole W. Saastad, Dr.Scient. Scali AS P.O.Box 150 Oppsal 0619 Oslo NORWAY Tel:+47 22 62 89 68(dir) mailto:ole@scali.no http://www.scali.com Are you meeting Petaflop requirements with Gigaflops performance ? - Scali Terarack bringing Teraflops to the masses. From suzen at theochem.tu-muenchen.de Sun Apr 28 12:30:35 2002 From: suzen at theochem.tu-muenchen.de (Mehmet Ali Suzen) Date: Wed Nov 25 01:02:18 2009 Subject: MPICH p4_error No space left on device. Message-ID: <3CCC4DDB.8030901@theochem.tu-muenchen.de> Hello, I'm experiencing a problem with MPICH 1.2.2.1 in SMP linux cluster, just after sending my program with mpirun I'm receiving this error. p4_error : Last Message No space left on device. What may cause this p4_error? Or Where could I find the detailed description of p4_error? I've checked user manual of MPICH, but I didn't gain much help. Thanks for any comment. Cheers, Mehmet From lindahl at keyresearch.com Sun Apr 28 16:00:41 2002 From: lindahl at keyresearch.com (Greg Lindahl) Date: Wed Nov 25 01:02:18 2009 Subject: COTS cooling In-Reply-To: <1019710150.3596.37.camel@milagro.wildbrain.com>; from purp@wildbrain.com on Wed, Apr 24, 2002 at 09:49:09PM -0700 References: <200204240500.WAA23212@brownlee.cs.uidaho.edu> <1019710150.3596.37.camel@milagro.wildbrain.com> Message-ID: <20020428160041.A12006@wumpus.attbi.com> On Wed, Apr 24, 2002 at 09:49:09PM -0700, Jim Meyer wrote: > For us, it showed that the total buildout of a real computer room would > cost less than 10% of the cost of the machines. That and a discussion of > raised failure rates due to heat turned the corner on that one. That seems to be a rule of thumb, that's the number I've seen for some very large machine rooms. Of course it's often more expensive to change an existing room than add stuff to one you're building. greg p.s. your company website isn't very linux or mozilla friendly, ah well. From dan at systemsfirm.net Sun Apr 28 23:06:27 2002 From: dan at systemsfirm.net (Daniel R. Philpott) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 Message-ID: > From: Robert G. Brown [mailto:rgb@phy.duke.edu] > > a couple of the boxes today armed with a 32 bit riser, a 64 > bit riser, and an ATI rage video card and a 3c905m NIC. > > a) Only the video card would work in slot 1. Period. If > we put the 3c905 in slot one all by itself (using the BIOS > console), the system would behave erratically, actually > > b) If slot one had video or was empty, the system would > work fine for all other vertical configurations. That is, > > The problem persisted, identically, when we put the 64 bit > riser (which we were really counting on to fix things) into > slot 1 and plugged the NIC and video into it, in either > order. > > HOWEVER, being clever little beasties, we put the dismounted > (32 bit) riser in slot 2 with the extra cabled keys in slots > 3 and 4, added the dismounted PCI cards to any slots we felt > like and voila! The system, she work perfectly. Just a quick observation, looking at the different configurations you tried makes me think that the working configuration may have modified (slightly) the latency of the PCI bus. Have you tried modifying the PCI latency in BIOS to work without the extra hardware? I don't use the Tiger 2460 boards (I was a sucker for the S2462) so I can't test this hypothesis but maybe someone else can test it out. Dan From rgb at phy.duke.edu Mon Apr 29 08:44:19 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:18 2009 Subject: Tyan Tiger 2460 In-Reply-To: Message-ID: On Mon, 29 Apr 2002, Daniel R. Philpott wrote: > Just a quick observation, looking at the different configurations you > tried makes me think that the working configuration may have modified > (slightly) the latency of the PCI bus. Have you tried modifying the PCI > latency in BIOS to work without the extra hardware? > > I don't use the Tiger 2460 boards (I was a sucker for the S2462) so I > can't test this hypothesis but maybe someone else can test it out. We did play with some of the slot settings in the bios to no avail. We're in mid-test of the slot 2 riser solution (which has worked fine over the whole weekend at a load average between 3 and 6 and with the disk and network being heavily banged on all the while, so it looks like it works). As soon as it finishes (or I feel like grabbing another of the idle chassis to test with and open it up) I'll give this a more focused try. It does have the feel of a latency issue, but why slot 1. Why slot 1 when the system is (otherwise) entirely empty? I'm not asking you, understand, just looking to heaven for Enlightenment... rgb > > Dan > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jlb17 at duke.edu Mon Apr 29 09:05:47 2002 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:02:18 2009 Subject: Processor contention(?) and network bandwidth on AMD Message-ID: This is probably in the category of "Yup, that's the way it is, deal with it", but, just in case anyone has any ideas, I'm throwing it out there. In the course of testing the gigabit connection in a new server, I noticed that overloaded dual AMD systems take a big hit in network bandwidth. I'm testing with ttcp, and all connections were made over the same switch (HP Procurve 2324). As an example, the results for a Tiger MPX (S2466) based node with dual 1900+s and using the integrated 3Com are: unloaded: 11486.6 KB/real sec 2 matlab simulations: 10637.8 KB/real sec 2 matlab simulations and 2 SETI@homes (nice -19): 6645.4 KB/real sec Ouch. This is on RedHat 7.2 with kernel 2.4.9-31. I eliminated every variable I could think of -- I tried this on an S2462 (Thunder MP) based system, I used a PCI Intel eepro100 card rather than the built-in 3Com, I upgraded to an almost vanilla (French Vanilla?) 2.4.18 kernel (the one from SGI's 1.1 XFS release). All showed the same results (well, 2.4.18 didn't show much of a drop with just the two matlabs, but still crashed with matlab+SETI). The one Intel system I tested (dual PIII 933 on an i860) showed very little bandwidth drop with load, and no extra drop for an overload. Any ideas? Is there any way to fix this? Or is the answer just to not run background nice jobs on cluster nodes? Thanks. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From aby_sinha at yahoo.com Mon Apr 29 09:56:06 2002 From: aby_sinha at yahoo.com (abhishek Sinha) Date: Wed Nov 25 01:02:18 2009 Subject: Hyperthreading in P4 In-Reply-To: <3CCBFED7.FCF1DA94@scali.no> Message-ID: <20020429165606.5649.qmail@web20710.mail.yahoo.com> Hi I am also using a dual Xeon 2.2 Ghz box and it seems that the box is slower than my normal pentium 3 also. The reason i guess is the kernel. If i watch my /proc/interrupts i see all of them on the single CPU . Upon research on the net i found that it required some kind of IRQ routing patch , (ingo's i guess) so that the CPU's perform better. I havent had much exposure to writing Open MP programs and then testing the power of the Xeon. comments?? abhishek --- "Ole W. Saastad" wrote: > > New Pentium 4 processors has hyper threading > capabilities > and when setting this the linux sees 4 cpus on each > dual node. > > I have done some testing with OpenMP programs and > found that > for OpenMP threaded programs there is no performance > gain in using > the hypertheading. Using a number of threads that > equal the number > of real processors seems to be optimal. > However, this is the results from just a few OpenMP > programs and > might not tell a full story. > I would like comments from others who has played > with threads > and hyperthreading in a Pentium 4 processor > environment. > > > Hyperthreading is claimed to perform better when > running a large > number of processes and a high number of threads. > But this is most > probably different applications or different > requests lets say to a > web or db. server. > > > -- > Ole W. Saastad, Dr.Scient. Scali AS P.O.Box 150 > Oppsal 0619 Oslo NORWAY > Tel:+47 22 62 89 68(dir) mailto:ole@scali.no > http://www.scali.com > Are you meeting Petaflop requirements with Gigaflops > performance ? > - Scali Terarack bringing Teraflops to the > masses. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ===== And I dont want the world to see me Coz i know that they won't understand When Everything else is meant to be broken I just want u to know who i m .... __________________________________________________ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com From josip at icase.edu Mon Apr 29 10:54:34 2002 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:02:18 2009 Subject: howto increase MTU size on 100Mbps FE References: Message-ID: <3CCD88DA.67FDCB8D@icase.edu> Donald Becker wrote: > > On Fri, 26 Apr 2002, Joseph Mack wrote: > > > I know that jumbo frames increase throughput rate on GigE and was > > wondering if a similar thing is possible with regular FE. > > I used to track which FE NICs support oversized frames. Jumbo frames > turned out to be so problematic that I've stopped maintaining the table. > [...] > The backwards compatibility issue is severe. Jumbo frames are great to reduce host frame procesing overhead, but, unfortunately, we arrived at the same conclusion: jumbo frames and normal equipment do not mix well. If you have a separate network where all participants use jumbo frames, fine; otherwise, things get messy. Alteon (a key proponent of jumbo frames) has some suggestions: define a normal frame VLAN including everybody and a (smaller) jumbo frame VLAN; then use their ACEswitch 180 to automatically fragment UDP datagrams when routing from a jumbo frame VLAN to a non-jumbo frame VLAN (TCP is supposed to negotiate MTU for each connection, so it should not need this help). This sounds simple, but it requires support for 802.1Q VLAN tagging in Linux kernel if a machine is to participate in both jumbo frame and in non-jumbo frame VLAN. Moreover, in practice this mix is fragile for many reasons, as Donald Becker has explained... One of the problems I've seen involves UDP packets generated by NFS. When a large UDP packet (jumbo frame MTU=9000) is fragmented into 6 standard (MTU=1500) UDP packets, the receiver is likely to drop some of these 6 fragments because they are arriving too closely spaced in time. If even one fragment is dropped, the NFS has to resend that jumbo UDP packet, and the process can repeat. This results in a drastic NFS performance drop (almost 100:1 in our experience). To restore performance, you need significant interrupt mitigation on the receiver's NIC (e.g. receive all 6 packets before interrupting), but this can hurt MPI application performance. NFS-over-TCP may be another good solution (untested!). We got good gigabit ethernet bandwidth using jumbo frames (about 2-3 times better than normal frames using NICs with Alteon chipsets and the acenic driver), but in the end full compatibility with existing non-jumbo equipment won the argument: we went back to normal frames. The frame processing overhead does not seem as bad now that CPUs are so much faster (2GHz+), even with our gigabit ethernet, and particularly not with fast ethernet. However, if we had a separate jumbo-frame-only gigabit ethernet network, we'd stick to jumbo frames. Jumbo frames are simply a better solution for bulk data transfer, even with fast CPUs. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From Daniel.Kidger at quadrics.com Mon Apr 29 10:54:35 2002 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:02:18 2009 Subject: Hyperthreading in P4 Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA74D2D63@stegosaurus.bristol.quadrics.com> Ole W. Saastad [mailto:ole@scali.no] wrote: >New Pentium 4 processors has hyper threading capabilities >and when setting this the linux sees 4 cpus on each dual node. >I have done some testing with OpenMP programs and found that >for OpenMP threaded programs there is no performance gain in using >the hypertheading. Using a number of threads that equal the number >of real processors seems to be optimal. Having a multi-threaded processor should help codes which are limited by memory *latency*. I doubt if memory-bandwidth limited codes would benefit much, since memory bandwidth is a limited resource which is already oversubscribed on many dual-P4 nodes. Also there are no more floating-point units that a standard P4, so CPU limited codes wont see any improvement either. Perhaps the interesting area though is where the CPU can issue instructions to the FPU *AND* to the integer execution units concurrently but for different threads. This would perhaps allow general Linux system services to not impact the performance on application codes? Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From hahn at physics.mcmaster.ca Mon Apr 29 12:40:30 2002 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:02:18 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: Message-ID: > This is probably in the category of "Yup, that's the way it is, deal with > it", but, just in case anyone has any ideas, I'm throwing it out there. well, there are several contributing factors, which are probably mitigated by running a decently modern (ie 2.4.18) kernel. for instance, at gigabit speeds, you're almost certainly generating nontrivial MM load. there's been a huge amount of improvement in 2.4's in how they handle ram. 2.4 (vs 2.2) has some fairly profound changes to the structure of the network stack, including efforts to make zero-copy possible (which effects alignment of packets, I think, even if you're not sendfile'ing.) I've also heard mutterings from Alan Cox-ish people that the scheme for waking up user-space is a bit too timid (resulting in the stack punting, and relying on ksoftirqd to eventually do the deed.) and of course, there's a big update to the scheduler impending or already merged ("Ingo's O(1) scheduler"). it seems to be a lot smarter about issues like migrating procs, affinity, waking up the right tasks, etc. > unloaded: 11486.6 KB/real sec > 2 matlab simulations: 10637.8 KB/real sec > 2 matlab simulations and 2 SETI@homes (nice -19): 6645.4 KB/real sec SETI@home is obviously in the "so don't do that" category. I expect your matlab was decelerated by a similar amount. though I think that another of Ingo's goals with the new scheduler was to give more intuitive nice -19 behavior. that is, most people think of nice -19 as a way to spend otherwise idle CPU on something. Linux (and at least some other Unixes) have *NEVER* done this - figure around 5% CPU, and that's ignoring cache/etc effects. anyway, I think Ingo tries to keep -19 pretty close to idle-only. (there are fundamental issues with really implementing an idle-only form of scheduling, since you wind up with "priority inversion", where the idle-only task holds a lock when high-pri jobs want to do something...) > Ouch. This is on RedHat 7.2 with kernel 2.4.9-31. ugh. NOTE TO ALL BEOWULF USERS: seriously consider running 2.4.18 or better, and *definitely* try out gcc 3.1 ASAP. these updates have changed my life > 2.4.18 kernel (the one from SGI's 1.1 XFS release). All showed the same > results (well, 2.4.18 didn't show much of a drop with just the two > matlabs, but still crashed with matlab+SETI). The one Intel system I hmm, come to think of it, I think Ingo's scheduler isn't merged in 2.4.19-pre yet. > tested (dual PIII 933 on an i860) showed very little bandwidth drop with i860 is a P4 chipset afaik... > load, and no extra drop for an overload. then again, it would not be astonishing if PIII and P4 showed dramatically different effects of cache pollution. remember that the P4 depends rather strongly on seeing a decent hit rate in its many caches (trace cache, normal I+D caches, TLB, prediction tables, etc) From jlb17 at duke.edu Mon Apr 29 12:49:32 2002 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:02:18 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: Message-ID: On Mon, 29 Apr 2002 at 3:40pm, Mark Hahn wrote > well, there are several contributing factors, which are probably > mitigated by running a decently modern (ie 2.4.18) kernel. > > for instance, at gigabit speeds, you're almost certainly generating > nontrivial MM load. there's been a huge amount of improvement in 2.4's > in how they handle ram. These tests (obviously) were only at FE speed, though. The receiving end was gigabit, but the sending end (the dual AMD nodes) was FE. > > unloaded: 11486.6 KB/real sec > > 2 matlab simulations: 10637.8 KB/real sec > > 2 matlab simulations and 2 SETI@homes (nice -19): 6645.4 KB/real sec > > SETI@home is obviously in the "so don't do that" category. I expect your > matlab was decelerated by a similar amount. Sure, but it was just an example of a niced background load, which "shouldn't" interfere with anything. It certainly shouldn't crash bandwidth like that. > > tested (dual PIII 933 on an i860) showed very little bandwidth drop with > > i860 is a P4 chipset afaik... Oops -- not enough coffee. I meant i840. Dual PIII with RDRAM. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From siegert at sfu.ca Mon Apr 29 13:13:30 2002 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:02:18 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: ; from hahn@physics.mcmaster.ca on Mon, Apr 29, 2002 at 03:40:30PM -0400 References: Message-ID: <20020429131330.A10649@stikine.ucs.sfu.ca> On Mon, Apr 29, 2002 at 03:40:30PM -0400, Mark Hahn wrote: > NOTE TO ALL BEOWULF USERS: seriously consider running 2.4.18 > or better, and *definitely* try out gcc 3.1 ASAP. > > these updates have changed my life I can only confirm that 2.4.18 improves "life" dramatically. Before that certain MPI jobs (using LAM) would just hang. Now with respect to gcc-3.x: It has been pointed out that gcc-3.0 is extremely bad for Athlons and (only) bad for PIIIs: http://math-atlas.sourceforge.net/errata.html#gcc3.0 My recent exercise in compiling atlas only confirms this. Have these issues been resolved in gcc-3.1? Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From SGaudet at turbotekcomputer.com Mon Apr 29 11:33:18 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:19 2009 Subject: Hyperthreading in P4 Message-ID: <3450CC8673CFD411A24700105A618BD6267E92@911TURBO> > I am also using a dual Xeon 2.2 Ghz box and it seems > that the box is slower than my normal pentium 3 also. > The reason i guess is the kernel. If i watch my > /proc/interrupts i see all of them on the single CPU . > Upon research on the net i found that it required some > kind of IRQ routing patch , (ingo's i guess) so that > the CPU's perform better. > > I havent had much exposure to writing Open MP programs > and then testing the power of the Xeon. > > comments?? Key issues are: 1. Code must be threaded. 2. BIOS and O.S. must be enabled. - RH has a patch available on their site. As for the performance vs PIII, I strongly recommend that application developers use our C and Fortran compilers for Linux. GCC is not well-optimized for Netburst architecture, PGI is o.k., and our compilers really fly! In addition, our compilers generate the best code for PIII, AMD, and P4P/Xeon based systems so you really can't lose. Pls see the following Hyper-Threading whitepaper (preliminary) for all the gory details about req'ts.> Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: HyperthreadingOSenabling.doc Type: application/msword Size: 1554944 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20020429/7efbb061/HyperthreadingOSenabling.doc From math at velocet.ca Mon Apr 29 13:56:52 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:19 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: ; from hahn@physics.mcmaster.ca on Mon, Apr 29, 2002 at 03:40:30PM -0400 References: Message-ID: <20020429165652.U8530@velocet.ca> On Mon, Apr 29, 2002 at 03:40:30PM -0400, Mark Hahn's all... > and of course, there's a big update to the scheduler impending or > already merged ("Ingo's O(1) scheduler"). it seems to be a lot smarter > about issues like migrating procs, affinity, waking up the right tasks, etc. > > > unloaded: 11486.6 KB/real sec > > 2 matlab simulations: 10637.8 KB/real sec > > 2 matlab simulations and 2 SETI@homes (nice -19): 6645.4 KB/real sec > > SETI@home is obviously in the "so don't do that" category. I expect your > matlab was decelerated by a similar amount. > > though I think that another of Ingo's goals with the new scheduler > was to give more intuitive nice -19 behavior. that is, most people > think of nice -19 as a way to spend otherwise idle CPU on something. > Linux (and at least some other Unixes) have *NEVER* done this - > figure around 5% CPU, and that's ignoring cache/etc effects. > anyway, I think Ingo tries to keep -19 pretty close to idle-only. > (there are fundamental issues with really implementing an idle-only > form of scheduling, since you wind up with "priority inversion", > where the idle-only task holds a lock when high-pri jobs want to do > something...) How does freebsd do this then? They've had idle (and realtime) priority in the kernels for a couple years. And there are no problems with priority inversion (which was Mike Shaver's answer to me for linux's lack of idle time priority 2 years ago when I asked him if Linux was going to be incorporating it) - rather, they have lock breaking code working nicely in freebsd. Freebsd's idle priority gives 30 levels of idle priority to play with. Anything at a lower level of idle priority gets NO time on the cpu at all until there is some available. This is quite nice when things like Gaussian98 is running and you want to put a higher priority g98 job on without having a nice level 19 in linux G98 fighting and thrashing the cache vs the nice level 0 g98. I notice a total speedup of 1-2% at least in the difference between running the two jobs sequentially vs putting 1 at 19 and 1 at 0. I find this system VERY useful for scheduling jobs on various machines shared by different groups. It really guarantees that the CPU will be used primarily for one type of job and not another and avoids cache thrashing quite nicely - until something goes to disk, and then the idle job wakes up, etc... is it worth using the extra cpu and possibly thrashing the cache, or is it more efficient to wait for a bigger chunk of free CPU? /kc > > > Ouch. This is on RedHat 7.2 with kernel 2.4.9-31. > > ugh. > > NOTE TO ALL BEOWULF USERS: seriously consider running 2.4.18 > or better, and *definitely* try out gcc 3.1 ASAP. > > these updates have changed my life > > > > 2.4.18 kernel (the one from SGI's 1.1 XFS release). All showed the same > > results (well, 2.4.18 didn't show much of a drop with just the two > > matlabs, but still crashed with matlab+SETI). The one Intel system I > > hmm, come to think of it, I think Ingo's scheduler isn't merged > in 2.4.19-pre yet. > > > tested (dual PIII 933 on an i860) showed very little bandwidth drop with > > i860 is a P4 chipset afaik... > > > load, and no extra drop for an overload. > > then again, it would not be astonishing if PIII and P4 showed > dramatically different effects of cache pollution. remember that > the P4 depends rather strongly on seeing a decent hit rate in > its many caches (trace cache, normal I+D caches, TLB, prediction tables, etc) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From rgb at phy.duke.edu Mon Apr 29 14:07:24 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:19 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: Message-ID: On Mon, 29 Apr 2002, Joshua Baker-LePain wrote: > > > unloaded: 11486.6 KB/real sec > > > 2 matlab simulations: 10637.8 KB/real sec > > > 2 matlab simulations and 2 SETI@homes (nice -19): 6645.4 KB/real sec > > > > SETI@home is obviously in the "so don't do that" category. I expect your > > matlab was decelerated by a similar amount. > > Sure, but it was just an example of a niced background load, which > "shouldn't" interfere with anything. It certainly shouldn't crash > bandwidth like that. Joshua, Actually, running a heavy background load can (as you have observed) significantly affect network times, especially if it is the receiver that is loaded. As to whether or not it "should", I cannot say (kind of a value judgement there:-), but one can try to understand it. There are deliberate tradeoffs made in the tuning of the kernel and for better or worse the linux tradeoffs optimize "user response time" at the expense of a variety of things that might improve throughput on a purely computational load or throughput on the network or pretty much anything else. Sometimes one can retune -- Josip Loncaric's TCP patch is one such retuning, but one can also envision changing timeslice granularity and other things to optimize one thing at the expense of others. Generally such a retuning is a Bad Idea. Right now the kernel is pretty damn good, overall, and all components are delicately balanced. As Mark's previous reply made clear, some naive retunings would just lock up the system (or really make performance go to hell) as important components starve. It isn't too hard to see why loading the receiver might decrease the efficiency of the network. Imagine the network component of the kernel from the point of view of the stream receiver (not the transmitter). It never knows when the next packet/message will come through. The kernel does its best to do OTHER work in the gaps between packets by installing top half and bottom half handlers and the like (so it does no more work then absolutely necessary when the asynchronous interrupt is first received, postponing what it can until later) to provide the illusion of seamless access to the CPU and other resources for running processes. One side effect of this is that there are times when the delivery of packets is delayed so that a background application can complete a timeslice it was given "in between" packets when the system was momentarily idle. What this ends up meaning is that when the system is BUSY, it de facto delays the delivery of packets that it has buffered for fractions of the many timeslices of CPU the system is allocating to the competing tasks when the network process is momentarily idle (blocked, waiting for the next packet). If it didn't do this a high speed packet stream could (for example) starve running processes for CPU by forcing them to wait for the whole stream to complete. Processing the text of TCP packets (not to mention the interrupts and context switches themselves) is a nontrivial load on the CPU in its own right, so much so that people try NOT to run high-performance network connections for fine-grained code over TCP if they can avoid it. The network stack ends up contending for CPU with everything else that is running, and it makes no sense to retune things so that this is never true as the cure will likely be worse than the disease for most usage patterns. Curiously, transmitting works more efficiently than receiving, probably because the transmitter is in charge of the scheduling. In very crude terms the transmitter is never interrupted or delayed by other processes -- it just gets its timeslice, executes a send or stream of sends, eventually blocks (moving up in priority while blocked) or finishes its timeslice, and then moves on. No delays to speak of. Try this: Do your netpipe transmitter on an unloaded host, a host at load 1 and at load 2. Do your netpipe receiver on an unloaded host, a host at load 1 and one at load 2. Fill in the matrix -- load 0 to load 0, load 0 to load 1, etc. I found (in similar tests done years ago) that a TRANSMITTER could be loaded to 2 (per cpu) with only a small degradation of throughput, but loading a RECEIVER would drop throughput dramatically, by as much as 50%. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jlb17 at duke.edu Mon Apr 29 14:22:41 2002 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:02:19 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: Message-ID: On Mon, 29 Apr 2002 at 5:07pm, Robert G. Brown wrote > Try this: > > Do your netpipe transmitter on an unloaded host, a host at load 1 and > at load 2. > Do your netpipe receiver on an unloaded host, a host at load 1 and one > at load 2. > > Fill in the matrix -- load 0 to load 0, load 0 to load 1, etc. > > I found (in similar tests done years ago) that a TRANSMITTER could be > loaded to 2 (per cpu) with only a small degradation of throughput, but > loading a RECEIVER would drop throughput dramatically, by as much as > 50%. I will indeed do these tests and post a followup. One thing, though, that I don't know that I made clear. I understand and accept that a higher load is going to affect bandwidth. What I'm pointing out (and wondering why) is that the effect is *far* greater on dual AMD based systems than it is on, e.g., the dual PIII systems I have. As an aside, At Mark's suggestion, I reniced the ksoftirqds on a S2466 based system, and saw vast improvement. For no load, 2 matlabs, and 2 matlabs+2SETIs I saw (with the cursed 2.4.9-31 RH kernel): ksoftirqds reniced to 0: 11463.5 KB/real sec 10637 KB/real sec 9585.39 KB/real sec And reniced to -19: 11481.8 KB/real sec 10632.7 KB/real sec 9347.31 KB/real sec FWIW. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From rocky at atipa.com Mon Apr 29 16:09:08 2002 From: rocky at atipa.com (Rocky McGaugh) Date: Wed Nov 25 01:02:19 2009 Subject: Hyperthreading in P4 In-Reply-To: <3450CC8673CFD411A24700105A618BD6267E92@911TURBO> Message-ID: On Mon, 29 Apr 2002, Steve Gaudet wrote: > > Key issues are: > 1. Code must be threaded. > 2. BIOS and O.S. must be enabled. > - RH has a patch available on their site. > > As for the performance vs PIII, I strongly recommend that application > developers use our C and Fortran compilers for Linux. GCC is not > well-optimized for Netburst architecture, PGI is o.k., and our compilers > really fly! In addition, our compilers generate the best code for PIII, > AMD, and P4P/Xeon based systems so you really can't lose. > > Pls see the following Hyper-Threading whitepaper (preliminary) for all the > gory details about req'ts.> > > Fine and dandy. The problem is with no way to bind processes to processors, it's quite easy for 2 heavy processes (or threads) to migrate to the same physical CPU, leaving 2 smaller threads (or processes) on the other physical CPU. For CPU bound apps, its too unpredictable without the processor affinity stuff. -- Rocky McGaugh Atipa Technologies rocky@atipatechnologies.com rmcgaugh@atipa.com 1-785-841-9513 x3110 http://1087800222/ perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");' From fruechtl at fecit.co.uk Tue Apr 30 02:20:36 2002 From: fruechtl at fecit.co.uk (Herbert Fruchtl) Date: Wed Nov 25 01:02:19 2009 Subject: Attachmants (was: Hyperthreading in P4) References: <200204292059.g3TKxcD06021@blueraja.scyld.com> Message-ID: <3CCE61E4.20F3D7B6@fecit.co.uk> Would you PLEEEEEASE not send huge binary attachments to the list! Put them on the web, send them on demand, whatever. Hundreds of elm users hate you and will never buy from your company again :-) I apologise if others already complained. I only get the daily digest (where attachments are definitely unreadable). Herbert From ole at scali.com Tue Apr 30 05:00:03 2002 From: ole at scali.com (Ole W. Saastad) Date: Wed Nov 25 01:02:19 2009 Subject: OpenMP and P4 hyperthreading Message-ID: <3CCE8743.6ED1D681@scali.no> Will hyper-threading make sense when using OpenMP and more threads than physical processors? I have run the NPB2.3 benchmark in C/OpenMP version on a dual Pentium Xeon system and found some interesting results. For most of the benchmarks there is no gain in using hyperthreading, as expected, but the for the ep benchmark there is a significant speed up. This benchmark contain a loop with a trancendentals like ln, exp and pow (pow is a combination of ln and exp). The ep benchmark is supposed to scale almost perfect as it is embarrassingly parallel (hence the name ep), but it was somewhat unexpected that the speedup using four threads were so significantly. For all the others there is a slowdown from 0 to 11%, but for the ep there is a speedup of 34%. The results can be viewed at : http://computational-battery.org/ I have received a lot of comments about the hyperthreading due to my former posting, but little actual benchmark results. It would be interesting to see if there are other programs or problems that can benefit from the hyperthreading. -- Ole W. Saastad, Dr.Scient. Scali AS P.O.Box 150 Oppsal 0619 Oslo NORWAY Tel:+47 22 62 89 68(dir) mailto:ole@scali.no http://www.scali.com Are you meeting Petaflop requirements with Gigaflops performance ? - Scali Terarack bringing Teraflops to the masses. From raju at linux-delhi.org Tue Apr 30 06:38:32 2002 From: raju at linux-delhi.org (Raju Mathur) Date: Wed Nov 25 01:02:19 2009 Subject: Attachmants (was: Hyperthreading in P4) In-Reply-To: <3CCE61E4.20F3D7B6@fecit.co.uk> References: <200204292059.g3TKxcD06021@blueraja.scyld.com> <3CCE61E4.20F3D7B6@fecit.co.uk> Message-ID: <15566.40536.998049.36069@mail.linux-delhi.org> This is a Mailman-administered list, and Mailman has pretty decent options to filter out messages containing all sorts of unnecessary content. For instance, on the Linux-India-* lists which I manage we quarantine all messages with any MIME content (including HTML) for administrator action. While it's a bit of a PITA for the list administrator, it definitely keeps the list coherent and easy on both the high-bandwidth and us III-world 28.8-dialup types :-) The new version of Mailman (which was still Beta, last I checked) had command-line tools for doing regular list maintenance. Quite an improvement over that sucky web interface (IMNSHO). Vive l'keyboard! BTW, most (all?) mailers have options for exploding digests into individual messages, after which you'll be able to read the attachments just fine. Regards, -- Raju >>>>> "Herbert" == Herbert Fruchtl writes: Herbert> Would you PLEEEEEASE not send huge binary attachments to Herbert> the list! Put them on the web, send them on demand, Herbert> whatever. Hundreds of elm users hate you and will never Herbert> buy from your company again :-) Herbert> I apologise if others already complained. I only get the Herbert> daily digest (where attachments are definitely Herbert> unreadable). -- Raju Mathur raju@kandalaya.org http://kandalaya.org/ It is the mind that moves From becker at scyld.com Tue Apr 30 08:45:27 2002 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:02:19 2009 Subject: List Attachmants (was: Hyperthreading in P4) In-Reply-To: <15566.40536.998049.36069@mail.linux-delhi.org> Message-ID: > >>>>> "Herbert" == Herbert Fruchtl writes: > > Herbert> Would you PLEEEEEASE not send huge binary attachments to > Herbert> the list! Put them on the web, send them on demand, I apologize for allowing this very large message to get through. Some subscribers were lucky, and didn't see the message. Sending a 2.4MB message to several thousand subscribers saturated our link. Once I figured out what was happening, I shut down the mailer and manually deleted the large messages from the queue. (That's why the mailer was down overnight.) The Klez virus is part of the problem here. There have been so many Klez messages that I assumed the initial complaints were mistaken about the message source. On Tue, 30 Apr 2002, Raju Mathur wrote: > This is a Mailman-administered list, and Mailman has pretty decent > options to filter out messages containing all sorts of unnecessary > content. For instance, on the Linux-India-* lists which I manage we > quarantine all messages with any MIME content (including HTML) for > administrator action. We are running Mailman 2.0.6. I don't see a moderation option for MIME, although I might have missed it. > The new version of Mailman (which was still Beta, last I checked) had > command-line tools for doing regular list maintenance. Quite an > improvement over that sucky web interface (IMNSHO). Vive l'keyboard! OOoooh... I'm updating when it comes out of beta. It's very time consuming to use the web interface to delete the same spam from two dozen lists. I'm hoping the new version has "discard" patterns as well as the current "hold for moderation" pattens. -- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From SGaudet at turbotekcomputer.com Tue Apr 30 10:19:42 2002 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:02:19 2009 Subject: List Attachmants (was: Hyperthreading in P4) Message-ID: <3450CC8673CFD411A24700105A618BD6267EA5@911TURBO> Don, > > I apologize for allowing this very large message to get through. > > Some subscribers were lucky, and didn't see the message. Sending a > 2.4MB message to several thousand subscribers saturated our > link. Once > I figured out what was happening, I shut down the mailer and manually > deleted the large messages from the queue. (That's why the mailer was > down overnight.) Very sorry about this. I didn't realize I put together such a big file. It won't happen again. I'll put all this up on our web site this week. Again I apologize for this screw up. Steve Gaudet From math at velocet.ca Tue Apr 30 10:25:12 2002 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:02:19 2009 Subject: List Attachmants (was: Hyperthreading in P4) In-Reply-To: ; from becker@scyld.com on Tue, Apr 30, 2002 at 11:45:27AM -0400 References: <15566.40536.998049.36069@mail.linux-delhi.org> Message-ID: <20020430132512.R8530@velocet.ca> On Tue, Apr 30, 2002 at 11:45:27AM -0400, Donald Becker's all... > > >>>>> "Herbert" == Herbert Fruchtl writes: > > > > Herbert> Would you PLEEEEEASE not send huge binary attachments to > > Herbert> the list! Put them on the web, send them on demand, > > I apologize for allowing this very large message to get through. > > Some subscribers were lucky, and didn't see the message. Sending a > 2.4MB message to several thousand subscribers saturated our link. Once > I figured out what was happening, I shut down the mailer and manually > deleted the large messages from the queue. (That's why the mailer was > down overnight.) > > The Klez virus is part of the problem here. There have been so many > Klez messages that I assumed the initial complaints were mistaken about > the message source. > > On Tue, 30 Apr 2002, Raju Mathur wrote: > > > This is a Mailman-administered list, and Mailman has pretty decent > > options to filter out messages containing all sorts of unnecessary > > content. For instance, on the Linux-India-* lists which I manage we > > quarantine all messages with any MIME content (including HTML) for > > administrator action. > > We are running Mailman 2.0.6. > I don't see a moderation option for MIME, although I might have missed it. In the new mailman (whcih we also use) there should be header filtering. It should be possible to filter based on Content-type: however that doesnt help with content-length. > > The new version of Mailman (which was still Beta, last I checked) had > > command-line tools for doing regular list maintenance. Quite an > > improvement over that sucky web interface (IMNSHO). Vive l'keyboard! > > OOoooh... I'm updating when it comes out of beta. It's very time > consuming to use the web interface to delete the same spam from two > dozen lists. I'm hoping the new version has "discard" patterns as well > as the current "hold for moderation" pattens. The new one can auto discard spam, andyou can even do it sans notification. (I do it with, so I get the spam, but just to see what's being discarded in case some lost sheep subscriber is posting incorrectly to my relatively private lists.) Its not bad. Now if I could only figure out why it uses a cluster worth of CPU to deliver messages, I'd be happy with mailman. :) (1.2Ghz CPU doing about 20-30% cpu 24hrs a day to send 2000 posts-recipient :( ) /kc > > > -- > Donald Becker becker@scyld.com > Scyld Computing Corporation http://www.scyld.com > 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters > Annapolis MD 21403 410-990-9993 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From jlb17 at duke.edu Tue Apr 30 12:50:58 2002 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:02:19 2009 Subject: Processor contention(?) and network bandwidth on AMD In-Reply-To: Message-ID: On Mon, 29 Apr 2002 at 5:07pm, Robert G. Brown wrote > Try this: > > Do your netpipe transmitter on an unloaded host, a host at load 1 and > at load 2. > Do your netpipe receiver on an unloaded host, a host at load 1 and one > at load 2. > > Fill in the matrix -- load 0 to load 0, load 0 to load 1, etc. > > I found (in similar tests done years ago) that a TRANSMITTER could be > loaded to 2 (per cpu) with only a small degradation of throughput, but > loading a RECEIVER would drop throughput dramatically, by as much as > 50%. Done -- the results are at . Keep in mind these were pretty quick and dirty tests using the systems I have on hand. Between Athlon systems, it seems the transmitter vs. receiver loading doesn't make much of a difference. A newer Intel based system (dual PIIIs on a Serverworks HE-SL chipset) shows the same bandwidth hit with overload as the Athlon systems. But older systems (well, a couple of years anyway) don't show this hit, which is what set me off on all this. Maybe it's an issue of chipset support? Anyways, thanks for listening to me babble on about all this. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From vanw at tticluster.com Tue Apr 30 15:11:59 2002 From: vanw at tticluster.com (Kevin Van Workum) Date: Wed Nov 25 01:02:19 2009 Subject: Dolphin Wulfkit Message-ID: I know this has been discussed before, but I'd like to know the "current" opinion. What are your experiences with the Dolphin Wulfkit interconnect? Any major issues (compatability/linux7.2/MPI/etc)? General comments. -- Kevin Van Workum www.tsunamictechnologies.com ONLINE COMPUTER CLUSTERS __/__ __/__ * / / / / / / From rgb at phy.duke.edu Tue Apr 30 16:03:57 2002 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:02:19 2009 Subject: xmlsysd, wulfstat (cluster monitor apps, beta) Message-ID: Dearest DBUG (and beowulf list) persons, Announcing xmlsysd and its companion application, wulfstat. xmlsysd is a lightweight, throttleable daemon that runs either as a forking daemon or out of xinetd (the latter by default). When one connects to it it accepts a very simple command language that basically a) configures it to deliver certain kinds of /proc and systems-call-derived information, generally throttling it so it doesn't return anything you aren't interested in; and b) causes it to wrap up that information in an xml-formatted message and return it to the caller. Security is managed any of several ways -- by ipchains or iptables, using tcp wrappers, or using xinetd's internal ip-level security features (or using ssl or ssh tunnels, for the truly paranoid or those who want to monitor across a WAN). wulfstat is a companion client application that uses the xmlsysd's running on a collection of cluster nodes or LAN workstation hosts to gather information about the nodes or hosts and present it in a simple tty (e.g. xterm, konsole) accessible tabular form, updating the table every N seconds (default 5). Think of it as vmstat, procinfo, ifconfig, uptime, free, date, the upper part of the top command, and a bit more all rolled into a single application so that you can monitor whole connected sets of this information across an entire cluster with some reasonable granularity. Such a tool has obvious uses -- for cluster users, it allows them to monitor host load averages, look for idle resources, monitor memory usage, obtain information at a glance about remote cpu type and clock, cache size, monitor network loads, and even see what fraction of a cluster node's up time has been spent "doing work" instead of idle. Most of this is equally useful to systems administrators seeking to monitor LAN host activity -- crashing systems are often signalled by anomalous consumption of memory or a steady rise in cpu usage, for example. The toolset has now been in use for some time and has been reasonably stable for several weeks (in spite of my constant poking at it to add new features or fix tiny problems). I am therefore releasing it as version 0.1.0 BETA for wider testing, although at the moment it seems to be doing fine in production. It is expected that wulfstat is just the first of a number of monitoring applications that will be developed that use the daemon. The daemon, for example, can also be used to monitor tasks on remote nodes by username and/or taskname and/or run status, although the application that actually permits name and task lists to be managed on the user side and the returned results properly displayed has yet to be written. Full GUI and/or web applications should also be straightforward to build, although this time I learned my lesson and built the tty application FIRST (for xmlsysd's predecessor, procstatd I built a GUI application and have regretted it ever after). It is also expected that at least a few more features will be added to the daemon (it lacks e.g. lm-sensors support at this point, for example). The daemon >>should<< have just enough power to form the basis for a load balancing or job distribution system -- it can certainly efficiently provide realtime monitoring of many of the components upon which a queuing decision might be based, including load, memory and network utilization, non-root tasks running or waiting to run, and even CPU type, clock, and cache. It does not run as a privileged user, however, and is not designed to manage the actual distribution or control of jobs. Still, I expect and hope that wulfstat and xmlsysd together will be immediately useful to cluster people who install it. The included documentation should be adequate although not overwhelming -- there are man pages for both xmlsysd and wulfstat that are very nearly up to date -- and I'm available to help with installations that don't seem to work correctly. The one "gotcha" of wulfstat is that it does require libxml2 (and hence probably RH 7.2 or better) to run -- you will need to ensure that this RPM is installed on the hosts where wulfstat is to run. xmlsysd similarly requires libxml to run on the cluster nodes. I would greatly appreciate feedback and bug reports, if any, from anybody who chooses to install it and give it a try. To retrieve it in RPM form, you can use the URL's below: http://www.phy.duke.edu/brahma/xmlsysd-0.1.0-beta.i386.rpm http://www.phy.duke.edu/brahma/xmlsysd-0.1.0-beta.src.rpm http://www.phy.duke.edu/brahma/wulfstat-0.1.0-beta.i386.rpm http://www.phy.duke.edu/brahma/wulfstat-0.1.0-beta.src.rpm If anybody needs it in tarball form (not in source or binary rpm form) they should contact me directly. I can easily generate one (or it can be extracted from the source rpm) but I guarantee the instructions for installation or configuration -- they are encapsulated already in the RPMs. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From ajiang at mail.eecis.udel.edu Tue Apr 30 20:19:05 2002 From: ajiang at mail.eecis.udel.edu (Ao Jiang) Date: Wed Nov 25 01:02:19 2009 Subject: Screen dump analysis: In-Reply-To: Message-ID: Hi, I have some questions, when I install Scyld Beowulf (version 2.0 preview). I am looking forward to seeing someone can give me some direction. Thanks a lot! After boot up the beowulf sys, the slave node shows: "Boot: System boot phase 1 in progress... ... Sending RARP request..." But this signal seems to send forever, status of node are always 'down' in the Beosetup of the master node. I checked the website and found it may be caused by the beoserv which isn't seeing RARP signals. But unfortunatly I couldn't find an effective way to solve it. So would you mind giving me some suggestion on how to fix beoserv problem? Or how to send message to slave nodes manually? The interesting thing is: When I tried to reboot the master node, the slave node seems to receive the messages from the master node and reboot too and enter phase 2 or 3. But some slave nodes show something wrong. I don't know what they mean and what the reasons are? If it is the problem of hardware, which device it is? The following is the screen dump: Slave node 1: " EXT2-fs error (device ramdisk(1,3)) Ext2_add_entry: bad entry in directory #8193; directory entry across blocks-offset=13680, inode=9096, rec_len=2064, name_len=5. " The status of node 1 is error. Slave node 2: " Boot: System boot phase 2 in progress... " autorun...Done VFS: Cannot open root device 03:01 Kernel panic: VFS unable to mount root fs on 03:01 " The status of node 2 is error. Tom From alvin at Maggie.Linux-Consulting.com Tue Apr 30 20:39:55 2002 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:02:19 2009 Subject: Screen dump analysis: In-Reply-To: Message-ID: hiya > autorun...Done > VFS: Cannot open root device 03:01 > Kernel panic: VFS unable to mount root fs on 03:01 > " > The status of node 2 is error. the system/kernel you are booting is lookign for / on /dev/hda1 but cant find it... - you probably copied kernels from differnt machines onto this one lilo: vmlinuz root=/dev/hda3 if your / is located on /dev/hda3 once it comes up...fix /etc/lilo.conf, re-run lilo and than reboot c ya alvin On Tue, 30 Apr 2002, Ao Jiang wrote: > Hi, > I have some questions, when I install Scyld Beowulf (version 2.0 preview). > I am looking forward to seeing someone can give me some direction. Thanks a > lot! > > > After boot up the beowulf sys, the slave node shows: > "Boot: System boot phase 1 in progress... > ... > Sending RARP request..." > But this signal seems to send forever, status of node are always 'down' > in the Beosetup of the master node. > > I checked the website and found it may be caused by the beoserv which isn't > seeing RARP signals. But unfortunatly I couldn't find an effective way to > solve it. > > So would you mind giving me some suggestion on how to fix beoserv problem? > Or how to send message to slave nodes manually? > > The interesting thing is: > > When I tried to reboot the master node, the slave node seems to receive the > messages from the master node and reboot too and enter phase 2 or 3. But > some slave nodes show something wrong. I don't know what they mean and what the > reasons are? If it is the problem of hardware, which device it is? > The following is the screen dump: > > Slave node 1: > > " > EXT2-fs error (device ramdisk(1,3)) > Ext2_add_entry: bad entry in directory #8193; directory entry across > blocks-offset=13680, inode=9096, rec_len=2064, name_len=5. > " > The status of node 1 is error. > > Slave node 2: > > " > Boot: System boot phase 2 in progress... > " > autorun...Done > VFS: Cannot open root device 03:01 > Kernel panic: VFS unable to mount root fs on 03:01 > " > The status of node 2 is error. > > Tom > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >