From rajkumar at csse.monash.edu.au Sat Sep 1 04:03:05 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:40 2009 Subject: NSF/TFCC Teaching Cluster Computing Workshop MaterialOnline! Message-ID: <3B90C069.D686BAFE@csse.monash.edu.au> Dear All, FYI: ---------------------------------------------------------------------- Professor Barry Wilkinson (TFCC Education Coordinator) has successfully concluded 3 day intensive workshop, funded by the "National Science Foundation" and sponsored by the IEEE Task Force on Cluster Computing, provided educators with materials and formal instruction to enable them to teach cluster computing at the undergraduate and graduate level. Participants received formal lectures and guided hands-on experience using a dedicated cluster of computers. The workshop lasted three days and took place in the Department of Computer Science at the University of North Carolina at Charlotte. There were no fees for this workshop. Accommodation and meals were provided at the Hilton Hotel, University Place, Charlotte, at no charge to the participants. The Full course presentation material is now available online from: http://www.cs.uncc.edu/%7eabw/CCworkshop2001/ Along with presentation material, participants have received free copies of following books (courtesy: NSF and Prentice Hall): ---------------- * Parallel Programming Techniques and Applications Using Networked Workstations and Parallel Computers, Barry Wilkinson and Michael Allen, Prentice Hall, 1999. http://vig.prenhall.com/catalog/academic/product?ISBN=0136717101 * High Performance Cluster Computing, Rajkumar Buyya (ed.), Prentice Hall, 1999. Vol.1: Architectures and Systems: http://www.phptr.com/ptrbooks/ptr_0130137847.html vol.2: Programming and Applications: http://www.phptr.com/ptrbooks/ptr_0130137855.html ---------------------------------------------------------------------- For other information on IEEE TFCC activities, browse: http://www.ieeetfcc.org/ For TFCC Membership Info see: http://www.ieeetfcc.org/membership.html Please note that the TFCC membership is FREE and is Open to all irrespective of whether your are IEEE/Computer Society member or not. Best regards, Raj ------------------------------------------------------------------------ Rajkumar Buyya School of Computer Science and Software Engineering Monash University, C5.41, Caulfield Campus Melbourne, VIC 3145, Australia Phone: +61-3-9903 1969 (office); +61-3-9571 3629 (home) Fax: +61-3-9903 2863; eFax: +1-801-720-9272 Email: rajkumar@buyya.com | rajkumar@csse.monash.edu.au URL: http://www.buyya.com | http://www.csse.monash.edu.au/~rajkumar ------------------------------------------------------------------------ From opengeometry at yahoo.ca Sat Sep 1 10:55:52 2001 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:01:40 2009 Subject: Anyone using PC-Chips motherboard? Message-ID: <20010901175552.77163.qmail@web13705.mail.yahoo.com> I would be grateful if I can get a feedback (good or bad) on PC-Chips motherboards? I'm looking at using them for cheap diskless nodes. --William _______________________________________________________ Do You Yahoo!? Get your free @yahoo.ca address at http://mail.yahoo.ca From dwight at supercomputer.org Sat Sep 1 14:23:28 2001 From: dwight at supercomputer.org (dwight@supercomputer.org) Date: Wed Nov 25 01:01:40 2009 Subject: Searchable Archives for this list Message-ID: <200109012123.OAA22788@localhost.localdomain> The search engine here has been down for a while, due to a combination of a security issue, and my not having the time to fix it properly. I've got quite a few things to do right now. But I'll take the time next weekend and look at bringing it back up. -dwight- > Hi, > > Is there any working searchable archive for this list? > Both links at beowulf.org do not work. The scyld link points to an > unavailable page and the link to supercomputer.org allows you to enter your > query, but only gives you invalid responses. > > Does anybody know other places to search the list or can fix the ones > supplied? > This would be greatly appreciated and might help reducing the number of > duplicate questions. > > Bye, Thommy > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From svel_n at yahoo.com Sun Sep 2 21:20:12 2001 From: svel_n at yahoo.com (Sakthivel Narayanan) Date: Wed Nov 25 01:01:40 2009 Subject: Help needed regarding Ethernet Switch In-Reply-To: <200109021600.MAA29188@blueraja.scyld.com> Message-ID: <20010903042012.10767.qmail@web20604.mail.yahoo.com> Dear Lists, We are having a 20 node linux cluster with 24 port 10/100 Mbps Intel EtherExpress 510T (manageable & stackable) switch. We would like to add some more machines with the existing cluster. I want to buy one more Ethernet Switch and stack with the existing Intel 510T switch. I need your suggestion for the following. 1. Can i go for one more Intel 510T E-switch with stackable interface module and stack with the existing switch. 2. I have doubt, is it possible to stack two different brands of switches.(Like Cisco & intel) 3. What are the pros and cons of stacking with the same brand or different brands. 4. What are the important technical specs, i should look before buying and stacking a Ethernet switch with an existing one. Please through some light on this, from your experiences. Thanking you in Advance N. Sakthivel Institute for Plasma Research Bhat, Gandhinagar - 382 428 India. __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From jakob at unthought.net Sun Sep 2 22:50:17 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:40 2009 Subject: Help needed regarding Ethernet Switch In-Reply-To: <20010903042012.10767.qmail@web20604.mail.yahoo.com>; from svel_n@yahoo.com on Sun, Sep 02, 2001 at 09:20:12PM -0700 References: <200109021600.MAA29188@blueraja.scyld.com> <20010903042012.10767.qmail@web20604.mail.yahoo.com> Message-ID: <20010903075017.A12364@unthought.net> On Sun, Sep 02, 2001 at 09:20:12PM -0700, Sakthivel Narayanan wrote: > Dear Lists, > > We are having a 20 node linux cluster with > 24 port 10/100 Mbps Intel EtherExpress > 510T (manageable & stackable) switch. We would like > to add some more machines with the existing cluster. > > I want to buy one more Ethernet Switch and stack with > the existing Intel 510T switch. > > I need your suggestion for the following. > > 1. Can i go for one more Intel 510T E-switch with > stackable interface module and stack with the > existing switch. We have that here: The stacking module is advertised as 1Gbit, but when you receive it the docs will tell you it's actually 400 Mbit full duplex. However, that may be enough (?) Our stacking module had a too short pcb, meaning it wouldn't work when inserted. Opening the switch and applying sufficient amounts of physical violence fixed the situation. > > 2. I have doubt, is it possible to stack two different > brands of switches.(Like Cisco & intel) Not with the Intel stacking module - but you could do this with gigabit ethernet ports (available for the 510T and many other swiches). > > 3. What are the pros and cons of stacking with the > same brand or different brands. > > 4. What are the important technical specs, i should > look before buying and stacking a Ethernet switch > with an existing one. > > > Please through some light on this, from your > experiences. Aside from the issues mentioned, the stacking module has worked perfectly well ever since (400 Mbit instead of 1G, and nobody dares to touch the switches out of fear from shaking the module out of it's socket again). The pcb issue may well have been fixed since. I don't know what speed they market the stacking module with these days. We use the switches on a regular LAN (not for a cluster), so the 400 Mbit is acceptable. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From atctam at csis.hku.hk Sun Sep 2 23:18:36 2001 From: atctam at csis.hku.hk (Anthony Tam) Date: Wed Nov 25 01:01:40 2009 Subject: Help needed regarding Ethernet Switch In-Reply-To: <20010903075017.A12364@unthought.net> References: <20010903042012.10767.qmail@web20604.mail.yahoo.com> <200109021600.MAA29188@blueraja.scyld.com> <20010903042012.10767.qmail@web20604.mail.yahoo.com> Message-ID: <4.3.2.7.0.20010903140536.00b4b398@study.csis.hku.hk> At 07:50 AM 9/3/2001 +0200, Jakob ?stergaard wrote: >On Sun, Sep 02, 2001 at 09:20:12PM -0700, Sakthivel Narayanan wrote: > > Dear Lists, > > > > We are having a 20 node linux cluster with > > 24 port 10/100 Mbps Intel EtherExpress > > 510T (manageable & stackable) switch. We would like > > to add some more machines with the existing cluster. > > > > I want to buy one more Ethernet Switch and stack with > > the existing Intel 510T switch. > > > > I need your suggestion for the following. > > > > 1. Can i go for one more Intel 510T E-switch with > > stackable interface module and stack with the > > existing switch. > >We have that here: > >The stacking module is advertised as 1Gbit, but when you receive it >the docs will tell you it's actually 400 Mbit full duplex. However, >that may be enough (?) This is likely coming from the 800 Mbps aggregated bandwidth limitation of the 510T switch. Don't trust those advertised info. They claimed this switch has a 2.2 Gb backplane, however, some where in their datasheet, it has mentioned the 800 Mbps limitation. BTW, if you want to have a medium-sized cluster with good network performance, I would suggest using the Cisco catalyst 2948G or 2980G, which has 48 or 80 FE ports respectively. I have tested on the 2980G, its performance looks promising, however, I have only tested it with 32 ports, which is the largest size of our cluster :( Cheers Anthony e Y8 d8 88 d8b Y8 88*8e d8888 88*e 88 88 88*8e Y8b Y888 d888b Y8 88 88b 88 88 88 88 88 88 88b Y8b Y8 d888888888 88 888 88 88 88 88 88 88 888 Y8b d888 b Y8 88 888 888 88 88 88 88 88 888 88 88 88 From kjyoun at netstech.com Mon Sep 3 05:22:41 2001 From: kjyoun at netstech.com (=?ks_c_5601-1987?B?v6yx1MGk?=) Date: Wed Nov 25 01:01:40 2009 Subject: HPL residual check failure Message-ID: Hi When I was doing HPL benchmark test using big matrix(bigger than 20,000 ) with many linux server(more than 20), sometimes I got residual check error as attached. When I got residual check error, I turned off my linux servers for several hours and then tried again. And usually it worked - I don't know the reason. Heat is suspicious. But, is it really heat problem? Is there anybody who have experienced similar problem or know the reason? please help me. Thanks in advance! Keaton HPL result files------------------------------------------------------------ ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- W11R2C4 21000 200 6 6 702.80 8.786e+00 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0272768 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0140749 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0026585 ...... PASSED ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- W11R2C4 23000 200 6 6 866.35 9.364e+00 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 3255.3898794 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 7833.1904572 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1364.3123654 ...... FAILED ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.000049 ||A||_oo . . . . . . . . . . . . . . . . . . . = 5827.145943 ||A||_1 . . . . . . . . . . . . . . . . . . . = 5836.795619 ||x||_oo . . . . . . . . . . . . . . . . . . . = 2.390054 From piramid_ps at yahoo.com Mon Sep 3 09:02:59 2001 From: piramid_ps at yahoo.com (pramod ps) Date: Wed Nov 25 01:01:40 2009 Subject: client working.( urgent , pls help me). In-Reply-To: Message-ID: <20010903160259.24236.qmail@web14104.mail.yahoo.com> i ,am doing a project in "BEOWULF CLUSTERS". the clients are remote booted using floppy disks. all arangements are made clear according to the instn given in cdrom. i've partitioned the client h/d remotely using the instn- beofdisk -d & beofdisk -w. then i restarted the machine after storing the partition table to the client. but when some program is given to execute in this environment , the client shows the message as "vmadump : mmap failed". server shows something like" connection reset by peer " etc. clients are booted and can be made redy to execute. ( up & available are true). is it anything reg with the fstab? please help me to solve this problem as early as possible.. thank u , pramod. __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From ron_chen_123 at yahoo.com Mon Sep 3 09:56:34 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:40 2009 Subject: Fwd: Commerical Sun Grid Engine Support Available Message-ID: <20010903165634.40949.qmail@web14704.mail.yahoo.com> Many companies cannot use software without support, that limits the reach of many great open source products, like SGE, Linux, FreeBSD, and Apache to name a few. I just discovered that commerical Sun Grid Engine (SGE) support will be available in Q1 2002 from Blackstone. Does anyone know where I can get commerical support for Beowulf? -Ron Chen Open Source Consultant --- "Elliott N. Berger" wrote: > Reply-to: users@gridengine.sunsource.net > Date: Thu, 30 Aug 2001 16:50:06 -0400 > > (http://www.BlackstoneComputing.com) > To: users@gridengine.sunsource.net > You asked: > "The question is, is Blackstone offering support for > SGE for OSes other than Solaris?" > > The answer is yes! > > We have entered into an arrangement with Sun whereby > Sun Grid Engine users can purchase a first line > support agreement > directly from Blackstone to access our expertise for > installation > and configuration support. Initially, this service > is available for > Solaris and Linux. Support for IBM, and Compaq > platforms is > planned for 1Q2002. Additional platforms may be > added in the > future based on customer demand. In addition, all of > the SGE > platform binaries are available for download from > Blackstone's > web site. > > Please visit our web site at BlackstoneComputing.com > or > send email to "Sales@BlackstoneComputing.com". > > Best regards, > Ron Ranauro __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From ron_chen_123 at yahoo.com Mon Sep 3 10:27:09 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:40 2009 Subject: Fwd: SGE support from Cards too!! Message-ID: <20010903172709.4355.qmail@web14701.mail.yahoo.com> I just found that another company is offering support for SGE too! It's great that Open Source Software is getting more and more support! Users can choose where they want to get support from :-) -Ron --- Martin Kloock wrote: > Reply-to: users@gridengine.sunsource.net > Date: Fri, 31 Aug 2001 15:26:44 +0200 (MEST) > From: Martin Kloock > To: users@gridengine.sunsource.net > Subject: [GE users] SGE support (was: output to > screen in interactive mode) > > Hi all, > > I would like to add that cards Engineering based in > Cologne/Germany is > also doing SGE and SGEEE support mainly in Germany, > but with the will to > expand this to whole Europe and USA. We are working > mainly in the > automotive, finance and banking area, but we also > are aquiring SGE > customers in other areas like the chemical industry. > We are supporting SGE on any platform SGE is > available in partnership to > SUN Microsystems. In these months I am developing an > SGE environment on > HP-UX. > > So I think with Blackstone and cards there are > allready two powerfull > companies supporting SGE so any company that needs > technical SGE support > and someone to count on can contact us. > > You can get more information under > http://www.cardse.com and/or > gridengine@cardse.com. > > > > Regards, > Martin > > > > The answer is yes! > > > > We have entered into an arrangement with Sun > whereby > > Sun Grid Engine users can purchase a first line > support agreement > > directly from Blackstone to access our expertise > for installation > > and configuration support. Initially, this service > is available for > > Solaris and Linux. Support for IBM, and Compaq > platforms is > > planned for 1Q2002. Additional platforms may be > added in the > > future based on customer demand. In addition, all > of the SGE > > platform binaries are available for download from > Blackstone's > > web site. > > > > Please visit our web site at > BlackstoneComputing.com or > > send email to "Sales@BlackstoneComputing.com". > > > > Best regards, > > Ron Ranauro __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From joelja at darkwing.uoregon.edu Mon Sep 3 10:52:13 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:40 2009 Subject: Fwd: Commerical Sun Grid Engine Support Available In-Reply-To: <20010903165634.40949.qmail@web14704.mail.yahoo.com> Message-ID: On Mon, 3 Sep 2001, Ron Chen wrote: > Many companies cannot use software without support, > that limits the reach of many great open source > products, like SGE, Linux, FreeBSD, and Apache to name > a few. > > I just discovered that commerical Sun Grid Engine > (SGE) support will be available in Q1 2002 from > Blackstone. > > Does anyone know where I can get commerical support > for Beowulf? IBM is more than happy to sell you pc cluster, with a support contract for both the hardware and software... > -Ron Chen > Open Source Consultant > > --- "Elliott N. Berger" wrote: > > Reply-to: users@gridengine.sunsource.net > > Date: Thu, 30 Aug 2001 16:50:06 -0400 > > > > (http://www.BlackstoneComputing.com) > > To: users@gridengine.sunsource.net > > > You asked: > > "The question is, is Blackstone offering support for > > SGE for OSes other than Solaris?" > > > > The answer is yes! > > > > We have entered into an arrangement with Sun whereby > > Sun Grid Engine users can purchase a first line > > support agreement > > directly from Blackstone to access our expertise for > > installation > > and configuration support. Initially, this service > > is available for > > Solaris and Linux. Support for IBM, and Compaq > > platforms is > > planned for 1Q2002. Additional platforms may be > > added in the > > future based on customer demand. In addition, all of > > the SGE > > platform binaries are available for download from > > Blackstone's > > web site. > > > > Please visit our web site at BlackstoneComputing.com > > or > > send email to "Sales@BlackstoneComputing.com". > > > > Best regards, > > Ron Ranauro > > > > > > > > > __________________________________________________ > Do You Yahoo!? > Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger > http://im.yahoo.com > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From zouguangxian at hotmail.com Tue Sep 4 00:54:25 2001 From: zouguangxian at hotmail.com (=?gb2312?B?194gueLPyA==?=) Date: Wed Nov 25 01:01:40 2009 Subject: Linpack Result! Message-ID: hi.:) I got a linpack result with PIII 933 and 256M memory.it can reach 0.518G.but i have doubt on the performance(N=4500,it is enough.),can some body give me a referance?thanks. weck _________________________________________________________________ 您可以在 MSN Hotmail 站点 http://www.hotmail.com/cn 免费收发电子邮件 From indraneel at indialine.org Tue Sep 4 08:20:26 2001 From: indraneel at indialine.org (Indraneel Majumdar) Date: Wed Nov 25 01:01:40 2009 Subject: PC configuration for cluster Message-ID: <20010904102025.A4111@indialine.org> Hi, My department is planning to build a cluster and I've some questions regarding some of the components. If anyone is using these I'll be grateful to hear of their experiences. 1. Does the Asus A7M266 boot without a graphics card? 2. Does the Asus A7M266 bios support network boot? 3. Does the Tyan S2462UNG (dual amd motherboard) require paired processors (eg like in SGI and Sun machines)? 4. Can the Tyan S2462UNG boot off a SCSI disk (the motherboard has an onboard SCSI controller)? 5. How is the performance of the Netgear GA622T copper gigabit network card in relation to the fiber optic card (GA620) (This is my first plunge into gigabit networking)? 6. Is it possible to boot a machine off a DVD Drive with a bootable CD? TIA, Indraneel -- http://www.indialine.org/indraneel/ From Dean.Carpenter at pharma.com Tue Sep 4 10:00:53 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:40 2009 Subject: meshing together multiple nodes without a switch Message-ID: <759FC8B57540D311B14E00902727A0C002EC4A8B@a1mbx01.pharma.com> Are you thinking of the Flat Neighborhood Network idea ? http://aggregate.org/FNN/ -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) -----Original Message----- From: Velocet [mailto:math@velocet.ca] Sent: Wednesday, August 29, 2001 5:48 PM To: beowulf@beowulf.org Subject: meshing together multiple nodes without a switch Someone posted a url for the solution to put N nodes no more than M hops away from eachother. I've been scouring the list archives, but I cannot for the life of me find it. If it could be reposted that would be much appreciated. Thx. From Dean.Carpenter at pharma.com Tue Sep 4 12:26:40 2001 From: Dean.Carpenter at pharma.com (Carpenter, Dean) Date: Wed Nov 25 01:01:40 2009 Subject: Reading list Message-ID: <759FC8B57540D311B14E00902727A0C002EC4A8D@a1mbx01.pharma.com> Hey All - Can anyone provide a good recommended reading list for cluster/high performance/high avail/beowulf computing ? Which books have you read and found to be good, and more importantly, which were bad and to be avoided ? -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter@pharma.com deano@areyes.com 94TT :) From eric at fnordsystems.com Tue Sep 4 17:27:36 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:40 2009 Subject: PC configuration for cluster In-Reply-To: <20010904102025.A4111@indialine.org> Message-ID: 1) No, but you could install something like the PC Weasel or AMI MegaRAC, which emulates a video card and provides serial console. However these are much more expensive than a cheap video card, if you're trying to keep costs down... 4mb PCI cards with the SiS 6326 chipset can be had for around $20 each, and work nicely in every OS. 2) Yes, with the proper NIC. I'd suggest Intel Pro/100+ 82559 chipsets for inexpensive 100base-T, they cost about $25 each. Intel model PILA8460. 3) Not required, but it's adviseable to get the same stepping of AthlonMP. Currently 1.0 and 1.2GHz are available, within the next several weeks the 1.3 and 1.4GHz will be released. 4) Yes. 5) Roughly the same, in my experience, and going copper will save greatly on cabling costs too. The switch will still cost you an arm and a leg. 6) 99% of DVD-ROMs made in the last few years function as regular ATAPI CD-ROM devices, and should boot from a burned CD-R or CD-RW just fine. A final note, you may want to consider Tyan's S2460 TigerMP board, which is roughly half the cost of the Thunder MP (S2462). It has no SCSI, NIC, or video onboard, and the DIMM slots aren't angled for 1U... but it will fit nicely in most mid-tower ATX cases and works with regular 300W power supplies. No need for the special NMB or Delta power supply. Here's a review: http://accelenation.com/?doc=56 Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 fax: +1-360-647-0752 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Indraneel Majumdar Sent: Tuesday, September 04, 2001 8:20 AM To: Beowulf Subject: PC configuration for cluster Hi, My department is planning to build a cluster and I've some questions regarding some of the components. If anyone is using these I'll be grateful to hear of their experiences. 1. Does the Asus A7M266 boot without a graphics card? 2. Does the Asus A7M266 bios support network boot? 3. Does the Tyan S2462UNG (dual amd motherboard) require paired processors (eg like in SGI and Sun machines)? 4. Can the Tyan S2462UNG boot off a SCSI disk (the motherboard has an onboard SCSI controller)? 5. How is the performance of the Netgear GA622T copper gigabit network card in relation to the fiber optic card (GA620) (This is my first plunge into gigabit networking)? 6. Is it possible to boot a machine off a DVD Drive with a bootable CD? TIA, Indraneel -- http://www.indialine.org/indraneel/ _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From al303917 at vigia.ens.uabc.mx Mon Sep 3 12:40:55 2001 From: al303917 at vigia.ens.uabc.mx (< Dragon >) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? Message-ID: Hi, does any body know if the RTL-8139 kernel module for the KNE120TX works with channel bonding?? I channel bonded 3 KNE120tx cards on 2 computers (3 cards per system) and the comunication between them is very slow, I don't know if the problem is in the rtl8139 kernel module (Red Hat 6.2) Does any body know which kernel module can I use for the KNE120TX that will support channel bonding?? I have a cluster bonded with KNE100TX cards using the Tulip kernel module and they work fine. Any help is appreciated. Thanks. Raul A. Gonzalez Olimon al303917@vigia.ens.uabc.mx Universidad Autonoma de Baja California Mexico. From jgreen at thunderlizards.net Tue Sep 4 14:06:31 2001 From: jgreen at thunderlizards.net (Joe Greenseid) Date: Wed Nov 25 01:01:40 2009 Subject: Reading list In-Reply-To: <759FC8B57540D311B14E00902727A0C002EC4A8D@a1mbx01.pharma.com> Message-ID: On Tue, 4 Sep 2001, Carpenter, Dean wrote: > Hey All - > > Can anyone provide a good recommended reading list for cluster/high > performance/high avail/beowulf computing ? Which books have you read and > found to be good, and more importantly, which were bad and to be avoided ? The great thing about this type of question is that it can easily be the start of a nice little flame war. :) But, for a partial answer to your question, I have only read one book on the subject of Linux clustering -- "Building Linux Clusters," which was published by O'Reilly. This book was very much not good. It was also pulled off the production line by O'Reilly only a few months after it was released, so they recognized that it needed some help. I have found, however, that books are sometimes not a very helpful source of information in this particular field. The technology is evolving so rapidly that in many cases, if a book is about specific technologies, it can be out of date by the time it is released. There are a few websites out there that are actually helpful in providing information and links about various aspects of Linux clustering. A few of these you may wish to check out are: http://lcic.org http://linux-ha.org http://www.csse.monash.edu.au/~rajkumar/cluster/index.html I know that the first and third of these links have pages or sections about books related to Linux and Linux clustering. Good Luck, --Joe ************************************* * Joe Greenseid * * e-mail: jgreen@thunderlizards.net * * http://www.thunderlizards.net * * http://lcic.org * ************************************* From venk11 at yahoo.com Tue Sep 4 00:23:14 2001 From: venk11 at yahoo.com (k. vengadesan) Date: Wed Nov 25 01:01:40 2009 Subject: help Message-ID: <20010904072314.63315.qmail@web14206.mail.yahoo.com> Dear Sir, We came to know you through the web. We are currently working on structure prediction works. we wrote our own fortaran program to do this computation. It took long CPU time to complete this computation in the pentium- III 650 MHz processor which has linux operating system and 10 GB hard disk. Each time we have to generate 1000 structures. it can be parallized easily. so we bought another one pentium III machine (Exactly same specification as the first PC) Then we clustered two PCs using pvm as well as MPI and ordinary external network. we parallesied our program such as 500 structures calculations to FIRST PC and another 500 to second PC. Then the computation time reduced to approximately half. We are happy about this. here the input data stored in both PCs and we distribute only the numbers(eg.501to 1000) when staring the computation and only at the end of the computation. so, the less communication only used during the computation. results needed for further computation We have to do lot of research on this same area. so, we planned to build large beowulf computer(may be with more than 16 processor). however, we have little bit experience as stated above. we don't know exactly what are harwares and softwares to be buy to build such a computer. Unfortunately, no person around our place to contact about this. So, could you please guide us, what are things we have to buy to build such a beowulf computer, which is applicaple to our work (as stated above). we have the fund for this around $63,000 from our Government. I think these informations are enough for you to guide us, or if you have doubt about us(regarding the above, please free to contact. so, could you please give us the complete list of hardwares and softwares to build beowulf computer(may be with cost). If you can't give this, please show us a person, whom to be contact regarding this. the uncommercial help or the prebuilt beowulf most prefered one. waiting for your reply. Yours sincerely, K. Vengadesan ===== K. Vengadesan Research Fellow, Department of Crystallography and Biophysics University of Madras Guindy Campus, Chennai-600 025, INDIA e-mail:venk11@yahoo.com phone: 91 44 2351367 __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From petitet at cs.utk.edu Mon Sep 3 09:06:44 2001 From: petitet at cs.utk.edu (Antoine Petitet) Date: Wed Nov 25 01:01:40 2009 Subject: HPL residual check failure Message-ID: Hi, > ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 3255.3898794 ...... FAILED means that the last 4 digits of the vector solution are incorrect 3255.389... = O(10^3). This also means that the first 12 digits are correct ... This residual number should be a O(1), and because it is found to be more than the threshold value given in the HPL.dat (16.0 is the default) the test is flagged as failed. Those residuals are computed, because if you were to report the perfor- mance achieved by your system to say the Top 500 list, you would be asked for those residuals. Such a failure may happen for one of the two following reasons: 1) the matrix random generator may produce a poorly conditioned matrix such that a more accurate result can not be produced with the algorithm used in HPL. One way to check for this would be to estimate the condition number of the randomly generated matrix. HPL does not do it, because such an operation is time-consuming, and also because a large number of those randomly generated matrices have been shown to be sufficiently well- conditioned. In short: On one hand, I cannot prove that this generator produces well-conditioned matrices, and on the other hand, not a single case of failure due to this generator has been reported so far. 2) For some reason, a bit or a byte is being corrupted during the computa- tions / communications. Such a problem may be caused by the hardware, or the software. Ex: A memory bank corrupts data, a network transmission failed, or a computation get the wrong result say because of a data align- ment issues. Software problems are relatively easy to track down: multiple implemen- tation of MPI, or the BLAS are available. The problem could be in HPL as well, but with the source available, one can potentially investigate. Hardware failures are more problematic. They are often not repeatable, and they rarely occur during a short run. They are also rare. Cheers, Antoine From agrajag at scyld.com Tue Sep 4 17:24:20 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:40 2009 Subject: distributed file systems In-Reply-To: <3B8FBAD5.834ABF32@wiglaf.se>; from Jon.Tegner@wiglaf.se on Fri, Aug 31, 2001 at 06:27:01PM +0200 References: <3B8FBAD5.834ABF32@wiglaf.se> Message-ID: <20010904202420.A13931@blueraja.scyld.com> On Fri, 31 Aug 2001, Jon Tegner wrote: > Haven't tried, but it seems that afs should be slightly faster than > nfs, see > > http://www.ait.iastate.edu/olc/storage/afs/nfs2afs.txt.html As someone who has experience with afs as a user and an administrator, I can tell you it is definately not what you want for a cluster. AFS is very nice if you have a very large deployment of machines (like a campus with over 50,000 users), however for a self-contained cluster, it has way to much overheard and adds unnecessary complications to administration. As far as speed, I think nfs is actually faster than afs. On top of these, there are also stability issues, there are serious problems with every implementation of AFS I know of for linux, this includes Transarc's closed source implementation, the OpenAFS project which is based on Transarc's code, and the code for arla, an implementation of AFS from scratch. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010904/eb01391f/attachment.bin From rgb at phy.duke.edu Wed Sep 5 03:38:38 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? In-Reply-To: Message-ID: On Mon, 3 Sep 2001, << Dragon >> wrote: > > Hi, does any body know if the RTL-8139 kernel module for the KNE120TX > works with channel bonding?? > > I channel bonded 3 KNE120tx cards on 2 computers (3 cards per system) and > the comunication between them is very slow, I don't know if the problem is > in the rtl8139 kernel module (Red Hat 6.2) I don't know about channel bonding per se, but the RTL8139 NIC is Dark Evil. In my opinion, of course: YMMV, caveat emptor, I may be crazy (some would say there is little doubt:-), standard disclaimers etc. Still, my own experience: It is one of the few cards with which I can still consistently crash a linux box currently in my possesion when I whack it with a very heavy packet stream. I should note in saying this that this is with the current RH 7.1 kernel with its "stolen" 8139too driver, not Don's, although many kernel revisions ago I managed to crash boxes with two RTL8139's with Don's driver as well. You should connect to the scyld website and visit the rtl8139 page and read the notes there. I'd have to say that even if your particular implementation of the 8139 is more stable than my own (maybe Kingston did a better job of engineering the NICs than my no-name mfr, or maybe you're using Don's driver already and it actually works stably where the 8139too dies) it is a relatively poor choice for a beowulf NIC. Even when I run just one card at a time (and cannot crash the system) the card seems to choke up under a heavy packet load and actually slow down to a crawl. Crush. Kill. Destroy. Choose another card. They are more expensive, but 3c905's do PXE and WOL and can save you the cost of a floppy or CD-ROM (per node) for the original install. They have MUCH better latency and bw numbers in my tests -- better than even midrange cards that DON'T really suck (e.g. PNIC based cards). rgb > > Does any body know which kernel module can I use for the KNE120TX that > will support channel bonding?? > > I have a cluster bonded with KNE100TX cards using the Tulip kernel module > and they work fine. > > Any help is appreciated. > > > Thanks. > Raul A. Gonzalez Olimon > > al303917@vigia.ens.uabc.mx > > Universidad Autonoma de Baja California > > Mexico. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From eric at fnordsystems.com Wed Sep 5 04:34:38 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? In-Reply-To: Message-ID: In my experience, the Realtek 8139 is the worst modern network controller on the planet. It's the cheapest available, and commonly found on $65 generic (ie: PC Chips) celeron motherboards. Sometimes it completely refuses to autodetect speed and duplex of a connection. You really do get what you pay for... I second Robert's comment on the 3C905C-TX NIC, which is an excellent card. They can be had for under $30 if you look around. The Intel 82559 is also good, as evidenced by its use in high end Supermicro Serverworks III-HE chipset motherboards. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 fax: +1-360-647-0752 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Robert G. Brown Sent: Wednesday, September 05, 2001 3:39 AM To: << Dragon >> Cc: beowulf@beowulf.org Subject: Re: channel bonding rtl8139 works?? On Mon, 3 Sep 2001, << Dragon >> wrote: > > Hi, does any body know if the RTL-8139 kernel module for the KNE120TX > works with channel bonding?? > > I channel bonded 3 KNE120tx cards on 2 computers (3 cards per system) and > the comunication between them is very slow, I don't know if the problem is > in the rtl8139 kernel module (Red Hat 6.2) I don't know about channel bonding per se, but the RTL8139 NIC is Dark Evil. In my opinion, of course: YMMV, caveat emptor, I may be crazy (some would say there is little doubt:-), standard disclaimers etc. Still, my own experience: It is one of the few cards with which I can still consistently crash a linux box currently in my possesion when I whack it with a very heavy packet stream. I should note in saying this that this is with the current RH 7.1 kernel with its "stolen" 8139too driver, not Don's, although many kernel revisions ago I managed to crash boxes with two RTL8139's with Don's driver as well. You should connect to the scyld website and visit the rtl8139 page and read the notes there. I'd have to say that even if your particular implementation of the 8139 is more stable than my own (maybe Kingston did a better job of engineering the NICs than my no-name mfr, or maybe you're using Don's driver already and it actually works stably where the 8139too dies) it is a relatively poor choice for a beowulf NIC. Even when I run just one card at a time (and cannot crash the system) the card seems to choke up under a heavy packet load and actually slow down to a crawl. Crush. Kill. Destroy. Choose another card. They are more expensive, but 3c905's do PXE and WOL and can save you the cost of a floppy or CD-ROM (per node) for the original install. They have MUCH better latency and bw numbers in my tests -- better than even midrange cards that DON'T really suck (e.g. PNIC based cards). rgb > > Does any body know which kernel module can I use for the KNE120TX that > will support channel bonding?? > > I have a cluster bonded with KNE100TX cards using the Tulip kernel module > and they work fine. > > Any help is appreciated. > > > Thanks. > Raul A. Gonzalez Olimon > > al303917@vigia.ens.uabc.mx > > Universidad Autonoma de Baja California > > Mexico. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rajkumar at csse.monash.edu.au Wed Sep 5 04:52:51 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:40 2009 Subject: Good conferences... Message-ID: <3B961213.7F851DCD@csse.monash.edu.au> Hi Jyrki, Also Check out: IEEE/ACM CCGrid 2002: www.ccgrid.org | http://ccgrid2002.zib.de/ The upcoming meeting is in Germany/Europe. Raj Message: 6 Date: Tue, 28 Aug 2001 13:40:45 +0300 To: beowulf@beowulf.org From: Jyrki Huusko Subject: Good conferences... Good day, Are there any good annual conferences (worth of attending) on distributed computing and cluster computing ... exept IEEE Cluster xxxx and ACM SIGCOMM'xx ? Sincerely Yours, Jyrki "I think there's a world market for about five computers." -Thomas Watson (IBM)- -- Jyrki Huusko, jyrki.huusko@vtt.fi Kaitov?yl? 1 P.O.BOX 1100, FIN-90571 OULU, FINLAND Tel. +358 8 551 2111, Fax +358 8 551 2320 http://www.vtt.fi http://www.willab.fi/telaketju From pdiaz88 at terra.es Wed Sep 5 09:11:12 2001 From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? In-Reply-To: References: Message-ID: <01090516111205.06347@duero> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've been running my tests/personal cluster with rtl8139's more than a year without problems. But recently I have been having a lot of troubles with the nics: locks, bad network response, etc.. Let's see, two things have changed: I've upgraded to 2.4.x series of kernels (2.4.9, previously running 2.2.17) and I've been working in different fields: original use was crypto related stuff, very little network activity. Now I also perform some video proccessing, wich is somewhat the opposite: large ammounts of time (days) with almost 100% network activity, a lot of nfs reads&writes, etc... It might be a kernel issue, didn't have time to investigate yet Cheers Pedro On Wednesday 05 September 2001 11:34, Eric Kuhnke wrote: > In my experience, the Realtek 8139 is the worst modern network controller > on the planet. It's the cheapest available, and commonly found on $65 > generic (ie: PC Chips) celeron motherboards. Sometimes it completely > refuses to autodetect speed and duplex of a connection. You really do get > what you pay for... > > I second Robert's comment on the 3C905C-TX NIC, which is an excellent > card. They can be had for under $30 if you look around. The Intel 82559 > is also good, as evidenced by its use in high end Supermicro Serverworks > III-HE chipset motherboards. > > > Eric Kuhnke > Lead Engineer / Operations Manager > Fnord Datacenter Systems Inc. > eric@fnordsystems.com > www.fnordsystems.com > voice: +1-360-527-3301 fax: +1-360-647-0752 > > -----Original Message----- > From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On > Behalf Of Robert G. Brown > Sent: Wednesday, September 05, 2001 3:39 AM > To: << Dragon >> > Cc: beowulf@beowulf.org > Subject: Re: channel bonding rtl8139 works?? > > On Mon, 3 Sep 2001, << Dragon >> wrote: > > Hi, does any body know if the RTL-8139 kernel module for the KNE120TX > > works with channel bonding?? > > > > I channel bonded 3 KNE120tx cards on 2 computers (3 cards per system) > > and > > > the comunication between them is very slow, I don't know if the problem > > is > > > in the rtl8139 kernel module (Red Hat 6.2) > > I don't know about channel bonding per se, but the RTL8139 NIC is Dark > Evil. In my opinion, of course: YMMV, caveat emptor, I may be crazy > (some would say there is little doubt:-), standard disclaimers etc. > > Still, my own experience: It is one of the few cards with which I can > still consistently crash a linux box currently in my possesion when I > whack it with a very heavy packet stream. I should note in saying this > that this is with the current RH 7.1 kernel with its "stolen" 8139too > driver, not Don's, although many kernel revisions ago I managed to crash > boxes with two RTL8139's with Don's driver as well. > > You should connect to the scyld website and visit the rtl8139 page and > read the notes there. I'd have to say that even if your particular > implementation of the 8139 is more stable than my own (maybe Kingston > did a better job of engineering the NICs than my no-name mfr, or maybe > you're using Don's driver already and it actually works stably where the > 8139too dies) it is a relatively poor choice for a beowulf NIC. Even > when I run just one card at a time (and cannot crash the system) the > card seems to choke up under a heavy packet load and actually slow down > to a crawl. > > Crush. Kill. Destroy. Choose another card. > > They are more expensive, but 3c905's do PXE and WOL and can save you the > cost of a floppy or CD-ROM (per node) for the original install. They > have MUCH better latency and bw numbers in my tests -- better than even > midrange cards that DON'T really suck (e.g. PNIC based cards). > > rgb > > > Does any body know which kernel module can I use for the KNE120TX that > > will support channel bonding?? > > > > I have a cluster bonded with KNE100TX cards using the Tulip kernel > > module > > > and they work fine. > > > > Any help is appreciated. > > > > > > Thanks. > > Raul A. Gonzalez Olimon > > > > al303917@vigia.ens.uabc.mx > > > > Universidad Autonoma de Baja California > > > > Mexico. > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf - -- /* * Pedro Diaz Jimenez: pdiaz88@terra.es, pdiaz@acm.asoc.fi.upm.es * * GPG KeyID: E118C651 * Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65 * * http://planetcluster.org * Clustering & H.P.C. news and documentation * * "La sabiduria me persigue, pero yo soy mas rapido" */ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7lk6gnu53feEYxlERAoU5AJ0ZLN/JEIu7KTqEqKM429oEWW5bbgCdGqcs E3JxXvF/ZkZ7ny5GQ8zRoBM= =zXYO -----END PGP SIGNATURE----- From gkogan at students.uiuc.edu Wed Sep 5 08:41:42 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:40 2009 Subject: security on Scyld Message-ID: Hi. What kind of security, firewall, does Scyld have and how do I implement it? Thanks From jacsib at lutecium.org Wed Sep 5 09:24:33 2001 From: jacsib at lutecium.org (Jacques B. Siboni) Date: Wed Nov 25 01:01:40 2009 Subject: Scyld Beowulf and Red Hat 7.0 Message-ID: <3B9651C1.132C42C2@lutecium.org> Hello, I have just received the Scyld Beowulf cdrom from Linux Central. Unfortunetly I see it is built with Linux 6.2. The Linux box I use runs RH 7.0. Is there a way to install the Beowulf package on this version of Linux? Did some of you solve this problem? Thanks in advance Jacques -- Dr. Jacques B. Siboni mailto:jacsib@Lutecium.org 8 pass. Charles Albert, F75018 Paris, France Tel. & Fax: 33 (0) 1 42 28 76 78 Home Page: http://www.lutecium.org/jacsib/ From wsb at paralleldata.com Wed Sep 5 10:27:33 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? References: Message-ID: <3B966085.1932F920@paralleldata.com> Eric Kuhnke wrote: > > In my experience, the Realtek 8139 is the worst modern network controller > on the planet. It's the cheapest available, and commonly found on $65 > generic (ie: PC Chips) celeron motherboards. Sometimes it completely > refuses to autodetect speed and duplex of a connection. You really do get > what you pay for... > I've used 8139's without trouble on both Alphas and x86. YMMV. Most likely a driver issue if it's not running correctly. They do run a little slow, 8-9 MB/sec vs 10-11MB/sec on a Netgear FA310tx. Wes From jared_hodge at iat.utexas.edu Wed Sep 5 10:21:51 2001 From: jared_hodge at iat.utexas.edu (Jared Hodge) Date: Wed Nov 25 01:01:40 2009 Subject: [Fwd: Re: Reading list] Message-ID: <3B965F2F.E84E6E34@iat.utexas.edu> "How to build a beowulf" was good. "Carpenter, Dean" wrote: > > Hey All - > > Can anyone provide a good recommended reading list for cluster/high > performance/high avail/beowulf computing ? Which books have you read and > found to be good, and more importantly, which were bad and to be avoided ? > > -- > Dean Carpenter > Principal Architect > Purdue Pharma > dean.carpenter@pharma.com > deano@areyes.com > 94TT :) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Jared Hodge Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: Jared_Hodge@iat.utexas.edu From siegert at sfu.ca Wed Sep 5 12:14:17 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:40 2009 Subject: channel bonding rtl8139 works?? In-Reply-To: ; from rgb@phy.duke.edu on Wed, Sep 05, 2001 at 06:38:38AM -0400 References: Message-ID: <20010905121417.A27603@stikine.ucs.sfu.ca> On Wed, Sep 05, 2001 at 06:38:38AM -0400, Robert G. Brown wrote: > On Mon, 3 Sep 2001, << Dragon >> wrote: > > > > Hi, does any body know if the RTL-8139 kernel module for the KNE120TX > > works with channel bonding?? > > > > I channel bonded 3 KNE120tx cards on 2 computers (3 cards per system) and > > the comunication between them is very slow, I don't know if the problem is > > in the rtl8139 kernel module (Red Hat 6.2) > > I don't know about channel bonding per se, but the RTL8139 NIC is Dark > Evil. In my opinion, of course: YMMV, caveat emptor, I may be crazy > (some would say there is little doubt:-), standard disclaimers etc. I used RealTek cards with the rtl8139 kernel module a while back (kernel 2.2.16, 2.2.17 when the root holes in those kernels were still unknown) for channel bonding in a test configuration. There was a problem with those cards as the setting of the MAC addresses to the MAC address of eth0 did not work correctly. Hence, channel bonding failed miserably (basically 2 out of 3 packets were dropped). A workaround was published on this list http://www.beowulf.org/pipermail/beowulf/2000-October/010236.html You had to copy the MAC addresses manually from eth0 to eth1, eth2, etc. After that channel bonding worked. I do not know whether with newer versions of the driver this is still necessary. > Still, my own experience: It is one of the few cards with which I can > still consistently crash a linux box currently in my possesion when I > whack it with a very heavy packet stream. I agree. When I said "channel bonding worked" that really meant it worked for certain applications. When you run embarrassingly parallel jobs almost any NIC will work, but then you wouldn't setup channel bonding in the first place. For other applications that require higher bandwidth the rtl8139 did succeed in hanging up the network once in a while. And if you run NFS (udp) traffic over the channel bonded connection - you better don't do that: Under high load the packet loss was so high that it could hang up the whole box. > I should note in saying this > that this is with the current RH 7.1 kernel with its "stolen" 8139too > driver, not Don's, although many kernel revisions ago I managed to crash > boxes with two RTL8139's with Don's driver as well. > > You should connect to the scyld website and visit the rtl8139 page and > read the notes there. I'd have to say that even if your particular > implementation of the 8139 is more stable than my own (maybe Kingston > did a better job of engineering the NICs than my no-name mfr, or maybe > you're using Don's driver already and it actually works stably where the > 8139too dies) it is a relatively poor choice for a beowulf NIC. Even > when I run just one card at a time (and cannot crash the system) the > card seems to choke up under a heavy packet load and actually slow down > to a crawl. > > Crush. Kill. Destroy. Choose another card. Agreed. I don't let those cards come close to my cluster anymore. If you want to do channel bonding, you do it for performance. 3 Realtek cards under some circumstances only had a slightly better performance as a single 3c905b in my tests - it just doesn't make sense. > They are more expensive, but 3c905's do PXE and WOL and can save you the > cost of a floppy or CD-ROM (per node) for the original install. They > have MUCH better latency and bw numbers in my tests -- better than even > midrange cards that DON'T really suck (e.g. PNIC based cards). Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From rajkumar at csse.monash.edu.au Thu Sep 6 02:44:12 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:40 2009 Subject: TFCC Membership Registration with IEEE CS! Message-ID: <3B97456C.766BE275@csse.monash.edu.au> Dear Colleagues, This is regarding membership for the: IEEE Computer Society Task Force Cluster Computing (TFCC) http://www.ieeetfcc.org/ IEEE TFCC is an international community driven forum for promoting cluster computing research, education, industry, and business. It provides a venue for sharing and exchanging ideas and encourages collaboration among members and participants. TFCC's activities include, * Organising and Sponsoring conferences such as: IEEE Cluster: http://www.clustercomp.org/ IEEE/ACM CCGrid: http://www.ccgrid.org/ * TFCC electronic Open Discussion Forum (has more than 500 members as part of this forum) * Education Promotion and Book Donation Program * Publication of Quarterly Newsletters * Publishes CC White papers authored by leaders in the field * TFCC/ACM/CoRR Cluster Computing Archive: http://www.ieeetfcc.org/ClusterArchive.html Please see TFCC web site for further information on our activities. IEEE TFCC Membership: FREE and Open to all: ****************************************** We would like to invite and encourage you to formally join TFCC as members. The TFCC Membership is FREE and open to both IEEE/CS members and non-members. There are many benefits by becoming official members. Benefits include: TFCC will be able to post you printed material by mail; You will "voting" rights while resolving important issues including electing leaders, assuming office upto the level of "TFCC Chair", benefit from book donation program, etc. To become member, all you need to do is one of the following: ------------------------------------------------------------- 1. Fill ASCII version of membership form and email to: mail.list@computer.org http://www.ieeetfcc.org/TFCCmembership.txt OR 2. Fill PDF version of membership form and mail/fax to IEEE CS (as indicated in the form): http://www.ieeetfcc.org/tfcc-mem-form.pdf OR 3. Fill Online Form and hit Submit! http://hopper.computer.org/correspo.nsf/signup -------- If you are not sure about becoming TFCC member in the past, we encourage to register again as indicated in the above. Special Request: If you would like to volunteer and actively contribute to any of our existing programs or have ideas for new activities and programs, please let us. TFCC will be glad to support and promote any activity that benefits its members and the community. We appreciate if you can share this email with your friends and colleagues working in cluster computing. Thanks for taking time to become TFCC member! Best wishes Raj and Mark Co-Chairs, IEEE Task Force on Cluster Computing (TFCC) ------------------------------------------------------------------------------------ Rajkumar Buyya School of Computer Science and Software Engineering Monash University, C5.41, Caulfield Campus Melbourne, VIC 3145, Australia Phone: +61-3-9903 1969 (office); +61-3-9571 3629 (home) Fax: +61-3-9903 2863; eFax: +1-801-720-9272 Email: rajkumar@buyya.com | rajkumar@csse.monash.edu.au URL: http://www.buyya.com | http://www.csse.monash.edu.au/~rajkumar ------------------------------------------------------------------------ From jacsib at lutecium.org Thu Sep 6 08:52:40 2001 From: jacsib at lutecium.org (Jacques B. Siboni) Date: Wed Nov 25 01:01:40 2009 Subject: Working boot diskette? Message-ID: <3B979BC8.DCB35EEB@lutecium.org> Hello, I can't so far build a working cluster node. There are many parameters to take into account. Therefore I am looking for a boot diskette in use somewhere. I'd like to explore it and to modify it for my needs. So far I spend a lot of time in building the nfs boot and there is always something missing Thanks in advance, Cheers Jacques -- Dr. Jacques B. Siboni mailto:jacsib@Lutecium.org 8 pass. Charles Albert, F75018 Paris, France Tel. & Fax: 33 (0) 1 42 28 76 78 Home Page: http://www.lutecium.org/jacsib/ From rross at mcs.anl.gov Thu Sep 6 10:51:38 2001 From: rross at mcs.anl.gov (Robert Ross) Date: Wed Nov 25 01:01:40 2009 Subject: distributed file systems In-Reply-To: <3B8F8118.E2FCBFCD@nada.kth.se> Message-ID: Jon, There is no "best method" IMHO. PVFS is probably your best bet for a scratch space for applications to store large data sets in, especially for MPI-IO applications. It isn't good for home directories; it doesn't cache, and all the metadata lookups make for very slow accesses in typical user interaction. NFS is probably your best bet for home directories, but it isn't good for large data sets and parallel access both for performance (single I/O node, limited protocol) and correctness (the cache isn't consistent and is difficult to disable) reasons. So I would use PVFS for scratch space. I would then ask myself if I really NEED /home on all the nodes. It's only nine nodes...you can copy executables out quickly. If you can stand the inconvenience, performance will be better if you just run applications off local disks. Regards, Rob On Fri, 31 Aug 2001, Jon Tegner wrote: > We have a small cluster consisting of nine nodes, and we are currently > exporting /home from the master to every node using nfs. > > We have also tried using pvfs, using partitions from all nodes in one > "parallel partition"- something which was slower than only using > nfs (since we don't have access to the source of the codes we use, we > cannot write in parallel). Maybe it would be better to only use two > nodes for every parallel partition, i.e., n1 and n2 builds home1, n3 > and n4 builds home2 ... ? > > Haven't tried, but it seems that afs should be slightly faster than > nfs, see > > http://www.ait.iastate.edu/olc/storage/afs/nfs2afs.txt.html > > My question now is if you have any suggestion of a "best method" for > a distributed file system to use in a cluster environment. > > Regards, > > Jon Tegner From shaman at vawis.net Wed Sep 5 01:33:49 2001 From: shaman at vawis.net (Thierry Mallard) Date: Wed Nov 25 01:01:40 2009 Subject: distributed file systems In-Reply-To: <3B8F8118.E2FCBFCD@nada.kth.se> References: <3B8F8118.E2FCBFCD@nada.kth.se> Message-ID: <20010905103349.A798@mallard.com> On Fri, Aug 31, 2001 at 02:20:40PM +0200, Jon Tegner wrote: > [...] > My question now is if you have any suggestion of a "best method" for > a distributed file system to use in a cluster environment. Maybe GFS could be a possible answer ? http://www.sistina.com/products_gfs.htm -- Thierry Mallard http://www.vawis.net http://www.goonix-studio.com (new) http://www.worldforge.org http://www.erlang-fr.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 249 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010905/40d0a5a7/attachment.bin From alvin at iplink.net Wed Sep 5 07:28:24 2001 From: alvin at iplink.net (alvin) Date: Wed Nov 25 01:01:40 2009 Subject: Time Syncronization. Message-ID: <3B963688.3547C39F@iplink.net> I have a hand full of machines that I am using rdate to synchronize. rdate seems to work reasonably well except for one macine that seems to gain about 30 seconds a day. The problem I find is that when rdate is run the date is just set back 30 seconds. Most times this is not a problem but is looks like it has broken inetd and then until I restart inetd nobody can use any services from there. I am sure someone on this list has solved this problem. I have used NTP in the past but have had mixed luck with various implemntations. I have a feeling that some combinaton of rdate and adjtimex may work. Any pointers or help greatfuly appreciated. -- Alvin Starr || voice: (416)785-4051 Interlink Connectivity || fax: (416)785-3668 alvin@iplink.net || From lindahl at conservativecomputer.com Thu Sep 6 17:52:34 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:40 2009 Subject: Time Syncronization. In-Reply-To: <3B963688.3547C39F@iplink.net>; from alvin@iplink.net on Wed, Sep 05, 2001 at 10:28:24AM -0400 References: <3B963688.3547C39F@iplink.net> Message-ID: <20010906205234.A14656@wumpus.foo> On Wed, Sep 05, 2001 at 10:28:24AM -0400, alvin wrote: > I have a hand full of machines that I am using rdate to synchronize. I've always used NTP, and it worked fine. I've used the NTP that redhat ships as RPMs, for example, on 500+ machines... g From award at andorra.ad Fri Sep 7 07:02:23 2001 From: award at andorra.ad (Alan Ward) Date: Wed Nov 25 01:01:40 2009 Subject: Time Syncronization. References: <3B963688.3547C39F@iplink.net> Message-ID: <3B98D36F.42EDF585@andorra.ad> I've been using timed from both RedHat and Mandrake with excellent results. On a small park, admittedly. Best, Alan alvin ha escrit: > > I have a hand full of machines that I am using rdate to synchronize. > > rdate seems to work reasonably well except for one macine that seems to > gain about 30 seconds a day. The problem I find is that when rdate is > run the date is just set back 30 seconds. Most times this is not a > problem but is looks like it has broken inetd and then until I restart > inetd nobody can use any services from there. > > I am sure someone on this list has solved this problem. I have used NTP > in the past but have had mixed luck with various implemntations. I have > a feeling that some combinaton of rdate and adjtimex may work. > > Any pointers or help greatfuly appreciated. > > -- > Alvin Starr || voice: (416)785-4051 > Interlink Connectivity || fax: (416)785-3668 > alvin@iplink.net || > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jtracy at ist.ucf.edu Fri Sep 7 07:58:14 2001 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Wed Nov 25 01:01:40 2009 Subject: Time Syncronization. In-Reply-To: <3B963688.3547C39F@iplink.net> Message-ID: On Wed, 5 Sep 2001, alvin wrote: > I have a hand full of machines that I am using rdate to synchronize. > > rdate seems to work reasonably well except for one macine that seems to > gain about 30 seconds a day. The problem I find is that when rdate is > run the date is just set back 30 seconds. Most times this is not a > problem but is looks like it has broken inetd and then until I restart > inetd nobody can use any services from there. > > I am sure someone on this list has solved this problem. I have used NTP > in the past but have had mixed luck with various implemntations. I have > a feeling that some combinaton of rdate and adjtimex may work. > > Any pointers or help greatfuly appreciated. NTP works great for syncing the time and correcting the clock rate. I would also suggest that you use the broadcast mode of NTP if all of your machines are on the same subnet. Otherwise you end up with a lot of traffic on the network just for keeping the time. -- Judd Tracy Institute for Simulation and Training jtracy@ist.ucf.edu From Eugene.Leitl at lrz.uni-muenchen.de Fri Sep 7 09:37:34 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:40 2009 Subject: ALINKA Linux Clustering Letter, September 5th. 2001 (fwd) Message-ID: I think Antoine makes a so fine job out of his summaries, he should just post them to Beowulf, which does not see much traffic these days, anyway. Unless you people have objections? ---------- Forwarded message ---------- Date: Fri, 07 Sep 2001 18:33:01 +0200 From: Antoine Brenner To: clustering@alinka.com Subject: ALINKA Linux Clustering Letter, September 5th. 2001 Resent-Date: Fri, 7 Sep 2001 18:33:05 +0200 (CEST) Resent-From: clustering@alinka.com The ALINKA Linux Clustering Letter, Wednesday, September the 5th. 2001 Dear readers, I am happy to send you this week's edition of clustering@alinka.com clustering@alinka.com is a free weekly e-mail newsletter on linux clustering. It provides a summary of the weekly activity in mailing-lists relative to linux clustering (such as beowulf, linux virtual server or linux-ha) and general clustering news. For more information about ALINKA, see: http://www.alinka.com News from the High Performance world, by Dr Laurent Gatineau (lgatineau@alinka.com) ====================================================================== Tips and tricks from the Beowulf mailing list ======== * About switches, Martin Wheeler [m1] is looking for cheap switches, Chris Black [m2] recommanded D-link switches, and Dan Philpott [m3] NetGear switches. Sakthivel Narayanan [m4] is wondering if he could upgrade its cluster with Intel EtherExpress 510T switche, Jakob ?stergaard [m5] gives its experience explaining bandwidth problems; Anthony Tam [m6] suggests Cisco switches for good bandwidth on medium cluster. Dean Carpenter [m7] posted a link [1] to a web which help user create their own FNN (Flat Neighborhood Network) designs. [1] http://aggregate.org/FNN/ [m1] http://www.beowulf.org/pipermail/beowulf/2001-August/001157.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-August/001158.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-August/001164.html [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001175.html [m5] http://www.beowulf.org/pipermail/beowulf/2001-September/001176.html [m6] http://www.beowulf.org/pipermail/beowulf/2001-September/001177.html * Rajkumar Buyya [m1] posted a link [1] to the online material of the NSF/TFCC Workshop on Teaching Cluster Computing. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001172.html [1] http://www.cs.uncc.edu/%7eabw/CCworkshop2001/ News from MOSIX mailing list by Benoit des Ligneris ===================================================================== * Linux Terminal Server Project AND Mosix by Wiliam Danau [m1] [1] [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0043.html [1] http://www.lpmo.edu/~daniau/ltsp-mosix/ * Howto install SlackWare 8.0 and mosix by Thomas A Web [m1] [1] [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Aug/0279.html [1] http://wordwonder.com/slackmosix.shtml * Release of Mosix 1.3.0 for kernel 2.4.9 is avalaible [m1]. You can download it [1] or view the changelog [2]. [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Aug/0282.html [1] http://www.mosix.cs.huji.ac.il/ftps/MOSIX-1.3.0-pre7.tar.gz [2] http://www.mosix.org/txt_changelog.html * Release of Mosixview 0.9 [1] [1] http://mosixview.sourceforge.net/indexen.html News from the High Availability world ====================================================================== DRBD devel by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * David Krovich posted [m1] a link to the docbook [1] version of the drbd howto [2] [1] http://www.slackworks.com/~dkrovich/DRBD/DRBD-HOWTO.sgml [2] http://www.slackworks.com/~dkrovich/DRBD [m1] http://www.geocrawler.com/archives/3/3756/2001/8/0/6529406/ * Philipp Reisner explained [m3] to Paul Botelho [m2] how to bypass a transfer log size issue. [m2] http://www.geocrawler.com/archives/3/3756/2001/8/0/6521622/ [m3] http://www.geocrawler.com/archives/3/3756/2001/8/0/6524266/ Linux-HA by Rached Ben Mustapha (rached@alinka.com) ======== * Alan Cox replied [m1] to a post [m2] announcing GFS license change, that the OpenGFS project has been launched, and that an action has been taken because GFS is blatantly violating the GPL license, which protects the linux kernel. [m1] http://marc.theaimsgroup.com/?l=linux-ha&m=99920680132473&w=2 [m2] http://marc.theaimsgroup.com/?l=linux-ha&m=99920545725731&w=2 LVS by Rached Ben Mustapha (rached@alinka.com) ======== * Matthew S. Crocker asked [m1] if LVS was receiving connections after any ipchains/iptables rules, in order to block a class C adress range. Julian Anastasov replied [m2] that LVS receives connections after the firewall rules. [m1] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=99928646602911&w=2 [m2] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=99928759706937&w=2 News on the Filesystems front ====================================================================== reiserfs by Guillaume Gimenez (ggimenez@alinka.com) ======== * Harmon Seaver asks [m4] if anyone is using encrypted file system with ReiserFS. (the thread [3]) [3] http://marc.theaimsgroup.com/?t=99935493100001&w=2&r=1 [m4] http://marc.theaimsgroup.com/?l=reiserfs&m=99935464825974&w=2 * Vladimir V. Saveliev posted [m5] a link to patch [4] that fixes a large files issue (>2GB). (the thread [5]) [4] ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.7.pending/2.4.7-plug-hole-and-pap-5660-pathrelse-fixes.dif [5] http://marc.theaimsgroup.com/?t=99925216000003&w=2&r=1 [m5] http://marc.theaimsgroup.com/?l=reiserfs&m=99925496024061&w=2 GFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * There was a lot of discussion about the new licensing scheme of GFS. Here is the explanations of Sistina and the reaction [1m]. Please also see our Linux-HA section above. * The GFS code has forked [1]. Some people thought the relicensing of GFS violates the GPL (since some external people from Sistina contributed to the code with patches). They spent their time on retrieve the last available code from Sistina (based on GFS 4.1.1 plus some glitches) [2m]. [1m] http://lists.sistina.com/pipermail/gfs-devel/2001-September/002127.html and the followings. [2m] http://www.geocrawler.com/lists/3/SourceForge/15276/0/6542974/ and the followings. [1] http://www.opengfs.org Intermezzo by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Peter Braam announced Intermezzo 1.0.5.1 [1m] [1m] http://www.geocrawler.com/lists/3/SourceForge/8078/0/6540002/ JFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Steve Best announced JFS 1.0.4 [m1] [m1] http://oss.software.ibm.com/pipermail/jfs-discussion/2001-August/000544.html News on other cluster related topics ====================================================================== LVM by Bruno Muller (bmuller@alinka.com) ===== * AJ Lewis announced LVM 1.0.1-rc2 available at www.sistina.com[m1]. [m1] http://lists.sistina.com/pipermail/linux-lvm/2001-September/008647.html ====================================================================== To subscribe to the list, send e-mail to clustering@alinka.com from the address you wish to subscribe, with the word "subscribe" in the subject. To unsubscribe from the list, send e-mail to clustering@alinka.com from the address you wish to unsubscribe from, with the word "unsubscribe" in the subject. Alinka is the editor of the ALINKA ORANGES and ALINKA RAISIN administration software for Linux clusters. (Web site: http://www.alinka.com ) From siegert at sfu.ca Fri Sep 7 12:20:48 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:40 2009 Subject: WOL: how does it work? Message-ID: <20010907122048.A31459@stikine.ucs.sfu.ca> I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster and until now I failed. Admittedly I don't know much about WOL, thus this failure may just be due to some stupid mistake on my part. Here is the problem: Each node draws a current of about 1.5A (I measured that a few days ago). Since I have about 70 of those, booting all nodes at once will draw all of the sudden a current of more than 100A. The people who run our machine room don't allow me to do that (probably for good reason). Thus I decided on the following approach: The bios for the motherboard that I'm using (Tyan Thunder K7) allows two setting for what to do after a power failure when the power comes back on: a) stay off or b) power on. Instead of choosing b) for all nodes (wich would cause the aforementioned problem) I want to choose b) only for the master node and a) for all slaves. Then use WOL from the master node to wake up the slave sequentially using a script and the ether-wake program from http://www.scyld.com/expert/wake-on-lan.html. Unfortunately, I have been unable to wake up a node. Here is what I do: "halt" a node. Detach the power cable. Reattach the power cable. At this point the lights on the two onboard NICs (the Tyan web site and the printing on the chips say that those are 3c920, the 3c59x driver identifies them as 3c980; I don't know whether that is relevant; the NICs work fine) come on. A Tyan technician told me that WOL on the Thunder K7 is always on, no special BIOS setup would be needed. They also told me that I have to use a 2.4.x kernel because only those would support APCI. I don't understand why the kernel is important here: when the node is halted what difference does the kernel make for the receiving of the magic WOL packet that is supposed to wake up the box? Anyway, I compiled a 2.4.9-ac8 kernel with APIC enabled, which I use with the "noapic" kernel option in /etc/lilo.conf. I have also tried the stock RH 7.1 2.4.3-12smp kernel without any difference with respect to WOL (i.e., no success). After reattaching the power cable I then send the magic packet from the master node: ./ether-wake -i eth4 00:E0:81:03:21:DD where 00:E0:81:03:21:DD is the MAC address of one of the onboard NICs on the node. tcpdump shows that the packet actually is sent. Also the lights in the NICs on the sending and receiving end flash, but otherwise nothing happens. What's wrong? Any suggestions are most appreciated. Thanks! Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From lindahl at conservativecomputer.com Fri Sep 7 13:14:49 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: WOL: how does it work? In-Reply-To: <20010907122048.A31459@stikine.ucs.sfu.ca>; from siegert@sfu.ca on Fri, Sep 07, 2001 at 12:20:48PM -0700 References: <20010907122048.A31459@stikine.ucs.sfu.ca> Message-ID: <20010907161449.A16583@wumpus.foo> On Fri, Sep 07, 2001 at 12:20:48PM -0700, Martin Siegert wrote: > Here is the problem: Each node draws a current of about 1.5A (I measured > that a few days ago). Since I have about 70 of those, booting all nodes > at once will draw all of the sudden a current of more than 100A. The > people who run our machine room don't allow me to do that (probably for > good reason). I can't really answer your question, but there's an alternate solution. You can use a device which delays the booting of some nodes. For example, the APC MasterSwitch has the ability to let you power cycle nodes by attaching to a web browser, but another feature is that it can power up the nodes with a delay after a power failure. It's a bit expensive for this purpose ($354 for 8 plugs @ 120V, 12A total), but maybe you can find something cheaper, such as an X10 based controller. greg From cpignol at seismiccity.com Fri Sep 7 14:23:27 2001 From: cpignol at seismiccity.com (Claude Pignol) Date: Wed Nov 25 01:01:41 2009 Subject: WOL: how does it work? References: <20010907122048.A31459@stikine.ucs.sfu.ca> Message-ID: <3B993ACF.40001@seismiccity.com> I think the wake up depends on how the node enter the sleeping state. May be you could try this with the original kernel of Beowulf 27bz-7 power down remotely a node: bpctl -S Node_Number -s pwroff The node shutdown and the power off but the NIC is alive Don't remove the power plug. (Important) send the magic packet ether-wake -i eth4 00:E0:81:03:21:DD It should wake up the node If not add the kernel parameter apm=power-off to the slave node kernel And restart the whole process. This works fine for me (but It's not the same motherboard) I hope this could help Claude Martin Siegert wrote: >I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster >and until now I failed. Admittedly I don't know much about WOL, thus >this failure may just be due to some stupid mistake on my part. > >Here is the problem: Each node draws a current of about 1.5A (I measured >that a few days ago). Since I have about 70 of those, booting all nodes >at once will draw all of the sudden a current of more than 100A. The >people who run our machine room don't allow me to do that (probably for >good reason). Thus I decided on the following approach: > >The bios for the motherboard that I'm using (Tyan Thunder K7) allows >two setting for what to do after a power failure when the power comes >back on: a) stay off or b) power on. >Instead of choosing b) for all nodes (wich would cause the aforementioned >problem) I want to choose b) only for the master node and a) for all slaves. >Then use WOL from the master node to wake up the slave sequentially >using a script and the ether-wake program from >http://www.scyld.com/expert/wake-on-lan.html. > >Unfortunately, I have been unable to wake up a node. Here is what I do: >"halt" a node. Detach the power cable. Reattach the power cable. >At this point the lights on the two onboard NICs (the Tyan web site >and the printing on the chips say that those are 3c920, the 3c59x driver >identifies them as 3c980; I don't know whether that is relevant; the NICs >work fine) come on. A Tyan technician told me that WOL on the Thunder K7 is >always on, no special BIOS setup would be needed. They also told me that I >have to use a 2.4.x kernel because only those would support APCI. I don't >understand why the kernel is important here: when the node is halted >what difference does the kernel make for the receiving of the magic WOL >packet that is supposed to wake up the box? Anyway, I compiled a >2.4.9-ac8 kernel with APIC enabled, which I use with the "noapic" >kernel option in /etc/lilo.conf. I have also tried the stock RH 7.1 >2.4.3-12smp kernel without any difference with respect to WOL (i.e., >no success). > >After reattaching the power cable I then send the magic packet from >the master node: > >./ether-wake -i eth4 00:E0:81:03:21:DD > >where 00:E0:81:03:21:DD is the MAC address of one of the onboard NICs >on the node. tcpdump shows that the packet actually is sent. Also the >lights in the NICs on the sending and receiving end flash, but otherwise >nothing happens. > >What's wrong? Any suggestions are most appreciated. > >Thanks! > >Martin > >======================================================================== >Martin Siegert >Academic Computing Services phone: (604) 291-4691 >Simon Fraser University fax: (604) 291-4242 >Burnaby, British Columbia email: siegert@sfu.ca >Canada V5A 1S6 >======================================================================== > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ------------------------------------------------------------------------ Claude Pignol SeismicCity, Inc. 2900 Wilcrest Dr. Suite 470 Houston TX 77042 Phone:832 251 1471 Mob:281 703 2933 Fax:832 251 0586 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010907/ba5e5f93/attachment.html From rgb at phy.duke.edu Fri Sep 7 18:39:08 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:41 2009 Subject: WOL: how does it work? In-Reply-To: <20010907161449.A16583@wumpus.foo> Message-ID: On Fri, 7 Sep 2001, Greg Lindahl wrote: > On Fri, Sep 07, 2001 at 12:20:48PM -0700, Martin Siegert wrote: > > > Here is the problem: Each node draws a current of about 1.5A (I measured > > that a few days ago). Since I have about 70 of those, booting all nodes > > at once will draw all of the sudden a current of more than 100A. The > > people who run our machine room don't allow me to do that (probably for > > good reason). > > I can't really answer your question, but there's an alternate > solution. You can use a device which delays the booting of some > nodes. For example, the APC MasterSwitch has the ability to let you > power cycle nodes by attaching to a web browser, but another feature > is that it can power up the nodes with a delay after a power failure. > It's a bit expensive for this purpose ($354 for 8 plugs @ 120V, 12A > total), but maybe you can find something cheaper, such as an X10 based > controller. I haven't yet tried it, but a lot of ethernet cards now support Wake On Lan, and ATX power supplies can boot in software once power is delivered to the switching supply. They are usually the better ethernet cards anyway, the sort one would probably prefer to use in a cluster. We were hoping/planning to arrange it so that a relative few master nodes controlled when the slave nodes start up (and shut down in the event of a loss of AC). Is this not possible? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From becker at scyld.com Fri Sep 7 22:31:00 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:41 2009 Subject: WOL: how does it work? In-Reply-To: <20010907122048.A31459@stikine.ucs.sfu.ca> Message-ID: On Fri, 7 Sep 2001, Martin Siegert wrote: > I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster ... > Unfortunately, I have been unable to wake up a node. Here is what I do: > "halt" a node. Detach the power cable. Reattach the power cable. > At this point the lights on the two onboard NICs (the Tyan web site > and the printing on the chips say that those are 3c920, the 3c59x driver > identifies them as 3c980; I don't know whether that is relevant; the NICs > work fine) come on. This is likely a ethercard-specific problem. Pre-CX 3Com cards don't automatically go into wake-on-LAN mode. The driver must first be loaded, and the card left in the correct state (TotalReset + ACPI-D3). My 3c59x.c driver takes care to do this properly. I believe that the new 3c905CX cards do have a setting for automatically configuring the card for WOL with just stand-by power. Most other Ethernet adapters enable wake-on-magic-packet when stand-by power is first applied. > A Tyan technician told me that WOL on the Thunder K7 is > always on, no special BIOS setup would be needed. Likely true. If you use a WOL cable, the ethernet adapter almost literally pushes the power butter. If you rely on standby power from the PCI slot, the chipset must default to treating the PME signal as a power-on signal. > They also told me that I have to use a 2.4.x kernel because only those > would support APCI. That's false, and mostly not relevant. My pci-scan code adds PCI power management state control to the 2.2 kernel, which is part of the ACPI spec. The only aspect which is relavent is the ability to soft power down the system. That might require an ACPI Control Language interpreter if your motherboard does not have APM functions. > I don't understand why the kernel is important here: when the node is > halted what difference does the kernel make for the receiving of the > magic WOL packet that is supposed to wake up the box? Yup. After restoring power, the OS has never had a chance to run. Only the power-down procedure depends on the kernel support. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From becker at scyld.com Fri Sep 7 22:31:58 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:41 2009 Subject: WOL: how does it work? In-Reply-To: Message-ID: On Fri, 7 Sep 2001, Robert G. Brown wrote: > We were hoping/planning to arrange it so that a relative few master > nodes controlled when the slave nodes start up (and shut down in the > event of a loss of AC). Is this not possible? The Scyld Beowulf system has wake-up and sleep support built in. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From LuismarP at cheque.uq.edu.au Sat Sep 8 01:08:29 2001 From: LuismarP at cheque.uq.edu.au (Luismar Marques Porto) Date: Wed Nov 25 01:01:41 2009 Subject: Price of a 24 cluster Message-ID: Dear Beowulf Angels, Is there anyone out there willing to help me estimating the price of setting up a 24 PC beowulf cluster with Intel P4, 2.0GHz, dual CPU, 1 GB RAM, 4x40 HD/node, including switches and a bridge for a ATM conection? I just need an estimate at this point, but any particular configuration with respective amount spent would be of a really big help, since I am in a hurry to submit a resarch project. If you prefer, you could just tell me about your experience sending a particular message to luismarp@cheque.uq.edu.au. I really appreciate any comments and help. Regards, Luismar Luismar Marques Porto Laboratory for Biological Engineering Department of Chemical Engineering The University of Queensland AUSTRALIA From mwheeler at startext.co.uk Sat Sep 8 10:54:10 2001 From: mwheeler at startext.co.uk (Martin WHEELER) Date: Wed Nov 25 01:01:41 2009 Subject: D-Link switch for b@h In-Reply-To: Message-ID: Thanks to all who responded to my query, both off-list and on. Surprisingly (to me), my best-value-for-money turned out to be a Netgear *sixteen* port switch, rather than any of the 8-port switches I had been looking at. Once again, thanks for pointing me in the right direction. msw -- *** Free Speech *** Free Dmitry Sklyarov *** Sell your shares in Adobe. Boycott ALL American non-free software. see: http://uk.eurorights.org/ http://uk.freesklyarov.org/ From postmaster at webdsi.com Sat Sep 8 13:04:01 2001 From: postmaster at webdsi.com (postmaster@webdsi.com) Date: Wed Nov 25 01:01:41 2009 Subject: Autoreply: Beowulf digest, Vol 1 #570 - 7 msgs Message-ID: The user that you sent email to does not exist on this server. Please check the email address and try again. Your message reads: Received: from blueraja.scyld.com (unverified [216.254.93.179]) by mail.webdsi.com (Rockliffe SMTPRA 4.5.4) with ESMTP id for ; Sat, 8 Sep 2001 15:04:01 -0500 Received: from blueraja.scyld.com (localhost [127.0.0.1]) by blueraja.scyld.com (8.9.3/8.9.3) with ESMTP id MAA19814; Sat, 8 Sep 2001 12:00:07 -0400 Date: Sat, 8 Sep 2001 12:00:07 -0400 Message-Id: <200109081600.MAA19814@blueraja.scyld.com> From: beowulf-admin@beowulf.org Subject: Beowulf digest, Vol 1 #570 - 7 msgs Reply-to: beowulf@beowulf.org X-mailer: Mailman v1.1 Mime-version: 1.0 Content-type: text/plain To: beowulf@beowulf.org Sender: beowulf-admin@beowulf.org Errors-To: beowulf-admin@beowulf.org X-Mailman-Version: 1.1 Precedence: bulk List-Id: Discussion of topics related to Beowulf clusters X-BeenThere: beowulf@beowulf.org Send Beowulf mailing list submissions to beowulf@beowulf.org To subscribe or unsubscribe via the web, visit http://www.beowulf.org/mailman/listinfo/beowulf or, via email, send a message with subject or body 'help' to beowulf-request@beowulf.org You can reach the person managing the list at beowulf-admin@beowulf.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beowulf digest..." Today's Topics: 1. Re:WOL: how does it work? (Greg Lindahl) 2. Re:WOL: how does it work? (Claude Pignol) 3. Re:WOL: how does it work? (Robert G. Brown) 4. Re:WOL: how does it work? (Donald Becker) 5. Re:WOL: how does it work? (Donald Becker) 6. Price of a 24 cluster (Luismar Marques Porto) 7. RE:D-Link switch for b@h (Martin WHEELER) --__--__-- Message: 1 Date: Fri, 7 Sep 2001 16:14:49 -0400 From: Greg Lindahl To: beowulf@beowulf.org Subject: Re: WOL: how does it work? On Fri, Sep 07, 2001 at 12:20:48PM -0700, Martin Siegert wrote: > Here is the problem: Each node draws a current of about 1.5A (I measured > that a few days ago). Since I have about 70 of those, booting all nodes > at once will draw all of the sudden a current of more than 100A. The > people who run our machine room don't allow me to do that (probably for > good reason). I can't really answer your question, but there's an alternate solution. You can use a device which delays the booting of some nodes. For example, the APC MasterSwitch has the ability to let you power cycle nodes by attaching to a web browser, but another feature is that it can power up the nodes with a delay after a power failure. It's a bit expensive for this purpose ($354 for 8 plugs @ 120V, 12A total), but maybe you can find something cheaper, such as an X10 based controller. greg --__--__-- Message: 2 Date: Fri, 07 Sep 2001 16:23:27 -0500 From: Claude Pignol Organization: SeismicCity, Inc. To: Martin Siegert CC: beowulf Subject: Re: WOL: how does it work? boundary="------------060806040101010806080706" --------------060806040101010806080706 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit I think the wake up depends on how the node enter the sleeping state. May be you could try this with the original kernel of Beowulf 27bz-7 power down remotely a node: bpctl -S Node_Number -s pwroff The node shutdown and the power off but the NIC is alive Don't remove the power plug. (Important) send the magic packet ether-wake -i eth4 00:E0:81:03:21:DD It should wake up the node If not add the kernel parameter apm=power-off to the slave node kernel And restart the whole process. This works fine for me (but It's not the same motherboard) I hope this could help Claude Martin Siegert wrote: >I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster >and until now I failed. Admittedly I don't know much about WOL, thus >this failure may just be due to some stupid mistake on my part. > >Here is the problem: Each node draws a current of about 1.5A (I measured >that a few days ago). Since I have about 70 of those, booting all nodes >at once will draw all of the sudden a current of more than 100A. The >people who run our machine room don't allow me to do that (probably for >good reason). Thus I decided on the following approach: > >The bios for the motherboard that I'm using (Tyan Thunder K7) allows >two setting for what to do after a power failure when the power comes >back on: a) stay off or b) power on. >Instead of choosing b) for all nodes (wich would cause the aforementioned >problem) I want to choose b) only for the master node and a) for all slaves. >Then use WOL from the master node to wake up the slave sequentially >using a script and the ether-wake program from >http://www.scyld.com/expert/wake-on-lan.html. > >Unfortunately, I have been unable to wake up a node. Here is what I do: >"halt" a node. Detach the power cable. Reattach the power cable. >At this point the lights on the two onboard NICs (the Tyan web site >and the printing on the chips say that those are 3c920, the 3c59x driver >identifies them as 3c980; I don't know whether that is relevant; the NICs >work fine) come on. A Tyan technician told me that WOL on the Thunder K7 is >always on, no special BIOS setup would be needed. They also told me that I >have to use a 2.4.x kernel because only those would support APCI. I don't >understand why the kernel is important here: when the node is halted >what difference does the kernel make for the receiving of the magic WOL >packet that is supposed to wake up the box? Anyway, I compiled a >2.4.9-ac8 kernel with APIC enabled, which I use with the "noapic" >kernel option in /etc/lilo.conf. I have also tried the stock RH 7.1 >2.4.3-12smp kernel without any difference with respect to WOL (i.e., >no success). > >After reattaching the power cable I then send the magic packet from >the master node: > >./ether-wake -i eth4 00:E0:81:03:21:DD > >where 00:E0:81:03:21:DD is the MAC address of one of the onboard NICs >on the node. tcpdump shows that the packet actually is sent. Also the >lights in the NICs on the sending and receiving end flash, but otherwise >nothing happens. > >What's wrong? Any suggestions are most appreciated. > >Thanks! > >Martin > >======================================================================== >Martin Siegert >Academic Computing Services phone: (604) 291-4691 >Simon Fraser University fax: (604) 291-4242 >Burnaby, British Columbia email: siegert@sfu.ca >Canada V5A 1S6 >======================================================================== > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- ------------------------------------------------------------------------ Claude Pignol SeismicCity, Inc. 2900 Wilcrest Dr. Suite 470 Houston TX 77042 Phone:832 251 1471 Mob:281 703 2933 Fax:832 251 0586 --------------060806040101010806080706 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit I think the wake up depends on how the node enter the  sleeping state.
May be you could try this with the original kernel of Beowulf 27bz-7
power down remotely a node:
bpctl -S Node_Number -s pwroff

The node shutdown and the power off but the NIC is alive
Don't remove the power plug. (Important)

send the magic packet
ether-wake -i eth4 00:E0:81:03:21:DD
It should wake up the node
If not add the kernel parameter
apm=power-off
to the slave node kernel

And restart the whole process.

This works fine for me (but It's not the same motherboard)
I hope this could help
Claude


 

Martin Siegert wrote:
I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster
and until now I failed. Admittedly I don't know much about WOL, thus
this failure may just be due to some stupid mistake on my part.

Here is the problem: Each node draws a current of about 1.5A (I measured
that a few days ago). Since I have about 70 of those, booting all nodes
at once will draw all of the sudden a current of more than 100A. The
people who run our machine room don't allow me to do that (probably for
good reason). Thus I decided on the following approach:

The bios for the motherboard that I'm using (Tyan Thunder K7) allows
two setting for what to do after a power failure when the power comes
back on: a) stay off or b) power on.
Instead of choosing b) for all nodes (wich would cause the aforementioned
problem) I want to choose b) only for the master node and a) for all slaves.
Then use WOL from the master node to wake up the slave seq uentially
using a script and the ether-wake program from
http://www.scyld.com/expert/wake-on-lan.html.

Unfortunately, I have been unable to wake up a node. Here is what I do:
"halt" a node. Detach the power cable. Reattach the power cable.
At this point the lights on the two onboard NICs (the Tyan web site
and the printing on the chips say that those are 3c920, the 3c59x driver
identifies them as 3c980; I don't know whether that is relevant; the NICs
work fine) come on. A Tyan technician told me that WOL on the Thunder K7 is
always on, no special BIOS setup would be needed. They also told me that I
have to use a 2.4.x kernel because only those would support APCI. I don't
understand why the kernel is important here: when the node is halted
what difference does the kernel make for the receiving of the magic WOL
packet that is supposed to wake up th e box? Anyway, I compiled a
2.4.9-ac8 kernel with APIC enabled, which I use with the "noapic"
kernel option in /etc/lilo.conf. I have also tried the stock RH 7.1
2.4.3-12smp kernel without any difference with respect to WOL (i.e.,
no success).

After reattaching the power cable I then send the magic packet from
the master node:

./ether-wake -i eth4 00:E0:81:03:21:DD

where 00:E0:81:03:21:DD is the MAC address of one of the onboard NICs
on the node. tcpdump shows that the packet actually is sent. Also the
lights in the NICs on the sending and receiving end flash, but otherwise
nothing happens.

What's wrong? Any suggestions are most appreciated.

Thanks!

Martin

========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnab y, British Columbia email: siegert@sfu.ca
Canada V5A 1S6
========================================================================

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


--

Claude Pignol SeismicCity, Inc.
2900 Wilcrest Dr.    Suite 470  Houston TX 77042
Phone:832 251 1471 Mob:281 703 2933  Fax:832 251 0586

--------------060806040101010806080706-- --__--__-- Message: 3 Date: Fri, 7 Sep 2001 21:39:08 -0400 (EDT) From: "Robert G. Brown" To: Greg Lindahl Cc: Subject: Re: WOL: how does it work? On Fri, 7 Sep 2001, Greg Lindahl wrote: > On Fri, Sep 07, 2001 at 12:20:48PM -0700, Martin Siegert wrote: > > > Here is the problem: Each node draws a current of about 1.5A (I measured > > that a few days ago). Since I have about 70 of those, booting all nodes > > at once will draw all of the sudden a current of more than 100A. The > > people who run our machine room don't allow me to do that (probably for > > good reason). > > I can't really answer your question, but there's an alternate > solution. You can use a device which delays the booting of some > nodes. For example, the APC MasterSwitch has the ability to let you > power cycle nodes by attaching to a web browser, but another feature > is that it can power up the nodes with a delay after a power failure. > It's a bit expensive for this purpose ($354 for 8 plugs @ 120V, 12A > total), but maybe you can find something cheaper, such as an X10 based > controller. I haven't yet tried it, but a lot of ethernet cards now support Wake On Lan, and ATX power supplies can boot in software once power is delivered to the switching supply. They are usually the better ethernet cards anyway, the sort one would probably prefer to use in a cluster. We were hoping/planning to arrange it so that a relative few master nodes controlled when the slave nodes start up (and shut down in the event of a loss of AC). Is this not possible? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu --__--__-- Message: 4 Date: Sat, 8 Sep 2001 01:31:00 -0400 (EDT) From: Donald Becker To: Martin Siegert cc: beowulf@beowulf.org Subject: Re: WOL: how does it work? On Fri, 7 Sep 2001, Martin Siegert wrote: > I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster ... > Unfortunately, I have been unable to wake up a node. Here is what I do: > "halt" a node. Detach the power cable. Reattach the power cable. > At this point the lights on the two onboard NICs (the Tyan web site > and the printing on the chips say that those are 3c920, the 3c59x driver > identifies them as 3c980; I don't know whether that is relevant; the NICs > work fine) come on. This is likely a ethercard-specific problem. Pre-CX 3Com cards don't automatically go into wake-on-LAN mode. The driver must first be loaded, and the card left in the correct state (TotalReset + ACPI-D3). My 3c59x.c driver takes care to do this properly. I believe that the new 3c905CX cards do have a setting for automatically configuring the card for WOL with just stand-by power. Most other Ethernet adapters enable wake-on-magic-packet when stand-by power is first applied. > A Tyan technician told me that WOL on the Thunder K7 is > always on, no special BIOS setup would be needed. Likely true. If you use a WOL cable, the ethernet adapter almost literally pushes the power butter. If you rely on standby power from the PCI slot, the chipset must default to treating the PME signal as a power-on signal. > They also told me that I have to use a 2.4.x kernel because only those > would support APCI. That's false, and mostly not relevant. My pci-scan code adds PCI power management state control to the 2.2 kernel, which is part of the ACPI spec. The only aspect which is relavent is the ability to soft power down the system. That might require an ACPI Control Language interpreter if your motherboard does not have APM functions. > I don't understand why the kernel is important here: when the node is > halted what difference does the kernel make for the receiving of the > magic WOL packet that is supposed to wake up the box? Yup. After restoring power, the OS has never had a chance to run. Only the power-down procedure depends on the kernel support. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 --__--__-- Message: 5 Date: Sat, 8 Sep 2001 01:31:58 -0400 (EDT) From: Donald Becker To: "Robert G. Brown" cc: Beowulf List Subject: Re: WOL: how does it work? On Fri, 7 Sep 2001, Robert G. Brown wrote: > We were hoping/planning to arrange it so that a relative few master > nodes controlled when the slave nodes start up (and shut down in the > event of a loss of AC). Is this not possible? The Scyld Beowulf system has wake-up and sleep support built in. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 --__--__-- Message: 6 Reply-To: From: "Luismar Marques Porto" To: Subject: Price of a 24 cluster Date: Sat, 8 Sep 2001 18:08:29 +1000 charset="iso-8859-1" Dear Beowulf Angels, Is there anyone out there willing to help me estimating the price of setting up a 24 PC beowulf cluster with Intel P4, 2.0GHz, dual CPU, 1 GB RAM, 4x40 HD/node, including switches and a bridge for a ATM conection? I just need an estimate at this point, but any particular configuration with respective amount spent would be of a really big help, since I am in a hurry to submit a resarch project. If you prefer, you could just tell me about your experience sending a particular message to luismarp@cheque.uq.edu.au. I really appreciate any comments and help. Regards, Luismar Luismar Marques Porto Laboratory for Biological Engineering Department of Chemical Engineering The University of Queensland AUSTRALIA --__--__-- Message: 7 Date: Sat, 8 Sep 2001 17:54:10 +0000 (UTC) From: Martin WHEELER To: Subject: RE: D-Link switch for b@h Thanks to all who responded to my query, both off-list and on. Surprisingly (to me), my best-value-for-money turned out to be a Netgear *sixteen* port switch, rather than any of the 8-port switches I had been looking at. Once again, thanks for pointing me in the right direction. msw -- *** Free Speech *** Free Dmitry Sklyarov *** Sell your shares in Adobe. Boycott ALL American non-free software. see: http://uk.eurorights.org/ http://uk.freesklyarov.org/ --__--__-- _______________________________________________ Beowulf mailing list Beowulf@beowulf.org http://www.beowulf.org/mailman/listinfo/beowulf End of Beowulf Digest From laurence at elinux.com.sg Sat Sep 8 16:34:06 2001 From: laurence at elinux.com.sg (Laurence Liew) Date: Wed Nov 25 01:01:41 2009 Subject: Price of a 24 cluster In-Reply-To: References: Message-ID: <999992046.3b9aaaee5eb8d@sun.elinux.com.sg> Dear Luismar, If you are building the cluster yourself.. then you only need to conisder the hardware you need. You need to include in racks to house the systems and network cablings. ... based on current prices of hardware.. I expect the cluster to cost you around A$150,000 for a normal tower casing system.. if you are going for a branded hardware like IBM's x330 1U systems.. (PIII's only no P4s).. you are probably looking at A$220,000. Add an additional 10 - 15% markup for integration services if you are using an integrator. Personally I would recommend going for a server class systems like the IBM x330 or the new compaq 1U systems (Please note that we are an IBM business partner). We have had 70 nodes of such systems running for over 3 months without any hardware failures.... this is important especially if your simulations are long and run over many days.... checkpointing is still not generally available on linux (unless you code for them or patch your system with some of the checkpointing patches available).. to have system restart is not nice :-) And as a system-integrator... it is a lose making proposition to propose low end machines... the higher chances of machine failures make the frequent trips to customer site very expensive. The above pricing is based on list prices.. as you are an edu, you should get generous discounts... Hope this helps.. Cheers! Laurence Quoting Luismar Marques Porto : > Dear Beowulf Angels, > > Is there anyone out there willing to help me estimating the > price of setting up a 24 PC beowulf cluster with Intel P4, 2.0GHz, > dual CPU, 1 GB RAM, 4x40 HD/node, including switches and a bridge > for a ATM conection? > > I just need an estimate at this point, but any particular > configuration with respective amount spent would be of a really > big help, since I am in a hurry to submit a resarch project. > > If you prefer, you could just tell me about your experience sending > a particular message to luismarp@cheque.uq.edu.au. > > I really appreciate any comments and help. > > Regards, > > Luismar > > Luismar Marques Porto > Laboratory for Biological Engineering > Department of Chemical Engineering > The University of Queensland > AUSTRALIA > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > --- Laurence eLinux Pte Ltd From jim_windle at eudoramail.com Sun Sep 9 09:17:05 2001 From: jim_windle at eudoramail.com (Jim Windle) Date: Wed Nov 25 01:01:41 2009 Subject: Power Consumption Message-ID: While it is not directly relevant to Beowulfs, there is an article regarding power consumption in the current issue of "American Scientist: http://www.americanscientist.org/issues/Comsci01/Compsci2001-09.html The author is rebutting assertions that internet based computers account for roughly 10% of electrical power used in the US. In addition to reviewing different sources of power consumption numbers he hooked up a digital power meter to his system. In normal usage he figured it came to 150-170 watts of which 50 watts was for the CPU versus a rating of 400 watts according to the manufacturer and about 100 watts for the monitor. Jim Windle Join 18 million Eudora users by signing up for a free Eudora Web-Mail account at http://www.eudoramail.com From anuradha at gnu.org Sun Sep 9 22:58:03 2001 From: anuradha at gnu.org (Anuradha Ratnaweera) Date: Wed Nov 25 01:01:41 2009 Subject: "Small" motherboards for diskless nodes Message-ID: <20010910115803.A429@bee.lk> Hi, I have gone through the archives to find an answer to this question, but didn't come across something very relevent. We are looking for "mini" motherboards to build a 32 node cluster. Ideally, they should not have PCI/AGP expansion slots and onboard sound. But they should have onboard NIC (100 Mbps) and video. Also we will not be using the FDD and IDE controllers. Netwoking booting is necessary. We would probably go for Pentium II or III chips. Is there a significant difference between them and Celeron (and AMD perhaps)? Thanks in advance. Regards and greetings, Anuradha -- Debian GNU/Linux (kernel 2.4.9) A light wife doth make a heavy husband. -- Wm. Shakespeare, "The Merchant of Venice" From timm at fnal.gov Mon Sep 10 08:18:34 2001 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:01:41 2009 Subject: IDE problems Message-ID: Hi everyone... We have a system with Supermicro 370LE motherboard that has 20Gb IDE system disk (primary IDE master) and CD-rom (primary IDE slave). (2xPentium III 1 GHZ). On about six of our 136 nodes we have seen errors like the following: wait_on_bh, CPU 1: irq: 0 [0 0] bh: 1 [1 0] <[c010c289]> <[c0179d91]> <[c017edfb]> <[c01533d4]> <[c0138148]> hda: status timeout: status=0x90 { Busy } hda: drive not ready for command ide0: reset timed-out, status=0x90 hda: status timeout: status=0x90 { Busy } hda: drive not ready for command ide0: reset timed-out, status=0x90 hda: status timeout: status=0x90 { Busy } end_request: I/O error, dev 03:01 (hda), sector 3678384 hda: drive not ready for command EXT2-fs error (device ide0(3,1)): ext2_write_inode: unable to read inode block - inode=304663, block=622597 We are currently running 2.2.19-6.2.1 kernel as it came from Red Hat. ---------------------------------------------------- Now, whenever I have seen errors like this before, it has meant a hardware fault with the disk. But with any of these, we just reboot the system, it does a fsck of the system disk, and everything is fine again. Can anyone give me a clue as to 1) How errors that include an I/O error could mean anything else than a hardware error on the disk? 2) What may be causing these errors? 3) What resources are out there on the net for IDE faq's on Linux 4) If we go to 2.4 kernels is it likely to get better? Thanks Steve Timm ------------------------------------------------------------------ Steven C. Timm (630) 840-8525 timm@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division/Operating Systems Support Scientific Computing Support Group--Computing Farms Operations From justin at cs.duke.edu Mon Sep 10 09:14:07 2001 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:01:41 2009 Subject: IDE problems In-Reply-To: Message-ID: > We have a system with Supermicro 370LE motherboard that > has 20Gb IDE system disk (primary IDE master) and CD-rom (primary > IDE slave). (2xPentium III 1 GHZ). > > [snip ...] > > We are currently running 2.2.19-6.2.1 kernel as it came from Red Hat. There are some known problems with the ServerWorks LE chipset that cause filesystem corruption on some recent kernels. The bottom line is that it appears to be a BIOS bug, and that disabling DMA on your IDE chains will prevent that. A chain of BIOS updates may or may not solve your problem, but I haven't seen anything too encouraging yet. A detailed bug report can be found at http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=38429 Good luck. -jdm Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu From lindahl at conservativecomputer.com Mon Sep 10 09:28:33 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: i845 stream results Message-ID: <20010910122833.A9196@wumpus.foo> Today companies started releasing Pentium4 systems with the i845 chipset. Anyone have a stream number? Does this chipset do SMP? g From timm at fnal.gov Mon Sep 10 09:38:28 2001 From: timm at fnal.gov (Steven Timm) Date: Wed Nov 25 01:01:41 2009 Subject: IDE problems In-Reply-To: Message-ID: > > We are currently running 2.2.19-6.2.1 kernel as it came from Red Hat. > > There are some known problems with the ServerWorks LE chipset that > cause filesystem corruption on some recent kernels. The bottom line is > that it appears to be a BIOS bug, and that disabling DMA on your IDE > chains will prevent that. A chain of BIOS updates may or may not solve > your problem, but I haven't seen anything too encouraging yet. A detailed > bug report can be found at > > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=38429 > > Good luck. > -jdm > The bug report in question is describing bugs in the 2.4 kernel. Has anyone seen the problems like that in 2.2 kernels besides me?-- Also, turning DMA off is a non-starter for us...we depend on DMA I/O for the application we are running. The vendor has tried three types of hard disk on the nodes so far... Maxtor, Seagate (which didn't show hda errors but did show massive filesystem corruption as described in the bug report above) and now Western Digital, where there are filesystem errors that hang the machine but the system usually comes back fine after reboot and fsck. They are now trying to go to IBM drives (which are working fine as data drives on these systems, albeit on the secondary IDE bus.) Steve Timm From joelja at darkwing.uoregon.edu Mon Sep 10 09:53:19 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:41 2009 Subject: i845 stream results In-Reply-To: <20010910122833.A9196@wumpus.foo> Message-ID: On Mon, 10 Sep 2001, Greg Lindahl wrote: > Today companies started releasing Pentium4 systems with the i845 > chipset. Anyone have a stream number? I'm prepared to be underwhelmed given the pc133 dram support... > Does this chipset do SMP? thing of the 845 as the 815 for p4's. the 860 is still your only choice for dual p4's > g > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From justin at cs.duke.edu Mon Sep 10 10:07:55 2001 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:01:41 2009 Subject: IDE problems In-Reply-To: Message-ID: > > > We are currently running 2.2.19-6.2.1 kernel as it came from Red Hat. > > > > There are some known problems with the ServerWorks LE chipset that > > cause filesystem corruption on some recent kernels. The bottom line is > > that it appears to be a BIOS bug, and that disabling DMA on your IDE > > chains will prevent that. A chain of BIOS updates may or may not solve > > your problem, but I haven't seen anything too encouraging yet. A detailed > > bug report can be found at > > > > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=38429 > > > > Good luck. > > -jdm > > > > The bug report in question is describing bugs in the 2.4 kernel. The next-to-last comment describes your problem with a Tyan board that uses the same chipset, running the 2.2.19 kernel from RedHat. > Has anyone seen the problems like that in 2.2 kernels besides me?-- > Also, turning DMA off is a non-starter for us...we depend on DMA I/O > for the application we are running. > > The vendor has tried three types of hard disk on the nodes so far... > Maxtor, Seagate (which didn't show hda errors but did > show massive filesystem corruption as described in the bug report above) > and now Western Digital, where there are filesystem > errors that hang the machine but the system usually comes back fine > after reboot and fsck. > They are now trying to go to IBM drives (which are working fine > as data drives on these systems, albeit on the secondary IDE bus.) Short of getting PCI IDE controllers I'm not quite sure what to suggest. I'm almost positive[1] that it really is a BIOS problem you're running into, and that's why your problem is hard-drive independent. -jdm [1] Positive (adj): Mistaken at the top of one's voice. - "The Devil's Dictionary", Ambrose Bierce Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu From joelja at darkwing.uoregon.edu Mon Sep 10 10:11:15 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:41 2009 Subject: IDE problems In-Reply-To: Message-ID: On Mon, 10 Sep 2001, Steven Timm wrote: > > > We are currently running 2.2.19-6.2.1 kernel as it came from Red Hat. > > > > There are some known problems with the ServerWorks LE chipset that > > cause filesystem corruption on some recent kernels. The bottom line is > > that it appears to be a BIOS bug, and that disabling DMA on your IDE > > chains will prevent that. A chain of BIOS updates may or may not solve > > your problem, but I haven't seen anything too encouraging yet. A detailed > > bug report can be found at > > > > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=38429 > > > > Good luck. > > -jdm > > > > The bug report in question is describing bugs in the 2.4 kernel. > Has anyone seen the problems like that in 2.2 kernels besides me?-- > Also, turning DMA off is a non-starter for us...we depend on DMA I/O > for the application we are running. the distro kernel from kernel for redhat, especially the smp and enterprise ones are pretty scary in terms of how they diverge from 2.2.19... I'd encourage you run the same boxes on 2.4.8 and/or 2.2.20pre9, all my serverworks boxes with ide in use (we have some with no ide at all) are currently running 2.4.8 or later... > The vendor has tried three types of hard disk on the nodes so far... > Maxtor, Seagate (which didn't show hda errors but did > show massive filesystem corruption as described in the bug report above) > and now Western Digital, where there are filesystem > errors that hang the machine but the system usually comes back fine > after reboot and fsck. > They are now trying to go to IBM drives (which are working fine > as data drives on these systems, albeit on the secondary IDE bus.) > > Steve Timm > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From mlrecv at yahoo.com Mon Sep 10 11:25:17 2001 From: mlrecv at yahoo.com (Zhifeng F. Chen) Date: Wed Nov 25 01:01:41 2009 Subject: Promise IDE controller and SBT2 Motherboard Message-ID: Hi, I've problems with Promise IDE controller and SBT2 motherboard. Does anyone know what else IDE controller I can use with SBT2? ~~~~~~~~~~~~~~~~~~~~~~~~~ Zhifeng F. Chen _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From alvin at Maggie.Linux-Consulting.com Mon Sep 10 11:36:10 2001 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:01:41 2009 Subject: Promise IDE controller and SBT2 Motherboard In-Reply-To: Message-ID: hi ya zhi what kind of problems ??? i had problems with the promise Tx2..but the ultra100 works fine ( needed to set the DMA bit /usr/sbin/hdparm -d1 /dev/hdXX other potential options.... hdparm -m16 -c1 -d1 -a8 /dev/hdXX otherwise... your other ide controllers... - adpatec, 3ware, etc have fun alvin On Mon, 10 Sep 2001, Zhifeng F. Chen wrote: > Hi, > I've problems with Promise IDE controller and SBT2 motherboard. Does > anyone know what else IDE controller I can use with SBT2? > From math at velocet.ca Mon Sep 10 11:51:26 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:41 2009 Subject: Anyone using PC-Chips motherboard? In-Reply-To: <20010901175552.77163.qmail@web13705.mail.yahoo.com>; from opengeometry@yahoo.ca on Sat, Sep 01, 2001 at 01:55:52PM -0400 References: <20010901175552.77163.qmail@web13705.mail.yahoo.com> Message-ID: <20010910145125.E45388@velocet.ca> On Sat, Sep 01, 2001 at 01:55:52PM -0400, William Park's all... > I would be grateful if I can get a feedback (good or bad) on PC-Chips > motherboards? I'm looking at using them for cheap diskless nodes. (These are all Athlon boards, note. We're using 1.333Ghz Tbirds) I had a bunch of M812 boards which have onboard AGP and a PcChips ethernet adapter. However, the M817 LMR board came out which has an onboard RTL 8139. Unfortunately it has no onboard video which is great for diskless nodes, allowing them to boot without an extra videocard and reducing the vertical profile. However, it has DDR SDRAM DIMM as well as regular 133Mhz DIMM slots, though you can only use one at a time. This is good for our expansion path actually, and besides, I havent seen a board that has DDR and onboard NIC elsewere. The great thing about the first 12 I got of these boards was that they boot without a video card (freeBSD in serial console mode works great). For diskless the M817 LMR is ideal due to the onboard RTL8139 and PXE. It also has RPL booting (like the M812, but the M812 has no PXE) but FreeBSD doenst have an RPL daemon and because of the nature of the packets it has to generate, the Linux RPL daemon doesnt work under freebsd. Sides, RPL is a nasty hacque. PXE is much more standard and works great. I can boot up new nodes almost as fast as I plug them in (we've played around with 'overlaying' directories of unique files per node on the filesystem image - kinda neat stuff). So I modify 3-4 files (with a bash script, actually :) and the node's up and ready to go. No major problems really, a couple boards were page faulting immediately after boot and they were returned and the new ones work great. My only complaint is that after node 12 we need a videocard to boot these things now :( I dont know what they did to the BIOS but I wish they'd put it back. Im going to try and contact their engineers to find out if I can flash back down to an older rev or something to try and get it to work... If anyone has an experience with the later with the M817 let me know. /kc > > --William > > _______________________________________________________ > Do You Yahoo!? > Get your free @yahoo.ca address at http://mail.yahoo.ca > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From hahn at physics.mcmaster.ca Mon Sep 10 12:15:03 2001 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:41 2009 Subject: i845 stream results In-Reply-To: <20010910122833.A9196@wumpus.foo> Message-ID: > Today companies started releasing Pentium4 systems with the i845 > chipset. Anyone have a stream number? Does this chipset do SMP? http://www.tomshardware.com/mainboard/01q3/010702/images/book1_26516_image001.gif 650 MB/s (sandra pseudo-stream): good for PC133, but hardly impressive. (sandra stream numbers seem to be consistently lower than unix ones.) From anuradha at gnu.org Wed Sep 12 00:15:25 2001 From: anuradha at gnu.org (Anuradha Ratnaweera) Date: Wed Nov 25 01:01:41 2009 Subject: Good 100 Mbps switches Message-ID: <20010912131525.A745@bee.lk> We are going to setup a beowulf cluster with 32 nodes using 100 Mbps ethernet. Please recommand good (and of course cheap) switches that can be used for this purpose (we need more than 32 ports). Regards, Anuradha -- Debian GNU/Linux (kernel 2.4.9) Q: Would you like to see the WINE list? A: What's on it, anything expensive? Q: No, just Solitaire and MineSweeper for now, but the WINE is free. -- Kevin M. Bealer, about the WINdows Emulator From carlos at megatonmonkey.net Wed Sep 12 06:04:21 2001 From: carlos at megatonmonkey.net (Carlos O'Donell Jr.) Date: Wed Nov 25 01:01:41 2009 Subject: Mulling over MTBF. Message-ID: <20010912090421.A7255@megatonmonkey.net> Beowulf, On a recent trend, I have been discussing with various colleagues the aspect of power usage and power saving in clusters. At first, power saving, through node sleeping or drive spin down, seemed like a good idea. Though, I am wary about the following effects: - Does spindown/spinup on common IDE drives effect MTBF? (Any drive for the matter) - Does node sleeping/wakeup cycles effect MTBF for voltage supplies on the motherboard? (Or other componenets, through relaxation and transients). I haven't seen any deep discussion about this. Though I may want to turn my eye towards a few electrical/computer engineering papers on the subject. Cheers, Carlos O'Donell Jr. ----------------------------- Computer(Engineering/Science) University of Western Ontario http://www.uwo.ca ----------------------------- From becker at scyld.com Wed Sep 12 08:38:25 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:41 2009 Subject: Mulling over MTBF. In-Reply-To: <20010912090421.A7255@megatonmonkey.net> Message-ID: On Wed, 12 Sep 2001, Carlos O'Donell Jr. wrote: > On a recent trend, I have been discussing with various > colleagues the aspect of power usage and power saving > in clusters. I'm guessing that you have already looked at the Scyld features for soft-power-down and Wake-On-LAN (WOL) wake-up. Our system has the advantage that we identify nodes by the station address, and thus already have the information needed for WOL. > At first, power saving, through node sleeping or drive > spin down, seemed like a good idea. > > Though, I am wary about the following effects: > > - Does spindown/spinup on common IDE drives effect MTBF? Yes. Typical disk drive ratings put the spin-up count equivalent to about 9 hours of the MTBF. Those numbers are not directly comparable, but it's a useful number to look at. Laptop drives are typically set to spin down after a few minutes of idle time, both because power savings are much more important and because a stopped drive is more resistant to shocks. > - Does node sleeping/wakeup cycles effect MTBF for voltage > supplies on the motherboard? (Or other componenets, through > relaxation and transients). Not obviously: the HV side of most ATX power supplies is continuously powered, so there is no inrush current shock coming out of stand-by mode. The thermal stress from the varying load is likely the dominant effect. We have one batch of ATX power supplies that are very likely to fail in the brown-out conditions around a power failure. Those same supplies have not failed when the machine is in stand-by mode during power failures. Yes, the real solution is to get different power supplies, however this is an example of soft-power-off increasing the MTBF of the system. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From eric at fnordsystems.com Wed Sep 12 11:24:41 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:41 2009 Subject: Good 100 Mbps switches In-Reply-To: <20010912131525.A745@bee.lk> Message-ID: I've had very good results with the HP Procurve 4000M series... http://www.hp.com/rnd/products/switches/switch4000/overview.htm http://www.hp.com/rnd/products/switches/switch4000/large.htm It comes with 40 10/100Base-T ports for $1499 from many online stores, and is expandable to 80 ports total. The only downside is the 8-port modules are rather expensive ($350 each) if you need to expand beyond 40 ports. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 fax: +1-360-647-0752 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Anuradha Ratnaweera Sent: Wednesday, September 12, 2001 12:15 AM To: beowulf@beowulf.org Subject: Good 100 Mbps switches We are going to setup a beowulf cluster with 32 nodes using 100 Mbps ethernet. Please recommand good (and of course cheap) switches that can be used for this purpose (we need more than 32 ports). Regards, Anuradha -- Debian GNU/Linux (kernel 2.4.9) Q: Would you like to see the WINE list? A: What's on it, anything expensive? Q: No, just Solitaire and MineSweeper for now, but the WINE is free. -- Kevin M. Bealer, about the WINdows Emulator _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Wed Sep 12 11:29:20 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:41 2009 Subject: Mulling over MTBF. In-Reply-To: <20010912090421.A7255@megatonmonkey.net> Message-ID: My thoughts on this: It's always better to leave a hard drive spinning, rather than power it up/down... it puts much more stress on the mechanism to spin up and down constantly. This could be compared to the way airliners are rated in "cycles", ie: takeoffs/landings, where all the mechanical stress occurs. Unlike hard drives, airplanes can't remain in the air all the time :) Node waking/sleeping doesn't seem to affect much, but it can be somewhat unreliable waking up again unless you have quality power supplies. For regular ATX desktop/tower supplies I prefer Enermax, Herolchi, and Enlight. They're all Taiwanese brands. Eric Kuhnke eric@fnordsystems.com www.fnordsystems.com -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Carlos O'Donell Jr. Sent: Wednesday, September 12, 2001 6:04 AM To: beowulf@beowulf.org Subject: Mulling over MTBF. Beowulf, On a recent trend, I have been discussing with various colleagues the aspect of power usage and power saving in clusters. At first, power saving, through node sleeping or drive spin down, seemed like a good idea. Though, I am wary about the following effects: - Does spindown/spinup on common IDE drives effect MTBF? (Any drive for the matter) - Does node sleeping/wakeup cycles effect MTBF for voltage supplies on the motherboard? (Or other componenets, through relaxation and transients). I haven't seen any deep discussion about this. Though I may want to turn my eye towards a few electrical/computer engineering papers on the subject. Cheers, Carlos O'Donell Jr. ----------------------------- Computer(Engineering/Science) University of Western Ontario http://www.uwo.ca ----------------------------- _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at conservativecomputer.com Wed Sep 12 11:42:20 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: Mulling over MTBF. In-Reply-To: <20010912090421.A7255@megatonmonkey.net>; from carlos@megatonmonkey.net on Wed, Sep 12, 2001 at 09:04:21AM -0400 References: <20010912090421.A7255@megatonmonkey.net> Message-ID: <20010912144220.B12517@wumpus.foo> On Wed, Sep 12, 2001 at 09:04:21AM -0400, Carlos O'Donell Jr. wrote: > - Does spindown/spinup on common IDE drives effect MTBF? > (Any drive for the matter) A typical drive is rated for 30,000 to 50,000 start/stop cycles. But that's the lifetime; I don't think there are any published numbers about MTBF related to start/stop cycles. greg From lindahl at conservativecomputer.com Wed Sep 12 12:50:43 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: Good 100 Mbps switches In-Reply-To: ; from eric@fnordsystems.com on Wed, Sep 12, 2001 at 11:24:41AM -0700 References: <20010912131525.A745@bee.lk> Message-ID: <20010912155043.A12686@wumpus.foo> On Wed, Sep 12, 2001 at 11:24:41AM -0700, Eric Kuhnke wrote: > It comes with 40 10/100Base-T ports for $1499 from many online stores, and > is expandable to 80 ports total. The only downside is the 8-port modules > are rather expensive ($350 each) if you need to expand beyond 40 ports. It's usually cheaper to buy cards by buying another half-populated 4000M. Then you get a spare chassis and power supply for free! greg From Karl.Bellve at umassmed.edu Wed Sep 12 13:17:59 2001 From: Karl.Bellve at umassmed.edu (Karl Bellve) Date: Wed Nov 25 01:01:41 2009 Subject: Copper Gigabit ethernet and Alpha support Message-ID: <3B9FC2F7.FF4C7607@umassmed.edu> What is the best copper based Gigabit card which also has driver support for Alpha based systems? -- Cheers, Karl Bellve, Ph.D. ICQ # 13956200 Biomedical Imaging Group TLCA# 7938 University of Massachusetts Email: Karl.Bellve@umassmed.edu Phone: (508) 856-6514 Fax: (508) 856-1840 PGP Public key: finger kdb@molmed.umassmed.edu From mail at thomas-boehme.de Wed Sep 12 13:22:34 2001 From: mail at thomas-boehme.de (Thomas R Boehme) Date: Wed Nov 25 01:01:41 2009 Subject: Good 100 Mbps switches Message-ID: Just get a second switch instead of 5 modules. It's cheaper and has the added benefit that you can put both power supplies (and of course the modules) into one case to have redundant power. This gives you an 80-port switch for ~3000$. We also have two Procurve 4000M and had pretty good results as well. Bye, Thommy -----Original Message----- From: Eric Kuhnke [mailto:eric@fnordsystems.com] Sent: Wednesday, September 12, 2001 1:25 PM To: beowulf@beowulf.org Subject: RE: Good 100 Mbps switches I've had very good results with the HP Procurve 4000M series... http://www.hp.com/rnd/products/switches/switch4000/overview.htm http://www.hp.com/rnd/products/switches/switch4000/large.htm It comes with 40 10/100Base-T ports for $1499 from many online stores, and is expandable to 80 ports total. The only downside is the 8-port modules are rather expensive ($350 each) if you need to expand beyond 40 ports. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 fax: +1-360-647-0752 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Anuradha Ratnaweera Sent: Wednesday, September 12, 2001 12:15 AM To: beowulf@beowulf.org Subject: Good 100 Mbps switches We are going to setup a beowulf cluster with 32 nodes using 100 Mbps ethernet. Please recommand good (and of course cheap) switches that can be used for this purpose (we need more than 32 ports). Regards, Anuradha -- Debian GNU/Linux (kernel 2.4.9) Q: Would you like to see the WINE list? A: What's on it, anything expensive? Q: No, just Solitaire and MineSweeper for now, but the WINE is free. -- Kevin M. Bealer, about the WINdows Emulator _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Wed Sep 12 15:22:36 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:41 2009 Subject: Copper Gigabit ethernet and Alpha support In-Reply-To: <3B9FC2F7.FF4C7607@umassmed.edu> Message-ID: On Wed, 12 Sep 2001, Karl Bellve wrote: > What is the best copper based Gigabit card which also has driver support > for Alpha based systems? Our 'ns820.c' and 'intel-gige.c' driver are tested with our Alpha product. The Syskonnect copper board reportedly works with the Alpha. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From Eugene.Leitl at lrz.uni-muenchen.de Thu Sep 13 08:58:56 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:41 2009 Subject: Booth 615 at ALS (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Thu, 13 Sep 2001 09:55:44 -0600 (MDT) From: Ronald G Minnich To: linuxbios@lanl.gov Subject: Booth 615 at ALS For all of you going to the Atlanta Linux Showcase in November, we will have a booth (#615) with a linuxbios cluster. We invite all of you to stop by and say hello. ron From Eugene.Leitl at lrz.uni-muenchen.de Thu Sep 13 08:59:42 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:41 2009 Subject: CCL:Cluster Boards (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Thu, 13 Sep 2001 08:44:48 -0500 From: Mauricio Esguerra N. To: chemistry@ccl.net Subject: CCL:Cluster Boards Hello again, This time I would like to ask if any of you with experience building clusters could tell me if it's better to build a cluster using (for say a 6 node cluster) 3 dual motherboards or 6 single motherboards. I am specifically interested in knowing the price-performance relation between these two cases. Thanking your kind help, ################################################################ Mauricio Esguerra Neira Chemist Grupo de Qu?mica Te?rica Universidad Nacional de Colombia email: esguerra@mentecolectiva.com tel: 57-1-3165000 ext18323 ################################################################ -= This is automatically added to each message by mailing script =- CHEMISTRY@ccl.net -- To Everybody | CHEMISTRY-REQUEST@ccl.net -- To Admins MAILSERV@ccl.net -- HELP CHEMISTRY or HELP SEARCH CHEMISTRY-SEARCH@ccl.net -- archive search | Gopher: gopher.ccl.net 70 Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl@osc.edu From bruno_richard at hp.com Mon Sep 10 00:45:01 2001 From: bruno_richard at hp.com (RICHARD,BRUNO (HP-France,ex1)) Date: Wed Nov 25 01:01:41 2009 Subject: Paper showing Linpack scalability of mainstream clusters Message-ID: Available from : I-Cluster: Reaching TOP500 performance using mainstream hardware By B. Richard (hp Laboratories Grenoble), P. Augerat, N. Maillard, S. Derr, S. Martin, C. Robert (ID Laboratory) Abstract: A common topic for PC clusters is the use of mainstream instead of dedicated hardware i.e., using standard desktop PCs and standard network connectivity, with technology to organize them so that they can be used as a single computing entity. Current work in this "off-the-shelf cluster" domain usually focuses on how to reach a high availability infrastructure, on how to efficiently balance the work between nodes of such clusters, or on how to get the most computing power for loosely-coupled (large grained) problems. hp Labs Grenoble, teaming with INRIA Rh?ne-Alpes, teamed up to build a cluster out of 225 standard hp e-PC interconnected by standard Ethernet, with the objective of getting the highest computational performance and scaling from the simplest desktop PC to the most powerful computers in the world. As an additional constraint, we decided to use a cluster that models a modern enterprise network, using standard machines interconnected through standard Ethernet connectivity. This paper describes the issues and challenges we had to overcome in order to reach the 385th rank in the TOP500 list of most powerful supercomputers in the world on June 21st, 2001, being the first mainstream cluster to enter TOP500 ever. Also we provide hereafter some details about the software and middleware tuning we have done, as well as the impact of different factors on performance such as the network topology and infrastructure hardware. From j.a.white at larc.nasa.gov Thu Sep 13 10:46:56 2001 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Wed Nov 25 01:01:41 2009 Subject: mpich question Message-ID: <3BA0F110.3932DD17@larc.nasa.gov> Dear group, I am trying to figure out how to use the -p4pg option in mpirun and I am experiencing some difficulties. My cluster configuration is as follows: node0 : machine : Dual processor Supermicro Super 370DLE cpu : 1 GHz Pentium 3 O.S. : Redhat Linux 7.1 kernel : 2.4.2-2smp mpich : 1.2.1 nodes1->18 : machine : Compaq xp1000 cpu : 667 MHz DEC alpha 21264 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 nodes 19->34 : machine : Microway Screamer cpu : 667 MHz DEC alpha 21164 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 The heterogeneous nature of the machine has made me migrate from using the -machinefile option to the -p4pg option. I have been trying to get a 2 processor job to run while submitting the mpirun command from node0 (-nolocal is specified) and using either nodes 1 and 2 or nodes 2 and 3. If I use the -machinefile approach I am able to run on any homogeneous combination of nodes. However, if I use the -p4pg approach I have not been able to run unless my mpi master node is node1. As long as node1 is the mpi master node then I can use any one of nodes 2 through 18 as the 2nd processor. THe following 4 runs illustrates what I have gotten to work as well as what doesn't work (and the subsequent error message). Runs 1, 2 and 3 worked and run 4 failed. 1) When submitting from node0 using the -machinefile option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the machinefile file vulcan.hosts contains: node1 node2 the PIXXXX file created contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v option reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI10802 and the program executes successfully 2) When submitting from node0 using the -p4pg option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the p4pg file vulcan.hosts contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver and the -v options reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors and the program executes successfully 3) When submitting from node0 using the -machinefile option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the machinefile file vulcan.hosts contains: node2 node3 the PIXXXX file created contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI11592 and the program executes successfully 4) When submitting from node0 using the -p4pg option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the p4pg file vulcan.hosts contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors and the following error message is genreated rm_10957: p4_error: rm_start: net_conn_to_listener failed: 34133 Thanks for your help, Jeffery A. White email : j.a.white@larc.nasa.gov Phone : (757) 864-6882 ; Fax : (757) 864-6243 URL : http://hapb-www.larc.nasa.gov/~jawhite/ From cfernandes at dee.cefet-ma.br Mon Sep 10 13:20:48 2001 From: cfernandes at dee.cefet-ma.br (Claudio A. C. Fernandes) Date: Wed Nov 25 01:01:41 2009 Subject: beostatus Message-ID: <01091023204800.00756@localhost.localdomain> Dear Lists, I'm Brazilian student and I'm working with beowulf cluster about six month I have a cluster with 16 slaves (PIII 800 MHZ, 128 MB, HD 20 GB, 2 tulip ethernet cards ) e and 1 master ( PIII 800 MHZ, 256 MB, HD 120 GB 3 tulip ethernet cards). I'm using channel bounding and scyld linux 27bz-7 in master and slaves, but a install the scyld manually in my system ( make partitions , install the packages, .NFS. bproc., mpi.........) , but there is some problem in my cluster. The beostatus and beostat aren't working well. Both only show me the status ( up, down, halt .....) but don't show status of cpu, net, disk, swap ... i thing is something with bproc, but i used several examples end they seem to work well. does anybore helps me solve this question ? another question ----- is possible i have two master in a cluster working with bproc ? Thanks in advance Claudio Fernandes Universidade Federal do Rio Grande do Norte - Brazil mail : cfernandes@elo.com.br , cfernandes@dee.cefet-ma.br and ccosta@leca.ufrn.br From msuarez at zeus.ccu.umich.mx Tue Sep 11 20:44:03 2001 From: msuarez at zeus.ccu.umich.mx (=?Windows-1252?Q?Mario-C=E9sar_Su=E1rez_Arriaga?=) Date: Wed Nov 25 01:01:41 2009 Subject: About Berlin 2002 Message-ID: <005401c13b3d$443125c0$ac01d894@msuarez> Dear Sir, Madam, The papers for the next congress on Clusters & Grids, that will be held in Berlin 2002 must be ready for the first of November or you need just an abstract? I will appreciate your prompt response. Sincerely. Mario Cesar Suarez Arriaga Facultad de Ciencias-UMSNH Universidad Michoacana -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20010911/bc7fa2fc/attachment.html From carlos at megatonmonkey.net Sun Sep 9 13:12:13 2001 From: carlos at megatonmonkey.net (Carlos O'Donell Jr.) Date: Wed Nov 25 01:01:41 2009 Subject: Power Consumption In-Reply-To: ; from jim_windle@eudoramail.com on Sun, Sep 09, 2001 at 12:17:05PM -0400 References: Message-ID: <20010909161213.F9613@megatonmonkey.net> > http://www.americanscientist.org/issues/Comsci01/Compsci2001-09.html While we are on power issues. I was just mulling over the following issues: Should, one set an idle power down timeout for all your IDE drives in the cluster? Does spinning down an IDE drive reduce MTBF? How much stress is really incured during spinup and/or spindown? The logistics of this becomes important, especially when you have hundreds of IDE drives deployed in a large cluster :) You save power if the drive is spundown, but does the spindown/spinup effect MTBF? If it impacts negatively on MTBF, does the power saving outweigh the cost? Cheers, Carlos. From chris at chris-wilson.co.uk Thu Sep 13 14:14:00 2001 From: chris at chris-wilson.co.uk (Chris Wilson) Date: Wed Nov 25 01:01:41 2009 Subject: HOWTO discriminate switches? Message-ID: <20010913221400.A1825@florence.intimate.mysticnet.org.uk> The situation is that we are trying to decide on the right switch for use in a 80 node cluster. The first option is to use a single switch with 160+ fast ethernet ports. The second is to use eight 10 port gigabit switches plus the current gigabit switch for routing. The codes to be run on the cluster are a mixture of the embarassing parallel, bandwidth hog and latency sensitive -- which complicates matters. :) What I need is guidance on how to measure (and indeed what to be measuring!) the switches to discriminate between the options. Presumably netperf is a good starting point, but pointers to resources on switch comparisions would be useful. TIA. -- Chris Wilson {^_^} spam to bit.bucket@dev.null Anything that, in happening, causes itself to happen again, happens again. -- THHGTTG From lindahl at conservativecomputer.com Thu Sep 13 14:40:54 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: ; from bruno_richard@hp.com on Mon, Sep 10, 2001 at 09:45:01AM +0200 References: Message-ID: <20010913174054.A14364@wumpus.foo> On Mon, Sep 10, 2001 at 09:45:01AM +0200, RICHARD,BRUNO (HP-France,ex1) wrote: > This paper describes > the issues and challenges we had to overcome in order to reach the 385th > rank in the TOP500 list of most powerful supercomputers in the world on June > 21st, 2001, being the first mainstream cluster to enter TOP500 ever. You mean all those other commodity clusters on the Top500 for years don't count? g From gkogan at students.uiuc.edu Fri Sep 14 09:15:25 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:41 2009 Subject: Problems running MPI Message-ID: Hi. I installed Scyld on our master node. So far I got one node up, to test it. I am trying to run a simple MPI program where each procces just says hi and its rank. It works when I give mpirun -np 1 "program name". But it stops working when try running more than one procces. It gives me the following error p0_12411: p4_error: net_create_slave: host not a bproc node: -3 p4_error: latest msg from perror: Success I tried inserting the off node as a first node in the BeoSetup but it does not work. Any ideas? Thank you From zouguangxian at hotmail.com Fri Sep 14 18:25:13 2001 From: zouguangxian at hotmail.com (=?gb2312?B?194gueLPyA==?=) Date: Wed Nov 25 01:01:41 2009 Subject: did the scyld be shuted down? Message-ID: Hi, why i can not connect to www.scyld.com?do you have the same problem?thank you .:) regards weck _________________________________________________________________ 您可以在 MSN Hotmail 站点 http://www.hotmail.com/cn 免费收发电子邮件 From qiang at tammy.harvard.edu Sat Sep 15 10:44:33 2001 From: qiang at tammy.harvard.edu (Qiang Cui) Date: Wed Nov 25 01:01:41 2009 Subject: autorestart Message-ID: Hi, folks - sorry if you received multiple copy of this message -we are having some problems with the mail server and I want to make sure this has been posted: ______________ I have been encountering a strange situation - my new Linux box has been rebooting itself spontaneously rather randomly. I thought it's related to some strange cron jobs, but it still happens after I disabled cron. Sometimes it happens during a Netscape session, sometimes it just happens without any CPU-intensive jobs running.... Any suggestions? Could this be related to a hardware problem, or some bugs/security feature related to Redhat 7.1? Where do I even begin to solve this problem? Thanks! ___________________________________________________________________________ Qiang Cui Dept. of Chem. _ __..-;''`--/'/ /.',-`-. Harvard Univ. (`/' ` | \ \ \\ / / / / .-'/`,_ 12 Oxford St., /'`\ \ | \ | \| // // / -.,/_,'-, Cambridge, MA 02138 /<7' ; \ \ | ; ||/ /| | \/ |`-/,/-.,_,/') (617)-495-8997 / _.-, `,-\,__| _-| / \ \/|_/ | '-/.;.\' (617)-495-1775 `-` f/ ; / __/ \__ `/ |__/ | Fax: (617)-496-3204 `-' | -| =|\_ \ |-' | (617)-496-4793 __/ /_..-' ` ),' // ((__.-'((___..-'' \__.' http://yuri.harvard.edu/~qiang/wisc/research.html http://yuri.harvard.edu/~qiang 32 Whites Ave. #4408 Watertown, MA 02472 (617)-926-6027 __________________________________________________________________________ From pdiaz88 at terra.es Sat Sep 15 12:02:51 2001 From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=) Date: Wed Nov 25 01:01:41 2009 Subject: Cluster FAQ Message-ID: <01091519025104.03127@duero> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello All, Rajkumar Buyya and I have been preparing for a while a small FAQ about clusters. It can be found at http://planetcluster.org/clusterfaq.html Feel free to suggest new questions, or comment some of the existing ones. Feel free also to replicate this FAQ on your sites (a more suitable for replication version will appear soon) as long as you add a note refering to the original URL (for sync. reasons) Best Regards Pedro P.D.: In other news: IBM have announced the Linux Cluster Starter Kit; actually based on the CSM software with some GUI enhancements. More info here: http://planetcluster.org/article.php?sid=54&mode=thread&order=0 Has anyone had experiences with this software?. IBM has an evaluation version for clusters with up to 6 nodes - -- /* * Pedro Diaz Jimenez: pdiaz88@terra.es, pdiaz@acm.asoc.fi.upm.es * * GPG KeyID: E118C651 * Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65 * * http://planetcluster.org * Clustering & H.P.C. news and documentation * */ - -- Atlee is a very modest man. And with reason. -- Winston Churchill -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7o6Xcnu53feEYxlERAqaIAJ4hm/aLKPlyL1XyuUVtIxTtgureVgCgrB41 1eKgI9eSz/FmtbGE8iJBN7A= =V0JE -----END PGP SIGNATURE----- From hahn at physics.mcmaster.ca Fri Sep 14 08:14:25 2001 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:41 2009 Subject: Power Consumption In-Reply-To: <20010909161213.F9613@megatonmonkey.net> Message-ID: > Should, one set an idle power down timeout for all your IDE > drives in the cluster? I don't think so, though it might make sense in certain special cases. > Does spinning down an IDE drive reduce MTBF? MTBF is usually quoted for power-on hours; merely spinning down the platters probably doesn't effect this at all, since all the electronics are powerd-on. obviously, MTBF really should be given for the whole range of activities: you'd certainly expect a disk to fail sooner if you keep seeking between the same two tracks, for instance. > How much stress is really incured during spinup and/or spindown? wherever I've seen start/stop cycles rated, the specs offer O(50K) cycles presumably during the warranty period (3-5 years). that's only 45 per day for a 3 year lifespan! > The logistics of this becomes important, especially when you have > hundreds of IDE drives deployed in a large cluster :) most ide drives idle at around 5W; I'm not sure power is a serious argument here. in short, I'd say that for reliability reasons, you wouldn't want to cycle drives more than a few times a day. saving ~4W doesn't seem like a big deal to me, even if you have hundreds of drives. otoh, there are lots of windows machines out there that default to spinning down after an hour or so... regards, mark hahn. From cozzi at hertz.rad.nd.edu Fri Sep 14 13:33:09 2001 From: cozzi at hertz.rad.nd.edu (Marc Cozzi) Date: Wed Nov 25 01:01:41 2009 Subject: PGI and HPL benchmarking Message-ID: I am trying to compile the High Performance Computing Linpack Benchmark (HPL) software on a cluster that has Portland (PGI) 3.2 compiler. The PGI compiler installs it's own BLAS routines so I would like to use those. I modify one of the make files, such as, Make.Linux_PII_FBLAS but still get errors during the compile. Has anyone successfully built and run this benchmark with PGI compilers. If so, could you please send me one of the make files you used? Many thanks Marc Univ. of Notre Dame From lange at informatik.uni-koeln.de Sun Sep 16 04:30:09 2001 From: lange at informatik.uni-koeln.de (Thomas Lange) Date: Wed Nov 25 01:01:41 2009 Subject: FAI (fully automatic installation) 2.2 released, new Beowulf support Message-ID: <15268.36161.865232.109957@informatik.uni-koeln.de> After several weeks of hacking and tests for a new Beowulf cluster, here comes the next FAI release. Major improvements are: - a new chapter and many examples and tools for Beowulf clusters - a script for software package updates after initial installation, a first hack is .../utils/softupdate - reorganisation of the source code, so it's more modular and readable - hooks can skip the default task The package is available at the FAI homepage http://www.informatik.uni-koeln.de/fai/ and also on the Debian mirrors in a few hours/days. Many thanks to all, who gave bug reports, suggestions how to enhance FAI or other feedback. FAI is a non interactive system to install a Debian GNU/Linux operating system on a PC cluster. You can take one or more virgin PCs, turn on the power and after a few minutes Linux is installed, configured and running on the whole cluster, without any interaction necessary. Thus it's a scalable method for installing and updating a Beowulf cluster or a network of workstations unattended with little effort involved. FAI uses the Debian distribution and a collection of shell and Perl scripts for the installation process. Changes to the configuration files of the operating system are made by cfengine, shell and Perl scripts. -- Thomas ---------------------------------------------------------------------- Thomas Lange Institut fuer Informatik mailto:lange@informatik.Uni-Koeln.DE Universitaet zu Koeln Pohligstr. 1 Telefon: +49 221 470 5303 50969 Koeln Fax : +49 221 470 5317 1024D/AB9B66FD AEA6 A8C1 BD8E 67C4 8EF6 8BCA DC13 E54E AB9B 66FD ---------------------------------------------------------------------- From Eugene.Leitl at lrz.uni-muenchen.de Sun Sep 16 06:20:39 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:41 2009 Subject: ALINKA Linux Clustering Letter, September 12th. 2001 (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Fri, 14 Sep 2001 19:16:59 +0200 From: Antoine Brenner To: clustering@alinka.com Subject: ALINKA Linux Clustering Letter, September 12th. 2001 Resent-Date: Fri, 14 Sep 2001 19:17:03 +0200 (CEST) Resent-From: clustering@alinka.com The ALINKA Linux Clustering Letter, Wednesday, September the 12th. 2001 Dear readers, I am happy to send you this week's edition of clustering@alinka.com clustering@alinka.com is a free weekly e-mail newsletter on linux clustering. It provides a summary of the weekly activity in mailing-lists relative to linux clustering (such as beowulf, linux virtual server or linux-ha) and general clustering news. For more information about ALINKA, see: http://www.alinka.com News from the High Performance world, by Dr Laurent Gatineau (lgatineau@alinka.com) ====================================================================== Tips and tricks from the Beowulf mailing list ======== * Jon Tegner [m1] is looking for a distributed file system and asking if AFS good be the choice. Sean Dilda [m2] explained why AFS is not suited for cluster. Thierry Mallard [m3] pointed him to GFS [1] and Robert Ross [m4] when NFS or PVFS could be needed. [1] http://www.sistina.com/products_gfs.htm [m1] http://www.beowulf.org/pipermail/beowulf/2001-August/001192.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001201.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001214.html [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001213.html * Raul A. Gonzalez Olimon [m1] experienced problem with channel bonding on RTL8139 NIC. Except W Bauske [m2] who reported no problem, all other beowulfers do not recommend this NIC [m3,m4,m5,m6] [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001190.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001208.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001202.html [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001203.html [m5] http://www.beowulf.org/pipermail/beowulf/2001-September/001205.html [m6] http://www.beowulf.org/pipermail/beowulf/2001-September/001210.html * Alvin Starr [m1] is looking for something to synhronize its nodes. NTP seems to be the best choice [m2,m3]. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001215.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001216.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001218.html * Martin Siegert [m1] wants to use Wake On Lan on its nodes but failed. Donald Becker [m2] gave technical answers/explanation. Claude Pignol [m3] gave a procedure which works on its nodes. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001220.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001224.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001222.html * Greg Lindahl [m1] is looking for information about the i845 chipset. Joel Jaeggli [m2] wrote this chipset doesn't do SMP. Mark Hahn [m3] posted a link [1] to few similar streams benchmark. [1] http://www.tomshardware.com/mainboard/01q3/010702/images/book1_26516_image001.gif [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001234.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001236.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001242.html News from MOSIX mailing list by Benoit des Ligneris ===================================================================== * Where is the process ask Amit Shah [m1]. Use the utility 'mtop' et 'mps' in the contrib section of mosix.org answer Mathias Rechenburg. [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0055.html [m2] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0057.html [1] http://www.mosix.org/ftps/contrib/ * Question about the possibility of migrating any process between two computers connected over internet by Jason Boudreault [m1], some technical answer (latency time) by Chris Buron [m2] and security concern answer by Giacomo Mulas [m3]. [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0059.html [m2] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0061.html [m3] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0064.html News from the High Availability world ====================================================================== DRBD devel by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * Jean-Yves Bouet asked [m1] interesting question about drbd internals. Philipp Reisner answered [m2] him with a lot of details. [m1] http://www.geocrawler.com/archives/3/3756/2001/9/0/6606388/ [m2] http://www.geocrawler.com/archives/3/3756/2001/9/0/6611586/ * Philipp Reisner [m3] announced drbd 0.6.1 prerelease 1 for linux kernel 2.4.9. Philipp invite you to test this release. [m3] http://www.geocrawler.com/archives/3/3756/2001/9/0/6612971/ * Philipp Reisner advised [m4] us that /var/lib/nfs directory must be on the drbd device when building an highly available nfs server. [m4] http://www.geocrawler.com/lists/3/SourceForge/3756/0/6613103/ News on the Filesystems front ====================================================================== Coda by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Shafeq Shimmanoid explains the Coda behaviour when handling large files [1m]. [1m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0769.html GFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * The OpenGFS project team has released a roadmap that was discussed [1m]. They were also be questions about interoperability [2m]. [1m] http://www.geocrawler.com/lists/3/SourceForge/15276/0/6602817/ and the followings [2m] http://www.geocrawler.com/lists/3/SourceForge/15276/0/6595706/ and the followings. Intermezzo by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Erik Heinz related its experience with Intermezzo [1m]. [1m] http://www.geocrawler.com/lists/3/SourceForge/8077/0/6590949/ and the followings News on other cluster related topics ====================================================================== linux-ia64 by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * David Mosberger posted [m5] a patch to cleanup the ia32 subsystem that fixes problems when running some programs. [m5] https://external-lists.valinux.com/archives/linux-ia64/2001-September/002110.html LVM by Bruno Muller (bmuller@alinka.com) ===== * AJ Lewis posted a patch to run LVM 1.0.1rc2 on 2.4.8 kernel[m1]. * Taher H. Haveliwala had installed lvm on a stock RedHat 7.1 without rebuilding the kernel and without rebooting[m2]. [m1] http://lists.sistina.com/pipermail/linux-lvm/2001-September/008697.html [m2] http://lists.sistina.com/pipermail/linux-lvm/2001-September/008749.html ====================================================================== To subscribe to the list, send e-mail to clustering@alinka.com from the address you wish to subscribe, with the word "subscribe" in the subject. To unsubscribe from the list, send e-mail to clustering@alinka.com from the address you wish to unsubscribe from, with the word "unsubscribe" in the subject. Alinka is the editor of the ALINKA ORANGES and ALINKA RAISIN administration software for Linux clusters. (Web site: http://www.alinka.com ) From gkogan at students.uiuc.edu Sun Sep 16 13:28:59 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:41 2009 Subject: Problems running MPI Message-ID: Hi. I installed Scyld on our master node. So far I got one node up, to test it. I am trying to run a simple MPI program where each procces just says hi and its rank. It works when I give mpirun -np 1 "program name". But it stops working when try running more than one procces. It gives me the following error p0_12411: p4_error: net_create_slave: host not a bproc node: -3 p4_error: latest msg from perror: Success I tried inserting the off node as a first node in the BeoSetup but it does not work. Any ideas? Thank you From amber_palekar at yahoo.com Mon Sep 17 03:01:23 2001 From: amber_palekar at yahoo.com (Amber Palekar) Date: Wed Nov 25 01:01:41 2009 Subject: Network RAM : Comm. issues Message-ID: <20010917100123.5884.qmail@web20310.mail.yahoo.com> Hi, We are planning to implement Network RAM as our syllabus project . Could someone suggest some communication mechanisms for passing messages over the ethernet ( which is what we are restricting ourselves to) . We are initially restricting to using RAW sockets only but are in a fix about what to use in the subsequent prototypes. Should MPIs and VIAs be looked at or could we develop our own protocol at the device driver level ? ( as we're restricitng ourselves to ethernet only .) Any other pointers for Network RAM implemntation would be of great help ! Amber __________________________________________________ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/ From rbw at networkcs.com Mon Sep 17 07:02:51 2001 From: rbw at networkcs.com (Richard Walsh) Date: Wed Nov 25 01:01:41 2009 Subject: HOWTO discriminate switches? In-Reply-To: <20010913221400.A1825@florence.intimate.mysticnet.org.uk> Message-ID: <200109171402.JAA32341@us.msp.networkcs.com> Chris Wilson wrote: >The situation is that we are trying to decide on the right switch for >use in a 80 node cluster. The first option is to use a single switch with >160+ fast ethernet ports. The second is to use eight 10 port gigabit >switches plus the current gigabit switch for routing. > >The codes to be run on the cluster are a mixture of the embarassing >parallel, bandwidth hog and latency sensitive -- which complicates >matters. :) > >What I need is guidance on how to measure (and indeed what to be >measuring!) the switches to discriminate between the options. Presumably >netperf is a good starting point, but pointers to resources on switch >comparisions would be useful. Another alternative which starts to make sense as the switching cost of your system rises as a percentage of total cost is a switchless system in the form of a 1D or 2D torus using SCI cards which have extremely low latency and high bandwdth. Take a look at the cluster at the University of Delaware based on this design. If you do the math, I think you will find that for larger clusters which would require multiple layers of switches or a "mega-switch" to build in reasonable bandwidth, the SCI torus design is no more costly and has better performance features (latency and bandwidth). SCI ratings are equal to or better than some custom engineered networks from vendors of SMP and MPP systems (Cray, Sun, SGI). Regards, rbw #--------------------------------------------------- # # Richard Walsh # Project Manager, Cluster Computing, Computational # Chemistry and Finance # netASPx, Inc. # 1200 Washington Ave. So. # Minneapolis, MN 55415 # VOX: 612-337-3467 # FAX: 612-337-3400 # EMAIL: rbw@networkcs.com, richard.walsh@netaspx.com # #--------------------------------------------------- # "What you can do, or dream you can, begin it; # Boldness has genius, power, and magic in it." # -Goethe #--------------------------------------------------- # "Without mystery, there can be no authority." # -Charles DeGaulle #--------------------------------------------------- # "Why waste time learning when ignornace is # instantaneous?" -Thomas Hobbes #--------------------------------------------------- From tekka99 at libero.it Mon Sep 17 07:45:42 2001 From: tekka99 at libero.it (tekka99@libero.it) Date: Wed Nov 25 01:01:41 2009 Subject: PVM: problem distributing on smp nodes with different number of processors Message-ID: Hello, I have several smp machines with pvm installed (3.4.3 on all). They are Linux/x86 machines and Tru64/Alpha machines. I read about "sp=" parameter to set in the pvm.hosts file, but it semmes not to work in distributing slaves based on the number of processor of the machines. For example I have this pvm.hosts linux8000 dx=/usr/share/pvm3/lib/pvmd sp=8000 fire2 dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=4000 fire1 dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=4000 aquila dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=4000 falco dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=2000 sds20a dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=2000 ses40a dx=/usr/users/gcecchi/pvm3/lib/pvmd sp=4000 the sp for each machine is the number of processors multiplied by 1000. When I run for example pvmpov with nt28 (or also other values) it seems that pvm does (number of hosts)/28 and then assigns equally to the slaves. So the 2 processors machines gets 4 slave processes and also the 8 way machine gets 4 slave processes. Anyone knows how to manage smp machines with different numbers of proc each? Thank in advance. Bye, Gianluca Cecchi From eswardev at yahoo.com Mon Sep 17 08:51:30 2001 From: eswardev at yahoo.com (Eswar Dev) Date: Wed Nov 25 01:01:41 2009 Subject: PGI and HPL benchmarking Message-ID: <20010917155130.18074.qmail@web14304.mail.yahoo.com> Hi cozzi! I have one built using Athlon based cluster. This is the make file you requested for # ---------------------------------------------------------------------- # - shell -------------------------------------------------------------- # ---------------------------------------------------------------------- # SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR = mkdir RM = /bin/rm -f TOUCH = touch # # ---------------------------------------------------------------------- # - Platform identifier ------------------------------------------------ # ---------------------------------------------------------------------- # ARCH = Linux_ATHLON_CBLAS # # ---------------------------------------------------------------------- # - HPL Directory Structure / HPL library ------------------------------ # ---------------------------------------------------------------------- # TOPdir = $(HOME)/hpl INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # ---------------------------------------------------------------------- # - Compilers / linkers - Optimization flags --------------------------- # ---------------------------------------------------------------------- # CC = /usr/local/mpich-1.2.1/bin/mpicc NOOPT = CCFLAGS = -fomit-frame-pointer -O3 -funroll-loops -W -Wall # LINKER = /usr/local/mpich-1.2.1/bin/mpicc LINKFLAGS = $(CCFLAGS) # ARCHIVER = ar ARFLAGS = r RANLIB = echo # # ---------------------------------------------------------------------- # - MPI directories - library ------------------------------------------ # ---------------------------------------------------------------------- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir = /usr/local/mpich-1.2.1 MPinc = -I$(MPdir)/include MPlib = $(MPdir)/lib/libmpich.a # # ---------------------------------------------------------------------- # - F77 / C interface -------------------------------------------------- # ---------------------------------------------------------------------- # You can skip this section if and only if you are not planning to use # a BLAS library featuring a Fortran 77 interface. Otherwise, it is # necessary to fill out the F2CDEFS variable with the appropriate # options. **One and only one** option should be chosen in **each** of # the 3 following categories: # # 1) name space (How C calls a Fortran 77 routine) # # -DAdd_ : all lower case and a suffixed underscore (Suns, # Intel, ...), # -DNoChange : all lower case (IBM RS6000), # -DUpCase : all upper case (Cray), # -Df77IsF2C : the FORTRAN compiler in use is f2c. # # 2) C and Fortran 77 integer mapping # # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. # # 3) Fortran 77 string handling # # -DStringSunStyle : The string address is passed at the string loca- # tion on the stack, and the string length is then # passed as an F77_INTEGER after all explicit # stack arguments, # -DStringStructPtr : The address of a structure is passed by a # Fortran 77 string, and the structure is of the # form: struct {char *cp; F77_INTEGER len;}, # -DStringStructVal : A structure is passed by value for each Fortran # 77 string, and the structure is of the form: # struct {char *cp; F77_INTEGER len;}, # -DCrayStyle : Special option for Cray machines, which uses # Cray fcd (fortran character descriptor) for # interoperation. # F2CDEFS = NOOPT = F77 =mpif77 F77LOADER =mpif77 F77FLAGS = -O $(NOOPT) # # ---------------------------------------------------------------------- # - Linear Algebra library (BLAS or VSIPL) ----------------------------- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # LAdir = /home/mpiuser/LAPACK LAinc = LAlib = $(HOME)/ATLAS/lib/Linux_ATHLON/libcblas.a $(HOME)/ATLAS/lib/Linux _ATHLON/libatlas.a # # ---------------------------------------------------------------------- # - HPL includes / libraries / specifics ------------------------------- # ---------------------------------------------------------------------- # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) # # - Compile time options ----------------------------------------------- # # -DHPL_COPY_L force the copy of the panel L before bcast; # -DHPL_CALL_CBLAS call the cblas interface; # -DHPL_CALL_VSIPL call the vsip library; # -DHPL_DETAILED_TIMING enable detailed timers; # # By default HPL will: # *) not copy L before broadcast, # *) call the Fortran 77 BLAS interface # *) not display detailed timing information. # HPL_OPTS = -DHPL_CALL_CBLAS # # ---------------------------------------------------------------------- # HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) # # --------------------------______________________________________________________ I am trying to compile the High Performance Computing Linpack Benchmark (HPL) software on a cluster that has Portland (PGI) 3.2 compiler. The PGI compiler installs it's own BLAS routines so I would like to use those. I modify one of the make files, such as, Make.Linux_PII_FBLAS but still get errors during the compile. Has anyone successfully built and run this benchmark with PGI compilers. If so, could you please send me one of the make files you used? Many thanks Marc Univ. of Notre Dame __________________________________________________ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/ From patrick at myri.com Mon Sep 17 09:06:39 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:41 2009 Subject: HOWTO discriminate switches? In-Reply-To: <3BA62056.345A0DD6@myri.com> Message-ID: Hi Richard, On Mon, 17 Sep 2001, Richard Walsh wrote: > > Another alternative which starts to make sense as the switching cost > > of your system rises as a percentage of total cost is a switchless > > system in the form of a 1D or 2D torus using SCI cards which have > > extremely low latency and high bandwdth. Take a look at the cluster > > at the University of Delaware based on this design. If you do the > > math, I think you will find that for larger clusters which would > > require multiple layers of switches or a "mega-switch" to build > > in reasonable bandwidth, the SCI torus design is no more costly and 80 nodes would fit easily on only one Myrinet switch, based on a 128 ports enclosure and 10 line cards (with place to expand to 128 ports by adding line cards). This solution provides full bandwidth bissection at 2 Gb/s full-duplex (Clos topology). Hard to do it with a Torus design. Large switched cluster are no a problem if you know how to build inexpensive large switch. The real math is that it cost the same thing that a switchless topology like SCI. FYI, the total price of the interconnect would be $140K with Myrinet, including NIC, switch, cables and software (public price from the web). > > are equal to or better than some custom engineered networks from > > vendors of SMP and MPP systems (Cray, Sun, SGI). What are these "custom engineered networks" ? Regards Patrick --------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | | Myricom, Inc. http://www.myri.com | | Cell: 865-389-8852 325 N. Santa Anita Ave. | | Fax: 865-974-1950 Arcadia, CA 91006 | --------------------------------------------------------- From Daniel.Kidger at quadrics.com Mon Sep 17 09:46:34 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:41 2009 Subject: PGI and HPL benchmarking Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D20E@stegosaurus.bristol.quadrics.com> >-----Original Message----- >From: Eswar Dev [mailto:eswardev@yahoo.com] >Sent: 17 September 2001 16:52 >To: cozzi@hertz.rad.nd.edu >Cc: beowulf@beowulf.org >Subject: Re:PGI and HPL benchmarking > > >Hi cozzi! >I have one built using Athlon based cluster. This is >the make file you requested for >< makefile ommitted > >$(HOME)/ATLAS/lib/Linux_ATHLON/libcblas.a >$(HOME)/ATLAS/lib/Linux_ATHLON/libatlas.a >< makefile ommitted > Your makefile appears to use ATLAS. I thought that the user wanted to use PGI's own BLAS routines? I have also used ATLAS Blas for HPL on various systems (P3, P4, Itanium). What I would be interested in is anyone who has got HPL running using Intel's Maths Kernel Library: MKL. This should be able to use SSE2 to the fullest and so beat the PGI implementation? Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 907 5375 ----------------------- www.quadrics.com -------------------- From Daniel.Kidger at quadrics.com Mon Sep 17 10:18:40 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:41 2009 Subject: PGI and HPL benchmarking Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D210@stegosaurus.bristol.quadrics.com> Marc wrote: >I am trying to compile the High Performance Computing Linpack Benchmark (HPL) >software on a cluster that has Portland (PGI) 3.2 >compiler. The PGI compiler installs it's own BLAS routines so I would like to use >those. I modify one of the make files, such as, Make.Linux_PII_FBLAS but still >get errors during the compile. Has anyone successfully built and run this >benchmark with PGI >compilers. You do not say why you want to use PGI's BLAS ? Check out http://computational-battery.org/Programvare/blas-lib-comparison.html Their results show that Portland's BLAS implementation trails a long way behind ATLAS and Intel's MKL. Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 907 5375 ----------------------- www.quadrics.com -------------------- From rbw at networkcs.com Mon Sep 17 12:12:02 2001 From: rbw at networkcs.com (Richard Walsh) Date: Wed Nov 25 01:01:41 2009 Subject: HOWTO discriminate switches? In-Reply-To: Message-ID: <200109171912.OAA37610@us.msp.networkcs.com> Hey Patrick. here's what I was thinking: >Hi Richard, > >On Mon, 17 Sep 2001, Richard Walsh wrote: > >> > Another alternative which starts to make sense as the switching cost >> > of your system rises as a percentage of total cost is a switchless >> > system in the form of a 1D or 2D torus using SCI cards which have >> > extremely low latency and high bandwdth. Take a look at the cluster >> > at the University of Delaware based on this design. If you do the >> > math, I think you will find that for larger clusters which would >> > require multiple layers of switches or a "mega-switch" to build >> > in reasonable bandwidth, the SCI torus design is no more costly and > >80 nodes would fit easily on only one Myrinet switch, based on a >128 ports enclosure and 10 line cards (with place to expand to 128 >ports by adding line cards). This solution provides full >bandwidth bissection at 2 Gb/s full-duplex (Clos topology). >Hard to do it with a Torus design. Without trying to be a salesman for SCI, they claim their 66/64 PCI cards to be 1.33 G_BYTES (not bits) per second bi-directionally with a 1.46 microsecond latency. The is point-to-point and is reduced as the distance to travel through the torus increases, but is potentially faster that Myrnet. Those with actual SCI experience are free to add or subtract value from this point. > >Large switched cluster are no a problem if you know how to build >inexpensive large switch. The real math is that it cost the same >thing that a switchless topology like SCI. >FYI, the total price of the interconnect would be $140K with Myrinet, >including NIC, switch, cables and software (public price from the web). > I do not know about the _REAL_ math, buy an anlysis that I did for a 512 node system a year ago comparing switched versus non-switched topologies based on SCI cards showed the SCI (cards only) configured system to be cheaper than Myrnet or Gigabt switched alternatives (cards and switches). Baseline performance of the network had to be close to that of the Cray T3E. >> > are equal to or better than some custom engineered networks from >> > vendors of SMP and MPP systems (Cray, Sun, SGI). > >What are these "custom engineered networks" ? I was refering to the interconnect on the Cray T3E. Thanks for the Myrnet information, rbw From lindahl at conservativecomputer.com Mon Sep 17 14:14:01 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:41 2009 Subject: HOWTO discriminate switches? In-Reply-To: <200109171912.OAA37610@us.msp.networkcs.com>; from rbw@networkcs.com on Mon, Sep 17, 2001 at 02:12:02PM -0500 References: <200109171912.OAA37610@us.msp.networkcs.com> Message-ID: <20010917171401.A25511@wumpus.foo> On Mon, Sep 17, 2001 at 02:12:02PM -0500, Richard Walsh wrote: > Without trying to be a salesman for SCI, they claim their 66/64 PCI > cards to be 1.33 G_BYTES (not bits) per second bi-directionally with a > 1.46 microsecond latency. Right. This is not a measured value for user process to use process communication through the PCI bus, which is what users care about. > I do not know about the _REAL_ math, buy an anlysis that I did for > a 512 node system a year ago comparing switched versus non-switched > topologies based on SCI cards showed the SCI (cards only) configured > system to be cheaper than Myrnet You should work out the bisection bandwidth. It is substantially lower for SCI for large systems, while Myrinet's bisection scales linearly. This is why all large clusters which need low latency and high bandwidth are Myrinet and not SCI. > Baseline performance of the network had to be > close to that of the Cray T3E. And if you work out the numbers, neither SCI nor Myrinet can match a T3E, especially on latency. greg From tekka99 at libero.it Tue Sep 18 04:49:33 2001 From: tekka99 at libero.it (tekka99@libero.it) Date: Wed Nov 25 01:01:41 2009 Subject: PVM: problem distributing on smp nodes with different number of processors Message-ID: Excuse all, I found the answer in the PVM-HOWTO: I have to give the option pvm_hosts to the pvmpov command line: ... pvm_hosts=aquila,aquila,fire2,fire2,fire1,fire1,linux8000,linux8000, linux8000,linux8000,sds20a,ses40a,ses40 +nt28 to have so 4 tasks on aquila ( 28/14 + 28/14 = 4) 4 tasks on fire2 4 tasks on fire1 8 tasks on linux8000 (28/14 + 28/14 + 28/14 + 28/14 = 8) 2 tasks on sds20a 4 tasks on ses40a Thanks anyway. Bye, Gianluca From sshealy at asgnet.psc.sc.edu Tue Sep 18 08:39:36 2001 From: sshealy at asgnet.psc.sc.edu (Scott Shealy) Date: Wed Nov 25 01:01:41 2009 Subject: Paper showing Linpack scalability of mainstream clusters References: <200109131600.MAA27970@blueraja.scyld.com> Message-ID: <003101c14058$2f6bacd0$3a5d893f@machavelli> Hey Richard is that the right link? Whenever I try to hit it I always get "a page cannot be displayed error" . In fact it appears as if the whole site is inaccesible. I have been trying to access it now for several days. Is this just me? I would be really interested in reading the paper... Thanks for any help Scott Shealy Message: 6 From: "RICHARD,BRUNO (HP-France,ex1)" To: "'beowulf@beowulf.org'" Subject: Paper showing Linpack scalability of mainstream clusters Date: Mon, 10 Sep 2001 09:45:01 +0200 charset="iso-8859-1" Available from : I-Cluster: Reaching TOP500 performance using mainstream hardware By B. Richard (hp Laboratories Grenoble), P. Augerat, N. Maillard, S. Derr, S. Martin, C. Robert (ID Laboratory) Abstract: A common topic for PC clusters is the use of mainstream instead of dedicated hardware i.e., using standard desktop PCs and standard network connectivity, with technology to organize them so that they can be used as a single computing entity. Current work in this "off-the-shelf cluster" domain usually focuses on how to reach a high availability infrastructure, on how to efficiently balance the work between nodes of such clusters, or on how to get the most computing power for loosely-coupled (large grained) problems. hp Labs Grenoble, teaming with INRIA Rh?ne-Alpes, teamed up to build a cluster out of 225 standard hp e-PC interconnected by standard Ethernet, with the objective of getting the highest computational performance and scaling from the simplest desktop PC to the most powerful computers in the world. As an additional constraint, we decided to use a cluster that models a modern enterprise network, using standard machines interconnected through standard Ethernet connectivity. This paper describes the issues and challenges we had to overcome in order to reach the 385th rank in the TOP500 list of most powerful supercomputers in the world on June 21st, 2001, being the first mainstream cluster to enter TOP500 ever. Also we provide hereafter some details about the software and middleware tuning we have done, as well as the impact of different factors on performance such as the network topology and infrastructure hardware. From Nicholas.Nevin at east.sun.com Wed Sep 19 07:12:47 2001 From: Nicholas.Nevin at east.sun.com (Nicholas Nevin - Sun HPC High Performance Computing) Date: Wed Nov 25 01:01:41 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: <003101c14058$2f6bacd0$3a5d893f@machavelli> Message-ID: <20010919101247.A2621@caja> On Tue, Sep 18, 2001 at 11:39:36AM -0400, Scott Shealy wrote: > Hey Richard is that the right link? Whenever I try to hit it I always get "a > page cannot be displayed error" . In fact it appears as if the whole site > is inaccesible. I have been trying to access it now for several days. Is > this just me? I would be really interested in reading the paper... > > Thanks for any help > Scott Shealy > try http://hpl.hp.com/techreports/2001/HPL-2001-206.html -nick From Eugene.Leitl at lrz.uni-muenchen.de Wed Sep 19 09:56:20 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:41 2009 Subject: ALINKA Linux Clustering Letter, September 19th. 2001 (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Wed, 19 Sep 2001 18:53:47 +0200 From: Antoine Brenner To: clustering@alinka.com Subject: ALINKA Linux Clustering Letter, September 19th. 2001 Resent-Date: Wed, 19 Sep 2001 18:53:50 +0200 (CEST) Resent-From: clustering@alinka.com The ALINKA Linux Clustering Letter, Wednesday, September the 19th. 2001 Dear readers, I am happy to send you this week's edition of clustering@alinka.com clustering@alinka.com is a free weekly e-mail newsletter on linux clustering. It provides a summary of the weekly activity in mailing-lists relative to linux clustering (such as beowulf, linux virtual server or linux-ha) and general clustering news. For more information about ALINKA, see: http://www.alinka.com News from the High Performance world, by Dr Laurent Gatineau (lgatineau@alinka.com) ====================================================================== Software for Beowulf cluster ======== FAI (fully automatic installation) 2.2 released [1] FAI is a non interactive system to install a Debian GNU/Linux operating system on a PC cluster. You can take one or more virgin PCs, turn on the power and after a few minutes Linux is installed, configured and running on the whole cluster, without any interaction necessary. Thus it's a scalable method for installing and updating a Beowulf cluster or a network of workstations unattended with little effort involved. FAI uses the Debian distribution and a collection of shell and Perl scripts for the installation process. Changes to the configuration files of the operating system are made by cfengine, shell and Perl scripts. [1] http://www.informatik.uni-koeln.de/fai/ >From http://www.beowulf.org/pipermail/beowulf/2001-September/001268.html Tips and tricks from the Beowulf mailing list ======== * Pedro D?az Jim?nez [m1] posted a link [1] to a small FAQ about cluster. [1] http://planetcluster.org/clusterfaq.html [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001265.html * Marc Cozzi [m1] is looking for using PGI BLAS routines with the Linpack benchmark. Eswar Dev [m2] gave an example using ATLAS BLAS library. Daniel Kidger [m3] pointed him to a document showing bad performances with the PGI BLAS routines. [1] http://computational-battery.org/Programvare/blas-lib-comparison.html [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001267.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001274.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001277.html * About switches, Eric Kuhnke [m1] wrote that he is happy with HP Procurve 4000M, but modules are expensive (look how to get them cheaper [m2,m3]). There was also a discussion about SCI networks [m4,m5] and Myrinet ones [m6,m7]. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001246.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001249.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001251.html [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001272.html [m5] http://www.beowulf.org/pipermail/beowulf/2001-September/001278.html [m6] http://www.beowulf.org/pipermail/beowulf/2001-September/001275.html [m7] http://www.beowulf.org/pipermail/beowulf/2001-September/001279.html * Karl Bellve [m1] is looking for copper based Gigabit card on Alpha system. Donald Becker [m2] answered that Syskonnect, Intel and NatSemi DP83820 network cards have drivers and work with Alpha system. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001250.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001252.html News from MOSIX mailing list by Benoit des Ligneris ===================================================================== No News from Mosix this week, expect a double Letter next week. News from the High Availability world ====================================================================== DRBD devel by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * David Krovich announced [m1] The DRBD HOWTO release 0.5 is available [1] [1] http://www.slackworks.com/~dkrovich/DRBD/ [m1] http://www.geocrawler.com/lists/3/SourceForge/3756/0/6626476/ Failsafe by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * Joachim Gleissner announced [m2] that a new release of failsafe for SuSE is available [2]. it includes two patches for filesystem and nfs resources [m3] [2] ftp://ftp.suse.com/pub/projects/failsafe [m2] http://community.tummy.com/pipermail/linuxfailsafe/2001-September/001219.html [m3] http://community.tummy.com/pipermail/linuxfailsafe/2001-September/001218.html Linux-HA dev by Rached Ben Mustapha (rached@alinka.com) ======== * Alan Robertson posted [m1] a request for comments on a paper [l1] that he wrote about STONITH. [m1] http://marc.theaimsgroup.com/?l=linux-ha-dev&m=100034813201067&w=2 [l1] http://linux-ha.org/heartbeat/ LVS by Rached Ben Mustapha (rached@alinka.com) ======== * Wensong Zhang announced [m1] the availability of LVS 0.9.4, that is available on the LVS website [l1]. He also posted [m2] the url to the patch-only version [l2], and a link to the latest version of ipvsadm [l3]. [l1] http://linux-vs.org/ [l2] http://linux-vs.org/software/kernel/linux-2.4.9-ipvs-0.9.4.patch.gz [l3] http://linux-vs.org/software/kernel/ipvsadm-1.20-1.src.rpm [m1] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100082842828624&w=2 [m2] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100090377721877&w=2 News on the Filesystems front ====================================================================== Coda by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Matthias Teege asked for compatibility between Coda and NIS and replied himself giving the answer [1m]. * Steffen Neumann gave comments about the situation where Coda is not usefull [2m]. [1m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0780.html [2m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0786.html Intermezzo by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Peter Braam announced Intermezzo 1.0.5.2 [1m]. * Shirish Phatak forwarded the announce of librsync 0.9.5 [2m]. [1m] http://www.geocrawler.com/lists/3/SourceForge/8078/0/6615430/ [2m] http://www.geocrawler.com/lists/3/SourceForge/8077/0/6639831/ XFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Masahino Asano was unable to mount a LVM snapshot of an XFS filesystem because the device was marked read-only by LVM [1m]. [1m] http://oss.sgi.com/projects/xfs/mail_archive/0109/msg00304.html and the followings. News on other cluster related topics ====================================================================== linux-ia64 by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * Doug Beattie started an interesting thread about 32 bit & 64 bit libraries coexistence. (start [m4]) [m4] https://external-lists.valinux.com/archives/linux-ia64/2001-September/002149.html LTSP by Bruno Muller (bmuller@alinka.com) ======== * Jim McQuillan anncounced that LTSP 2.09pre2 is available for download[m1]. [m1] http://www.geocrawler.com/lists/3/SourceForge/10022/100/6637852/ ====================================================================== To subscribe to the list, send e-mail to clustering@alinka.com from the address you wish to subscribe, with the word "subscribe" in the subject. To unsubscribe from the list, send e-mail to clustering@alinka.com from the address you wish to unsubscribe from, with the word "unsubscribe" in the subject. Alinka is the editor of the ALINKA ORANGES and ALINKA RAISIN administration software for Linux clusters. (Web site: http://www.alinka.com ) From Daniel.Kidger at quadrics.com Wed Sep 19 11:06:28 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:41 2009 Subject: PGI and HPL benchmarking Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D21D@stegosaurus.bristol.quadrics.com> I wrote: >You do not say why you want to use PGI's BLAS ? > >Check out >http://computational-battery.org/Programvare/blas-lib-comparison.html > >Their results show that Portland's BLAS implementation trails a long way >behind ATLAS and Intel's MKL. I investiagted a little further. It seems that PGI's BLAS is compiled up with backward compatability so that it will run on Pentium II's and lesser. Hence it makes no use of PIII and P4 instructions. i.e. no SSE / SSE2 or prefetching. There is a note in the release notes about recompiing their BLAS, but my distribution doesn't seem to have their source. Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 907 5375 ----------------------- www.quadrics.com -------------------- From gcd at chandra.bgsu.edu Wed Sep 19 12:47:29 2001 From: gcd at chandra.bgsu.edu (Comer Duncan) Date: Wed Nov 25 01:01:41 2009 Subject: query about mpi neural network computing Message-ID: Can people who know where I may find public domain f90 routines for implementing various and sundry neural networks using beowulf clusters and mpi please point me in the right direction. Also, any known reviews of parallel computing methods, issues for neural networks would be much appreciated. Thanks, Comer Duncan ------------ Comer Duncan Professor of Physics Department of Physics and Astronomy Bowling Green State University Bowling Green, OH 43403 email: gcd@chandra.bgsu.edu phone: (419) 372 8108 fax:(419) 372 9938 ------------ From pdiaz88 at terra.es Wed Sep 19 15:51:04 2001 From: pdiaz88 at terra.es (Pedro =?iso-8859-1?q?D=EDaz=20Jim=E9nez?=) Date: Wed Nov 25 01:01:42 2009 Subject: ALINKA Linux Clustering Letter, September 19th. 2001 (fwd) In-Reply-To: References: Message-ID: <01091922510400.00569@duero> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 BTW, A web mirror of the latest ALINKA LCL (soon also other cluster related newsletters) can be found here: http://planetcluster.org/alinka.php (mirrored on the web with the kind permission of the Alinka team) Regards Pedro On Wednesday 19 September 2001 16:56, Eugene Leitl wrote: > -- Eugen* Leitl leitl > ______________________________________________________________ > ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 > 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 > > ---------- Forwarded message ---------- > Date: Wed, 19 Sep 2001 18:53:47 +0200 > From: Antoine Brenner > To: clustering@alinka.com > Subject: ALINKA Linux Clustering Letter, September 19th. 2001 > Resent-Date: Wed, 19 Sep 2001 18:53:50 +0200 (CEST) > Resent-From: clustering@alinka.com > > > The ALINKA Linux Clustering Letter, > > Wednesday, September the 19th. 2001 > > Dear readers, I am happy to send you this week's edition of > clustering@alinka.com > > clustering@alinka.com is a free weekly e-mail newsletter on linux > clustering. It provides a summary of the weekly activity in > mailing-lists relative to linux clustering (such as beowulf, > linux virtual server or linux-ha) and general clustering news. > For more information about ALINKA, see: http://www.alinka.com > > > News from the High Performance world, by Dr Laurent Gatineau > (lgatineau@alinka.com) > ====================================================================== > Software for Beowulf cluster > ======== > FAI (fully automatic installation) 2.2 released [1] > > FAI is a non interactive system to install a Debian GNU/Linux > operating system on a PC cluster. You can take one or more virgin PCs, > turn on the power and after a few minutes Linux is installed, > configured and running on the whole cluster, without any interaction > necessary. Thus it's a scalable method for installing and updating a > Beowulf cluster or a network of workstations unattended with little > effort involved. FAI uses the Debian distribution and a collection of > shell and Perl scripts for the installation process. Changes to the > configuration files of the operating system are made by cfengine, > shell and Perl scripts. > [1] http://www.informatik.uni-koeln.de/fai/ > From http://www.beowulf.org/pipermail/beowulf/2001-September/001268.html > > > Tips and tricks from the Beowulf mailing list > ======== > > * Pedro D?az Jim?nez [m1] posted a link [1] to a small FAQ about cluster. > [1] http://planetcluster.org/clusterfaq.html > [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001265.html > > * Marc Cozzi [m1] is looking for using PGI BLAS routines with the > Linpack benchmark. Eswar Dev [m2] gave an example using ATLAS BLAS > library. Daniel Kidger [m3] pointed him to a document showing bad > performances with the PGI BLAS routines. > [1] http://computational-battery.org/Programvare/blas-lib-comparison.html > [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001267.html > [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001274.html > [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001277.html > > * About switches, Eric Kuhnke [m1] wrote that he is happy with HP > Procurve 4000M, but modules are expensive (look how to get them > cheaper [m2,m3]). There was also a discussion about SCI networks > [m4,m5] and Myrinet ones [m6,m7]. > [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001246.html > [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001249.html > [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001251.html > [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001272.html > [m5] http://www.beowulf.org/pipermail/beowulf/2001-September/001278.html > [m6] http://www.beowulf.org/pipermail/beowulf/2001-September/001275.html > [m7] http://www.beowulf.org/pipermail/beowulf/2001-September/001279.html > > * Karl Bellve [m1] is looking for copper based Gigabit card on Alpha > system. Donald Becker [m2] answered that Syskonnect, Intel and > NatSemi DP83820 network cards have drivers and work with Alpha > system. > [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001250.html > [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001252.html > > News from MOSIX mailing list by Benoit des Ligneris > > ===================================================================== > No News from Mosix this week, expect a double Letter next week. > > > > News from the High Availability world > ====================================================================== > DRBD devel by Guillaume GIMENEZ (ggimenez@alinka.com) > ======== > * David Krovich announced [m1] The DRBD HOWTO release 0.5 is > available [1] > [1] http://www.slackworks.com/~dkrovich/DRBD/ > [m1] http://www.geocrawler.com/lists/3/SourceForge/3756/0/6626476/ > > > Failsafe by Guillaume GIMENEZ (ggimenez@alinka.com) > ======== > * Joachim Gleissner announced [m2] that a new release of failsafe > for SuSE is available [2]. it includes two patches for filesystem > and nfs resources [m3] > [2] ftp://ftp.suse.com/pub/projects/failsafe > [m2] > http://community.tummy.com/pipermail/linuxfailsafe/2001-September/001219.ht >ml [m3] > http://community.tummy.com/pipermail/linuxfailsafe/2001-September/001218.ht >ml > > > Linux-HA dev by Rached Ben Mustapha (rached@alinka.com) > ======== > * Alan Robertson posted [m1] a request for comments on a paper [l1] > that he wrote about STONITH. > [m1] http://marc.theaimsgroup.com/?l=linux-ha-dev&m=100034813201067&w=2 > [l1] http://linux-ha.org/heartbeat/ > > > LVS by Rached Ben Mustapha (rached@alinka.com) > ======== > * Wensong Zhang announced [m1] the availability of LVS 0.9.4, that is > available on the LVS website [l1]. He also posted [m2] the url to > the patch-only version [l2], and a link to the latest version of > ipvsadm [l3]. > [l1] http://linux-vs.org/ > [l2] http://linux-vs.org/software/kernel/linux-2.4.9-ipvs-0.9.4.patch.gz > [l3] http://linux-vs.org/software/kernel/ipvsadm-1.20-1.src.rpm > [m1] > http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100082842828624&w=2 > [m2] > http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100090377721877&w=2 > > > > News on the Filesystems front > ====================================================================== > Coda by Ludovic Ishiomin (lishiomin@alinka.com) > ======== > * Matthias Teege asked for compatibility between Coda and NIS and replied > himself giving the answer [1m]. > * Steffen Neumann gave comments about the situation where Coda is not > usefull [2m]. > [1m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0780.html > [2m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0786.html > > > Intermezzo by Ludovic Ishiomin (lishiomin@alinka.com) > ======== > * Peter Braam announced Intermezzo 1.0.5.2 [1m]. > * Shirish Phatak forwarded the announce of librsync 0.9.5 [2m]. > [1m] http://www.geocrawler.com/lists/3/SourceForge/8078/0/6615430/ > [2m] http://www.geocrawler.com/lists/3/SourceForge/8077/0/6639831/ > > > XFS by Ludovic Ishiomin (lishiomin@alinka.com) > ======== > * Masahino Asano was unable to mount a LVM snapshot of an XFS filesystem > because the device was marked read-only by LVM [1m]. > > [1m] http://oss.sgi.com/projects/xfs/mail_archive/0109/msg00304.html and > the followings. > > > > News on other cluster related topics > ====================================================================== > linux-ia64 by Guillaume GIMENEZ (ggimenez@alinka.com) > ======== > * Doug Beattie started an interesting thread about > 32 bit & 64 bit libraries coexistence. (start [m4]) > [m4] > https://external-lists.valinux.com/archives/linux-ia64/2001-September/00214 >9.html > > > LTSP by Bruno Muller (bmuller@alinka.com) > ======== > * Jim McQuillan anncounced that LTSP 2.09pre2 is available for > download[m1]. [m1] > http://www.geocrawler.com/lists/3/SourceForge/10022/100/6637852/ > > > ====================================================================== > To subscribe to the list, send e-mail to clustering@alinka.com from the > address you wish to subscribe, with the word "subscribe" in the subject. > > To unsubscribe from the list, send e-mail to clustering@alinka.com from > the address you wish to unsubscribe from, with the word "unsubscribe" in > the subject. > > Alinka is the editor of the ALINKA ORANGES and ALINKA RAISIN > administration software for Linux clusters. > (Web site: http://www.alinka.com ) > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf - -- /* * Pedro Diaz Jimenez: pdiaz88@terra.es, pdiaz@acm.asoc.fi.upm.es * * GPG KeyID: E118C651 * Fingerprint: 1FD9 163B 649C DDDC 422D 5E82 9EEE 777D E118 C65 * * http://planetcluster.org * Clustering & H.P.C. news and documentation * */ - -- "Attention to health is life greatest hindrance. " - Plato (427-347 B.C.) "Plato was a bore. " - Friedrich Nietzsche (1844-1900) "Nietzsche was stupid and abnormal. " - Leo Tolstoy (1828-1910) "I'm not going to get into the ring with Tolstoy. " - Ernest Hemingway (1899-1961) "Hemingway was a jerk. " - Harold Robbins -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7qSFjnu53feEYxlERAoh/AJsGQvW30SvHm2yJBbEcG8zAhpHUIACgnfHF tYcLA3/qZxSk1+9tbfPLRII= =V/xM -----END PGP SIGNATURE----- From Eugene.Leitl at lrz.uni-muenchen.de Thu Sep 20 08:40:28 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:42 2009 Subject: linuxbios mailing list digest ? (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO : N48 10'07'' E011 33'53'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Thu, 20 Sep 2001 09:16:27 -0600 (MDT) From: Ronald G Minnich To: Ghozlane Toumi Cc: linuxbios@lanl.gov Subject: Re: linuxbios mailing list digest ? On Thu, 20 Sep 2001, Ghozlane Toumi wrote: > Hi, > is there any kind of digest mailing list for the linuxbios project ? Thanks to the folks at U. Md., we have: http://www.missl.cs.umd.edu/linuxbios/ ron From haddadj at cs.orst.edu Thu Sep 20 14:39:04 2001 From: haddadj at cs.orst.edu (Jalal Haddad) Date: Wed Nov 25 01:01:42 2009 Subject: dual P4 board ? Message-ID: <200109202139.f8KLd4X01030@rasta.CS.ORST.EDU> hey there, We are looking for a low end (ie cheap) dual P4 (1.6-2Ghz) motherboard. The only one that I have seen is Tyan's, and that is out of our price range. Any others that you might want to recommend ? thanks From joelja at darkwing.uoregon.edu Thu Sep 20 15:46:49 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:42 2009 Subject: dual P4 board ? In-Reply-To: <200109202139.f8KLd4X01030@rasta.CS.ORST.EDU> Message-ID: you won't find them cheap because the only chipset available for dual p4's in the intel 860 which is itself far from inexpensive... joelja On Thu, 20 Sep 2001, Jalal Haddad wrote: > hey there, > > We are looking for a low end (ie cheap) dual P4 (1.6-2Ghz) motherboard. The > only one that I have seen is Tyan's, and that is out of our price range. Any > others that you might want to recommend ? > > thanks > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From ctierney at hpti.com Thu Sep 20 16:18:19 2001 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:01:42 2009 Subject: dual P4 board ? In-Reply-To: <200109202139.f8KLd4X01030@rasta.CS.ORST.EDU>; from haddadj@cs.orst.edu on Thu, Sep 20, 2001 at 02:39:04PM -0700 References: <200109202139.f8KLd4X01030@rasta.CS.ORST.EDU> Message-ID: <20010920171819.F7330@hpti.com> Supermicro has a board. I don't think it is very cheap though (~ $600). I think Serverworks is coming out with a P4 dual chipset which will improve the performance and should bring the prices down somewhat. If you want cheap (~ $200) I don't know if/when that will be available. Craig On Thu, Sep 20, 2001 at 02:39:04PM -0700, Jalal Haddad wrote: > hey there, > > We are looking for a low end (ie cheap) dual P4 (1.6-2Ghz) motherboard. The > only one that I have seen is Tyan's, and that is out of our price range. Any > others that you might want to recommend ? > > thanks > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From n.gregory at garageflowers.co.uk Fri Sep 21 03:35:25 2001 From: n.gregory at garageflowers.co.uk (n.gregory@garageflowers.co.uk) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices Message-ID: <20010921103525.5662.qmail@fran.supanetwork.co.uk> (sorry if this gets posted twice) Hello, I am doing some research into the configuration of a 32 1U node Beowulf cluster and have a question regarding CPU configuration. The current choice is between Intel P4s as Itaniums seem a little bleeding edge at the moment, or the latest AMD chips. AMD seem to be getting a impressive performance for the price, but I?m a little concerned about the lack of mature multiprocessor chipsets and their heat issues. Intel on the other-hand have the MP chipsets but seem to be falling down with current lack of (non-commercial ) complier that support MMX and SSE, and the whole issue of Rambus Vs DDR memory. I would be grateful for any insight into a choice of CPU and its configuration in terms of price/performance/expandability, or any other factor I should be considering. Thanks Nick From ctierney at hpti.com Fri Sep 21 09:10:35 2001 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: <20010921103525.5662.qmail@fran.supanetwork.co.uk>; from n.gregory@garageflowers.co.uk on Fri, Sep 21, 2001 at 10:35:25AM -0000 References: <20010921103525.5662.qmail@fran.supanetwork.co.uk> Message-ID: <20010921101035.C9122@hpti.com> On Fri, Sep 21, 2001 at 10:35:25AM -0000, n.gregory@garageflowers.co.uk wrote: > (sorry if this gets posted twice) > Hello, > I am doing some research into the configuration of a 32 1U node Beowulf cluster > and have a question regarding CPU configuration. > > The current choice is between Intel P4s as Itaniums seem a little bleeding edge at > the moment, or the latest AMD chips. > > AMD seem to be getting a impressive performance for the price, but I?m a little > concerned about the lack of mature multiprocessor chipsets and their heat issues. Have you run your code on an AMD and P4? Alot of the published reports show AMD as faster, but mainly for codes that I don't care about and that havenot been recompiled. For the codes I am interested in a dual P4 1.7 Ghz is 30% faster than a dual Athlon 1.2 Ghz when using the Intel compiler for Linux. SSE support is better in the Intel compiler than the Portland Group Compiler. I didn't seem much improvement using Portland Group and SSE than non-SSE. With the Portland Group compiler the two systems perform similiary (within 5%). You are going to spend probably 150K (or more) on hardware, but aren't willing to shell out $1k (or less) for commercial compilers? For the Portland Group compiler, you only need to buy a license for the nodes you are going to compile on (Front ends). I do not know about the Intel compiler, but I hope it is the same. Anyone know? Craig > > Intel on the other-hand have the MP chipsets but seem to be falling down with > current lack of (non-commercial ) complier that support MMX and SSE, and the > whole issue of Rambus Vs DDR memory. > > I would be grateful for any insight into a choice of CPU and its configuration in > terms of price/performance/expandability, or any other factor I should be > considering. > > Thanks > Nick > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From John_Berninger at ncsu.edu Fri Sep 21 09:43:48 2001 From: John_Berninger at ncsu.edu (John Berninger) Date: Wed Nov 25 01:01:42 2009 Subject: Python bindings for libbeostat Message-ID: <20010921124348.I20537@belgarath.math.ncsu.edu> Folks - Following tradition of pybproc, I've decided to write up Python bindings for libbeostat functions and thought they might be of use to others on this list, since I know they're of use to me. If anyone's interested in downloading / testing them, the packages can be found at http://www.berningeronline.net/projects.php#pybeostat; I've made up binary RPMS, source RPMS, and a tarball, all with md5sums and GPG signatures available for those who want to verify the downloads. This package is designed for Scyld's 27z-8 release; there are packages for the 27bz-7 release at http://www.berningeronline.net/projects.php#pybeostat-0.2. If there's sufficient interest, I may even do up some manual pages and include them in the RPMS in future versions, although I've not done so as of yet. -- Thank you, John Berninger Systems Administrator John_Berninger@ncsu.edu Department of Mathematics Box 8205, Harrelson Hall NC State University Raleigh, NC 27695 Phone: (919)515-6315 Fax: (919)515-3798 GPG Key ID: A8C1D45C Fingerprint: B1BB 90CB 5314 3113 CF22 66AE 822D 42A8 A8C1 D45C -- From ron_chen_123 at yahoo.com Sat Sep 22 11:07:34 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:42 2009 Subject: Fwd: Grid Engine: ongoing and future work Message-ID: <20010922180734.86634.qmail@web14702.mail.yahoo.com> If you are interested in HPC, Linux clusters, batch systems, please join this opensource project. http://gridengine.sunsource.net/ (See below for the top 8 projects) -Ron --- Fritz Ferstl wrote: > Reply-to: dev@gridengine.sunsource.net > Date: Tue, 18 Sep 2001 15:17:42 +0200 (MEST) > From: Fritz Ferstl > To: Grid Engine dev alias > > CC: Grid Engine users alias > > Subject: [GE dev] ongoing and future work > > Hi (and sorry for cross-posting), > > the Grid Engine project is now in the open source > since 6 weeks and it is > time to make better use of the dev mailing list. > Those of you, who > are on the users mailing list and not on dev, but > who would wish to be > kept up-to-date on Grid Engine development issue, > might want to register > to the dev alias by sending e-mail to > > dev-subscribe@gridengine.sunsource.net > > > Some of you may not know that Grid Engine has a > history as a commercial > distributed resource management product: first > CODINE, then GRD and later > Sun Grid Engine - see also > > > http://gridengine.sunsource.net/project/gridengine/background.html > > As a result, a group of developers has been working > on Grid Engine already > when it was still proprietary. Also several > development projects have been > initiated before the project became open source. > These projects are still > ongoing. > > We should try to use the dev mailing list for > exchanging information > between those who are new to Grid Engine and those > who have more > experience. We also should try to get as much > community feedback as > possible on the already ongoing development > projects. > > Here is a brief description of the currently active > projects. I would like > to ask all who have questions or feedback to any of > them to respond. In > particular, I would ask developers working on them > to provide information > on their status shortly: > > 1. Grid Engine Enhanced Edition Dispatch Algorithm > > Development in this project is close to be > finished. The corresponding > changes will be checked in within the next few > weeks. The result will be > an improved algorithm for making dispatch > decisions in the Enhanced > Edition case being better aligned with the > (share-tree, functional, > etc.) policies. The corresponding mail from > Shannon in the users archive > > > http://gridengine.sunsource.net/servlets/ReadMsg?msgId=880&listName=users > > provides a good overview. > > 2. Array Jobs > > Several improvements are to come in the context > of the array job > facility. The first part concerns a more > efficient file spooling for > array jobs. It is going to be checked in within > the next few days. > > 3. Scalability, Performance Tuning > > Scalability and performance tuning, in particular > with respect to the > scheduler, are a topic of ongoing efforts. Some > modifications in this > area have been checked in recently, more is to > come. > > 3. Security > > Efforts are underway to include a security system > into the Grid Engine > framework, which provides state-of-the-art > protection against malicious > use and doesn't require any outside > administrative security measures > such as DCE, Kerberos or reserved port > installations. > > 4. Monitoring/Accounting > > There is a project which aims at providing more > information in the areas > of Grid Engine monitoring and accounting to the > administrator and user. > The first draft of a project spec will be > available within the next 2 > weeks. > > 5. Job Interface and other DRM Interfaces > > We intend to start a discussion on interfaces to > DRM software in general > and interfaces for the manipulation of jobs > (submission, monitoring, > control) in particular. A discussion paper is > currently developed and > will be presented in a few weeks. > > 6. Communication System > > Concepts are being developed to improve and > redesign the Grid Engine > communication system. A draft of the new concepts > is under construction > currently. It will be posted within a few weeks. > All feed-back and > information in advance is highly appreciated. > > 7. Multi-Threading > > This project explores further possibilities to > use multi-threading > approaches in the Grid Engine daemons for better > scalability and for > more efficient memory usage. The project is not > far advanced yet and any > early input is very welcome. > > 8. Cluster Queues > > The cluster queue project aims at introducing the > notion of cluster > queues into Grid Engine as an abstraction of the > current host orient > queue definition and for more ease of > administration as well as better > scalability. The project is in an early > specification stage. Please send > feed-back and comments in advance. > > There are a number of smaller development efforts > always ongoing in > parallel to the above more long ranging and larger > projects. Such small > development efforts are usually in response to bug > reports or enhancement > requests, which can be found in IssueZilla > (http://gridengine.sunsource.net/servlets/ProjectIssues). > > I have taken the action to prepare a short list of > top priority issues > regularly, which I will post soon for the first time > to the dev alias. > Input on which Issue to put on this list or on the > prioritization of > individual issues therein is very welcome. > > Beyond all those ongoing efforts, there is a (long) > list of potential > development projects, of course, to enhance Grid > Engine in a wide variety > of ways. The following is a short snippet of this > list, touching some of > the areas in which work has been performed in the > past: > > - Integration with existing schedulers, e.g. MAUI, > RWTH-Aachen, ... > - Interfaces/GUIs, e.g. Web, Java, Perl, Windows, > ... > - Grid framework integration (Globus, Legion, Punch > - some work already > done for all) > - Advanced scheduler policies (e.g., resource > reservation in combination with > preemption or policy driven management of > arbitrary resources) > - Hierarchical configuration and policy > administration > - ... > > Please send your input on these topics or add > further items to this list by > sending your contributions to the dev alias. > > Thanks, > > Fritz > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > dev-unsubscribe@gridengine.sunsource.net > For additional commands, e-mail: > dev-help@gridengine.sunsource.net > __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com From ron_chen_123 at yahoo.com Sat Sep 22 11:13:23 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: <20010921101035.C9122@hpti.com> Message-ID: <20010922181323.53698.qmail@web14706.mail.yahoo.com> Some interesting articles: Floating-Point Compiler Performance Analysis ============================================ http://www.aceshardware.com/Spades/read.php?article_id=40000189 Workstation Battle Royale ========================= (The Top Dual Xeon Workstations and the Dual Athlon MP) http://www.aceshardware.com/Spades/read.php?article_id=45000195 -Ron --- Craig Tierney wrote: > On Fri, Sep 21, 2001 at 10:35:25AM -0000, > n.gregory@garageflowers.co.uk wrote: > > (sorry if this gets posted twice) > > Hello, > > I am doing some research into the configuration of > a 32 1U node Beowulf cluster > > and have a question regarding CPU configuration. > > > > The current choice is between Intel P4s as > Itaniums seem a little bleeding edge at > > the moment, or the latest AMD chips. > > > > AMD seem to be getting a impressive performance > for the price, but I?m a little > > concerned about the lack of mature multiprocessor > chipsets and their heat issues. > > Have you run your code on an AMD and P4? Alot of > the published > reports show AMD as faster, but mainly for codes > that I don't care > about and that havenot been recompiled. > > For the codes I am interested in a dual P4 1.7 Ghz > is 30% faster > than a dual Athlon 1.2 Ghz when using the Intel > compiler for Linux. > SSE support is better in the Intel compiler than the > Portland Group > Compiler. I didn't seem much improvement using > Portland Group and SSE > than non-SSE. With the Portland Group compiler the > two systems perform > similiary (within 5%). > > You are going to spend probably 150K (or more) on > hardware, but aren't > willing to shell out $1k (or less) for commercial > compilers? For the > Portland Group compiler, you only need to buy a > license for the nodes you > are going to compile on (Front ends). I do not know > about the > Intel compiler, but I hope it is the same. Anyone > know? > > > Craig > > > > > Intel on the other-hand have the MP chipsets but > seem to be falling down with > > current lack of (non-commercial ) complier that > support MMX and SSE, and the > > whole issue of Rambus Vs DDR memory. > > > > I would be grateful for any insight into a choice > of CPU and its configuration in > > terms of price/performance/expandability, or any > other factor I should be > > considering. > > > > Thanks > > Nick > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Craig Tierney (ctierney@hpti.com) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com From rojesh_p at yahoo.com Sun Sep 23 08:08:55 2001 From: rojesh_p at yahoo.com (rojesh p) Date: Wed Nov 25 01:01:42 2009 Subject: multiple ethernet load sharing Message-ID: <20010923150855.44327.qmail@web14610.mail.yahoo.com> hello, i am trying to develop a ethernet driver for using more than one ethernet card per machine so that i can increase the speed of the network.i am writing the driver for ne2000 ethernet card. can anyone help in finding the necessary details relating to my project. has anybody written the dirver for multiple ethernet cards which transmit data simultaneously.if so where can i find the source code. __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com From dbernick at angstrom.com Sun Sep 23 13:22:11 2001 From: dbernick at angstrom.com (David Bernick) Date: Wed Nov 25 01:01:42 2009 Subject: multiple ethernet load sharing References: <20010923150855.44327.qmail@web14610.mail.yahoo.com> Message-ID: <3BAE4473.5000103@angstrommicro.com> > > >i am trying to develop a ethernet driver for >using more than one ethernet card per machine so that >i can increase the speed of the network.i am writing >the driver for ne2000 ethernet card. can anyone help >in finding the necessary details relating to my >project. has anybody written the dirver for multiple >ethernet cards which transmit data simultaneously.if >so where can i find the source code. > check out things about "channel bonding" because that's the concept you want. there's an open source product called FIREHOSE that does this. look around on yahoo for it. i forget the company's name. -- David Bernick Senior Technologist - Angstrom Microsystems http://www.angstrommicro.com I guess Bart's not to blame. He's lucky, too, because it's spanking season, and I got a hankering for some spankering! -- Homer Simpson Two Dozen and One Greyhounds From alvin at Maggie.Linux-Consulting.com Sun Sep 23 14:09:47 2001 From: alvin at Maggie.Linux-Consulting.com (alvin@Maggie.Linux-Consulting.com) Date: Wed Nov 25 01:01:42 2009 Subject: multiple ethernet load sharing In-Reply-To: <3BAE4473.5000103@angstrommicro.com> Message-ID: hi david i hear that the ne2000 driver is a major pain and slow??? ( transfers handled by cpu....vs onchip nic-based transfers ne2000 based cards - $15-$25... tulip based cards - $25... eepro100 based cards $45... ( you'd get a bigger/better price performace for tulip/eepro - if you wanna write stuff, i'd politely suggest using a tulip driver or eepro based drivers?? - transferring data at sustained 100/200/400 mbps becomes a (fun) problem have fun alvin On Sun, 23 Sep 2001, David Bernick wrote: > > > > > >i am trying to develop a ethernet driver for > >using more than one ethernet card per machine so that > >i can increase the speed of the network.i am writing > >the driver for ne2000 ethernet card. can anyone help > >in finding the necessary details relating to my > >project. has anybody written the dirver for multiple > >ethernet cards which transmit data simultaneously.if > >so where can i find the source code. > > > > check out things about "channel bonding" because that's the concept you > want. > > there's an open source product called FIREHOSE that does this. look > around on yahoo for it. i forget the company's name. > From ron_chen_123 at yahoo.com Sun Sep 23 15:29:28 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:42 2009 Subject: Unix/Win2k mixed cluster! Message-ID: <20010923222928.21033.qmail@web14704.mail.yahoo.com> --- Albeaus Bayucan wrote: > Date: Thu, 20 Sep 2001 09:28:53 -0700 > From: Albeaus Bayucan > To: "Ethan R. Case" > CC: "'pbs-users@OpenPBS.org'" > Subject: Re: [PBS-USERS] porting PBS to Windows > > "Ethan R. Case" wrote: > > > Has anyone ever ported PBS to Windows? What does > it involve, and how > > difficult is it? Any information would be greatly > appreciated. Thanks. > > Ethan Case > > We have ported PBS on windows 2000. Still some fine > tuning to be done but it > will be released soon under pbspro. Some differences > and mimicking had to be > done from UNIX to windows but they have been dealt > with so far... > > Albeaus > PBS Group > > __________________________________________________________________________ > To unsubscribe: email majordomo@openpbs.org with > body "unsubscribe pbs-users" > For message archives: visit > http://openpbs.org/UserArea/pbs-users.html > - - - - - - - - - - > - - - - > Academic Site? Use PBS Pro free, see: > http://www.pbspro.com/academia.html > OpenPBS and the pbs-users mailing list is sponsored > by Veridian. > __________________________________________________________________________ __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com From hanzl at noel.feld.cvut.cz Mon Sep 24 05:22:58 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:42 2009 Subject: batch systems with job deps (afterok) In-Reply-To: References: Message-ID: <20010924142258M.hanzl@unknown-domain> Hi batch system gurus, is there any opensource batch system with support for job dependencies? The only one I found is OpenPBS (afterok feature of qsub) but it is not easy to run on scyld. Any other ideas please? All we need is to run 100 jobs in any order, then one job to merge results, then 100 jobs again and so on. Thanks for any help Vaclav Hanzl From jakob at unthought.net Mon Sep 24 06:32:43 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:42 2009 Subject: batch systems with job deps (afterok) In-Reply-To: <20010924142258M.hanzl@unknown-domain>; from hanzl@noel.feld.cvut.cz on Mon, Sep 24, 2001 at 02:22:58PM +0200 References: <20010924142258M.hanzl@unknown-domain> Message-ID: <20010924153243.A26106@unthought.net> On Mon, Sep 24, 2001 at 02:22:58PM +0200, hanzl@noel.feld.cvut.cz wrote: > Hi batch system gurus, > > is there any opensource batch system with support for job > dependencies? > > The only one I found is OpenPBS (afterok feature of qsub) but it is > not easy to run on scyld. Any other ideas please? > > All we need is to run 100 jobs in any order, then one job to merge > results, then 100 jobs again and so on. If your jobs are relatively short lived (not days long), you could do this with ANTS (http://unthought.net/antsd) and a Makefile to get the dependencies right. I think that could be a simple and efficient solution to your problem - but note that ANTS is not meant to be a real batch queue system, and it therefore does not have features such as queues, accounting, etc. etc. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From ctierney at hpti.com Mon Sep 24 12:41:33 2001 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: <20010922181323.53698.qmail@web14706.mail.yahoo.com>; from ron_chen_123@yahoo.com on Sat, Sep 22, 2001 at 11:13:23AM -0700 References: <20010921101035.C9122@hpti.com> <20010922181323.53698.qmail@web14706.mail.yahoo.com> Message-ID: <20010924134133.D19063@hpti.com> Thanks for the pointers. They are very interesting reading. However, the conclusions are: 1) You need a good compiler to get the most out of your hardware. 2) The only good benchmark is your own code. I care about Fortran 77 and 90 performance, not raytracers and modelling packages. My results show that the Dual Xeon 1.7 Ghz is 30% faster than the Dual Athlon 1.2 Ghz for the 3 codes I ran when compiled with the Intel Fortran compiler for Linux. I have some more to do, but it seems that for my codes the Xeon processor is best. Craig On Sat, Sep 22, 2001 at 11:13:23AM -0700, Ron Chen wrote: > Some interesting articles: > > Floating-Point Compiler Performance Analysis > ============================================ > http://www.aceshardware.com/Spades/read.php?article_id=40000189 > > Workstation Battle Royale > ========================= > (The Top Dual Xeon Workstations and the Dual Athlon > MP) > http://www.aceshardware.com/Spades/read.php?article_id=45000195 > > -Ron > > --- Craig Tierney wrote: > > On Fri, Sep 21, 2001 at 10:35:25AM -0000, > > n.gregory@garageflowers.co.uk wrote: > > > (sorry if this gets posted twice) > > > Hello, > > > I am doing some research into the configuration of > > a 32 1U node Beowulf cluster > > > and have a question regarding CPU configuration. > > > > > > The current choice is between Intel P4s as > > Itaniums seem a little bleeding edge at > > > the moment, or the latest AMD chips. > > > > > > AMD seem to be getting a impressive performance > > for the price, but I?m a little > > > concerned about the lack of mature multiprocessor > > chipsets and their heat issues. > > > > Have you run your code on an AMD and P4? Alot of > > the published > > reports show AMD as faster, but mainly for codes > > that I don't care > > about and that havenot been recompiled. > > > > For the codes I am interested in a dual P4 1.7 Ghz > > is 30% faster > > than a dual Athlon 1.2 Ghz when using the Intel > > compiler for Linux. > > SSE support is better in the Intel compiler than the > > Portland Group > > Compiler. I didn't seem much improvement using > > Portland Group and SSE > > than non-SSE. With the Portland Group compiler the > > two systems perform > > similiary (within 5%). > > > > You are going to spend probably 150K (or more) on > > hardware, but aren't > > willing to shell out $1k (or less) for commercial > > compilers? For the > > Portland Group compiler, you only need to buy a > > license for the nodes you > > are going to compile on (Front ends). I do not know > > about the > > Intel compiler, but I hope it is the same. Anyone > > know? > > > > > > Craig > > > > > > > > Intel on the other-hand have the MP chipsets but > > seem to be falling down with > > > current lack of (non-commercial ) complier that > > support MMX and SSE, and the > > > whole issue of Rambus Vs DDR memory. > > > > > > I would be grateful for any insight into a choice > > of CPU and its configuration in > > > terms of price/performance/expandability, or any > > other factor I should be > > > considering. > > > > > > Thanks > > > Nick > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or > > unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > > Craig Tierney (ctierney@hpti.com) > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or > > unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > __________________________________________________ > Do You Yahoo!? > Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From brian at heptane.eng.yale.edu Mon Sep 24 21:07:07 2001 From: brian at heptane.eng.yale.edu (Brian) Date: Wed Nov 25 01:01:42 2009 Subject: LSI PCI-X Fibre Channel Host Adapters? Message-ID: Hi guys, I'm spec'ing out a system for a lab here, and although we're most likely leaning towards the eventual use of Myrinet, I also saw that LSI Logic has a 2Gb/s PCI-X adapter, and I was wondering if anyone has any experience with it? Also, back on the subject of Myrinet, (.. and I know a few of you Myri guys read this list!..), is there anything remotely like a product roadmap available, so I'd be able to plan out a time-frame for our purchases? Or, at the very least, know what sort of hardware I'd need? On a mostly unrelated note, I just saw something on a new (upcoming) motherboard from Gigabyte, the GA-6MXDR, allowing for 2 x Tualatin CPUs, and 8 GB of DDR RAM, and equipped with 2 PCI-X 64-bit 100/133Mhz slots. I prefer the Athlons, but hey, it still sounds like a pretty well loaded boad. (www.ocworkbench.com) Thanks for any info, - Brian From dfsousa at uol.com.br Tue Sep 25 01:15:27 2001 From: dfsousa at uol.com.br (Delcides =?iso-8859-1?q?Fl=E1vio=20de=20Sousa=20Jr=2E?=) Date: Wed Nov 25 01:01:42 2009 Subject: Advice needed Message-ID: <20010925081527.061CB245C4@Ulysses> Hi, I'm building a Beowulf cluster ( guess you never heard that before :-) and I'd really appreciate some advice on some questions. 1. Is it better to get the fastest CPU or maybe something slower with more memory ? 2. From tutorials, papers and web pages I get the impression that it's better to start with less nodes with better hardware, e.g. 4 PIV 1.4GHz 512 Mb RAM versus 8 PIII 700MHz 256 Mb RAM ( This example is merely illustrative - not wanting to dwell on the issue PIII vs. PIV :-) 3. What is the performance penalty on running diskless nodes ? I mean, comparing with a full/partial OS install on each node. 4. Could you recommend a good ethernet card/switch ? I've read the page on Gigabit and 100Mbps ethernet technology and I'm inclined to adopt the DEC "Tulip" model. Thanks in advance Delcides F. Sousa Jr. Institute of Physics - State University of Campinas - Brazil From canon at nersc.gov Tue Sep 25 08:22:41 2001 From: canon at nersc.gov (Shane Canon) Date: Wed Nov 25 01:01:42 2009 Subject: linpack/hpl Message-ID: <200109251522.IAA18883@pookie.nersc.gov> Greetings, We are attempting to get a linpack number for our cluster. We have around 250 nodes (dual PIIIs) with an ethernet interconnect (100baseT and 1000baseT). We have tried running it on a subset of machines and are getting disappointing numbers. We are adjusting the problem size and block size primarily. We have also tried a variety of library and compiler combinations (atlas/blas, intel's blas,pgi,gcc). The numbers for under 10 nodes look reasonable, but as we edge higher (>30) things start to tank. I had always understood that linpack was fairly insensitive to the interconnect. How true is this? I understand you can increase the blocksize to limit communications, but this also causes cache thrashing. Right? Are there any other handles to turn in HPL? Has anyone every written a linpack code that would work good on this type of architecture? Thanks in advance, --Shane Canon -- ------------------------------------------------------------------------ Shane Canon voice: 510-486-6981 National Energy Research Scientific fax: 510-486-7520 Computing Center 1 Cyclotron Road Mailstop 50D-106 Berkeley, CA 94720 canon@nersc.gov ------------------------------------------------------------------------ From lindahl at conservativecomputer.com Tue Sep 25 08:22:29 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:42 2009 Subject: Advice needed In-Reply-To: <20010925081527.061CB245C4@Ulysses>; from dfsousa@uol.com.br on Tue, Sep 25, 2001 at 05:15:27AM -0300 References: <20010925081527.061CB245C4@Ulysses> Message-ID: <20010925112229.A2721@wumpus.foo> On Tue, Sep 25, 2001 at 05:15:27AM -0300, Delcides Fl?vio de Sousa Jr. wrote: > 1. Is it better to get the fastest CPU or maybe something slower with more > memory ? It depends. If your application fits into a small amount of memory, then you don't need to buy too much. If your application doesn't fit in memory, you'll get a horrible slowdown. > 2. From tutorials, papers and web pages I get the impression that > it's better to start with less nodes with better hardware, e.g. It depends. You might find your performance is better with more, slower nodes. Depends on how tightly coupled your application is. > 3. What is the performance penalty on running diskless nodes ? I mean, > comparing with a full/partial OS install on each node. It depends. Today, the Scyld diskless distribution is definitely the easiest to admin, and for your average MPI program, it works great. But if you wanted to run scripts on all the nodes that accessed a lot of other programs, Scyld wouldn't do the job. greg From hanzl at noel.feld.cvut.cz Tue Sep 25 08:39:07 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:42 2009 Subject: batch systems with job deps (afterok) In-Reply-To: <20010924153243.A26106@unthought.net> References: <20010924142258M.hanzl@unknown-domain> <20010924153243.A26106@unthought.net> Message-ID: <20010925173907Y.hanzl@unknown-domain> >> opensource batch system with support for job dependencies? >> ... >> OpenPBS (afterok feature of qsub) ... not easy to run on scyld >> ... >> All we need is to run 100 jobs in any order, then one job to merge >> results, then 100 jobs again and so on. > >If your jobs are relatively short lived (not days long), you could do this with >ANTS (http://unthought.net/antsd) and a Makefile to get the dependencies right. > >I think that could be a simple and efficient solution to your problem - but >note that ANTS is not meant to be a real batch queue system, and it therefore >does not have features such as queues, accounting, etc. etc. Thanks, using 'make' this way really makes sense as our speech recognition training is something like huge build composed of five-minutes tasks. What I would miss with make (and is easy with 'afterok' job dependency) is the possibility of gradual creation of program to run - usually I am just a few steps ahead with program debugging - previous steps run while I prepare things which should follow (and once prepared, I would like them to run as soon as the previous step is finished). But maybe it is still possible to use make, antsd and some additional machinery, I will look into this. Thanks Vaclav From lindahl at conservativecomputer.com Tue Sep 25 08:30:37 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:42 2009 Subject: LSI PCI-X Fibre Channel Host Adapters? In-Reply-To: ; from brian@heptane.eng.yale.edu on Tue, Sep 25, 2001 at 12:07:07AM -0400 References: Message-ID: <20010925113037.B2721@wumpus.foo> On Tue, Sep 25, 2001 at 12:07:07AM -0400, Brian wrote: > Also, back on the subject of Myrinet, (.. and I know a few of you Myri > guys read this list!..), is there anything remotely like a product roadmap > available, Well, I'm not sure what's non disclosure and what isn't, but I'll ask. The main benefit of PCIX is that it will allow Myricom to again double their bandwidth, and it will allow significantly lower latencies for tiny packets. > On a mostly unrelated note, I just saw something on a new (upcoming) > motherboard from Gigabyte, the GA-6MXDR, allowing for 2 x Tualatin CPUs, > and 8 GB of DDR RAM, and equipped with 2 PCI-X 64-bit 100/133Mhz slots. > I prefer the Athlons, but hey, it still sounds like a pretty well loaded > boad. (www.ocworkbench.com) It's not clear that the Pentium III "Tulatin" cpus are ever going to catch up to the Pentium 4 or AMD offerings. Unfortunately the Pentium III currently has the best PCI bus implementations. However, if you examine your applications, I suspect that you'll find that you aren't maxing out communications, and that a mildly faster CPU is better than a mildly faster PCI bus. BTW there's a comparison of PCI bus speeds measured by Myrinet cards at: http://conservativecomputer.com/myrinet/ greg From hanzl at noel.feld.cvut.cz Tue Sep 25 08:51:06 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:42 2009 Subject: Fwd: Grid Engine: ongoing and future work In-Reply-To: <20010922180734.86634.qmail@web14702.mail.yahoo.com> References: <20010922180734.86634.qmail@web14702.mail.yahoo.com> Message-ID: <20010925175106R.hanzl@unknown-domain> I just found in documentation that gridengine supports batch job dependency which I desperately look for (so far having found it in PBS only, but I am unable to run PBS on Scyld). So I am willing to join the work. Anybody knows how hard it is (would be) to run gridengine on Scyld Beowulf? Thanks Vaclav From gary at umsl.edu Tue Sep 25 09:46:32 2001 From: gary at umsl.edu (Gary Stiehr) Date: Wed Nov 25 01:01:42 2009 Subject: Advice needed References: <20010925081527.061CB245C4@Ulysses> Message-ID: <3BB0B4E8.6090902@umsl.edu> Delcides Fl?vio de Sousa Jr. wrote: > Hi, > > I'm building a Beowulf cluster ( guess you never heard that before > :-) and I'd really appreciate some advice on some questions. > > 1. Is it better to get the fastest CPU or maybe something slower with more > memory ? Hi, Unless you have a large budget, you should analyze the application(s) you are going to run on the cluster to determine the most cost-effective way to build your cluster. * How much RAM does your application use? * Do you have large input files or do you generate a lot of output? * Does your application generate a lot of disk activity during its execution? * Will the parallel processes in your application need to communicate often? * and so on A lot of terms I have used above are vague ("a lot of disk activity", "comminicate often", etc.). Basically, if you have some idea of how much of each resource your application uses (disk space, RAM, processor time, network bandwidth), you will have a better idea of the type of hardware you need to buy. Obviously if you have a big budget, it doesn't hurt to get more RAM and disk space or faster processorsthan you will need. -- Gary Stiehr gary@umsl.edu > > > Thanks in advance > > Delcides F. Sousa Jr. > Institute of Physics - State University of Campinas - Brazil > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From lindahl at conservativecomputer.com Tue Sep 25 10:07:57 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:42 2009 Subject: linpack/hpl In-Reply-To: <200109251522.IAA18883@pookie.nersc.gov>; from canon@nersc.gov on Tue, Sep 25, 2001 at 08:22:41AM -0700 References: <200109251522.IAA18883@pookie.nersc.gov> Message-ID: <20010925130757.B3177@wumpus.foo> On Tue, Sep 25, 2001 at 08:22:41AM -0700, Shane Canon wrote: > We are adjusting the problem > size and block size primarily. You need to set the problem size as large as possible. If it were too small, you'd see exactly what you report, namely that above a certain # of nodes, it begins to suck. To set the problem size, figure out how much memory per node you want to use, sum it up, and figure out how big of an array you can then use (sqrt and divide by the element size, which is double precision complex, or 16 bytes.) greg From patrick at myri.com Tue Sep 25 12:15:03 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:42 2009 Subject: LSI PCI-X Fibre Channel Host Adapters? References: <20010925113037.B2721@wumpus.foo> Message-ID: <3BB0D7B7.150EF52A@myri.com> Greg Lindahl wrote: > > On Tue, Sep 25, 2001 at 12:07:07AM -0400, Brian wrote: > > > Also, back on the subject of Myrinet, (.. and I know a few of you Myri > > guys read this list!..), is there anything remotely like a product roadmap > > available, > > Well, I'm not sure what's non disclosure and what isn't, but I'll > ask. The main benefit of PCIX is that it will allow Myricom to again > double their bandwidth, and it will allow significantly lower > latencies for tiny packets. I cannot, of course, confirm that ;-) If a such NIC would ever exist, the expected improvement would be close to Greg's guess. Patrick ----------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | | Myricom, Inc. http://www.myri.com | | Cell: 865-389-8852 685 Emory Valley Rd (B) | | Fax: 865-425-0978 Oak Ridge, TN 37830 | ----------------------------------------------------------- From patrick at myri.com Tue Sep 25 12:20:19 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:42 2009 Subject: linpack/hpl References: <200109251522.IAA18883@pookie.nersc.gov> <20010925130757.B3177@wumpus.foo> Message-ID: <3BB0D8F3.915993A3@myri.com> Greg Lindahl wrote: > To set the problem size, figure out how much memory per node you want > to use, sum it up, and figure out how big of an array you can then use > (sqrt and divide by the element size, which is double precision > complex, or 16 bytes.) Greg, It's not double precision complex, it's double precision real, so sqrt and then divide per 8. Patrick ----------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | | Myricom, Inc. http://www.myri.com | | Cell: 865-389-8852 685 Emory Valley Rd (B) | | Fax: 865-425-0978 Oak Ridge, TN 37830 | ----------------------------------------------------------- From brian at heptane.eng.yale.edu Tue Sep 25 11:31:00 2001 From: brian at heptane.eng.yale.edu (Brian) Date: Wed Nov 25 01:01:42 2009 Subject: LSI PCI-X Fibre Channel Host Adapters? In-Reply-To: <3BB0D7B7.150EF52A@myri.com> Message-ID: > If a such NIC would ever exist, the expected improvement would be > close to Greg's guess. If such a NIC were to ever possibly exist, would we be able to speculate on the possible time-frame for it's introduction (and requirements)? All purely hypothetical guesses, of course, since such a NIC might, in fact, never exist. Riiiight. Is that pushing for too much? ;) - Brian From patrick at myri.com Tue Sep 25 12:57:53 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:42 2009 Subject: LSI PCI-X Fibre Channel Host Adapters? References: Message-ID: <3BB0E1C1.A1BF7A39@myri.com> Brian wrote: > > > If a such NIC would ever exist, the expected improvement would be > > close to Greg's guess. > > If such a NIC were to ever possibly exist, would we be able to speculate > on the possible time-frame for it's introduction (and requirements)? All > purely hypothetical guesses, of course, since such a NIC might, in fact, > never exist. Riiiight. Right :-) For all products, it's always a bad idea to speculate on the time-frame before production is ready to go. I don't know if you are looking to delay your choice waiting for a such hypothetical product, but if Myricom decide to go PCI-X, it would definitely not be in 2001. Patrick PS: I would be very carefull about early PCI-X motherboards. It's not easy to make a good PCI-X chipset (it's not easy to make a good PCI chipset today), and the learning curve will be steep for the vendors. ----------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | | Myricom, Inc. http://www.myri.com | | Cell: 865-389-8852 685 Emory Valley Rd (B) | | Fax: 865-425-0978 Oak Ridge, TN 37830 | ----------------------------------------------------------- From erayo at cs.bilkent.edu.tr Tue Sep 25 20:45:39 2001 From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa)) Date: Wed Nov 25 01:01:42 2009 Subject: Using firewire for a torus? Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Do you think it would be worthwhile to use the new 3.2 gb/s Firewire to make a cluster with torus topology? Thanks, - -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7sU9jfAeuFodNU5wRAhzfAKCbPTWmwL4/sBrGMCIjIfXCPEGo6ACfcLv5 ETquIhJqDYBoDGq7pucZc94= =hicm -----END PGP SIGNATURE----- From jakob at unthought.net Tue Sep 25 20:59:27 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:42 2009 Subject: batch systems with job deps (afterok) In-Reply-To: <20010925173907Y.hanzl@unknown-domain>; from hanzl@noel.feld.cvut.cz on Tue, Sep 25, 2001 at 05:39:07PM +0200 References: <20010924142258M.hanzl@unknown-domain> <20010924153243.A26106@unthought.net> <20010925173907Y.hanzl@unknown-domain> Message-ID: <20010926055927.B7750@unthought.net> On Tue, Sep 25, 2001 at 05:39:07PM +0200, hanzl@noel.feld.cvut.cz wrote: > >> opensource batch system with support for job dependencies? > >> ... > >> OpenPBS (afterok feature of qsub) ... not easy to run on scyld > >> ... > >> All we need is to run 100 jobs in any order, then one job to merge > >> results, then 100 jobs again and so on. > > > >If your jobs are relatively short lived (not days long), you could do this with > >ANTS (http://unthought.net/antsd) and a Makefile to get the dependencies right. > > > >I think that could be a simple and efficient solution to your problem - but > >note that ANTS is not meant to be a real batch queue system, and it therefore > >does not have features such as queues, accounting, etc. etc. > > Thanks, using 'make' this way really makes sense as our speech > recognition training is something like huge build composed of > five-minutes tasks. antsd was made for tasks of that duration (C++ compilation in my case). > > What I would miss with make (and is easy with 'afterok' job > dependency) is the possibility of gradual creation of program to run - > usually I am just a few steps ahead with program debugging - previous > steps run while I prepare things which should follow (and once > prepared, I would like them to run as soon as the previous step is > finished). If your runs take an input file and generate an output file from the input, make should be able to see that it should not re-run already completed jobs. Thus, typing "make" again as your next task is ready should only result in the non-completed tasks being run. Or am I missing something ? -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From ivan at biocomp.unibo.it Wed Sep 26 01:57:18 2001 From: ivan at biocomp.unibo.it (Ivan Rossi) Date: Wed Nov 25 01:01:42 2009 Subject: Scyld: bad scaling Message-ID: Beowulfers, recently i rebuilt our tiny 10 CPUs cluster using Scyld. Before i have been using RedHat 6.2 + LAM MPI. And i like it, it is easier to mantain. Unfortunately, after the rebuild, I found a marked performance degradation with respect to the former installation. In particular i found a disappointingly bad scaling for the application we use most, the MD program Gromacs 2.0. Now scaling goes almost exactly as the square root of the number of nodes, that is it takes 4 CPUs to double performance and nine CPUs to triple them. Since no hardware has been changed, in my opinion it must be either the pre-compiled Scyld kernel, bpsh or Scyld MPICH. So i hope that some fine tuning of them should solve the problem. Do you have any advice about the cause of the problem and about what to do? I really would like to stay with Scyld. Thanks in advance Ivan PS The cluster is composed by 4 Dual PIII 500, a dual PIII 700 front-end and an Intel Express 520 switch. RAM is 512MB on each machine. NIC are Intel eepro100 cards -- Dr. Ivan Rossi - CIRB Biocomputing Unit - University of Bologna (Italy) e-mail: ivan@biocomp.unibo.it Web: http://www.biocomp.unibo.it/ivan From hanzl at noel.feld.cvut.cz Wed Sep 26 02:03:10 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:42 2009 Subject: batch systems with job deps (afterok) In-Reply-To: <20010926055927.B7750@unthought.net> References: <20010924153243.A26106@unthought.net> <20010925173907Y.hanzl@unknown-domain> <20010926055927.B7750@unthought.net> Message-ID: <20010926110310Q.hanzl@unknown-domain> > > What I would miss with make (and is easy with 'afterok' job > > dependency) is the possibility of gradual creation of program to run - > > usually I am just a few steps ahead with program debugging - previous > > steps run while I prepare things which should follow (and once > > prepared, I would like them to run as soon as the previous step is > > finished). > > If your runs take an input file and generate an output file from the input, > make should be able to see that it should not re-run already completed > jobs. > > Thus, typing "make" again as your next task is ready should only result in > the non-completed tasks being run. > > Or am I missing something ? Typically, previous invocation of 'make' would still run when I have the next steps ready. Typing "make" again at this moment would cause both copies to work on unfinished step (second make would not wait as it should). (Typically, I might finish the next step on friday and would like it to be invoked during the weekend, as soon as possible.) However there might be a simple solution (like one 'make' in loop, until there is nothing to do). I wonder why my needs seem to be uncommon - is it because you guys on the beowulf list 1) are not as lazy as I am and you have the whole program ready in time? 2) you change data/parameters rather then programs? 3) you have plenty of time and do not need to work in parallel with your cluster? I do not beleive any of these, please tell me why you all are not calling for job dependencies in any job spooling system :) Regards Vaclav From siegert at sfu.ca Wed Sep 26 17:54:45 2001 From: siegert at sfu.ca (Martin Siegert) Date: Wed Nov 25 01:01:42 2009 Subject: GigE fiber NIC Message-ID: <20010926175445.A8037@stikine.ucs.sfu.ca> I am looking for a good GigE fiber NIC that is well supported under Linux 2.4.x kernels. The NIC is going to be used for NFS trafic between the master node that holds the home directories and the switch. All 70 slave nodes have a 100baseT NFS connection to that switch. Thus performance and reliability of the NIC/driver combination are equally important. The motherboard (Tyan Thunder K7) supports 64bit/33MHz PCI. There seem to be the following cards that are supported under Linux: - 3Com 3c985B-SX - Netgear GA620 - Syskonnect SK-9843 - National Semiconductor DP83820 Intel makes GigE cards as well, but the driver is not distributed with the kernel. Thus I would rely on Intel to have a driver available when I want to upgrade the kernel. My google searches did not come up with any information about the availability of Packet Engines NICs (now Alcatel) and Alteon AceNIC (now Nortel). Any suggestions/recommendations? Thanks, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert@sfu.ca Canada V5A 1S6 ======================================================================== From joelja at darkwing.uoregon.edu Wed Sep 26 23:47:58 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:42 2009 Subject: GigE fiber NIC In-Reply-To: <20010926175445.A8037@stikine.ucs.sfu.ca> Message-ID: On Wed, 26 Sep 2001, Martin Siegert wrote: > - 3Com 3c985B-SX > - Netgear GA620 > - Syskonnect SK-9843 > - National Semiconductor DP83820 > > Intel makes GigE cards as well, but the driver is not distributed with > the kernel. Thus I would rely on Intel to have a driver available when > I want to upgrade the kernel. I'd probably go with the sysconnect for performance/support followed by the ga620 which is the cheapest of the acenic (the differences between them aren't substantive)based cards. the nat semi chipset has the distinction of being really cheap (the 10/100/1000 copper cards from dlink are $89. but the chipset on has an 8k transmit and 32k recieve buffer which makes it not the most desireable for a high-end gig card... > My google searches did not come up with any information about the availability > of Packet Engines NICs (now Alcatel) and Alteon AceNIC (now Nortel). you won't find either of these cards out there, alcatel killed the nic business, and nortel merged alteons desktop products with their baynetworks netgear division... > Any suggestions/recommendations? > > Thanks, > Martin > > ======================================================================== > Martin Siegert > Academic Computing Services phone: (604) 291-4691 > Simon Fraser University fax: (604) 291-4242 > Burnaby, British Columbia email: siegert@sfu.ca > Canada V5A 1S6 > ======================================================================== > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From dean at ucdavis.edu Fri Sep 21 09:07:31 2001 From: dean at ucdavis.edu (Dean Lavelle) Date: Wed Nov 25 01:01:42 2009 Subject: Scyld and mpi FASTA Makefile Problems Message-ID: <3BAB65C3.1E811A62@ucdavis.edu> Dear users, I am a beowulf newbie who has almost gone completely mad trying to rework the Makefile for the mpi FASTA under the Scyld release 7operating system. The small Scyld beowulf system that I have constructed works perfectly for the Linpack as well as the mpi-mandel test applications. included in the Scyld distribution. However, I cannot seem to get the included Makefile for FASTA to compile. Does anyone have a Makefile for the mpi enabled FASTA that compiles under Scyld? Any pointers would be immensly helpful (I wouldn be surprised to learn that I have done everything incorrectly). Much thanks, Dean Lavelle From mxtd at vision.ime.usp.br Tue Sep 25 04:35:47 2001 From: mxtd at vision.ime.usp.br (Martha Ximena Torres Delgado) Date: Wed Nov 25 01:01:42 2009 Subject: PBS pro and Debian Message-ID: <3BB06C12.1859F697@vision.ime.usp.br> Hi, Has anyone experiencie to install PBS Pro ( free University licenses) with Debian, using the comand "alien"?? Any information would be greatly appreciated, Thanks Martha From bruno_richard at hp.com Tue Sep 18 08:45:12 2001 From: bruno_richard at hp.com (RICHARD,BRUNO (HP-France,ex1)) Date: Wed Nov 25 01:01:42 2009 Subject: Paper showing Linpack scalability of mainstream clusters Message-ID: Sorry Scott, I sent you a wrong reference. The actual link is http://www.hpl.hp.com/techreports/2001/HPL-2001-206.html. Enjoy, -bruno -----Original Message----- From: Scott Shealy [mailto:sshealy@asgnet.psc.sc.edu] Sent: Tuesday, September 18, 2001 17:40 To: beowulf@beowulf.org Cc: bruno_richard@hp.com Subject: Re: Paper showing Linpack scalability of mainstream clusters Hey Richard is that the right link? Whenever I try to hit it I always get "a page cannot be displayed error" . In fact it appears as if the whole site is inaccesible. I have been trying to access it now for several days. Is this just me? I would be really interested in reading the paper... Thanks for any help Scott Shealy Message: 6 From: "RICHARD,BRUNO (HP-France,ex1)" To: "'beowulf@beowulf.org'" Subject: Paper showing Linpack scalability of mainstream clusters Date: Mon, 10 Sep 2001 09:45:01 +0200 charset="iso-8859-1" Available from : I-Cluster: Reaching TOP500 performance using mainstream hardware By B. Richard (hp Laboratories Grenoble), P. Augerat, N. Maillard, S. Derr, S. Martin, C. Robert (ID Laboratory) Abstract: A common topic for PC clusters is the use of mainstream instead of dedicated hardware i.e., using standard desktop PCs and standard network connectivity, with technology to organize them so that they can be used as a single computing entity. Current work in this "off-the-shelf cluster" domain usually focuses on how to reach a high availability infrastructure, on how to efficiently balance the work between nodes of such clusters, or on how to get the most computing power for loosely-coupled (large grained) problems. hp Labs Grenoble, teaming with INRIA Rh?ne-Alpes, teamed up to build a cluster out of 225 standard hp e-PC interconnected by standard Ethernet, with the objective of getting the highest computational performance and scaling from the simplest desktop PC to the most powerful computers in the world. As an additional constraint, we decided to use a cluster that models a modern enterprise network, using standard machines interconnected through standard Ethernet connectivity. This paper describes the issues and challenges we had to overcome in order to reach the 385th rank in the TOP500 list of most powerful supercomputers in the world on June 21st, 2001, being the first mainstream cluster to enter TOP500 ever. Also we provide hereafter some details about the software and middleware tuning we have done, as well as the impact of different factors on performance such as the network topology and infrastructure hardware. From dean at ucdavis.edu Tue Sep 18 14:56:07 2001 From: dean at ucdavis.edu (Dean Lavelle) Date: Wed Nov 25 01:01:42 2009 Subject: Scyld and mpi FASTA Makefile Problems Message-ID: <3BA7C2F7.9A544096@ucdavis.edu> I have almost gone completely mad trying to rework the Makefile for the mpi FASTA under the Scyld operating system. The small Scyld beowulf system that I have constructed works perfectly for the Linpack as well as the mpi-mandel test applications. included in the Scyld distribution. However, I cannot seem to get the included Makefile for FASTA to compile. Does anyone have a Makefile for the mpi enabled FASTA that compiles under Scyld? Any pointers would be immensly helpful (I wouldn be surprised to learn that I have done everything incorrectly). Much thanks, Dean Lavelle From larry at kronos.jpl.nasa.gov Tue Sep 18 18:42:59 2001 From: larry at kronos.jpl.nasa.gov (Larry A. Bergman) Date: Wed Nov 25 01:01:42 2009 Subject: IEEE Cluster 2001 Conference To Go Forward as planned Message-ID: ** 2ND NOTICE ** (our apologies if you receive more than one copy of this) TO: All IEEE Cluster 2001 Participants and Prospective Participants SUBJECT: Status of IEEE Cluster 2001 Conference: October 8-11, 2001 Extended Advanced Registration Deadline: now Monday September 24th. In view of the tragic events of this past week, our hearts go out to all those who lost loved ones or colleagues in the horrific acts of terrorism in New York and Washington. As many of you know, the brief shutdown of the American air transportation system last week did cause the cancellation of a number of professional meetings throughout the USA. This week, the air transport system has resumed operation, with greatly heightened security on both domestic and international flights. As of yesterday, flights were roughly 60% of normal, and increasing day-by-day. Increased patrols and surveillance in and around U.S. cities, airports, borders, vital infrastructure, and national landmarks (to name a few) are also in effect all designed to increase safety of the US population and foreign visitors. THE BOTTOM LINE: We believe that travel to California will be stabile and safe by early October. Therefore, we plan to GO FORWARD WITH IEEE CLUSTER 2001 AS PLANNED. Because many businesses had shut down temporarily this last week, or had curtailed working hours, we would like to announce that the ADVANCED REGISTRATION DEADLINE will be slipped from Monday September 17th until Monday September 24th. This will give you adequate time to make your travel arrangements through your respective business travel departments. In spite of recent events, we still strongly believe that IEEE CLUSTER 2001 promises to be one of the best professional meetings of its kind, bringing together the industry, development, applications, and research communities collectively at a premier venue. If you have not already done so, we encourage all of you to take advantage of the reduced registration cost up until Monday September 24th. Further details of the event are available on the conference Web site at http://www.cacr.caltech.edu/cluster2001 We look forward to seeing all of you in Newport Beach, LA, in October. Mark Baker (University of Portsmouth, UK) Larry Bergman (JPL, USA) General Chairs, Cluster 2001 From cui at chem.wisc.edu Sat Sep 15 10:40:21 2001 From: cui at chem.wisc.edu (Qiang Cui) Date: Wed Nov 25 01:01:42 2009 Subject: auto-restart Message-ID: Hi, folks I have been encountering a strange situation - my new Linux box has been rebooting itself spontaneously rather randomly. I thought it's related to some strange cron jobs, but it still happens after I disabled cron. Sometimes it happens during a Netscape session, sometimes it just happens without any CPU-intensive jobs running.... Any suggestions? Could this be related to a hardware problem, or some bug/security feature related to Redhat 7.1? Where do I even begin to solve this problem? Thanks! -- ________________________________________________________ Qiang Cui Assistant Professor of Chemistry Department of Chemistry University of Wisconsin, Madison 1101 University Ave Madison, WI 53706 Phone: 608-262-9801 Fax: 608-262-4782 Email: cui@chem.wisc.edu Web: http://www.chem.wisc.edu/main/people/faculty/cui.html Group: http://www.chem.wisc.edu/~cui ________________________________________________________ From korsedal at zaiqtech.com Tue Sep 18 14:39:27 2001 From: korsedal at zaiqtech.com (Korsedal, Brian) Date: Wed Nov 25 01:01:42 2009 Subject: Linux cluster in commercial office? Message-ID: <706F0A000C79D5118BB10004ACC5603D15FDE9@acad-hq-ex-1.woburn.asic-alliance.com> I work in a company that designs ASIC's (application specific integrated circuits ... chips) and FPGA's. We use two sun servers and a bunch of PC's. The software we use for simulations runs on Unix and Linux. We'd like to look into clustering our PC's so that we can have an extra high performance server. Each PC would still have to function as a terminal (office apps and the ability to run processes on the unix machines) but use the free CPU time to run simulations. Is there any implementation of clustering software for this? If there isn't, it would be an interesting thing to look into, there are many offices with computers that are barely used. My CPU sits idle 95% of the time and it would be great to caputer the extra CPU cycles. Does anybody have any thoughts about this or know how to make it happen? My theory would work something like this: Partition each hard drive with RedHat 7.0. Upgrade to 1Gig ethernet. Install Bewolf software. Run StarOffice, Netscape Nav and other linux tools for Office functions. I probably need a very comprehensive plan if I am going to convince my company that it is worth trying. Any help would be greatly appriciated. Sincerely, Brian Korsedal From j.a.white at larc.nasa.gov Mon Sep 17 07:15:37 2001 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Wed Nov 25 01:01:42 2009 Subject: [Fwd: Problem using -p4pg and procgroup file] Message-ID: <3BA60588.CF05F00@larc.nasa.gov> Dear Group, I recently sent an email message to the mpich help address and the response I got back was rather unsatisfactory. Basically they told me to RTFM. Considering that I had attached several pages of MPICH debug information that (I think) showed a problem I was not satisfied with thier response. I would greatly appreciate any help/suggestion anyone on the list can provide. To view a complete description of the problem and the mpich debug information please see the attached file. Thanks, Jeff White Jeffery A. White email : j.a.white@larc.nasa.gov Phone : (757) 864-6882 ; Fax : (757) 864-6243 URL : http://hapb-www.larc.nasa.gov/~jawhite/ -------------- next part -------------- To whom it may concern, I am trying to figure out how to use the -p4pg option in mpirun and I am experiencing some difficulties. My cluster configuration is as follows: node0 : machine : Dual processor Supermicro Super 370DLE cpu : 1 GHz Pentium 3 O.S. : Redhat Linux 7.1 kernel : 2.4.2-2smp mpich : 1.2.1 nodes1->18 : machine : Compaq xp1000 cpu : 667 MHz DEC alpha 21264 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 nodes 19->34 : machine : Microway Screamer cpu : 667 MHz DEC alpha 21164 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 The heterogeneous nature of the machine has made me migrate from using the -machinefile option to the -p4pg option. I have been trying to get a 2 processor job to run while submitting the mpirun command from node0 (-nolocal is specified) and using either nodes 1 and 2 or nodes 2 and 3. If I use the -machinefile approach I am able to run on any homogeneous combination of nodes. However, if I use the -p4pg approach I have not been able to run unless my mpi master node is node1. As long as node1 is the mpi master node then I can use any one of nodes 2 through 18 as the 2nd processor. THe following 4 runs illustrates what I have gotten to work as well as what doesn't work (and the subsequent error message). Runs 1, 2 and 3 worked and run 4 failed. 1) When submitting from node0 using the -machinefile option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver the machinefile file vulcan.hosts contains: node1 node2 the PIXXXX file created contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v option reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI10802 the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 15:49:32 hyprwulf1 xinetd[21912]: START: shell pid=23013 from=192.168.47.31 Sep 13 15:49:32 hyprwulf1 pam_rhosts_auth[23013]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:49:32 hyprwulf1 PAM_unix[23013]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:49:32 hyprwulf1 in.rshd[23014]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI15564 -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node2 contains : Sep 13 15:49:32 hyprwulf2 xinetd[13163]: START: shell pid=13490 from=192.168.47.32 Sep 13 15:49:32 hyprwulf2 pam_rhosts_auth[13490]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 15:49:32 hyprwulf2 PAM_unix[13490]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:49:32 hyprwulf2 in.rshd[13491]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node1 34248 \-p4amslave' and the program executes successfully 2) When submitting from node0 using the -p4pg option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the progroup file vulcan.hosts contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver and the -v options reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 15:41:46 hyprwulf1 xinetd[21912]: START: shell pid=22978 from=192.168.47.31 Sep 13 15:41:46 hyprwulf1 pam_rhosts_auth[22978]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:41:46 hyprwulf1 PAM_unix[22978]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:41:46 hyprwulf1 in.rshd[22979]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node2 contains : Sep 13 15:41:46 hyprwulf2 xinetd[13163]: START: shell pid=13472 from=192.168.47.32 Sep 13 15:41:46 hyprwulf2 pam_rhosts_auth[13472]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 15:41:46 hyprwulf2 PAM_unix[13472]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:41:46 hyprwulf2 in.rshd[13473]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node1 34240 \-p4amslave' and the program executes successfully 3) When submitting from node0 using the -machinefile option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the machinefile file vulcan.hosts contains: node2 node3 the PIXXXX file created contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI11592 the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : no events during this time frame the /var/log/messages file on node2 contains : Sep 13 15:35:29 hyprwulf2 xinetd[13163]: START: shell pid=13451 from=192.168.47.31 Sep 13 15:35:29 hyprwulf2 pam_rhosts_auth[13451]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:35:29 hyprwulf2 PAM_unix[13451]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:35:29 hyprwulf2 in.rshd[13452]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI15167 -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node3 contains : Sep 13 15:35:29 hyprwulf3 xinetd[11167]: START: shell pid=11435 from=192.168.47.33 Sep 13 15:35:29 hyprwulf3 pam_rhosts_auth[11435]: allowed to jawhite@hyprwulf-boot2.hapb as jawhite Sep 13 15:35:29 hyprwulf3 PAM_unix[11435]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:35:29 hyprwulf3 in.rshd[11436]: jawhite@hyprwulf-boot2.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node2 33713 \-p4amslave' and the program executes successfully 4) When submitting from node0 using the -p4pg option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the progroup file vulcan.hosts contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 14:54:48 hyprwulf1 xinetd[21912]: START: shell pid=22917 from=192.168.47.31 Sep 13 14:54:48 hyprwulf1 pam_rhosts_auth[22917]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 14:54:48 hyprwulf1 PAM_unix[22917]: (rsh) session opened for user jawhite by (uid=0) Sep 13 14:54:48 hyprwulf1 in.rshd[22918]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node2 contains : no events during this time frame the /var/log/messages file on node3 contains : Sep 13 14:54:48 hyprwulf3 xinetd[11167]: START: shell pid=11395 from=192.168.47.32 Sep 13 14:54:48 hyprwulf3 pam_rhosts_auth[11395]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 14:54:48 hyprwulf3 PAM_unix[11395]: (rsh) session opened for user jawhite by (uid=0) Sep 13 14:54:48 hyprwulf3 in.rshd[11396]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node2 34232 \-p4amslave' and the following error message is generated rm_10957: p4_error: rm_start: net_conn_to_listener failed: 34133 It appear that in case 4 even though I have requested node2 and node3 be used that a process is being rhsh'd to node1 instead. The log message from node3 indicates it expects to connect to node2 (partial proof that really did request node2) but since there is no process on node2 an error occurs. The information below is the output stream from case 4 after envoking the -echo and -mpiversion options ++ echo 'default_arch = LINUX' ++ echo 'default_device = ch_p4' ++ echo 'machine = ch_p4' ++ '[' 1 -le 5 ']' ++ arg=-mpiversion ++ shift ++ '[' -x /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args ']' ++ device_knows_arg=0 ++ . /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args ++ '[' 0 '!=' 0 ']' +++ echo -mpiversion +++ sed s/%a//g ++ proginstance=-mpiversion ++ '[' '' = '' -a '' = '' -a '!' -x -mpiversion ']' ++ fake_progname=-mpiversion ++ '[' 1 -le 4 ']' ++ arg=-nolocal ++ shift ++ nolocal=1 ++ '[' 1 -le 3 ']' ++ arg=-p4pg ++ shift ++ '[' -x /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args ']' ++ device_knows_arg=0 ++ . /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args +++ '[' 1 -gt 1 ']' +++ p4pgfile=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts +++ shift +++ leavePGFile=1 +++ device_knows_arg=1 ++ '[' 1 '!=' 0 ']' ++ continue ++ '[' 1 -le 1 ']' ++ arg=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ++ shift ++ '[' -x /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args ']' ++ device_knows_arg=0 ++ . /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4.args ++ '[' 0 '!=' 0 ']' +++ echo /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver +++ sed s/%a//g ++ proginstance=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ++ '[' '' = '' -a -mpiversion = '' -a '!' -x /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ']' ++ '[' '' = '' -a -x /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ']' ++ progname=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ++ '[' 1 -le 0 ']' ++ '[' 1 -le 0 ']' ++ '[' '' = '' -a /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver = '' ']' ++ '[' -n -mpiversion -a -n /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ']' ++ echo 'Unrecognized argument -mpiversion ignored.' ++ larch= ++ '[' -z '' ']' ++ larch=LINUX ++ '[' -n 'sed -e s@/tmp_mnt/@/@g' ']' +++ pwd +++ sed -e s@/tmp_mnt/@/@g ++ PWDtest=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ '[' '!' -d /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ']' ++ '[' -n /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ']' +++ echo /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases +++ sed -e s@/tmp_mnt/@/@g ++ PWDtest2=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ /bin/rm -f /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410 +++ eval 'echo test > /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410' ++ '[' '!' -s /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410 ']' ++ PWD=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ /bin/rm -f /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/.mpirtmp16410 ++ '[' -n /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ']' ++ PWD_TRIAL=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases +++ echo /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver +++ sed 's/\/.*//' ++ tail= ++ '[' '' = '' ']' ++ true ++ '[' '' = '' -a -x /usr/local/pkgs/mpich_1.2.1/bin/tarch ']' +++ /usr/local/pkgs/mpich_1.2.1/bin/tarch ++ arch=LINUX ++ '[' LINUX = IRIX64 -a '(' LINUX = IRIX -o LINUX = IRIXN32 ')' ']' ++ archlist=LINUX ++ '[' ch_p4 = '' ']' ++ '[' ch_p4 = p4 -o ch_p4 = execer -o ch_p4 = sgi_mp -o ch_p4 = ch_p4 -o ch_p4 = ch_p4-2 -o ch_p4 = globus -o ch_p4 = globus ']' ++ '[' '' = '' ']' ++ MPI_HOST= ++ '[' LINUX = ipsc860 ']' +++ hostname ++ MPI_HOST=hyprwulf00 ++ '[' hyprwulf00 = '' ']' ++ '[' /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases '!=' '' ']' +++ pwd +++ sed -e s%/tmp_mnt/%/%g ++ PWD_TRIAL=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ '[' '!' -d /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ']' ++ '[' 1 = 1 ']' ++ cnt=1 ++ '[' 0 -gt 1 ']' ++ echo 'running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors' + argsset=1 + mpirun_version= + mpirun_version=/usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4 + exitstat=1 + '[' -n /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4 ']' + '[' -x /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4 ']' + . /usr/local/pkgs/mpich_1.2.1/bin/mpirun.ch_p4 ++ exitstatus=1 ++ '[' -z 1 ']' ++ '[' -n '' ']' ++ '[' -n '' ']' ++ '[' '' = shared ']' ++ MPI_MAX_CLUSTER_SIZE=1 ++ . /usr/local/pkgs/mpich_1.2.1/bin/mpirun.pg +++ '[' 1 = '' ']' +++ '[' 0 = 0 ']' +++ narch=1 +++ arch1=LINUX +++ archlist1=LINUX +++ archlocal=LINUX +++ np1=1 +++ '[' 1 = 1 ']' +++ procFound=0 +++ machinelist= +++ archuselist= +++ nprocuselist= +++ curarch=1 +++ nolocalsave=1 +++ archlocal=LINUX +++ '[' 1 -le 1 ']' +++ eval 'arch=$arch1' ++++ arch=LINUX +++ eval 'archlist=$archlist1' ++++ archlist=LINUX +++ '[' -z LINUX ']' +++ eval 'np=$np1' ++++ np=1 +++ '[' -z 1 ']' +++ eval 'mFile=$machineFile1' ++++ mFile= +++ '[' -n '' -a -r '' ']' +++ '[' -z '' ']' +++ '[' ch_p4 = ibmspx -a -x /usr/local/bin/getjid ']' +++ machineDir=/usr/local/pkgs/mpich_1.2.1/share +++ machineFile=/usr/local/pkgs/mpich_1.2.1/share/machines.LINUX +++ '[' -r /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX ']' +++ break +++ '[' -z /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX -o '!' -s /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX -o '!' -r /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX ']' ++++ expr hyprwulf00 : '\([^\.]*\).*' +++ MPI_HOSTLeader=hyprwulf00 +++ '[' '' = yes ']' +++ '[' 1 = 0 -o 1 -gt 1 ']' +++ '[' 1 -gt 1 -o 1 = 1 ']' ++++ cat /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX ++++ sed -e '/^#/d' -e 's/#.*^//g' ++++ grep -v '^hyprwulf00\([ -\.:]\)' ++++ head -1 ++++ tr '\012' ' ' +++ machineavail=mpi1 +++ KeepHost=0 +++ loopcnt=0 +++ '[' -z 1 ']' +++ '[' 1 = 0 -a 1 -gt 1 ']' ++++ expr 1 - 0 +++ nleft=1 +++ '[' 1 -lt 0 ']' +++ '[' 0 -lt 1 ']' +++ nfound=0 +++ nprocmachine=1 ++++ expr mpi1 : '.*:\([0-9]*\)' +++ ntest= +++ '[' -n '' -a '' '!=' 0 ']' ++++ expr mpi1 : '\([^\.]*\).*' +++ machineNameLeader=mpi1 +++ '[' 1 = 1 -o 0 = 1 -o '(' mpi1 '!=' hyprwulf00 -a mpi1 '!=' hyprwulf00 ')' ']' +++ '[' 1 -gt 1 ']' +++ machinelist= mpi1 +++ archuselist= LINUX +++ nprocuselist= 1 ++++ expr 0 + 1 +++ procFound=1 ++++ expr 0 + 1 +++ nfound=1 ++++ expr 1 - 1 +++ nleft=0 +++ '[' 1 = 1 ']' +++ break ++++ expr 0 + 1 +++ loopcnt=1 +++ '[' 1 = 0 -a 1 -gt 1 ']' +++ '[' 1 -lt 1 ']' ++++ expr 1 + 1 +++ curarch=2 +++ procFound=0 +++ nolocal=1 +++ machineFile= +++ '[' 2 -le 1 ']' +++ nolocal=1 +++ '[' 1 '!=' 1 ']' +++ break ++ prognamemain=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver ++ '[' -z /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts ']' ++ /bin/sync ++ '[' '' = '' ']' ++ p4workdir=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ startpgm=/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases ++ '[' '' '!=' '' ']' ++ MPIRUN_DEVICE=ch_p4 ++ export MPIRUN_DEVICE ++ '[' 0 = 1 ']' ++ doitall=eval ++ '[' 1 = 1 ']' ++ '[' '' = /dev/null ']' ++ doitall=eval /usr/bin/rsh -n mpi1 ++ eval /usr/bin/rsh -n mpi1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases +++ /usr/bin/rsh -n mpi1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases default_arch = LINUX default_device = ch_p4 machine = ch_p4 Unrecognized argument -mpiversion ignored. running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors rm_11548: p4_error: rm_start: net_conn_to_listener failed: 34288 bm_list_23231: p4_error: interrupt SIGINT: 2 p0_23230: p4_error: interrupt SIGINT: 2 Broken pipe P4 procgroup file is /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts. The result from cat /usr/local/pkgs/mpich_1.2.1/share/machines.LINUX is mpi1 mpi2 mpi3 mpi4 mpi5 mpi6 mpi7 mpi8 mpi9 mpi10 mpi11 mpi12 mpi13 mpi14 mpi15 mpi16 mpi17 mpi18 however our /etc/hosts file contains entries to mpi1 node1 mpi2 node2 so using the p4pg file vulcan.hosts containing: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver or mpi2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver mpi3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver both produce the same result/error message. Removing mpi1 from the machines.LINUX file seems to fix the problem by shifting the master process to mpi2/node2. But I suspect that if I requested nodes 3 and 4 the error would happen again. I had hoped that using the p4pg file would have allowed me to pick any node as my master node. From vanw at cae.wisc.edu Mon Sep 17 08:35:40 2001 From: vanw at cae.wisc.edu (Kevin Van Workum) Date: Wed Nov 25 01:01:42 2009 Subject: Network RAM : Comm. issues In-Reply-To: <20010917100123.5884.qmail@web20310.mail.yahoo.com> Message-ID: Amber, Have you looked at the GAMMA project? Sound's like what they are doing. http://www.disi.unige.it/project/GAMMA/ -- Online Computing Resources at: www.tsunamictechnologies.com Kevin Van Workum Vice President and Co-Founder Tsunamic Technologies Inc. On Mon, 17 Sep 2001, Amber Palekar wrote: > Hi, > We are planning to implement Network RAM as our > syllabus project . Could someone suggest some > communication mechanisms for passing messages over the > ethernet ( which is what we are restricting ourselves > to) . We are initially restricting to using RAW > sockets only but are in a fix about what to use in the > subsequent prototypes. Should MPIs and VIAs be looked > at or could we develop our own protocol at the device > driver level ? ( as we're restricitng ourselves to > ethernet only .) Any other pointers for Network RAM > implemntation would be of great help ! > > Amber > > > __________________________________________________ > Terrorist Attacks on U.S. - How can you help? > Donate cash, emergency relief information > http://dailynews.yahoo.com/fc/US/Emergency_Information/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From wharman at altatech.com Fri Sep 21 08:18:50 2001 From: wharman at altatech.com (Bill Harman) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: <20010921103525.5662.qmail@fran.supanetwork.co.uk> Message-ID: Nick; Forget about using the P4 in a 1U configuration. The voltage regulator on the motherboard takes up more than 1U. You will need to work with a 2U footprint as a minimum. You can get the AMD MP in a 1U, but, beware of the heat issues, you will need above average air flow. Bill -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of n.gregory@garageflowers.co.uk Sent: Friday, September 21, 2001 4:35 AM To: beowulf@beowulf.org Subject: CPU Choices (sorry if this gets posted twice) Hello, I am doing some research into the configuration of a 32 1U node Beowulf cluster and have a question regarding CPU configuration. The current choice is between Intel P4s as Itaniums seem a little bleeding edge at the moment, or the latest AMD chips. AMD seem to be getting a impressive performance for the price, but I?m a little concerned about the lack of mature multiprocessor chipsets and their heat issues. Intel on the other-hand have the MP chipsets but seem to be falling down with current lack of (non-commercial ) complier that support MMX and SSE, and the whole issue of Rambus Vs DDR memory. I would be grateful for any insight into a choice of CPU and its configuration in terms of price/performance/expandability, or any other factor I should be considering. Thanks Nick _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- A non-text attachment was scrubbed... Name: William Harman.vcf Type: text/x-vcard Size: 464 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010921/acca75ab/WilliamHarman.vcf From eennmg at electeng.leeds.ac.uk Thu Sep 20 04:30:04 2001 From: eennmg at electeng.leeds.ac.uk (Nick Gregory) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices Message-ID: <3BA9D327.2066.5127FF4@localhost> Hello, I am doing some research into the configuration of a 32 1U node Beowulf cluster and have a question regarding CPU configuration. The current choice is between Intel P4s as Itaniums seem a little bleeding edge at the moment, or the latest AMD chips. AMD seem to be getting a impressive performance for the price, but I抦 a little concerned about the lack of mature multiprocessor chipsets and their heat issues. Intel on the other-hand have the MP chipsets but seem to be falling down with current lack of (non-commercial ) complier that support MMX and SSE, and the whole issue of Rambus Vs DDR memory. I would be grateful for any insight into a choice of CPU and its configuration in terms of price/performance/expandability, or any other factor I should be considering. Thanks Nick ________________________________ _______----^^----_______ (========================( || )-==~~~~ ~~~~=== """/"""""""/"""""""""""""""";""" """"-------__________-------"""" (_ '-------======~~~ =' EENNMG@ELECTENG.LEEDS.AC.UK """""""""""\_._________________,' NCC 1701-D U.S.S ENTERPRISE ----------------------------- From j.a.white at larc.nasa.gov Thu Sep 13 13:25:02 2001 From: j.a.white at larc.nasa.gov (Jeffery A. White) Date: Wed Nov 25 01:01:42 2009 Subject: mpich bug? Message-ID: <3BA1161E.BF235E80@larc.nasa.gov> Dear list, My previous message had some incorrect information. My apologies. I have investigated further looking in the /var/log/message file and have found the following. My cluster configuration is as follows: node0 : machine : Dual processor Supermicro Super 370DLE cpu : 1 GHz Pentium 3 O.S. : Redhat Linux 7.1 kernel : 2.4.2-2smp mpich : 1.2.1 nodes1->18 : machine : Compaq xp1000 cpu : 667 MHz DEC alpha 21264 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 nodes 19->34 : machine : Microway Screamer cpu : 667 MHz DEC alpha 21164 O.S. : Redhat Linux 7.0 kernel : 2.4.2 mpich : 1.2.1 The heterogeneous nature of the machine has made me migrate from using the -machinefile option to the -p4pg option. I have been trying to get a 2 processor job to run while submitting the mpirun command from node0 (-nolocal is specified) and using either nodes 1 and 2 or nodes 2 and 3. If I use the -machinefile approach I am able to run on any homogeneous combination of nodes. However, if I use the -p4pg approach I have not been able to run unless my mpi master node is node1. As long as node1 is the mpi master node then I can use any one of nodes 2 through 18 as the 2nd processor. THe following 4 runs illustrates what I have gotten to work as well as what doesn't work (and the subsequent error message). Runs 1, 2 and 3 worked and run 4 failed. 1) When submitting from node0 using the -machinefile option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver the machinefile file vulcan.hosts contains: node1 node2 the PIXXXX file created contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v option reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI10802 the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 15:49:32 hyprwulf1 xinetd[21912]: START: shell pid=23013 from=192.168.47.31 Sep 13 15:49:32 hyprwulf1 pam_rhosts_auth[23013]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:49:32 hyprwulf1 PAM_unix[23013]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:49:32 hyprwulf1 in.rshd[23014]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI15564 -p4wd /home0/jawhite/Vul 2) When submitting from node0 using the -p4pg option to run on nodes 1 and 2 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the p4pg file vulcan.hosts contains: node1 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node2 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver and the -v options reports running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 15:41:46 hyprwulf1 xinetd[21912]: START: shell pid=22978 from=192.168.47.31 Sep 13 15:41:46 hyprwulf1 pam_rhosts_auth[22978]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:41:46 hyprwulf1 PAM_unix[22978]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:41:46 hyprwulf1 in.rshd[22979]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node2 contains : Sep 13 15:41:46 hyprwulf2 xinetd[13163]: START: shell pid=13472 from=192.168.47.32 Sep 13 15:41:46 hyprwulf2 pam_rhosts_auth[13472]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 15:41:46 hyprwulf2 PAM_unix[13472]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:41:46 hyprwulf2 in.rshd[13473]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node1 34240 \-p4amslave' and the program executes successfullycan/DEC_21264/Ver_4.3/Sample_cases' 3) When submitting from node0 using the -machinefile option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -keep_pg -nolocal -np 2 -machinefile vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the machinefile file vulcan.hosts contains: node2 node3 the PIXXXX file created contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver on 2 LINUX ch_p4 processors Created /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI11592 the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : no events during this time frame the /var/log/messages file on node2 contains : Sep 13 15:35:29 hyprwulf2 xinetd[13163]: START: shell pid=13451 from=192.168.47.31 Sep 13 15:35:29 hyprwulf2 pam_rhosts_auth[13451]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 15:35:29 hyprwulf2 PAM_unix[13451]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:35:29 hyprwulf2 in.rshd[13452]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/PI15167 -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node3 contains : Sep 13 15:35:29 hyprwulf3 xinetd[11167]: START: shell pid=11435 from=192.168.47.33 Sep 13 15:35:29 hyprwulf3 pam_rhosts_auth[11435]: allowed to jawhite@hyprwulf-boot2.hapb as jawhite Sep 13 15:35:29 hyprwulf3 PAM_unix[11435]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:35:29 hyprwulf3 in.rshd[11436]: jawhite@hyprwulf-boot2.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node2 33713 \-p4amslave' and the program executes successfully 4) When submitting from node0 using the -p4pg option to run on nodes 2 and 3 using mpirun configured as: mpirun -v -nolocal -p4pg vulcan.hosts /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver the p4pg file vulcan.hosts contains: node2 0 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node3 1 /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver and the -v options report running /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver on 1 LINUX ch_p4 processors the /var/log/messages file on node0 contains : no events during this time frame the /var/log/messages file on node1 contains : Sep 13 14:54:48 hyprwulf1 xinetd[21912]: START: shell pid=22917 from=192.168.47.31 Sep 13 14:54:48 hyprwulf1 pam_rhosts_auth[22917]: allowed to jawhite@hyprwulf-boot0.hapb as jawhite Sep 13 14:54:48 hyprwulf1 PAM_unix[22917]: (rsh) session opened for user jawhite by (uid=0) Sep 13 14:54:48 hyprwulf1 in.rshd[22918]: jawhite@hyprwulf-boot0.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver -p4pg /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/vulcan.hosts -p4wd /home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases' the /var/log/messages file on node2 contains : no events during this time frame the /var/log/messages file on node3 contains : Sep 13 14:54:48 hyprwulf3 xinetd[11167]: START: shell pid=11395 from=192.168.47.32 Sep 13 14:54:48 hyprwulf3 pam_rhosts_auth[11395]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 14:54:48 hyprwulf3 PAM_unix[11395]: (rsh) session opened for user jawhite by (uid=0) Sep 13 14:54:48 hyprwulf3 in.rshd[11396]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Executable/VULCAN_solver node2 34232 \-p4amslave' and the following error message is generated rm_10957: p4_error: rm_start: net_conn_to_listener failed: 34133 It appear that in case 4 even though I have requested node2 and node3 be used that a process is being rhsh'd to node1 instead. The log message from node3 indicates it expects to connect to node2 (partial proof that really did request node2) but since there is no process on node2 an error occurs. Is this a mpich bug or am I trying to use mpich incorrectly? Thanks for any and all help! Jeff the /var/log/messages file on node2 contains : Sep 13 15:49:32 hyprwulf2 xinetd[13163]: START: shell pid=13490 from=192.168.47.32 Sep 13 15:49:32 hyprwulf2 pam_rhosts_auth[13490]: allowed to jawhite@hyprwulf-boot1.hapb as jawhite Sep 13 15:49:32 hyprwulf2 PAM_unix[13490]: (rsh) session opened for user jawhite by (uid=0) Sep 13 15:49:32 hyprwulf2 in.rshd[13491]: jawhite@hyprwulf-boot1.hapb as jawhite: cmd='/home0/jawhite/Vulcan/DEC_21264/Ver_4.3/Sample_cases/VULCAN_solver node1 34248 \-p4amslave' and the program executes successfully -- Jeffery A. White email : j.a.white@larc.nasa.gov Phone : (757) 864-6882 ; Fax : (757) 864-6243 URL : http://hapb-www.larc.nasa.gov/~jawhite/ From dean at ucdavis.edu Fri Sep 21 14:42:20 2001 From: dean at ucdavis.edu (Dean Lavelle) Date: Wed Nov 25 01:01:42 2009 Subject: Scyld and mpi FASTA Makefile Problems Message-ID: <3BABB43C.FA90A7FC@ucdavis.edu> Dear Users, I figured out my problem in the Makefile for mpi enabled FASTA. It was a result of my ignorance. FASTA is running very well under Scyld at the present. Dean Lavelle From mxtd at lsi.usp.br Wed Sep 26 09:39:35 2001 From: mxtd at lsi.usp.br (mxtd@lsi.usp.br) Date: Wed Nov 25 01:01:42 2009 Subject: pbs pro and debian Message-ID: <200109261644.MAA11265@blueraja.scyld.com> Hi, Has anyone experiencie to install PBS Pro ( free University licenses) with Debian, using the comand "alien"?? Any information would be greatly appreciated, Thanks Martha From Eugene.Leitl at lrz.uni-muenchen.de Thu Sep 27 05:32:11 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:42 2009 Subject: ALINKA Linux Clustering Letter, September 26th. 2001 (fwd) Message-ID: -- Eugen* Leitl leitl ______________________________________________________________ ICBMTO: N48 04'14.8'' E11 36'41.2'' http://www.lrz.de/~ui22204 57F9CFD3: ED90 0433 EB74 E4A9 537F CFF5 86E7 629B 57F9 CFD3 ---------- Forwarded message ---------- Date: Thu, 27 Sep 2001 14:25:04 +0200 From: Antoine Brenner To: clustering@alinka.com Subject: ALINKA Linux Clustering Letter, September 26th. 2001 Resent-Date: Thu, 27 Sep 2001 14:25:08 +0200 (CEST) Resent-From: clustering@alinka.com The ALINKA Linux Clustering Letter, Wednesday, September the 26th. 2001 Dear readers, I am happy to send you this week's edition of clustering@alinka.com clustering@alinka.com is a free weekly e-mail newsletter on linux clustering. It provides a summary of the weekly activity in mailing-lists relative to linux clustering (such as beowulf, linux virtual server or linux-ha) and general clustering news. For more information about ALINKA, see: http://www.alinka.com News from the High Performance world, by Dr Laurent Gatineau (lgatineau@alinka.com) ====================================================================== Tips and tricks from the Beowulf mailing list ======== * Delcides Fl?vio de Sousa Jr. [m1] is wondering which kind of configuration he needs to start its beowulf cluster. As Greg Lindahl [m2] try to answer to its questions, it depends on applications. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001304.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001306.html * Shane Canon [m1] has problem to configure linpack, Greg Lindahl [m2] and Patrick Geoffray [m3] explained how to set the problem size. [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001305.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001311.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001313.html * Jalal Haddad [m1] is looking for cheap dual P4 motherboard, Joel Jaeggli [m2] answered that actually the only available chipset is the intel 860 which is expensive. Ron Chen [m3] gave two links [1,2] where Dual P4 and Dual Athlon are compared, showing that performances depends on the compiler and the code [m4]. About PCI bus speed, Greg Lindahl [m5] posted a link [3] where some comparison have been done. [1] http://www.aceshardware.com/Spades/read.php?article_id=40000189 [2] http://www.aceshardware.com/Spades/read.php?article_id=45000195 [3] http://conservativecomputer.com/myrinet/perf.html [m1] http://www.beowulf.org/pipermail/beowulf/2001-September/001288.html [m2] http://www.beowulf.org/pipermail/beowulf/2001-September/001289.html [m3] http://www.beowulf.org/pipermail/beowulf/2001-September/001295.html [m4] http://www.beowulf.org/pipermail/beowulf/2001-September/001302.html [m5] http://www.beowulf.org/pipermail/beowulf/2001-September/001308.html News from MOSIX mailing list by Benoit des Ligneris ===================================================================== * Jame Troup ask for support [m1] in order to use Mosix in the Linux Terminal Server Project [1]. An old debate about development of mosix for the ``last'' linux kernel arise. Only last versions of Mosix are developped and patches are not ported to old version. As stated by Thomas Webb [m2] it's not good for a production environment ! [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0090.html [m2] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0095.html [1] http://www.ltsp.org/ * Discussion lauched by John Strange about GFS + Mosix [m1]. One of the problem was mexecd that did not work on recent Mosix kernels. Paul Mundt told [m2] that he has almost all rewritten for recent mosix. Then Jacob Gorn Hansen [m3] says that CODA/intermezzo will surely have better performance. [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0102.html [m2] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0102.html [m3] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0121.html * Tim Chipman asked [m1] for a simple queue management for mosix cluster. John Strange [m2] suggested Gnu Queue [1] or Sun Grid [2] and lots of other posters gave advices for a "home made" solution but nothing concrete and really mosix-specific. [m1] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0103.html [m2] http://www.mosix.cs.huji.ac.il/month-arch/2001/Sep/0104.html [1] http://bioinfo.mbb.yale.edu/~wkrebs/queue.html [2] http://www.sun.com/software/gridware/ News from the High Availability world ====================================================================== DRBD devel by Guillaume GIMENEZ (ggimenez@alinka.com) ======== * Martin Bene posted a patch [m1] to solve some troubles. it applies against drbd 0.61-pre2 and is useful for redhat users. [m1] http://www.geocrawler.com/lists/3/SourceForge/3756/0/6698522/ Linux-HA dev by Rached Ben Mustapha (rached@alinka.com) ======== * Alan Robertson proposed [m1] that dependencies on resources be implemented in future versions of linux-ha software. [m1] http://marc.theaimsgroup.com/?l=linux-ha-dev&m=99980244716915&w=2 Linux-HA by Rached Ben Mustapha (rached@alinka.com) ======== * Jean-Yves Bouet asked [m1] what was the role of job of each thread in heartbeat. Alan Robertson replied [m2] this: master control process: read requests to send messages from a FIFO, and sends them to the master status proces Master status process, tracks status of nodes, initiates takeovers, handles heartbeat protocol, etc. Each medium declared in the ha.cf file has a read and a write process. [m1] http://marc.theaimsgroup.com/?l=linux-ha&m=100141586826041&w=2 [m2] http://marc.theaimsgroup.com/?l=linux-ha&m=100142758208102&w=2 LVS by Rached Ben Mustapha (rached@alinka.com) ======== * malalon@poczta.onet.pl asked [m1] what was the problem with testing a NAT-based service, with rr scheduling. Joseph Mack replied [m2] that the recommended method to test lvs was with telnet. * Dmitry Dan Brovkovich asked [m3] why trying to telnet to the VIP resulted in an inactive connection, and the client not receiving anything. Joseph Mack replied [m4] that the realserver probably didn't have its root set properly to the client. [m1] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100141845402872&w=2 [m2] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100141983407703&w=2 [m3] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100108131816446&w=2 [m4] http://marc.theaimsgroup.com/?l=linux-virtual-server&m=100115505406875&w=2 News on the Filesystems front ====================================================================== Coda by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Jan Harkes posted a patch specific to Coda kernel code on Linux that makes Coda client be able to use other filesystem than ext2 [1m]. * Phil Nelson gave information about Coda on Solaris 8 [2m]. * Jan Harkes posted comments about Coda and Intermezzo [3m]. [1m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0808.html [2m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0807.html [3m] http://www.coda.cs.cmu.edu/maillists/codalist-2001/0811.html Intermezzo by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Shirish Phatak posted status about Reiserfs support with Intermezzo [1m]. [1m] http://www.geocrawler.com/lists/3/SourceForge/8077/0/6666780/ JFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * Steve Best announced JFS 1.0.5 [1m]. * Dave Kleikamp posted a patch to the latest release of JFS [2m]. * Anthony Liu confirmed that this drop won't compile with 2.4.10 kernel because of major VM changes [3m]. [1m] http://oss.software.ibm.com/pipermail/jfs-discussion/2001-September/000591.html [2m] http://oss.software.ibm.com/pipermail/jfs-discussion/2001-September/000593.html [3m] http://oss.software.ibm.com/pipermail/jfs-discussion/2001-September/000602.html XFS by Ludovic Ishiomin (lishiomin@alinka.com) ======== * There was a long thread about mainstream kernel inclusion of XFS [1m]. [1m] http://oss.sgi.com/projects/xfs/mail_archive/0109/msg00486.html and the followings News on other cluster related topics ====================================================================== Linux Cluster by Bruno Muller (bmuller@alinka.com) ======== * Lars Marowsky-Bree announced the Linux Kongress 2001 Clustering Workshop[m1]. [m1] http://mail.nl.linux.org/linux-cluster/2001-09/msg00011.html LVM by Bruno Muller (bmuller@alinka.com) ===== * AJ Lewis posted a patch to configure LVM 1.0.1-rc2 [m1]. [m1] http://lists.sistina.com/pipermail/linux-lvm/2001-September/008936.html ====================================================================== To subscribe to the list, send e-mail to clustering@alinka.com from the address you wish to subscribe, with the word "subscribe" in the subject. To unsubscribe from the list, send e-mail to clustering@alinka.com from the address you wish to unsubscribe from, with the word "unsubscribe" in the subject. Alinka is the editor of the ALINKA ORANGES and ALINKA RAISIN administration software for Linux clusters. (Web site: http://www.alinka.com) From rjones at merl.com Thu Sep 27 05:40:21 2001 From: rjones at merl.com (Ray Jones) Date: Wed Nov 25 01:01:42 2009 Subject: linpack/hpl Message-ID: <1dk7yk6c3u.fsf@jitter.merl.com> Greg Lindahl writes: > On Tue, Sep 25, 2001 at 08:22:41AM -0700, Shane Canon wrote: > > > We are adjusting the problem > > size and block size primarily. > > You need to set the problem size as large as possible. If it were too > small, you'd see exactly what you report, namely that above a certain > # of nodes, it begins to suck. Linpack is also be very sensitive to P and Q, the dimensions of the virtual grid that your nodes are organized into for the computation. In my experience (limited to 1 cluster), more square grids are best. Ray From roger at ERC.MsState.Edu Thu Sep 27 06:33:29 2001 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:01:42 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: Message-ID: On Tue, 18 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > Sorry Scott, I sent you a wrong reference. The actual link is > http://www.hpl.hp.com/techreports/2001/HPL-2001-206.html. Enjoy, -bruno I'd be REALLY interested in hearing how you justify the following statement in the paper: "Being the first ones to enter the TOP500 using only mainstream hardware (standard PCs, standard Ethernet connectivity)...". _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From cblack at eragen.com Thu Sep 27 07:06:31 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:42 2009 Subject: Linux cluster in commercial office? In-Reply-To: <706F0A000C79D5118BB10004ACC5603D15FDE9@acad-hq-ex-1.woburn.asic-alliance.com>; from korsedal@zaiqtech.com on Tue, Sep 18, 2001 at 05:39:27PM -0400 References: <706F0A000C79D5118BB10004ACC5603D15FDE9@acad-hq-ex-1.woburn.asic-alliance.com> Message-ID: <20010927100631.A23322@getafix.EraGen.com> On Tue, Sep 18, 2001 at 05:39:27PM -0400, Korsedal, Brian wrote: [stuff deleted] > look into clustering our PC's so that we can have an extra high performance > server. Each PC would still have to function as a terminal (office apps and > the ability to run processes on the unix machines) but use the free CPU time > to run simulations. Is there any implementation of clustering software for > this? If there isn't, it would be an interesting thing to look into, there > are many offices with computers that are barely used. My CPU sits idle 95% > of the time and it would be great to caputer the extra CPU cycles. Does > anybody have any thoughts about this or know how to make it happen? You should look into sun gridengine (http://www.sun.com/software/gridware/) which has a nice mechanism for detecting idle time and running jobs when interactive idle time reaches a certain point. You can even schedule your workstations to become available for compute jobs after a certain time using their calendar function. They even have an appnote on how to do what you want to do (http://supportforum.sun.com/gridengine/appnote_idle.html). We have recently (last few months) moved from OpenPBS to SGE and I must say SGE is quite nice. It is much more stable for us with our large job arrays. Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20010927/b2e7b960/attachment.bin From online_hpc at tticluster.com Thu Sep 27 08:04:21 2001 From: online_hpc at tticluster.com (Kevin V.) Date: Wed Nov 25 01:01:42 2009 Subject: Linux cluster in commercial office? Message-ID: Brian wrote: > Each PC would still have to function as a terminal (office apps and the > ability to run processes on the unix machines) but use the free CPU time > to run simulations. Is there any implementation of clustering software > for this? Take a look at Condor. It does exactly that. We've used it on our linux and MSWindows machines together. If you install Cygwin on your windows machines, you can run your linux apps on your either your linux or windows machines during idle cpu cycles. Here are the links. http://www.cs.wisc.edu/condor/ http://www.cygwin.com/ Also if you need extra highspeed CPU's where you only pay per hour of usage, checkout http://www.tsunamictechnologies.com ********************************* Online and On Demand HPC Pay per Use CPU Time www.tsunamictechnologies.com ********************************* From rob.myers at gtri.gatech.edu Thu Sep 27 10:27:11 2001 From: rob.myers at gtri.gatech.edu (Rob Myers) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: <3BA9D327.2066.5127FF4@localhost> References: <3BA9D327.2066.5127FF4@localhost> Message-ID: <1001611631.14792.111.camel@ransom> On Thu, 2001-09-20 at 07:30, Nick Gregory wrote: > Hello, > I am doing some research into the configuration of a 32 1U node Beowulf cluster > and have a question regarding CPU configuration. > > The current choice is between Intel P4s as Itaniums seem a little bleeding edge at > the moment, or the latest AMD chips. you should not even consider itaniums unless you absolutely must have a 64bit platform. > AMD seem to be getting a impressive performance for the price, but I'm a little > concerned about the lack of mature multiprocessor chipsets and their heat issues. i have 2 tyan thunder k7 based systems (amd-760mp chipset) which have been rock solid after upgrading to the latest bios. they do much better clock for clock than my dual p3 systems. of course they are quite warm, and i don't think i would have the guts to stuff them in a 1U case. :) > Intel on the other-hand have the MP chipsets but seem to be falling down with > current lack of (non-commercial ) complier that support MMX and SSE, and the > whole issue of Rambus Vs DDR memory. whatever platform you choose you should really invest in a good compiler. my experience is that commercial compilers can improve speeds anywhere from 20-50%. which means they are well worth the money in most cases. i know portland groups latest supports SSE2 (p4). also you can download intel's compiler for p4's for linux for a free trial period. (http://developer.intel.com/software/products/compilers/f50/linux/) > I would be grateful for any insight into a choice of CPU and its configuration in > terms of price/performance/expandability, or any other factor I should be > considering. > > Thanks > Nick keep in mind that cooling will be an issue for any 32 nodes you assemble, no matter what chips you choose. hope that helps rob. > > > ________________________________ _______----^^----_______ > (========================( || )-==~~~~ ~~~~=== > """/"""""""/"""""""""""""""";""" """"-------__________-------"""" > (_ '-------======~~~ =' EENNMG@ELECTENG.LEEDS.AC.UK > """""""""""\_._________________,' NCC 1701-D U.S.S ENTERPRISE > > ----------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eric at fnordsystems.com Thu Sep 27 10:53:22 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:42 2009 Subject: CPU Choices In-Reply-To: Message-ID: Hi Bill, Nick: I second what Bill says about 1Us... there is a Dual AthlonMP 1.2Ghz 1U platform available, but it *just barely* manages to cool itself. If you stack more than 4 of them, you'll have a meltdown. So if your heart is set on 1Us, you'll be going with dual P3/1GHz boxes. Certainly not slow at all, my company sells a lot of them in various configurations. We've had good luck with the Tyan S2462 Dual Athlon board, and socketted P4/Xeon boards in 2U chassis. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Bill Harman Sent: Friday, September 21, 2001 8:19 AM To: n.gregory@garageflowers.co.uk; beowulf@beowulf.org Subject: RE: CPU Choices Nick; Forget about using the P4 in a 1U configuration. The voltage regulator on the motherboard takes up more than 1U. You will need to work with a 2U footprint as a minimum. You can get the AMD MP in a 1U, but, beware of the heat issues, you will need above average air flow. Bill -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of n.gregory@garageflowers.co.uk Sent: Friday, September 21, 2001 4:35 AM To: beowulf@beowulf.org Subject: CPU Choices (sorry if this gets posted twice) Hello, I am doing some research into the configuration of a 32 1U node Beowulf cluster and have a question regarding CPU configuration. The current choice is between Intel P4s as Itaniums seem a little bleeding edge at the moment, or the latest AMD chips. AMD seem to be getting a impressive performance for the price, but I?m a little concerned about the lack of mature multiprocessor chipsets and their heat issues. Intel on the other-hand have the MP chipsets but seem to be falling down with current lack of (non-commercial ) complier that support MMX and SSE, and the whole issue of Rambus Vs DDR memory. I would be grateful for any insight into a choice of CPU and its configuration in terms of price/performance/expandability, or any other factor I should be considering. Thanks Nick _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From roger at ERC.MsState.Edu Thu Sep 27 11:38:17 2001 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:01:42 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: Message-ID: Bruno, Hmmm, I thought that some of the others were purely ethernet based, but after doing some quick research, I guess I'll stand corrected. If you intend "mainstream" to mean only single-processor desktop-type machines, then I'll completely concede your point. As of the last Top 500 list, our cluster was listed as 158th, and is entirely ethernet based, using a single 100Mb/s interconnect to each node, and GigE interconnects between switches. However, our nodes are 1U with dual processors, so I guess maybe it doesn't fit the definition of mainstream that you stated. It doesn't predate the I-cluster, but it does at least tie with it. :-) -Roger On Thu, 27 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > Hi Roger, > > What we mean by "mainstream" is that these could be your grandma's machine, > unmodified except for the Software. > Actually I-Cluster *is* the first cluster of this type to enter the TOP500. > Some PC-based clusters are already registered there of course, but they > cannot be called "mainstream": Most require specific (non-mainstream at > all!) connectivity such as Myrinet, SCI, Quadrix... Some are based on PCs > equipped with several LAN boards (not mainstream either). > If you restrict to off-the-shelf monoprocessor (excluding non-mainstream > Alpha, MIPS and such) interconnected through standard Ether100, no cluster > ever entered the TOP500 list. And I-Cluster is still the only one to be > there. Let me know if you think otherwise. > > Regards, -bruno > _____________________________________________ > Bruno RICHARD - Research Program Manager > HP Laboratories > 38053 Grenoble Cedex 9 - FRANCE > Phone: +33 (4) 76 14 15 38 > bruno_richard@hp.com > > > -----Original Message----- > From: Roger L. Smith [mailto:roger@ERC.MsState.Edu] > Sent: Thursday, September 27, 2001 15:33 > To: RICHARD,BRUNO (HP-France,ex1) > Cc: 'Scott Shealy'; beowulf@beowulf.org > Subject: RE: Paper showing Linpack scalability of mainstream clusters > > > On Tue, 18 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > > > Sorry Scott, I sent you a wrong reference. The actual link is > > http://www.hpl.hp.com/techreports/2001/HPL-2001-206.html. Enjoy, > > -bruno > > > I'd be REALLY interested in hearing how you justify the following statement > in the paper: > > "Being the first ones to enter the TOP500 using only mainstream hardware > (standard PCs, standard Ethernet connectivity)...". > > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ > | Roger L. Smith Phone: 662-325-3625 | > | Systems Administrator FAX: 662-325-7692 | > | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | > | Mississippi State University | > |_______________________Engineering Research > |Center_______________________| > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From sshealy at asgnet.psc.sc.edu Thu Sep 27 12:14:25 2001 From: sshealy at asgnet.psc.sc.edu (Scott Shealy) Date: Wed Nov 25 01:01:42 2009 Subject: Paper showing Linpack scalability of mainstream clusters References: Message-ID: <00a601c14788$a2e51110$3a5d893f@machavelli> From becker at scyld.com Thu Sep 27 13:49:46 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:42 2009 Subject: Scyld: bad scaling In-Reply-To: Message-ID: On Wed, 26 Sep 2001, Ivan Rossi wrote: > recently i rebuilt our tiny 10 CPUs cluster using Scyld. Before i have been > using RedHat 6.2 + LAM MPI. And i like it, it is easier to mantain. > Unfortunately, after the rebuild, I found a marked performance degradation > with respect to the former installation. In particular i found a > disappointingly bad scaling for the application we use most, the MD program > Gromacs 2.0. > > Now scaling goes almost exactly as the square root of the number of nodes, > that is it takes 4 CPUs to double performance and nine CPUs to triple them. > > Since no hardware has been changed, in my opinion it must be either the > pre-compiled Scyld kernel, bpsh or Scyld MPICH. So i hope that some fine > tuning of them should solve the problem. There isn't an inherent problem with Scyld and scaling. (Obviously we wouldn't have released a product with a specific problem.) Some things you should initially check Verify that you are not seeing network errors check /proc/net/dev for non-zero error counts Verify that you are using the SMP kernel CPU1 should show some activity with beostat. Verify that jobs are being places on all nodes beostat again. For reference, the Scyld releases up through "-8" use MPICH as the base. We modified the process initiation code to work with the Scyld system (it's now much faster to start jobs), but not the code of the run-time e.g. send/receive calls. It's very easy to use LAM on Scyld, however that's beyond the limit of our commercial support. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From becker at scyld.com Thu Sep 27 14:00:09 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:42 2009 Subject: GigE fiber NIC In-Reply-To: Message-ID: On Wed, 26 Sep 2001, Joel Jaeggli wrote: > On Wed, 26 Sep 2001, Martin Siegert wrote: > > > - 3Com 3c985B-SX > > - Netgear GA620 > > - Syskonnect SK-9843 > > - National Semiconductor DP83820 > > > > Intel makes GigE cards as well, but the driver is not distributed with > > the kernel. Thus I would rely on Intel to have a driver available when > > I want to upgrade the kernel. We have an alternate Intel driver named intel-gige.c, however its continued evolution does depend on development information from Intel. > I'd probably go with the sysconnect for performance/support followed by > the ga620 which is the cheapest of the acenic (the differences > between them aren't substantive)based cards. > > the nat semi chipset has the distinction of being really cheap (the > 10/100/1000 copper cards from dlink are $89. but the chipset on has an 8k > transmit and 32k recieve buffer which makes it not the most desireable for > a high-end gig card... The primary performance limitations of the NatSemi chips are Must receive into word-aligned Rx buffers, which misaligns the IP header and payload. No UDP/TCP/IP checksum support, or other network work "offload". The 8KB/32KB on-chip FIFOs don't limit the chip in most systems, nor does the limit of a 1KB burst have a big impact. (Some chips, such as the 3c905CX, can burst a whole 1514 byte packet.) Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From roger at ERC.MsState.Edu Thu Sep 27 15:16:52 2001 From: roger at ERC.MsState.Edu (Roger L. Smith) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: <00a601c14788$a2e51110$3a5d893f@machavelli> Message-ID: On Thu, 27 Sep 2001, Scott Shealy wrote: > >From reading the paper I was particularly impressed with the 50W power > consumption of each node. We have an Athlon Cluster(1.33 GHZ) and those > things are very power hungry. We underestimated how much they draw and had > to install an extra circuit in addition to the ones we had planned for... > > Perhaps there is yet another efficiency measure - MFlops/kwatt or something > like that. > > So for the I-cluster that would have been > > 75 Gflops/(210 * 50 W/1000) = 7.14GFlops/kW > > Roger, out of curiosity could you do a similar calculation? I checked our UPS before and after powering up the first 128 nodes (256 PIII 1GHz, 1GB, 20GB IDE), and came up with around 20kVA of power draw. I haven't converted that to Wattage (they are running at 208V, if anyone wants to do the math). _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________| From sjarczyk at wist.net.pl Thu Sep 27 15:30:43 2001 From: sjarczyk at wist.net.pl (Sergiusz Jarczyk) Date: Wed Nov 25 01:01:43 2009 Subject: Linux cluster in commercial office? In-Reply-To: Message-ID: Guys If I understand Brian, what he want to do is to use spare time of his company's workstations for ASIC and FCPGA design. Solutions like Condor are good, if you're going to use spice or verilog extensions for emacs ;-), since you have full source code. But if you want to use tools like Mentor Graphics, Cadence or Actel, the only way to do that is to use LFS from Platform Computing or EnFusion from TurboLinux. I don't know SGE, so I can't tell if it works. Correct me if I'm wrong. Best regards, Sergiusz Jarczyk From carlos at megatonmonkey.net Thu Sep 27 18:24:18 2001 From: carlos at megatonmonkey.net (Carlos O'Donell Jr.) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: ; from roger@ERC.MsState.Edu on Thu, Sep 27, 2001 at 05:16:52PM -0500 References: <00a601c14788$a2e51110$3a5d893f@machavelli> Message-ID: <20010927212418.G2611@megatonmonkey.net> > > So for the I-cluster that would have been > > > > 75 Gflops/(210 * 50 W/1000) = 7.14GFlops/kW > > > > Roger, out of curiosity could you do a similar calculation? > > > I checked our UPS before and after powering up the first 128 nodes (256 > PIII 1GHz, 1GB, 20GB IDE), and came up with around 20kVA of power draw. > I haven't converted that to Wattage (they are running at 208V, > if anyone wants to do the math). > > kVA's are simply apparent power rather than average power or Watt's. They should be almost the same thing if the power factor of the PC supply is relatively "good." P = Vrms*Irms*cos(theta_V - theta_I) Where (theta_V - theta_I) is the phase difference between voltage and curent. The cosine is dimensionless and thus it really is still watt's. And if the supply is _good_enough_ then it will approach 1 and give you the worst case GFlops/kW ;) So XX GFlops/(20,000W/1000) = XX GFlops / kW Cheers, Carlos. From lindahl at conservativecomputer.com Thu Sep 27 20:29:41 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: <20010927212418.G2611@megatonmonkey.net>; from carlos@megatonmonkey.net on Thu, Sep 27, 2001 at 09:24:18PM -0400 References: <00a601c14788$a2e51110$3a5d893f@machavelli> <20010927212418.G2611@megatonmonkey.net> Message-ID: <20010927232941.A10325@wumpus.foo> On Thu, Sep 27, 2001 at 09:24:18PM -0400, Carlos O'Donell Jr. wrote: > And if the supply is _good_enough_ then it will approach 1 and give > you the worst case GFlops/kW ;) I don't believe that modern power supplies actually have an "efficiency factor" that close to 1. I'll have to ask my business guy (David Rhoades), he used to specialize in weird trivia like building machine rooms, something that most computer people tend to avoid... greg From jacsib at lutecium.org Fri Sep 28 06:39:53 2001 From: jacsib at lutecium.org (Jacques B. Siboni) Date: Wed Nov 25 01:01:43 2009 Subject: NIS server core dump References: <5.1.0.14.2.20010927232147.035127b0@pop.mtlug.org> Message-ID: <3BB47DA9.B3A6E974@lutecium.org> Hi all, I use ypserv on the Linux 7.0 box to serv the NIS clients on the slave machines. ypserv version is ypserv-1.3.12 (previous versions act the same way). ypserv starts ok. but as soon as I try to access to the service I get a segmentation fault and core dump. A simple command as rpcinfo -u localhost ypserv generates the problem. The conf files are very simple. The funny thing is that it seems to serv ok but crashes then. For instance the command 'ypbind' produces to ypserv -d: [root@lutecium /root]# ypserv -d [Welcome to the NYS YP Server, version 1.3.12 (with securenets)] Find securenet: 255.0.0.0 127.0.0.0 Find securenet: 0.0.0.0 0.0.0.0 ypserv.conf: 0.0.0.0/0.0.0.0:*:0:0:2 ypproc_domain_nonack("nis.lutecium") [From: 192.168.1.1:3142] connect from 192.168.1.1 -> OK. Segmentation fault (core dumped) as you see it connects ok and then crashes. Does anyone have the experience of this kind of thing or can direct me to a place where I can find an answer Thanks in advance Jacques -- Dr. Jacques B. Siboni mailto:jacsib@Lutecium.org 8 pass. Charles Albert, F75018 Paris, France Tel. & Fax: 33 (0) 1 42 28 76 78 Home Page: http://www.lutecium.org/jacsib/ From award at andorra.ad Fri Sep 28 08:42:03 2001 From: award at andorra.ad (Alan Ward) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters References: <00a601c14788$a2e51110$3a5d893f@machavelli> <20010927212418.G2611@megatonmonkey.net> <20010927232941.A10325@wumpus.foo> Message-ID: <3BB49A4B.285EB16E@andorra.ad> In many (european) countries, the "efficiency factor" -- or "cos(phi)" as we call it -- must be over 0.9 . A transformer, or computer power supply, must be designed to comply with this, at least under normal working conditions. An overloaded transformer won't. I would have to check this, but get the impression that with many power supplies cos(phi) goes way down during power on, when they tend to suck in current. Regards, Alan Ward Greg Lindahl ha escrit: > On Thu, Sep 27, 2001 at 09:24:18PM -0400, Carlos O'Donell Jr. wrote: > > > And if the supply is _good_enough_ then it will approach 1 and give > > you the worst case GFlops/kW ;) > > I don't believe that modern power supplies actually have an > "efficiency factor" that close to 1. I'll have to ask my business guy > (David Rhoades), he used to specialize in weird trivia like building > machine rooms, something that most computer people tend to avoid... > > greg > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From asabigue at bmw.fing.edu.uy Fri Sep 28 11:11:01 2001 From: asabigue at bmw.fing.edu.uy (Ariel Sabiguero Yawelak) Date: Wed Nov 25 01:01:43 2009 Subject: anyone LXT1001 dirver for 2.4.x kernel? Message-ID: <3BB4BD35.7000901@bmw.fing.edu.uy> Hi all! I have been trying to find a driver for a Gigabit (1000Base-SX) NIC for the Kernel 2.4.x unsuccessfully. The NICs chipset is LXT1001, from Level One Communications,a company that has been acquired by Intel. We have been using the jt1lin driver by Douglas Greiman and Antonio Torrini, with 2.2.x kernels succesfully, but no answer from authors... If any one can provide us with a link to a newer kernel driver (we would not like to drop the adapters!).... Regards Ariel From lindahl at conservativecomputer.com Fri Sep 28 12:22:45 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: <3BB49A4B.285EB16E@andorra.ad>; from award@andorra.ad on Fri, Sep 28, 2001 at 05:42:03PM +0200 References: <00a601c14788$a2e51110$3a5d893f@machavelli> <20010927212418.G2611@megatonmonkey.net> <20010927232941.A10325@wumpus.foo> <3BB49A4B.285EB16E@andorra.ad> Message-ID: <20010928152245.A12347@wumpus.foo> On Fri, Sep 28, 2001 at 05:42:03PM +0200, Alan Ward wrote: > In many (european) countries, the "efficiency factor" -- or "cos(phi)" > as we call it -- must be over 0.9 . A transformer, or computer power supply, > must be designed to comply with this, at least under normal working > conditions. An overloaded transformer won't. David's number was also 0.9, so I was misremembering, and it is pretty close to one. greg From lindahl at conservativecomputer.com Fri Sep 28 12:24:16 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:43 2009 Subject: NIS server core dump In-Reply-To: <3BB47DA9.B3A6E974@lutecium.org>; from jacsib@lutecium.org on Fri, Sep 28, 2001 at 01:39:53PM +0000 References: <5.1.0.14.2.20010927232147.035127b0@pop.mtlug.org> <3BB47DA9.B3A6E974@lutecium.org> Message-ID: <20010928152416.B12347@wumpus.foo> On Fri, Sep 28, 2001 at 01:39:53PM +0000, Jacques B. Siboni wrote: > I use ypserv on the Linux 7.0 box to serv the NIS clients on the slave > machines. ypserv version is ypserv-1.3.12 (previous versions act the same > way). Since you say 7.0 I guess you really mean RedHat Linux 7.0. If so the first place to check is redhat.com/errata, look to see if that RPM has an update available. Usually when they really goof up, they quickly have an update. If it's Mandrake or another distribution, they also have similar spots on their website. greg From lindahl at conservativecomputer.com Fri Sep 28 13:06:24 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:43 2009 Subject: Baltimore Washington Beowulf User Group meeting Message-ID: <20010928160624.A12456@wumpus.foo> The Baltimore Washington Beowulf User Group (bwbug) is beginning to have monthly meetings once again. For more details, see http://bwbug.org or subscribe to our mailing list. Here's the announcement. In the future we might try to broadcast these meetings on the nationwide NCSA ACCESS videoconferencing network. If you are interested in attending meetings this way, please drop me email, and I'll know if it's worth asking NCSA. greg ====================================================================== Date: Fri, 28 Sep 2001 15:55:33 -0400 From: David Rhoades To: bwbug@bwbug.org Subject: bwbug: October BWBUG meeting FINALLY! The Baltimore Washing Beowulf Users Group is having a meeting! We've got a place, date, time, and topic. DATE/TIME Tuesday October 16 3:00 pm TOPIC Panel Discussion: Clusters and non MPI programs We all know how to run an MPI program or two on a cluster, with a reasonable user interface. But what about non MPI programs? Do we have an embarrassingly easy way to run embarrassingly parallel problems? Is it possible to use the wide variety of tools for non MPI problems on the wide of styles of clusters? Confirmed panelists: Don Becker, CTO, Scyld Greg Lindahl, CTO, Conservative Computer other panelists wanted; contact lindahl@conservativecomputer.com to suggest or VOLUNTEER DIRECTIONS The address is: 7501 Greenway Center Drive Suite 1000 (10th floor) Greenbelt, MD 20770, phone 703-628-7451 Mapquest does a reasonable job of directions. from the Beltway, take the BWPkwy (295) North. Take the first exit onto MD-193 towards Greenbelt/NASA Goddard. At the bottom of the ramp you will turn left onto MD193, and then IMMEDIATELY right onto Hanover Parkway. Take the next right onto Greenway Center Drive and the Logicon building is near the end of the road on the left. Logicon/FDC has graciously opened their auditorium in their Greenbelt facility for this meeting, and future ones if this is convenient for everyone. While it's not right on the Metro, I will offer to run a van from Metro at 2:40PM if that helps some folks get there. Just don't wait until the last minute to tell me, as it is a 15-passenger van and I don't usually drive it unless I know I need to (also, I can have the kids clean it up ;-). NOTE We need ideas for speakers and topics. If you can help out, please contact me at drhoades@conservativecomputer.com. From mudguy at speedfactory.net Sat Sep 29 10:24:26 2001 From: mudguy at speedfactory.net (Sam Harper) Date: Wed Nov 25 01:01:43 2009 Subject: OT: Nodes Message-ID: Hi. My name is Sam Harper, and I own a small system-building company in Alpharetta, Georgia called Distortion Limited. We primarily build home computers and high-performance workstations, and are known by word of mouth. I've been on this list for a year now, and I've watched people price shop for nodes on their Beowulfs. Please excuse me for this offer, but I'm interested in helping anyone with their node requirements, and I feel fairly confident that I can offer a much better price than major competitors. I sell AMD [SDR or DDR ram] and Intel [rambus or DDR ram] in both single and dual configurations, and in mid-tower or rackmount chassis. I look forward to helping anyone with their node or server needs. Thanks again for your time. Sam Harper mudguy@speedfactory.net Distortion Limited 770-740-8417 -- From dvos12 at calvin.edu Sat Sep 29 13:12:26 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:43 2009 Subject: Gaussian 98 Message-ID: Has anyone on this list gotten Gaussian 98 using Linda to run on a Redhat 7.x cluster? David From eric at fnordsystems.com Sat Sep 29 13:32:55 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:43 2009 Subject: Nodes In-Reply-To: Message-ID: Nice try, Sam. :) But I'm of the opinion people usually want to buy beowulf nodes from an established rackmount server vendor like my company... particularly one that has a web presence and onsite service waranty. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Sam Harper Sent: Saturday, September 29, 2001 10:24 AM To: beowulf@beowulf.org Subject: OT: Nodes Hi. My name is Sam Harper, and I own a small system-building company in Alpharetta, Georgia called Distortion Limited. We primarily build home computers and high-performance workstations, and are known by word of mouth. I've been on this list for a year now, and I've watched people price shop for nodes on their Beowulfs. Please excuse me for this offer, but I'm interested in helping anyone with their node requirements, and I feel fairly confident that I can offer a much better price than major competitors. I sell AMD [SDR or DDR ram] and Intel [rambus or DDR ram] in both single and dual configurations, and in mid-tower or rackmount chassis. I look forward to helping anyone with their node or server needs. Thanks again for your time. Sam Harper mudguy@speedfactory.net Distortion Limited 770-740-8417 -- _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jakob at unthought.net Sat Sep 29 14:14:03 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:43 2009 Subject: Nodes In-Reply-To: ; from eric@fnordsystems.com on Sat, Sep 29, 2001 at 01:32:55PM -0700 References: Message-ID: <20010929231403.B14711@unthought.net> On Sat, Sep 29, 2001 at 01:32:55PM -0700, Eric Kuhnke wrote: > Nice try, Sam. :) > > But I'm of the opinion people usually want to buy beowulf nodes from an > established rackmount server vendor like my company... particularly one > that has a web presence and onsite service waranty. Nice tries, both of you ;) I'm confident that people on this list are capable of making their own decisions - *and* looking for vendors and vendor oppinions when they feel they need them. No harm done, but let's not turn beowulf into an advertisement list anymore than it is (hi Scyld ;) Cheers all, -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From carlos at megatonmonkey.net Sat Sep 29 22:02:39 2001 From: carlos at megatonmonkey.net (Carlos O'Donell Jr.) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters In-Reply-To: <20010928152245.A12347@wumpus.foo>; from lindahl@conservativecomputer.com on Fri, Sep 28, 2001 at 03:22:45PM -0400 References: <00a601c14788$a2e51110$3a5d893f@machavelli> <20010927212418.G2611@megatonmonkey.net> <20010927232941.A10325@wumpus.foo> <3BB49A4B.285EB16E@andorra.ad> <20010928152245.A12347@wumpus.foo> Message-ID: <20010930010239.A18746@megatonmonkey.net> > > > In many (european) countries, the "efficiency factor" -- or "cos(phi)" > > as we call it -- must be over 0.9 . A transformer, or computer power supply, > > must be designed to comply with this, at least under normal working > > conditions. An overloaded transformer won't. > > David's number was also 0.9, so I was misremembering, and it is pretty > close to one. > > greg > Since PC supplies are really nothing more than switching supplies, they generally have good power factor (the cos(phi) in the equation). What they also have is good harmonics ;) Your PC will generate higher harmonic distortions that make all the power supplied to your systems ugly. Infact most of the load is on the peak of the waveform and tends to flatten out the voltage supplied. If you have delicate scientific instrumentaiton, do not run it on the same circuit as any of your cluster! Large companies with many computers and odd devices, may need to hire a power systems engineer to clean up all the _bad_ power devices. In Canada, anything under 0.9 starts getting you charged extra. c. From jakob at unthought.net Sun Sep 30 01:57:04 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:43 2009 Subject: batch systems with job deps (afterok) In-Reply-To: <20010926110310Q.hanzl@unknown-domain>; from hanzl@noel.feld.cvut.cz on Wed, Sep 26, 2001 at 11:03:10AM +0200 References: <20010924153243.A26106@unthought.net> <20010925173907Y.hanzl@unknown-domain> <20010926055927.B7750@unthought.net> <20010926110310Q.hanzl@unknown-domain> Message-ID: <20010930105704.B20110@unthought.net> On Wed, Sep 26, 2001 at 11:03:10AM +0200, hanzl@noel.feld.cvut.cz wrote: ... > Typically, previous invocation of 'make' would still run when I have > the next steps ready. Typing "make" again at this moment would cause > both copies to work on unfinished step (second make would not wait as > it should). (Typically, I might finish the next step on friday and > would like it to be invoked during the weekend, as soon as possible.) > However there might be a simple solution (like one 'make' in loop, > until there is nothing to do). > > > I wonder why my needs seem to be uncommon - is it because you guys on > the beowulf list > 1) are not as lazy as I am and you have the whole program ready in time? If you believe that one, I have this big tower in london that I can sell you really cheap - it's got a clock on it too ;) > 2) you change data/parameters rather then programs? I believe it is rare to change your program, and have the new "incarnation" depend on results from an older incarnation (eg. obsolete or at least different version). In fact, I can't imagine how you manage to keep track of how you got which results from which code, with a pipeline of various incremental versions of your software depending on results from other versions. But maybe that's just me :) > 3) you have plenty of time and do not need to work in parallel with > your cluster? Well, most scientists don't have a lot of work to do, and besides, you need time to spend all that money too... Eh... ;) > > I do not beleive any of these, please tell me why you all are not > calling for job dependencies in any job spooling system :) I think (2) comes fairly close to at least some of the truth. Of course you change your algorithms. But I have never worked with multiple revisions of code depending on results from other revisions. And I have never heard of anyone else (except you) doing so to such an extent. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From ron_chen_123 at yahoo.com Sun Sep 30 12:20:56 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:43 2009 Subject: Fwd: Grid Engine: ongoing and future work In-Reply-To: <20010925175106R.hanzl@unknown-domain> Message-ID: <20010930192056.50349.qmail@web14708.mail.yahoo.com> SGE runs on Linux, with some minor modifications, it should be able to run on Scyld. -Ron --- hanzl@noel.feld.cvut.cz wrote: > I just found in documentation that gridengine > supports batch job > dependency which I desperately look for (so far > having found it in PBS > only, but I am unable to run PBS on Scyld). So I am > willing to join > the work. > > Anybody knows how hard it is (would be) to run > gridengine on Scyld > Beowulf? > > Thanks > > Vaclav __________________________________________________ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com From asabigue at fing.edu.uy Fri Sep 28 09:49:23 2001 From: asabigue at fing.edu.uy (Ariel Sabiguero Yawelak) Date: Wed Nov 25 01:01:43 2009 Subject: Gigabit driver Message-ID: <3BB4AA13.6020803@fing.edu.uy> Hi all! I have been trying to find a driver for a Gigabit (1000Base-SX) NIC for the Kernel 2.4.x unsuccessfully. The NICs chipset is LXT1001, from Level One Communications,a company that has been acquired by Intel. We have been using the jt1lin driver by Douglas Greiman and Antonio Torrini, with 2.2.x kernels succesfully. If any one can provide us with a link to a newer kernel driver.... Regards Ariel From bruno_richard at hp.com Fri Sep 28 00:05:51 2001 From: bruno_richard at hp.com (RICHARD,BRUNO (HP-France,ex1)) Date: Wed Nov 25 01:01:43 2009 Subject: Paper showing Linpack scalability of mainstream clusters Message-ID: Interesting. We have been doing some performance/m3, perfomance/m2, performance/$ computations but I did not think about this performance/kW one. 2 additional comments: - The noise level of I-Cluster is very low (A/C is by far the top noise generator), which is good as well. - The drawback of the HP e-PC is that it is not easily rackable, as it is designed to be on users desks. -bruno -----Original Message----- From: Scott Shealy [mailto:sshealy@asgnet.psc.sc.edu] Sent: Thursday, September 27, 2001 21:14 To: beowulf@beowulf.org Cc: Roger L. Smith; bruno_richard@hp.com Subject: Re: Paper showing Linpack scalability of mainstream clusters From bruno_richard at hp.com Fri Sep 28 00:00:16 2001 From: bruno_richard at hp.com (RICHARD,BRUNO (HP-France,ex1)) Date: Wed Nov 25 01:01:44 2009 Subject: Paper showing Linpack scalability of mainstream clusters Message-ID: Hi Roger, If you read our paper, you will see that one of our goals is to model an enterprise network. Which typically will not have SMP machines but standard PCs. If we were targeting pure performance/price, we would have rather chosen SMP machines. An experiment is ongoing at INRIA, with a 200 nodes cluster, SMP. This will have a much better performance. Here the goal is to use *existing* PCs, hence acquisition cost is not considered (so infinite performance/price ratio;) Regards, -bruno -----Original Message----- From: Roger L. Smith [mailto:roger@ERC.MsState.Edu] Sent: Thursday, September 27, 2001 20:38 To: RICHARD,BRUNO (HP-France,ex1) Cc: 'Scott Shealy'; beowulf@beowulf.org Subject: RE: Paper showing Linpack scalability of mainstream clusters Bruno, Hmmm, I thought that some of the others were purely ethernet based, but after doing some quick research, I guess I'll stand corrected. If you intend "mainstream" to mean only single-processor desktop-type machines, then I'll completely concede your point. As of the last Top 500 list, our cluster was listed as 158th, and is entirely ethernet based, using a single 100Mb/s interconnect to each node, and GigE interconnects between switches. However, our nodes are 1U with dual processors, so I guess maybe it doesn't fit the definition of mainstream that you stated. It doesn't predate the I-cluster, but it does at least tie with it. :-) -Roger On Thu, 27 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > Hi Roger, > > What we mean by "mainstream" is that these could be your grandma's > machine, unmodified except for the Software. Actually I-Cluster *is* > the first cluster of this type to enter the TOP500. Some PC-based > clusters are already registered there of course, but they cannot be > called "mainstream": Most require specific (non-mainstream at > all!) connectivity such as Myrinet, SCI, Quadrix... Some are based on > PCs equipped with several LAN boards (not mainstream either). If you > restrict to off-the-shelf monoprocessor (excluding non-mainstream > Alpha, MIPS and such) interconnected through standard Ether100, no > cluster ever entered the TOP500 list. And I-Cluster is still the only > one to be there. Let me know if you think otherwise. > > Regards, -bruno > _____________________________________________ > Bruno RICHARD - Research Program Manager > HP Laboratories > 38053 Grenoble Cedex 9 - FRANCE > Phone: +33 (4) 76 14 15 38 > bruno_richard@hp.com > > > -----Original Message----- > From: Roger L. Smith [mailto:roger@ERC.MsState.Edu] > Sent: Thursday, September 27, 2001 15:33 > To: RICHARD,BRUNO (HP-France,ex1) > Cc: 'Scott Shealy'; beowulf@beowulf.org > Subject: RE: Paper showing Linpack scalability of mainstream clusters > > > On Tue, 18 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > > > Sorry Scott, I sent you a wrong reference. The actual link is > > http://www.hpl.hp.com/techreports/2001/HPL-2001-206.html. Enjoy, > > -bruno > > > I'd be REALLY interested in hearing how you justify the following > statement in the paper: > > "Being the first ones to enter the TOP500 using only mainstream > hardware (standard PCs, standard Ethernet connectivity)...". > > > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ > | Roger L. Smith Phone: 662-325-3625 | > | Systems Administrator FAX: 662-325-7692 | > | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | > | Mississippi State University | > |_______________________Engineering Research > |Center_______________________| > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research |Center_______________________| From bropers at lsu.edu Thu Sep 27 11:44:23 2001 From: bropers at lsu.edu (Brian D. Ropers-Huilman) Date: Wed Nov 25 01:01:44 2009 Subject: Linux cluster in commercial office? In-Reply-To: <20010927100631.A23322@getafix.EraGen.com> Message-ID: Also, you should check out the Condor project at UW-Madison: http://www.cs.wisc.edu/condor/ This is _exactly_ what you're looking for. On Thu, 27 Sep 2001 beowulf-admin@beowulf.org wrote: > On Tue, Sep 18, 2001 at 05:39:27PM -0400, Korsedal, Brian wrote: > [stuff deleted] > > look into clustering our PC's so that we can have an extra high performance > > server. Each PC would still have to function as a terminal (office apps and > > the ability to run processes on the unix machines) but use the free CPU time > > to run simulations. Is there any implementation of clustering software for > > this? If there isn't, it would be an interesting thing to look into, there > > are many offices with computers that are barely used. My CPU sits idle 95% > > of the time and it would be great to caputer the extra CPU cycles. Does > > anybody have any thoughts about this or know how to make it happen? > > You should look into sun gridengine (http://www.sun.com/software/gridware/) > which has a nice mechanism for detecting idle time and running jobs when > interactive idle time reaches a certain point. You can even schedule your > workstations to become available for compute jobs after a certain time > using their calendar function. They even have an appnote on how to do > what you want to do (http://supportforum.sun.com/gridengine/appnote_idle.html). > We have recently (last few months) moved from OpenPBS to SGE and I must say > SGE is quite nice. It is much more stable for us with our large job arrays. > > Chris > -- Brian D. Ropers-Huilman (225) 578-0461 (V) Systems Administrator (225) 578-6400 (F) Office of Computing Services brian@ropers-huilman.net High Performance Computing http://www.ropers-huilman.net/ Fred Frey Building, Rm. 201, E-1Q \o/ Louisiana State University -- __o / | Baton Rouge, LA 70803-1900 --- `\<, / `\\, O/ O / O/ O From bruno_richard at hp.com Thu Sep 27 09:28:34 2001 From: bruno_richard at hp.com (RICHARD,BRUNO (HP-France,ex1)) Date: Wed Nov 25 01:01:44 2009 Subject: Paper showing Linpack scalability of mainstream clusters Message-ID: Hi Roger, What we mean by "mainstream" is that these could be your grandma's machine, unmodified except for the Software. Actually I-Cluster *is* the first cluster of this type to enter the TOP500. Some PC-based clusters are already registered there of course, but they cannot be called "mainstream": Most require specific (non-mainstream at all!) connectivity such as Myrinet, SCI, Quadrix... Some are based on PCs equipped with several LAN boards (not mainstream either). If you restrict to off-the-shelf monoprocessor (excluding non-mainstream Alpha, MIPS and such) interconnected through standard Ether100, no cluster ever entered the TOP500 list. And I-Cluster is still the only one to be there. Let me know if you think otherwise. Regards, -bruno _____________________________________________ Bruno RICHARD - Research Program Manager HP Laboratories 38053 Grenoble Cedex 9 - FRANCE Phone: +33 (4) 76 14 15 38 bruno_richard@hp.com -----Original Message----- From: Roger L. Smith [mailto:roger@ERC.MsState.Edu] Sent: Thursday, September 27, 2001 15:33 To: RICHARD,BRUNO (HP-France,ex1) Cc: 'Scott Shealy'; beowulf@beowulf.org Subject: RE: Paper showing Linpack scalability of mainstream clusters On Tue, 18 Sep 2001, RICHARD,BRUNO (HP-France,ex1) wrote: > Sorry Scott, I sent you a wrong reference. The actual link is > http://www.hpl.hp.com/techreports/2001/HPL-2001-206.html. Enjoy, > -bruno I'd be REALLY interested in hearing how you justify the following statement in the paper: "Being the first ones to enter the TOP500 using only mainstream hardware (standard PCs, standard Ethernet connectivity)...". _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger@ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research |Center_______________________| From mpaindav at toad.net Thu Sep 27 13:48:01 2001 From: mpaindav at toad.net (Matthieu Paindavoine) Date: Wed Nov 25 01:01:44 2009 Subject: Scyld and mpi FASTA Makefile Problems (fwd) In-Reply-To: References: Message-ID: <20010927164800.F1830@piddly> Dear Dean, I have put in ftp.scyld.com/pub/applications a fasta rpm that you might find easier to use. I haven't had a problem compiling fasta on the scyld cluster. I used the existing Makefile.mpi4 without any modification. I would be please to assist you further for compile problem if you provide me with the output of make. I hope this reply will reach you before you've gone completely mad :) Cheers, Matt > Dear users, > > I am a beowulf newbie who has almost gone completely mad trying to > rework the Makefile for the mpi FASTA under the Scyld release 7operating > system. The small Scyld beowulf system that I have constructed works > perfectly for the Linpack as well as the mpi-mandel test applications. > included in the Scyld distribution. However, I cannot seem to get the > included Makefile for FASTA to compile. > > Does anyone have a Makefile for the mpi enabled FASTA that compiles > under Scyld? > > Any pointers would be immensly helpful (I wouldn be surprised to learn > that I have done everything incorrectly). > > Much thanks, > > Dean Lavelle > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf