From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Mar 2 02:08:41 2009 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] small distro for PXE boot, autostarts sshd? In-Reply-To: <20090227110525.GD6210@sillage.bis.pasteur.fr> References: <20090226234606.GA11226@sillage.bis.pasteur.fr> <20090227110525.GD6210@sillage.bis.pasteur.fr> Message-ID: On Fri, 27 Feb 2009, Tru Huynh wrote: > On Fri, Feb 27, 2009 at 11:28:30AM +0100, Bogdan Costescu wrote: >> On Fri, 27 Feb 2009, Tru Huynh wrote: >> >>> I am using kickstart + CentOS-5 + %pre feature to download the >>> required dropbear daemon + partprobe (parted/fdisk/sfdisk are >>> included) + your nettee for a post kickstart cloning of a "gold" >>> image. >> I haven't seen kickstart being used this way so far... > if it was done in the %post part of the kickstart, yes > > 1) first regular kickstart your "gold" image > 2) start another kickstart and use only the %pre part Ah, I've parsed 'post kickstart' to mean 'in the %post part of kickstart'. Now it makes sense, thanks for the enlightment ! -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From kishandba at gmail.com Mon Mar 2 04:34:59 2009 From: kishandba at gmail.com (kishan gandhi) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] channel bonding three lan cards (ether net cards) Message-ID: Hi, how to configure i have three lan cards eth0 public ip eth1 public ip eth2 private ip how to bond can u salve the problem -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090302/1da2dc4f/attachment.html From hearnsj at googlemail.com Mon Mar 2 08:34:01 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] channel bonding three lan cards (ether net cards) In-Reply-To: References: Message-ID: <9f8092cc0903020834w13ff3058j3c8ce54a85bcc991@mail.gmail.com> 2009/3/2 kishan gandhi : > > Hi, > > how to configure > > i have three lan cards > > eth0 public ip > eth1 public ip > eth2 private ip Kishan, please say if these are on separate physical networks. It looks like the eth2 card is on a different physical network (read 'collision domain' or 'network segment'). In this cased you cannot bond it with the other cards. For the other two cards, eth0 and eth1, can we ask what the purpose of bonding would be - is it a failover to cope with the failure or one interface? From raysonlogin at gmail.com Mon Mar 2 10:57:28 2009 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] user stats on clusters In-Reply-To: <49A85DF7.7090502@tamu.edu> References: <49A85DF7.7090502@tamu.edu> Message-ID: <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> SGE has the Accounting and Reporting Console module (aka ARCo). It basically logs the job and host data from the qmaster+scheduler into an SQL database (MySQL, Oracle, or PostgreSQL). The pie charts are nice: http://wikis.sun.com/display/GridEngine/Starting+the+Accounting+and+Reporting+Console And, you can write your own SQL queries to get the data you want. Homepage: http://arco.sunsource.net/ Rayson On Fri, Feb 27, 2009 at 4:41 PM, Gerry Creager wrote: > A general question: What're folks using for stats, including queue wait, > execution times, hours/month? Any suggestions? > > gerry > -- > Gerry Creager -- gerry.creager@tamu.edu > Texas Mesonet -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 > Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From bill at cse.ucdavis.edu Tue Mar 3 14:01:06 2009 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Dual Nehalem announced. In-Reply-To: <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> References: <49A85DF7.7090502@tamu.edu> <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> Message-ID: <49ADA8A2.9030200@cse.ucdavis.edu> I noticed that apple's selling single/dual nehalems, claim to ship within 4 days. They offer 2.26, 2.66, and 2.93 GHz duals. Hopefully that triggers the NDAs to evaporate. From kilian.cavalotti.work at gmail.com Tue Mar 3 15:50:48 2009 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] user stats on clusters In-Reply-To: <49A85DF7.7090502@tamu.edu> References: <49A85DF7.7090502@tamu.edu> Message-ID: <200903040050.49150.kilian.cavalotti.work@gmail.com> Hi Gerry, On Friday 27 February 2009 22:41:11 Gerry Creager wrote: > A general question: What're folks using for stats, including queue wait, > execution times, hours/month? Any suggestions? We use LSF reporting tools, which are a bit raw, but do their job just fine. For the users and PIs, I wrote web wrapper to present usage statistics and usage reports (for billing purposes) in a more user-friendly manner. Most features are decribed here : https://biox2.stanford.edu/doc/wiki/WebProfile Due to the specificity of the environment and of our requirements, I never bothered making this tool usable outside of our cluster, but that's probably something which can be done in a reasonnable amount of time. Other than that, Platform has a nice monitoring tool for clusters using LSF, which is scheduler-centric, and based on the open-source Cacti. It's called RTM, and is really helpful for both admins and users. See http://www.platform.com/Products/platform-rtm Cheers, -- Kilian From thakur at mcs.anl.gov Wed Mar 4 10:23:16 2009 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] SC09: Call for tutorial proposals Message-ID: <402E92EFEE554F45A510C43F65D1343C@mcs.anl.gov> Call for SC09 Tutorial Proposals Experts in high performance computing are invited to share their expertise with the High Performance Computing (HPC) community by submitting proposals for tutorials at the SC09 conference to be held in Portland, Oregon, November 14-20, 2009.The SC09 Tutorials program will give attendees the opportunity to explore a wide variety of important topics related to high-performance computing, networking, and storage. SC09 invites proposals for introductory, intermediate, and advanced tutorials, either full-day (six hours) or half-day (three hours). A distinguished panel of experts will select the tutorials from the submitted proposals. Submissions for tutorials and other aspects of the SC09 Technical Program open Monday, March 16, 2009. The deadline for submission is April 6, 2009. Detailed submission information: http://sc09.supercomputing.org/?pg=tutorials.html Questions: tutorials@info.supercomputing.org From forum.san at gmail.com Wed Mar 4 22:29:09 2009 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP Message-ID: Hello everyone, Is there a Grid scheduler (only open source, like SGE) tool which can be installed/run on Windows XP Desktop systems (there is no Linux involvement strictly). The applications used under this grid are Native to Windows XP. Thanks, Sangamesh From hunting at ix.netcom.com Thu Mar 5 00:24:03 2009 From: hunting at ix.netcom.com (Michael Huntingdon) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Dual Nehalem announced. In-Reply-To: <49ADA8A2.9030200@cse.ucdavis.edu> References: <49A85DF7.7090502@tamu.edu><73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> <49ADA8A2.9030200@cse.ucdavis.edu> Message-ID: <9F1244F4B74B49C38122B0FE2E5E675F@MichaelPC> Bill I'm told the availability of systems within 4 days can't be accurate at this point. Any product available now is a prerelease and not a final as Intel has not released the volume chips yet. Michael A. Huntingdon Higher Education Sales Account Manager Systems Performance Consultants Office (408) 294-6811 Cell (408) 531-7422 Fax (601) 510-3808 ----- Original Message ----- From: "Bill Broadley" To: Sent: Tuesday, March 03, 2009 2:01 PM Subject: [Beowulf] Dual Nehalem announced. > > I noticed that apple's selling single/dual nehalems, claim to ship within > 4 > days. They offer 2.26, 2.66, and 2.93 GHz duals. Hopefully that triggers > the > NDAs to evaporate. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From lynesh at cardiff.ac.uk Thu Mar 5 00:28:36 2009 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: References: Message-ID: <1236241716.16016.2.camel@desktop> On Thu, 2009-03-05 at 01:29 -0500, Sangamesh B wrote: > Hello everyone, > > Is there a Grid scheduler (only open source, like SGE) tool which > can be installed/run on Windows XP Desktop systems (there is no Linux > involvement strictly). > > The applications used under this grid are Native to Windows XP. Condor would be my first thought. http://www.cs.wisc.edu/condor/ We use it to run batch jobs across our Windows XP desktops. We run the central manager on a linux box, but you can use Windows for that as well. Cheers, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From forum.san at gmail.com Thu Mar 5 00:57:29 2009 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: <1236241716.16016.2.camel@desktop> References: <1236241716.16016.2.camel@desktop> Message-ID: Thanks Huw for your suggestion. By batch jobs did you mean only serial jobs? Is anybody successful running SGE on cygwin like environment on Windows machines? (Both master & execution hosts) Thanks, Sangamesh On Thu, Mar 5, 2009 at 3:28 AM, Huw Lynes wrote: > On Thu, 2009-03-05 at 01:29 -0500, Sangamesh B wrote: >> Hello everyone, >> >> Is there a Grid scheduler (only open source, like SGE) tool which >> can be installed/run on Windows XP Desktop systems (there is no Linux >> involvement strictly). >> >> The applications used under this grid are Native to Windows XP. > > Condor would be my first thought. > http://www.cs.wisc.edu/condor/ > > We use it to run batch jobs across our Windows XP desktops. We run the > central manager on a linux box, but you can use Windows for that as > well. > > Cheers, > Huw > > -- > Huw Lynes | Advanced Research Computing > HEC Sysadmin | Cardiff University > | Redwood Building, > Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB > > From hearnsj at googlemail.com Thu Mar 5 01:24:36 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] channel bonding three lan cards (ether net cards) In-Reply-To: References: <9f8092cc0903020834w13ff3058j3c8ce54a85bcc991@mail.gmail.com> Message-ID: <9f8092cc0903050124t59ce6952vba8f575215dbc79e@mail.gmail.com> 2009/3/5 kishan gandhi : > Hi, > > exactly failover purpose > > if?in any case one public ip fail that time?it will up second public ip up > that why i am?asking Kishan, it is not clear what you are trying to achieve. Do you have: a) two ethernet cards, two cables, these cables attached to either one ethernet switch or two ethernet switches one IP address for this server - ie. you would like the second ethernet card to come up if the primary link fails Bonding can do this. b) two IP Addresses for this server From lynesh at cardiff.ac.uk Thu Mar 5 04:02:54 2009 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: References: <1236241716.16016.2.camel@desktop> Message-ID: <1236254574.2806.78.camel@w609.insrv.cf.ac.uk> On Thu, 2009-03-05 at 03:57 -0500, Sangamesh B wrote: > Thanks Huw for your suggestion. > > By batch jobs did you mean only serial jobs? > We only run serial jobs under condor because most of the desktops are single-core and are only connected with 10 or 100base ethernet. So running parallel jobs wouldn't make any sense for us. Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From kus at free.net Thu Mar 5 09:04:25 2009 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: Message-ID: In message from Sangamesh B (Thu, 5 Mar 2009 01:29:09 -0500): >Hello everyone, > > Is there a Grid scheduler (only open source, like SGE) tool which >can be installed/run on Windows XP Desktop systems (there is no Linux >involvement strictly). > >The applications used under this grid are Native to Windows XP. GRAM component of Globus Toolkit (http://www.globus.org/) give you some possibilities of batch queue system, and there is SGE interfaces to Globus. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow > >Thanks, >Sangamesh >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > >-- >This message has been scanned for viruses and >dangerous content by MailScanner, and is >believed to be clean. > From coutinho at dcc.ufmg.br Thu Mar 5 09:05:26 2009 From: coutinho at dcc.ufmg.br (Bruno Coutinho) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Dual Nehalem announced. In-Reply-To: <9F1244F4B74B49C38122B0FE2E5E675F@MichaelPC> References: <49A85DF7.7090502@tamu.edu> <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> <49ADA8A2.9030200@cse.ucdavis.edu> <9F1244F4B74B49C38122B0FE2E5E675F@MichaelPC> Message-ID: 2009/3/5 Michael Huntingdon > Bill > I'm told the availability of systems within 4 days can't be accurate at > this point. Any product available now is a prerelease and not a final as > Intel has not released the volume chips yet. When is the official release? I heard that is March 28 ... > > > Michael A. Huntingdon > Higher Education Sales Account Manager > Systems Performance Consultants > Office (408) 294-6811 > Cell (408) 531-7422 > Fax (601) 510-3808 > ----- Original Message ----- From: "Bill Broadley" > To: > Sent: Tuesday, March 03, 2009 2:01 PM > Subject: [Beowulf] Dual Nehalem announced. > > > >> I noticed that apple's selling single/dual nehalems, claim to ship within >> 4 >> days. They offer 2.26, 2.66, and 2.93 GHz duals. Hopefully that triggers >> the >> NDAs to evaporate. >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090305/43c5df8e/attachment.html From gerry.creager at tamu.edu Thu Mar 5 10:55:59 2009 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection Message-ID: <49B0203F.3030808@tamu.edu> I recall a couple of recent arguments, so I'm wondering if someone could summarize the discussion on gigabit switches for HPC/clusters. 1. If memory serves, NetGear wasn't a great choice? 2. HP Procurve got some good reviews. I've been asked for an opinion by some folks building a small, dedicated cluster. I'd told them of my misgivings on NetGear, and now I've gotta give a recommendation. I am asking for the collective wisdom. That said, I'll comment that I've been happy with out Procurve 5012zl but it's a bit larger than they need or would grow into. thanks, Gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From landman at scalableinformatics.com Thu Mar 5 11:17:55 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection In-Reply-To: <49B0203F.3030808@tamu.edu> References: <49B0203F.3030808@tamu.edu> Message-ID: <49B02563.5090705@scalableinformatics.com> Gerry Creager wrote: > I recall a couple of recent arguments, so I'm wondering if someone could > summarize the discussion on gigabit switches for HPC/clusters. > > 1. If memory serves, NetGear wasn't a great choice? > 2. HP Procurve got some good reviews. > > I've been asked for an opinion by some folks building a small, dedicated > cluster. I'd told them of my misgivings on NetGear, and now I've gotta > give a recommendation. How large and what traffic over it? > > I am asking for the collective wisdom. That said, I'll comment that > I've been happy with out Procurve 5012zl but it's a bit larger than they > need or would grow into. > > thanks, Gerry -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From jlforrest at berkeley.edu Thu Mar 5 11:20:13 2009 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection In-Reply-To: <49B0203F.3030808@tamu.edu> References: <49B0203F.3030808@tamu.edu> Message-ID: <49B025ED.8080505@berkeley.edu> Gerry Creager wrote: > I recall a couple of recent arguments, so I'm wondering if someone could > summarize the discussion on gigabit switches for HPC/clusters. > > 1. If memory serves, NetGear wasn't a great choice? > 2. HP Procurve got some good reviews. > > I've been asked for an opinion by some folks building a small, dedicated > cluster. I'd told them of my misgivings on NetGear, and now I've gotta > give a recommendation. I've been using NetGear with OK results, although I have to admit that we're not super critical about switches. My primary suggest would be to not pay too much attention to the subjective reviews you'll see here, including mine. Instead, I'd suggest finding and understanding a switch benchmark program that gives a more objective evaluation. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest@berkeley.edu From gerry.creager at tamu.edu Thu Mar 5 11:36:21 2009 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection In-Reply-To: <49B02563.5090705@scalableinformatics.com> References: <49B0203F.3030808@tamu.edu> <49B02563.5090705@scalableinformatics.com> Message-ID: <49B029B5.4040408@tamu.edu> Joe Landman wrote: > Gerry Creager wrote: >> I recall a couple of recent arguments, so I'm wondering if someone >> could summarize the discussion on gigabit switches for HPC/clusters. >> >> 1. If memory serves, NetGear wasn't a great choice? >> 2. HP Procurve got some good reviews. >> >> I've been asked for an opinion by some folks building a small, >> dedicated cluster. I'd told them of my misgivings on NetGear, and now >> I've gotta give a recommendation. > > How large and what traffic over it? <= 24 nodes, with gigabit as the primary interconnect. MPI for data assimilation on custom weather codes. >> I am asking for the collective wisdom. That said, I'll comment that >> I've been happy with out Procurve 5012zl but it's a bit larger than >> they need or would grow into. >> >> thanks, Gerry > > -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From landman at scalableinformatics.com Thu Mar 5 11:38:58 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection In-Reply-To: <49B029B5.4040408@tamu.edu> References: <49B0203F.3030808@tamu.edu> <49B02563.5090705@scalableinformatics.com> <49B029B5.4040408@tamu.edu> Message-ID: <49B02A52.7040608@scalableinformatics.com> Gerry Creager wrote: > Joe Landman wrote: >> Gerry Creager wrote: >>> I recall a couple of recent arguments, so I'm wondering if someone >>> could summarize the discussion on gigabit switches for HPC/clusters. >>> >>> 1. If memory serves, NetGear wasn't a great choice? >>> 2. HP Procurve got some good reviews. >>> >>> I've been asked for an opinion by some folks building a small, >>> dedicated cluster. I'd told them of my misgivings on NetGear, and >>> now I've gotta give a recommendation. >> >> How large and what traffic over it? > > <= 24 nodes, with gigabit as the primary interconnect. MPI for data > assimilation on custom weather codes. It would be hard to beat the Procurve 2900-24G or 2824 units for performance. We have had customers slot them in after other (disastrous) switch choices. Often times we hear "faster", "NFS doesn't die", "PXE/bootp finally works", and other nice things like that. They will cost you more, but we haven't seen an unhappy cluster customer using them yet. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From gerry.creager at tamu.edu Thu Mar 5 12:33:11 2009 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Dual Nehalem announced. In-Reply-To: References: <49A85DF7.7090502@tamu.edu> <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> <49ADA8A2.9030200@cse.ucdavis.edu> <9F1244F4B74B49C38122B0FE2E5E675F@MichaelPC> Message-ID: <49B03707.3010105@tamu.edu> I've heard 30 MAR Bruno Coutinho wrote: > > > 2009/3/5 Michael Huntingdon > > > Bill > I'm told the availability of systems within 4 days can't be accurate > at this point. Any product available now is a prerelease and not a > final as Intel has not released the volume chips yet. > > > When is the official release? > I heard that is March 28 ... > > > > > Michael A. Huntingdon > Higher Education Sales Account Manager > Systems Performance Consultants > Office (408) 294-6811 > Cell (408) 531-7422 > Fax (601) 510-3808 > ----- Original Message ----- From: "Bill Broadley" > > > To: > > Sent: Tuesday, March 03, 2009 2:01 PM > Subject: [Beowulf] Dual Nehalem announced. > > > > I noticed that apple's selling single/dual nehalems, claim to > ship within 4 > days. They offer 2.26, 2.66, and 2.93 GHz duals. Hopefully > that triggers the > NDAs to evaporate. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From forum.san at gmail.com Thu Mar 5 23:41:42 2009 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: References: Message-ID: Thanks everyone for your suggestions. Globus has interface to SGE. But SGE can't be deployed completely on Windows. Right? regards, Sangamesh On Thu, Mar 5, 2009 at 10:34 PM, Mikhail Kuzminsky wrote: > In message from Sangamesh B (Thu, 5 Mar 2009 01:29:09 > -0500): >> >> Hello everyone, >> >> ?Is there a Grid scheduler (only open source, like SGE) tool which >> can be installed/run on Windows XP Desktop systems (there is no Linux >> involvement strictly). >> >> The applications used under this grid are Native to Windows XP. > > GRAM component of Globus Toolkit (http://www.globus.org/) give you some > possibilities of batch queue system, and there is SGE interfaces to Globus. > Mikhail Kuzminsky > Computer Assistance to Chemical Research Center > Zelinsky Institute of Organic Chemistry RAS > Moscow >> >> Thanks, >> Sangamesh >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> > > From sabujp at gmail.com Tue Mar 3 18:11:54 2009 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Dual Nehalem announced. In-Reply-To: <49ADA8A2.9030200@cse.ucdavis.edu> References: <49A85DF7.7090502@tamu.edu> <73a01bf20903021057x3e49ab3ax7f77f18c17f94f0a@mail.gmail.com> <49ADA8A2.9030200@cse.ucdavis.edu> Message-ID: On Tue, Mar 3, 2009 at 4:01 PM, Bill Broadley wrote: > > I noticed that apple's selling single/dual nehalems, claim to ship within 4 > days. ?They offer 2.26, 2.66, and 2.93 GHz duals. ?Hopefully that triggers the > NDAs to evaporate. _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf Argh it's not an xserve yet, just a mac pro http://store.apple.com/us/browse/home/shop_mac/family/mac_pro From chichan2008 at gmail.com Wed Mar 4 08:52:22 2009 From: chichan2008 at gmail.com (Chi Chan) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid Engine training courses? In-Reply-To: References: Message-ID: Anyone has experiences with Grid Engine/SGE training? I have the following questions: 1. How good are the SGE training courses? Did anyone attend any of the courses before: http://blogs.sun.com/HPC/entry/workshop_migrating_from_lsf_to http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109060 2. Do they run the SGE cluster in Virtual Machines? 3. How big is each class? 4. How related are those courses for a Rocks cluster user and admin? TIA :-) --Chi From matt at technoronin.com Thu Mar 5 11:34:15 2009 From: matt at technoronin.com (Matt Lawrence) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] switch selection In-Reply-To: <49B0203F.3030808@tamu.edu> References: <49B0203F.3030808@tamu.edu> Message-ID: On Thu, 5 Mar 2009, Gerry Creager wrote: > I recall a couple of recent arguments, so I'm wondering if someone could > summarize the discussion on gigabit switches for HPC/clusters. > > 1. If memory serves, NetGear wasn't a great choice? > 2. HP Procurve got some good reviews. > > I've been asked for an opinion by some folks building a small, dedicated > cluster. I'd told them of my misgivings on NetGear, and now I've gotta give > a recommendation. > > I am asking for the collective wisdom. That said, I'll comment that I've > been happy with out Procurve 5012zl but it's a bit larger than they need or > would grow into. The Linksys 48 port managed switches seem to be very unreliable. I recommend avoiding them. -- Matt It's not what I know that counts. It's what I can remember in time to use. From hearnsj at googlemail.com Fri Mar 6 07:06:04 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes - SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <98861235872102@webmail23.yandex.ru> References: <98861235872102@webmail23.yandex.ru> Message-ID: <9f8092cc0903060706g332e1f17l5f0f157e3b29f8e3@mail.gmail.com> 2009/3/1 ???????? ??????? : > I have such a configuration and now I'm interested, where can I obtain Beowulf & PVM, in "Operating systems. Internals and Design Principles. 4-th Ed." by W.Stollings, said that i can obtain Beowulf from www.beowulf.org. But I cann't find the path to download. The same is with PVM. By the way, what OS should i use: Fedora-based Yellow Dog Linux and Fedora for Head; NetBSD everywhere or something else? I am new in Linux, this cluster is intended for my scientific activity to make a work, that would make possible to me to became Ph.D. in heat-transfer coupled with hydrodynamics. > Sincerely yours, Dmitry Zaletnev. Dmitry, there is really no such thing as 'Beowulf' It is a combination of using a parallel library - such as PVM or MPI with commodity hardware. That looks like a really, really interesting setup! My advice to you - download OpenSUSE 11.1 for your head node, and download the CUDA development kit from http://www.nvidia.com/object/cuda_home.html That should give you plenty of CPU power with 4 CUDA cards. And please let us know how you progress - I'm very interested in running CFD codes on CUDA. (Hint - I look after CFD machines) Sorry I cannot help with the preferred way to run Sony PS3s. However, I think you should concentrate on the mix of x86_64 and CUDA - as I Say you have lots of potential there. John Hearns From gus at ldeo.columbia.edu Fri Mar 6 08:04:36 2009 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes - SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <98861235872102@webmail23.yandex.ru> References: <98861235872102@webmail23.yandex.ru> Message-ID: <49B14994.30608@ldeo.columbia.edu> Hello Dmitry, list Can you explain your hardware configuration a little better, please? It came only on the message subject. Too telegraphic for an explanation. Way too long for a message subject title. I copied it to the body of the message below. 1) Are your two nodes Core2Duo like the head node? 2) Are they two Sony PlayStation-3 boxes? 3) Or are they something else? 4) Do you have Nvidia 8800 GTX only on the head node or also on the two (compute) nodes (if they are not PS3)? 5) Do you plan to use CUDA for CFD parallel processing along with PVM? I don't think there is any downloadable setup for clusters in the Beowulf site. There is great information, but not a full OS plus clustering software. However, you can find a full cluster setup in Rocks Clusters, which is free and easy to install: http://www.rocksclusters.org/wordpress/ Nevertheless, it won't work with PlayStation, Fedora, or FreeBSD, I guess (they use CentOS, RHEL or Scientific Linux). MPI superseded PVM a while ago. These are the two main open source versions of MPI: http://www.open-mpi.org/ http://www.mcs.anl.gov/research/projects/mpich2/ The old PVM is available here: http://www.csm.ornl.gov/pvm/ I hope this helps. Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Dmitry Zaletnev wrote: > Subject: > > Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + > 2 nodes - SonyPS3 + > AlliedTelesyn GLAN Switch running > "home-made CFD-program". > > Message: > > I have such a configuration and now I'm interested, > where can I obtain Beowulf & PVM, in "Operating systems. > Internals and Design Principles. 4-th Ed." by W.Stollings, > said that i can obtain Beowulf from www.beowulf.org. > But I cann't find the path to download. > The same is with PVM. > By the way, what OS should i use: > Fedora-based Yellow Dog Linux and Fedora for Head; > NetBSD everywhere or something else? > I am new in Linux, this cluster is intended > for my scientific activity to make a work, > that would make possible to me to became Ph.D. > in heat-transfer coupled with hydrodynamics. > Sincerely yours, Dmitry Zaletnev. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From niftyompi at niftyegg.com Fri Mar 6 11:39:07 2009 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes - SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <98861235872102@webmail23.yandex.ru> References: <98861235872102@webmail23.yandex.ru> Message-ID: <20090306193907.GA2961@tosh2egg.wr.niftyegg.com> On Sun, Mar 01, 2009 at 04:48:22AM +0300, ???????? ??????? wrote: > > I have such a configuration and now I'm interested, where can I obtain Beowulf & PVM, in "Operating systems. Internals and Design Principles. 4-th Ed." by W.Stollings, said that i can obtain Beowulf from www.beowulf.org. But I cann't find the path to download. The same is with PVM. By the way, what OS should i use: Fedora-based Yellow Dog Linux and Fedora for Head; NetBSD everywhere or something else? I am new in Linux, this cluster is intended for my scientific activity to make a work, that would make possible to me to became Ph.D. in heat-transfer coupled with hydrodynamics. > Sincerely yours, Dmitry Zaletnev. PVM.. http://www.csm.ornl.gov/pvm/ Also look at MPI, http://www.open-mpi.org/ As an aside, PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) have become the standard interfaces for high-performance parallel programming in the message-passing paradigm. I mention this because I have a bias toward MPI. One nice thing about Open MPI is the range of hardware and batch system integrations. Also MPI may be the best place to develop new parallel code, it appears to me that new application code design work is happening in the MPI world. Perhaps more important is the integration and development of drivers for fast interconnects. As for OS use Redhat Enterprise linux or one of the derrived "free" clones like CentOS or Scientific Linux will take you a long way. Use the same OS on all your systems.. CentOS -- http://www.centos.org/ Scientific Linux -- https://www.scientificlinux.org/ -- T o m M i t c h e l l Found me a new hat, now what? From raysonlogin at gmail.com Fri Mar 6 11:59:01 2009 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid Engine training courses? In-Reply-To: References: Message-ID: <73a01bf20903061159y6377edend16a7bad98fc491a@mail.gmail.com> (Warning: Except the free SGE web training on the sun website, I have never taken any extra SGE training courses) The feedback on the Grid Engine mailing list about an earlier version of the SGE training course are positive. You can download the course materials from: https://www.middleware.georgetown.edu/confluence/display/HPCT/Advanced+Sun+Grid+Engine+Configuration+and+Administration It covers from basic SGE configuration to advanced scheduling algorithms (fair share, share tree), and then to the accounting console (ARCo). Rayson On Wed, Mar 4, 2009 at 11:52 AM, Chi Chan wrote: > Anyone has experiences with Grid Engine/SGE training? I have the > following questions: > > 1. How good are the SGE training courses? Did anyone attend any of the > courses before: > > http://blogs.sun.com/HPC/entry/workshop_migrating_from_lsf_to > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109060 > > 2. Do they run the SGE cluster in Virtual Machines? > > 3. How big is each class? > > 4. How related are those courses for a Rocks cluster user and admin? > > TIA :-) > > --Chi > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From herborn at usna.edu Fri Mar 6 12:41:54 2009 From: herborn at usna.edu (Steve Herborn) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes -SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <98861235872102@webmail23.yandex.ru> References: <98861235872102@webmail23.yandex.ru> Message-ID: If you are starting from scratch with only hardware and little to no previous experience you might want to read this article ==> http://www.linux-mag.com/id/7239 According to the author he built a functional Caos NSA/Perceus cluster in 23 minutes which included building the master node, downloading the capsules, and booting the first compute node. Steven A. Herborn U.S. Naval Academy Advanced Research Computing 410-293-6480 (Desk) 757-418-0505 (Cell) -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of ???????? ??????? Sent: Saturday, February 28, 2009 8:48 PM To: beowulf@beowulf.org Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes -SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program I have such a configuration and now I'm interested, where can I obtain Beowulf & PVM, in "Operating systems. Internals and Design Principles. 4-th Ed." by W.Stollings, said that i can obtain Beowulf from www.beowulf.org. But I cann't find the path to download. The same is with PVM. By the way, what OS should i use: Fedora-based Yellow Dog Linux and Fedora for Head; NetBSD everywhere or something else? I am new in Linux, this cluster is intended for my scientific activity to make a work, that would make possible to me to became Ph.D. in heat-transfer coupled with hydrodynamics. Sincerely yours, Dmitry Zaletnev. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gus at ldeo.columbia.edu Fri Mar 6 14:20:08 2009 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes - SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <49B14994.30608@ldeo.columbia.edu> References: <98861235872102@webmail23.yandex.ru> <49B14994.30608@ldeo.columbia.edu> Message-ID: <49B1A198.8080002@ldeo.columbia.edu> PS - Dmitry: ClusterMonkey is another excellent site where you can get information about clusters: http://www.clustermonkey.net/ Gus Correa Gus Correa wrote: > Hello Dmitry, list > > Can you explain your hardware configuration a little better, please? > It came only on the message subject. > Too telegraphic for an explanation. > Way too long for a message subject title. > I copied it to the body of the message below. > > 1) Are your two nodes Core2Duo like the head node? > 2) Are they two Sony PlayStation-3 boxes? > 3) Or are they something else? > 4) Do you have Nvidia 8800 GTX only on the head node or also on > the two (compute) nodes (if they are not PS3)? > 5) Do you plan to use CUDA for CFD parallel processing along with PVM? > > > I don't think there is any downloadable setup for clusters in > the Beowulf site. > There is great information, > but not a full OS plus clustering software. > > However, you can find a full cluster setup in Rocks Clusters, > which is free and easy to install: > > http://www.rocksclusters.org/wordpress/ > > Nevertheless, it won't work with PlayStation, Fedora, > or FreeBSD, I guess (they use CentOS, RHEL or Scientific Linux). > > MPI superseded PVM a while ago. > These are the two main open source versions of MPI: > > http://www.open-mpi.org/ > http://www.mcs.anl.gov/research/projects/mpich2/ > > The old PVM is available here: > > http://www.csm.ornl.gov/pvm/ > > I hope this helps. > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > Dmitry Zaletnev wrote: > > > Subject: > > > > Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + > > 2 nodes - SonyPS3 + > > AlliedTelesyn GLAN Switch running > > "home-made CFD-program". > > > > Message: > > > > I have such a configuration and now I'm interested, > > where can I obtain Beowulf & PVM, in "Operating systems. > > Internals and Design Principles. 4-th Ed." by W.Stollings, > > said that i can obtain Beowulf from www.beowulf.org. > > But I cann't find the path to download. > > The same is with PVM. > > By the way, what OS should i use: > > Fedora-based Yellow Dog Linux and Fedora for Head; > > NetBSD everywhere or something else? > > I am new in Linux, this cluster is intended > > for my scientific activity to make a work, > > that would make possible to me to became Ph.D. > > in heat-transfer coupled with hydrodynamics. > > Sincerely yours, Dmitry Zaletnev. > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From reuti at staff.uni-marburg.de Fri Mar 6 15:01:14 2009 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Grid scheduler for Windows XP In-Reply-To: References: Message-ID: <6ED4CB95-75EC-40D4-99BA-9F23D9F09812@staff.uni-marburg.de> Hi, Am 05.03.2009 um 07:29 schrieb Sangamesh B: > Hello everyone, > > Is there a Grid scheduler (only open source, like SGE) tool which > can be installed/run on Windows XP Desktop systems (there is no Linux > involvement strictly). > > The applications used under this grid are Native to Windows XP. I came across this: http://jobscheduler.sourceforge.net/ I don't know, how it will operate in a cluster though, but maybe it's worth to be checked. -- Reuti > Thanks, > Sangamesh > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Fri Mar 6 15:45:44 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] Beowulf on head Core2Duo/8RAM/8800GTX(4 CUDA) + 2 nodes - SonyPS3 + AlliedTelesyn GLAN Switch running "home-made CFD-program In-Reply-To: <98861235872102@webmail23.yandex.ru> References: <98861235872102@webmail23.yandex.ru> Message-ID: > I have such a configuration and now I'm interested, where can I obtain > Beowulf & PVM, in "Operating systems. Internals and Design Principles. 4-th why PVM? it's a fine system, but MPI has, practically speaking, obsoleted it. > Ed." by W.Stollings, said that i can obtain Beowulf from > www.beowulf.org. beowulf is not really a singular thing, but a concept. there are many instances which are downloadable and relatively turnkey if you're not very picky. Oscar, Rocks, Perceus/Warewulf, etc. > But I cann't find the path to download. The same is with > PVM. By the way, what OS should i use: Fedora-based Yellow Dog Linux and > Fedora for Head; NetBSD everywhere or something else? I am new in Linux, you should DEFINITELY use the same OS on all nodes. mixing OSs merely adds support issues and complexity. if you simply have a handful of machines, I think you should take the lowest-tech approach: install a comfortable distro on all of them, and you're done. distros like fedora already include working versions of MPI (indeed sometimes even PVM). for a personal cluster, you don't necessarily need anything else: schedulers, monitoring, private networks, etc. it's not absolutely necessary, but the first feature I'd add would be a shared filesystem (just an NFS export from somewhere). > this cluster is intended for my scientific activity to make a work, that > would make possible to me to became Ph.D. in heat-transfer coupled with > hydrodynamics. well, I think there's more to becoming a phd than running sims on a cluster ;) From thakur at mcs.anl.gov Fri Mar 6 11:34:46 2009 From: thakur at mcs.anl.gov (Rajeev Thakur) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] [hpc-announce] SC09: Call for Tutorial Proposals Message-ID: Call for SC09 Tutorial Proposals Experts in high performance computing are invited to share their expertise with the High Performance Computing (HPC) community by submitting proposals for tutorials at the SC09 conference to be held in Portland, Oregon, November 14-20, 2009. The SC09 tutorials program will give attendees the opportunity to explore a wide variety of important topics related to high-performance computing, networking, and storage. SC09 invites proposals for introductory, intermediate, and advanced tutorials, either full-day (six hours) or half-day (three hours). A distinguished panel of experts will select the tutorials from the submitted proposals. Submissions for tutorials and other aspects of the SC09 technical program open Monday, March 16, 2009. The deadline for submission is April 6, 2009. Detailed information: http://sc09.supercomputing.org/?pg=tutorials.html Questions: tutorials@info.supercomputing.org Regards, Fred Johnson and Rajeev Thakur SC09 Tutorials Co-Chairs From bejosukamto at gmail.com Sun Mar 8 13:20:32 2009 From: bejosukamto at gmail.com (Bejo Sukamto) Date: Wed Nov 25 01:08:20 2009 Subject: [Beowulf] How to Configure and install xmpi-2.2.3b8 Message-ID: <6395e7990903081320u62da0c99ufa3436d0d3c5976@mail.gmail.com> How to install and configure xmpi-2.2.3b8 with lam-7.1.4 for this spesification: - opensuse 8.1 with openmotif 2.2 and default C and C++ compiler - and lam configuration ./configure --prefix=/usr/local --with-trillium --with-rsh="ssh -x" --without-fc ? Thanks for your Attention And Answers Please? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090309/04f9264f/attachment.html From polk678 at gmail.com Mon Mar 9 02:08:25 2009 From: polk678 at gmail.com (gossips J) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] HPCC "intel_mpi" error Message-ID: Hi, We are using ICR validation. We are facing following problem while running below command: cluster-check --debug --include_only intel_mpi /root/sample.xml Problem is: Output of cluster checker shows us that "intel_mpi" FAILED, where as by looking into debug.out file it is seen that "Hello World" is returned from all nodes. I have 16 nodes configuration and we are running 8 proc/node. Above behavior is observed with even 1 proc/node, 2 proc/node, 4 proc/node as well. I also tried "rdma" and "rdssm" as a DEVICE in XML file but no luck. If anyone can shed some light on this issue, it would be great help. Another thing I would like to know is: Is there a way to specify "-env RDMA_TRANSLATION_CACHE" option with Intel Cluster Checker? Awaiting for kind response, Thanks in advance, Polk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090309/44deb969/attachment.html From herborn at usna.edu Tue Mar 10 11:35:39 2009 From: herborn at usna.edu (Steve Herborn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: References: Message-ID: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> I have a small test cluster built off Novell SUES Enterprise Server 10.2 that is giving me fits. It seems that every time the hardware is physically moved (keep getting kicked out of the space I'm using), I end up with any number of different problems. Personally I suspect some type of hardware issue (this equipment is about 5 years old), but one of my co-workers isn't so sure hardware is in play. I was having problems with the RAID initializing after one move back which I resolved a while back by reseating the RAID controller card. This time It appears that the file system & configuration databases became corrupted after moving the equipment. Several services aren't starting up (LADP, DHCP, PBS to name a few) and YAST2 hangs any time an attempt is made to use it. For example adding a printer or software package. My co-worker feels the issue maybe related to the ReiserFS file system with AMD processors. The ReiserFS file system was the default presented when I initially installed SLES so I went with it. Do you know of any issues with using the ReiserFS file system on AMD based systems or have any other ideas what I maybe facing? Steven A. Herborn U.S. Naval Academy Advanced Research Computing 410-293-6480 (Desk) 757-418-0505 (Cell) _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of gossips J Sent: Monday, March 09, 2009 5:08 AM To: beowulf@beowulf.org Subject: [Beowulf] HPCC "intel_mpi" error Hi, We are using ICR validation. We are facing following problem while running below command: cluster-check --debug --include_only intel_mpi /root/sample.xml Problem is: Output of cluster checker shows us that "intel_mpi" FAILED, where as by looking into debug.out file it is seen that "Hello World" is returned from all nodes. I have 16 nodes configuration and we are running 8 proc/node. Above behavior is observed with even 1 proc/node, 2 proc/node, 4 proc/node as well. I also tried "rdma" and "rdssm" as a DEVICE in XML file but no luck. If anyone can shed some light on this issue, it would be great help. Another thing I would like to know is: Is there a way to specify "-env RDMA_TRANSLATION_CACHE" option with Intel Cluster Checker? Awaiting for kind response, Thanks in advance, Polk. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090310/57513a00/attachment.html From hahn at mcmaster.ca Tue Mar 10 12:05:50 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> References: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> Message-ID: > moved (keep getting kicked out of the space I'm using), I end up with any > number of different problems. debugging is mainly about breaking down the system into components whose correctness can be observed separately. > Personally I suspect some type of hardware issue (this equipment is about 5 > years old), but one of my co-workers isn't so sure hardware is in play. I > was having problems with the RAID initializing after one move back which I > resolved a while back by reseating the RAID controller card. sounds a bit blackmagic to me. I don't believe I've ever had a problem solved by card reseating (though dimm reseating does seem to clean up 40% of of the nodes I see that are reporting a lot of corrected ecc's.) > This time It appears that the file system & configuration databases became > corrupted after moving the equipment. Several services aren't starting up > (LADP, DHCP, PBS to name a few) and YAST2 hangs any time an attempt is made simplify. to me, it sounds like your network (ip, route, dns) is confused. From mathog at caltech.edu Tue Mar 10 12:38:16 2009 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re: Cluster doesn't like being moved (Steve Herborn) Message-ID: "Steve Herborn" wrote: > I have a small test cluster built off Novell SUES Enterprise Server 10.2 > that is giving me fits. It seems that every time the hardware is physically > moved (keep getting kicked out of the space I'm using), I end up with any > number of different problems. Off the top of my head... 1. motherboard batteries may be going/gone, leading to BIOS changes when unplugged during the move (or shut down for any extended period of time), leading to failures. 2. iffy wiring connections of any type (cards, data cables, power supply cables, jumpers from case to motherboard, etc.) > > Personally I suspect some type of hardware issue (this equipment is about 5 > years old), but one of my co-workers isn't so sure hardware is in play. I > was having problems with the RAID initializing after one move back which I > resolved a while back by reseating the RAID controller card. That would be consistent with (2). If moving involves any "rolling on small wheels over rough surfaces" failed electrical connections are a common result. We have a cart with about 10" inflated tires which is used to move equipment, specifically to minimize this issue. The last 2 racks we moved were completely disassembled, the frame moved first, then the nodes moved to it on this cart. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From jmdavis1 at vcu.edu Tue Mar 10 12:46:03 2009 From: jmdavis1 at vcu.edu (Mike Davis) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> References: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> Message-ID: <49B6C37B.9040504@vcu.edu> Steve Herborn wrote: > > I have a small test cluster built off Novell SUES Enterprise Server > 10.2 that is giving me fits. It seems that every time the hardware is > physically moved (keep getting kicked out of the space I'm using), I > end up with any number of different problems. > > Personally I suspect some type of hardware issue (this equipment is > about 5 years old), but one of my co-workers isn't so sure hardware is > in play. I was having problems with the RAID initializing after one > move back which I resolved a while back by reseating the RAID > controller card. > > This time It appears that the file system & configuration databases > became corrupted after moving the equipment. Several services aren't > starting up (LADP, DHCP, PBS to name a few) and YAST2 hangs any time > an attempt is made to use it. For example adding a printer or software > package. My co-worker feels the issue maybe related to the ReiserFS > file system with AMD processors. The ReiserFS file system was the > default presented when I initially installed SLES so I went with it. > > Do you know of any issues with using the ReiserFS file system on AMD > based systems or have any other ideas what I maybe facing? > > > > *Steven A. Herborn* > > *U.S. Naval Academy* > > *Advanced Research Computing* > > *410-293-6480 (Desk)* > > *757-418-0505 (Cell)* > > My experience is that anytime a system runs for months or years, a shutdown leads to issues. Sometimes, hard drives fail to speed up, other times power supplies may fail, and occasionally RAM or MB's may go bad. It may be hardware, and it may be a vibration issue brought on by the move. Then again, it may be magic! -- Mike Davis Technical Director (804) 828-3885 Center for High Performance Computing jmdavis1@vcu.edu Virginia Commonwealth University "Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity." George S. Patton From landman at scalableinformatics.com Tue Mar 10 14:06:39 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> References: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> Message-ID: <49B6D65F.2010104@scalableinformatics.com> Steve Herborn wrote: > This time It appears that the file system & configuration databases > became corrupted after moving the equipment. Several services aren't > starting up (LADP, DHCP, PBS to name a few) and YAST2 hangs any time an > attempt is made to use it. For example adding a printer or software > package. My co-worker feels the issue maybe related to the ReiserFS file > system with AMD processors. The ReiserFS file system was the default > presented when I initially installed SLES so I went with it. Ouch. Can you boot an OpenSuSE disk in rescue mode and fsck the file system? I have had two (severe) data losses in my work on Linux, one was with ext2, and the other with Reiserfs. Wouldn't recommend using either one in a production mode for data that needed long term viability. > Do you know of any issues with using the ReiserFS file system on AMD > based systems or have any other ideas what I maybe facing? Yes, reiserfs may have been silently accumulating errors that only became apparent upon restart. Or its fsck munged the file system. If you can move off of it, I would urge you to do that. It is likely that the configuration data that your non-starting services depend upon are lost. Can you rebuild this (or pay someone to do so) without too much pain? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From hearnsj at googlemail.com Wed Mar 11 02:47:41 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> References: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> Message-ID: <9f8092cc0903110247g54f75c95l6dd5b5db4838df35@mail.gmail.com> 2009/3/10 Steve Herborn : > I> Do you know of any issues with using the ReiserFS file system on AMD based > systems or have any other ideas what I maybe facing? > This might be a wake up call to change your storage. It looks like you are using Reiser for the root filesystem - in which case it might be a pain to change the filesystem. I would still consider it though - source a new system disk and put it in tandem with the original disk. Boot the system with a Knoppix CD, format and make new ext3 filesystems on the new disk. Rsync the root filesystem, pus /usr /var etc. filesystems across. One other tip - if you are getting system services failing to start, calm down and take things logically step by step. Connect a monitor and keyboard to the system. Boot it in single user. Look at the boot log and dmesg - there are no disk errors, right? Test the disk by writing some junk data to /tmp Again no errors, right? Bring up the network interface by hand. Ping some hosts on your network. Finally run 'init 3' in one console window, and in another have a 'tail -f /var/log/messages' running. From hearnsj at googlemail.com Wed Mar 11 02:58:04 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re: Cluster doesn't like being moved (Steve Herborn) In-Reply-To: References: Message-ID: <9f8092cc0903110258t72a4f4edkf06a71f88c1a5c44@mail.gmail.com> 2009/3/10 David Mathog : >> That would be consistent with (2). If moving involves any "rolling on > small wheels over rough surfaces" failed electrical connections are a > common result. ?We have a cart with about 10" inflated tires which is > used to move equipment, specifically to minimize this issue. ?The last 2 > racks we moved were completely disassembled, the frame moved first, then > the nodes moved to it on this cart. This is good advice. I once participated in a move of clusters to a new building at an oil company. I was very, very impressed with their professionalism. They had hired a moving company, who unracked every node and wrapped it in bubble wrap before moving it across to the new building. As a result the systems came back up without major problems. Five year old kit is getting old and cranky, and should be treated like your arthritic grandmother. From eugen at leitl.org Thu Mar 12 08:20:03 2009 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cray uses Supermicro blades for CX1 Message-ID: <20090312152003.GV11917@leitl.org> http://www.theinquirer.net/inquirer/news/356/1051356/cray-supermicro-blades-cx1 Cray uses Supermicro blades for CX1 CeBit 2009 Deskbottom supercomputer By Charlie Demerjian Tuesday, 10 March 2009, 16:33 YOU MAY HAVE heard about the Cray CX1, a 'desktop cluster' in the vein of the Tyan Typhoon. They are interesting little beasts, and with the addition of Nehalems, now have the grunt to be a deskbottom supercomputer. Cray_LX1 The front of the beastlet The CX1 isn't a breakthrough in technology by any means, it is just a product to fill a niche. The machine itself has eight blades that will each take two Nehalem EPs, and a claimed 32G of RAM/blade. We won't bring up that Nehalems use ram in 6G increments, if you can count to 32 evenly in sixes, you are using a different base than most people. In addition to Nehalems, you can opt for a storage blade, normally you only get two drives per blade or a GPGPU blade. Out the back, there is Infiniband or Ethernet, take your pick. That said, the cable routing could use a little cleaning up... LX1_rear Back of the beastlet Cray probably doesn't want you to know that the blades are in fact Supermicro blades, so you could probably get away with a little less cost should you shop around. That said, the version you see above, with four Harpertown x 2 blades, a GPGPU blade and active noise cancellation is available online for about $30K. It may be a little while to Christmas, but what tot wouldn't be delighted to see one of these under the tree? From smulcahy at atlanticlinux.ie Fri Mar 13 04:50:46 2009 From: smulcahy at atlanticlinux.ie (stephen mulcahy) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] No guards on 1u server fans - common practice? Message-ID: <49BA4896.9000801@atlanticlinux.ie> Hi, I have inherited 40 servers which I'm currently building a Hadoop[1] cluster out of (I know, not typical beowulf fare but we're using the same principles to roll out the systems, manage them and so on). We've been upgrading the drives in these servers (they are a few years old and came with 1 x 80GB drives, we're moving to 2 x 1TB drives since we're using Hadoop mainly for it's distributed filesystem properties) and I've run into a few problems now with a pretty simple problem. I guess I'm mailing the beowulf list with a view to adding something else to your list of things to look out for when troubleshooting servers and to query as to what I've found is general practice. We ran into some fan problems after powering up some of the systems to do initial smoke testing. After a little head-scratching and looking at a system running on a workbench - it became obvious what was causing the fans to fail. The SATA power cables and in some cases, the SATA data cables were sticking into some of the fans blocking them. Hence the error. The fans don't feature any guards (plastic or metal). I've since re-examined all the systems and in most cases, we had inadvertently created these cable+fan problems during the installation of the second drive. The only way to avoid if that I can see is to stuff the excess cable under one of the drives. This is fine, now that I realise it is neccesary but it strikes me that unguarded fans in 1u servers with very little space for cables is a pretty poor design. I don't think I've seen this in other 1u servers I've worked on (but I'll certainly be keeping an eye out in future). Is this really poor design (the server vendor shall go nameless though they are a tier-1) or am I expecting too much out of 1u server fans? -stephen [1] http://hadoop.apache.org/core/ -- Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway) From landman at scalableinformatics.com Fri Mar 13 06:20:59 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] No guards on 1u server fans - common practice? In-Reply-To: <49BA4896.9000801@atlanticlinux.ie> References: <49BA4896.9000801@atlanticlinux.ie> Message-ID: <49BA5DBB.3080906@scalableinformatics.com> stephen mulcahy wrote: > Is this really poor design (the server vendor shall go nameless though > they are a tier-1) or am I expecting too much out of 1u server fans? We had a supplier that liked to avoid putting the guards on the fans. It's generally not a good thing ... there is injury risk as well other issues (possible electrical shorts for cut wires, fan self destruction due to impacts ... and subsequent shrapnel ...) Whenever possible, we like to try to route wires away from the fans, or if we have no choice due to length constraints, we will try to tie them to rigid structures to prevent interference. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From bart at attglobal.net Tue Mar 10 12:00:27 2009 From: bart at attglobal.net (Bart Jennings) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Cluster doesn't like being moved In-Reply-To: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> References: <3A7364A8B6624B9D9EEEE46D0C21E298@dynamic.usna.edu> Message-ID: <49B6B8CB.7000403@attglobal.net> Just some thoughts... Since you are physically moving the machines, things like loose cards, processors, heat sinks/fans, memory, cables come to mind. I've personally have had loose heat sinks cause processors to do funky things (software crashes/corruption, etc...). I've heard of issues with the disk heads hitting the platters while they were moved which lead to data loss. Have you tried running a full file system check? I think most modern disks lock the disk armatures in place now but the disks/raid device might have software to do this for you though still. Other problem sources might include weird environmental ones, like excessive heat and magnetic fields playing havoc with the hardware during the transition. Good luck figuring it out. Bart Steve Herborn wrote: > > I have a small test cluster built off Novell SUES Enterprise Server > 10.2 that is giving me fits. It seems that every time the hardware is > physically moved (keep getting kicked out of the space I'm using), I > end up with any number of different problems. > > Personally I suspect some type of hardware issue (this equipment is > about 5 years old), but one of my co-workers isn't so sure hardware is > in play. I was having problems with the RAID initializing after one > move back which I resolved a while back by reseating the RAID > controller card. > > This time It appears that the file system & configuration databases > became corrupted after moving the equipment. Several services aren't > starting up (LADP, DHCP, PBS to name a few) and YAST2 hangs any time > an attempt is made to use it. For example adding a printer or software > package. My co-worker feels the issue maybe related to the ReiserFS > file system with AMD processors. The ReiserFS file system was the > default presented when I initially installed SLES so I went with it. > > Do you know of any issues with using the ReiserFS file system on AMD > based systems or have any other ideas what I maybe facing? > > > > *Steven A. Herborn* > > *U.S. Naval Academy* > > *Advanced Research Computing* > > *410-293-6480 (Desk)* > > *757-418-0505 (Cell)* > > > > ------------------------------------------------------------------------ > *From:* beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] *On Behalf Of *gossips J > *Sent:* Monday, March 09, 2009 5:08 AM > *To:* beowulf@beowulf.org > *Subject:* [Beowulf] HPCC "intel_mpi" error > > Hi, > > We are using ICR validation. > > We are facing following problem while running below command: > > cluster-check --debug --include_only intel_mpi /root/sample.xml > > > Problem is: > > Output of cluster checker shows us that "intel_mpi" FAILED, where as by > looking into debug.out file it is seen that "Hello World" is returned from > all nodes. > > > I have 16 nodes configuration and we are running 8 proc/node. > > Above behavior is observed with even 1 proc/node, 2 proc/node, 4 proc/node > as well. I also tried "rdma" and "rdssm" as a DEVICE in XML file but no luck. > > If anyone can shed some light on this issue, it would be great help. > > > Another thing I would like to know is: > > Is there a way to specify "-env RDMA_TRANSLATION_CACHE" option with Intel Cluster Checker? > Awaiting for kind response, > > Thanks in advance, > Polk. > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From feng at cs.vt.edu Wed Mar 11 10:06:09 2009 From: feng at cs.vt.edu (Wuchun Feng) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] [hpc-announce] IEEE Computer: Special Issue on "Tools and Environments for Multi- and Many-Core Architectures" Message-ID: <56627F95-D5D9-4222-8F31-90E958F6D45F@cs.vt.edu> An amended CFP ... this time, with web links to the submission guidelines and instructions. Regards, Wu ---- IEEE Computer Special Issue on Tools and Environments for Multi- and Many-Core Architectures DEADLINE EXTENDED to March 31, 2009 For submission instructions, please visit http://www.computer.org/portal/pages/computer/content/author.html To submit, go to https://mc.manuscriptcentral.com/cs-ieee. In the past, computing speeds doubled every 18-24 months by increasing the clock speed, thus giving software a "free ride" to better performance whenever the clock speed increased. This free ride is now over, and such automatic performance improvement is no longer possible. With clock speeds stalling out and computational horsepower instead increasing due to the rapid doubling of the number of cores per processor, serial computing is now dead, and the vision for parallel computing, which started over forty years ago, is a revolution that is now upon us. With the advent of multi-core chips --- from the traditional AMD and Intel multi-core to the more exotic hybrid multi-core of IBM Cell and many-core of AMD/ATi and nVidia GPGPUs --- parallel computing across multiple cores on a single chip has become a necessity. However, writing parallel applications is a significant undertaking that will create more, not less, problematic software. In order for parallelism to succeed, it must ultimately produce better performance relative to speed and efficiency. However, not only are most programmers ill-equipped to produce proper parallel programs, but they also lack the tools and environments for producing such programs. Therefore, the purpose of this special issue is to present the latest advances in next-generation tools and environments for multi- and many- core architectures. We solicit contributions in areas including, but not limited to: - Programming models and environments for multi-core and many-core architectures - Systems scheduling and management between different subsystems of multi-core and many-core architectures - Compile-time and run-time optimizations in multi-core and many-core architectures - Tools to enhance programming productivity in multi-core and many- core architectures - Performance evaluation of applications and system software in multi- core and many-core architectures - Software productivity studies - Fault tolerance and virtualization - Monitoring and measurement tools to better enable debugging and performance optimization -- Prof. Wu FENG | Synergy Laboratory | Depts. of CS and ECE | 2202 Kraft Dr | Virginia Tech | Blacksburg, VA 24060-6356 | 540-231-1192 | feng@cs.vt.edu | http://www.cs.vt.edu/~feng From traff at it.neclab.eu Thu Mar 12 01:29:17 2009 From: traff at it.neclab.eu (Jesper Larsson Traeff) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] [hpc-announce] CFP: Highly Parallel Processing on a Chip (HPPC 2009) Euro-Par Workshop Message-ID: <20090312082917.GB9748@fourier.it.neclab.eu> CALL FOR PAPERS 3rd full-day Workshop on Highly Parallel Processing on a Chip (HPPC) August 25, 2009, Delft, The Netherlands http://www.hppc-workshop.org/ to be held in conjunction with the 15th International European Conference on Parallel and Distributed Computing (Euro-Par), August 25-28, 2009, Delft, The Netherlands http://europar2009.ewi.tudelft.nl/ AIMS AND SCOPE The decline in the growth of single-processor performance, the growing concerns with energy consumption, and the still exponential increase in transistors per chip as per Moore's law, will open the scene for single-chip processors with a substantial amount of parallelism to meet the demands for extremely high performance, reliability, and controlable power consumption in all areas of computing. The major challenge for the coming years will be the design of architectures supporting manageable programming abstractions to allow the mainstream programmer to take advantage of the processing power promised by the technological developments. HPPC, the third workshop in the series, co-located with the EuroPar conference, is *the* workshop dedicated to addressing all aspects of highly parallel processing on a chip, be it in existing or emerging multi-core designs, or in bold, new proposals for architectures, programming models, languages and libraries for managing and exploiting massive levels of parallelism on a chip. Particular emphasis is on the interaction between hardware, architecture (processors, on-chip networks, cache and memory system), programming models and languages, and algorithms as well as applications in need of significant amounts of single-chip parallelism. The workshop will be conducted in an informal atmosphere, stressing interaction and discussion between presenters and audience. Topics of interest include, but are not limited to - hardware techniques (e.g. power saving, clocking, fault-tolerance) - processor core architectures (homogeneous and heterogeneous) - special purpose processors (accelerators, GPUs) - on-chip memory and cache (or cache-less) organization, and interconnects - off-chip memory, I/O, and multi-core interconnects - overall system design (resource allocation and balancing) - programming models (e.g. PRAM, BSP, data parallel, vector, transactional) - languages and software libraries - implementation techniques (e.g. multi-threading, work-stealing) - support and performance tools, performance evaluation - parallel algorithms and applications - migration of existing codebase - teaching of parallel computing for/on highly parallel multi-core systems. SUBMISSION Authors are encouraged to submit original, unpublished research or overviews addressing issues in the design and application of highly parallel multi-core processors as outlined above. Papers should be limited to 10 pages, and typeset in the Springer LNCS style (for details, see www.springer.de/comp/lncs/authors.html). Accepted papers that are presented at the workshop, will be published in revised form in a special Euro-Par Workshop Volume in the Lecture Notes in Computer Science (LNCS) series AFTER the Euro-Par conference. The proceedings of the first HPPC workshop appeared in Springer LNCS Volume 4854. SUBMISSION GUIDELINES Please see the workshop www-page: http://www.hppc-workshop.org IMPORTANT DATES Submission of manuscripts: Friday, 5th June, 2009 Notification of acceptance: Monday, 20th July 2009 Date of workshop: Tuesday 25th August, 2009 Deadline for final version (post-proceedings): September, 2009 WORKSHOP ORGANIZERS Martti Forsell, VTT, Finland Jesper Larsson Traff, NEC Laboratories Europe, NEC Europe Ltd, Germany PROGRAM COMMITTEE David Bader, Georgia Institute of Technology, USA Gianfranco Bilardi, University of Padova, Italy Marc Daumas, University of Perpignan Via Domitia, France Martti Forsell, VTT, Finland Peter Hofstee, IBM, USA Chris Jesshope, University of Amsterdam, The Netherlands Ben Juurlink, Technical University of Delft, The Netherlands Jorg Keller, University of Hagen, Germany Christoph Kessler, University of Linkoping, Sweden Dominique Lavenier, IRISA - CNRS, France Ville Leppanen, University of Turku, Finland Radu Marculescu, Carnegie Mellon University, USA Lasse Natvig, NTNU, Norway Geppino Pucci, University of Padova, Italy Jesper Larsson Traff, NEC Laboratories Europe, NEC Europe Ltd, Germany Uzi Vishkin, University of Maryland, USA CONTACT INFO Email: chair@hppc-workshop.org SPONSORS VTT NEC Euro-Par From niftyompi at niftyegg.com Fri Mar 13 11:50:06 2009 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] No guards on 1u server fans - common practice? In-Reply-To: <49BA4896.9000801@atlanticlinux.ie> References: <49BA4896.9000801@atlanticlinux.ie> Message-ID: <20090313185006.GA3425@compegg.wr.niftyegg.com> On Fri, Mar 13, 2009 at 11:50:46AM +0000, stephen mulcahy wrote: > > I have inherited 40 servers which I'm currently building a Hadoop[1] .... > We ran into some fan problems after powering up some of the systems to > do initial smoke testing. After a little head-scratching and looking at > a system running on a workbench - it became obvious what was causing the > fans to fail. The SATA power cables and in some cases, the SATA data > cables were sticking into some of the fans blocking them. Hence the > error. > > The fans don't feature any guards (plastic or metal). I've since This is a common situation. It pays to invest in a Tie-Wrap gun and a big bag of cable wraps. A tiewrap Gun automatically tightens and cuts Tie Wraps flush with head in one operation. Adjust the cut off trigger to be gentle to not over clamp the cables. Inside the chassis small ties are better. Also look for hook and loop ties for data and power cables outside of the box. I have found that the garden supply can sometimes have them for less than computer stores. I like the two or three meter roll where hooks are on one side and loops are on another (i.e. no adhesive). I cut them into various lengths and like them because they are so handy in organizing wires prior to using a tie-wrap for more permanent installation. I especially like the hook and loop product for cables like infiniband and coax where a too tight tie-wraps can mess with the cable quality. In a repair shop you will see the tech snip the old wraps with diagonal cutters replace the part and zip/snip reinstall the ties again. There is a knack involved to not cut the wires in how the nippers are oriented. When you are shopping pick up a bag of multi colored ties and you can quickly add a color pattern to both ends of a cable to make it easy to find the other end. I do this on most of my 'test' and 'temp' cables it can be a lot quicker than colored tape. This way I am less apt to pull a production wire in error in a service situation. -- T o m M i t c h e l l Found me a new hat, now what? From kus at free.net Mon Mar 16 10:48:31 2009 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Sun X4600 STREAM results Message-ID: Sorry, do somebody have X4600 M2 Stream results (or the corresponding URLs) for DDR2/667 - w/dependance from processor core numbers? Mikhail Kuzminsky Computer Assistance to Chemical Reserach Center Zelinsky Institute of Organic Chemistry RAS Moscow From francesco.pietra at accademialucchese.it Fri Mar 13 08:47:39 2009 From: francesco.pietra at accademialucchese.it (Francesco Pietra) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Resource conflict Message-ID: I have resumed a Tyan S2895 with two dual-opteron, placed in 2 slots of 400MHz Kingston DDR1 ECC, two sata 250GB Maxtor, new CD-ROM and floppy, PS2 mouse and US keyboard (it was previously running fine with 16GB RAM, two WD 150GB Raptor, and different CD-ROM and floppy; then the machine remained unused for one year) I have yesterday installed on this machine Debian amd64 lenny 5.0.0 (netinstall from CD-ROM) creating 3 partitions 1 2 3 on sda and sdb of 0.2G, 1.0G, and nearly the rest of the drive. Selected all the partitions to be a raid device configure raid md0 = sda1 sdb1 md1 = sda2 sdb2 md2 = sda3 sdb3 selected md0 as type ext2 mount /boot selected md1 as type ext3 mount / selected md2 as type lvm device configured lvm for /home /usr /var /tmp /swap /opt. Installed grub on master boot record. Reboot: OK. Next day, on booting: Phoenix Trusted Server 1.03.2895 CPU0 MemClk 200MHz Tcl=3.0 Trc=3 Tras=8 Trp=3 CPU1 MemClk 100MHz Tcl=??? Trc=0 Tras=Trp=0 LTD frequency=1000MHz LTD width=16 bit DOWN-16 bit UP 2048M System RAM passed 2048K Cache SRAM passed System BIOS shadowed Video BIOS shadowed ATAPI CD-ROM: HL-DT-STDVD-RAM GH22LP20 Mouse (PS2) initialized System configuration data updated ERROR Resource conflict - PCI Mass Storage Controller in slot 01 Bus:01, Device:04, Function:00 was unable to run due to memory constraints. ===== Well, I am no system maintainer (there is nobody here that qualifies for that) and I can imagine a lot of different possibilities, from faulty RAM (that suspicious reading 200MHz and 100MHz, while they are 400MHz) on... Thanks for suggestions francesco pietra From hahn at mcmaster.ca Mon Mar 16 20:24:24 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Resource conflict In-Reply-To: References: Message-ID: > of 400MHz Kingston DDR1 ECC, two sata 250GB Maxtor, new CD-ROM and so this is ddr1/pc3200, right? > Phoenix Trusted Server 1.03.2895 I think that bios version is quite out of date: http://www.tyan.com/support_download_bios.aspx?model=S.S2895 > CPU0 MemClk 200MHz Tcl=3.0 Trc=3 Tras=8 Trp=3 200 gets doubled (ddr) to 400 mega-transfers per second, which is pc3200. > CPU1 MemClk 100MHz Tcl=??? Trc=0 Tras=Trp=0 this indicates that the memory isn't detected, I think. I think you have just two dimms installed, which would agree. > Resource conflict - PCI Mass Storage Controller in slot 01 > Bus:01, Device:04, Function:00 was unable to run due to memory constraints. that sounds like the device is requiring its PCI resources mapped to physical addresses occupied by memory. I would guess that there are bios options that would work around this. but I'd update the bios first. > Well, I am no system maintainer (there is nobody here that qualifies > for that) and I can imagine a lot of different possibilities, from > faulty RAM (that suspicious reading 200MHz and 100MHz, while they are > 400MHz) on... nah, the ram sounds fine. the issue is getting it to boot into the flash image, I think. From peter.st.john at gmail.com Wed Mar 18 06:56:26 2009 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine Message-ID: This article at Wired is about Go playing computers: http://blog.wired.com/wiredscience/2009/03/gobrain.html Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube video of a famous game set to music :-) My beef, which started with Ken Thompson saying he was disappointed by how little we learned about human cognition from chess computers, is about statements like this: "People hoped that if we had a strong Go program, it would teach us how our minds work. But that's not the case," said Bob Hearn, a Dartmouth College artificial intelligence programmer. "We just threw brute force at a program we thought required intellect." And yet the article points out: [our brain is an]...efficiently configured biological processor ? sporting 1015 neural connections, capable of 1016 calculations per second Our brains do brute-force massively distributed computing. We just aren't conscious of most of it. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090318/14d97d87/attachment.html From cap at nsc.liu.se Wed Mar 18 08:55:00 2009 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: Message-ID: <200903181655.05735.cap@nsc.liu.se> On Wednesday 18 March 2009, Peter St. John wrote: > This article at Wired is about Go playing computers: > http://blog.wired.com/wiredscience/2009/03/gobrain.html It should have read: "Humans No Match for Go Bot Overlords with large handicaps" not: "Humans No Match for Go Bot Overlords" That said, incredible advances have been made lately by the "Bot Overlords". /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://www.scyld.com/pipermail/beowulf/attachments/20090318/9ad58fbe/attachment.bin From james.p.lux at jpl.nasa.gov Wed Mar 18 10:34:25 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: Message-ID: ________________________________ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Peter St. John Sent: Wednesday, March 18, 2009 6:56 AM To: Beowulf Mailing List Subject: [Beowulf] Wired article about Go machine This article at Wired is about Go playing computers: http://blog.wired.com/wiredscience/2009/03/gobrain.html Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube video of a famous game set to music :-) My beef, which started with Ken Thompson saying he was disappointed by how little we learned about human cognition from chess computers, is about statements like this: "People hoped that if we had a strong Go program, it would teach us how our minds work. But that's not the case," said Bob Hearn , a Dartmouth College artificial intelligence programmer. "We just threw brute force at a program we thought required intellect." And yet the article points out: [our brain is an]...efficiently configured biological processor - sporting 1015 neural connections, capable of 1016 calculations per second Our brains do brute-force massively distributed computing. We just aren't conscious of most of it. Peter --- Of course, those 10 "calculations" per second per neuron are basically logical operations like AND/OR/NOT with a single bit, and fairly high error rates. So it's not quite as impressive as all that. Modern high performance desktop CPUs might have 1E8 transistors, and 4GB of memory accounts for another 4E10.. Let's call it 1E11 "active" devices, running at, say, 1E8 operations/second, so we're up to 1E19 "calculations per second" which is 3 orders of magnitude more than the count attributed up to the brain. The brain is pretty fault tolerant, but also can't run at full compute load for 24/7. See for instance, Crichton, "The Terminal Man", 1972 which introduced the term "watershed week" (not that it actually exists) for when the total information processing capacity of all computers exceeds that of all humans (Crichton gives March 1969, but hey, it's fiction) Clarke has a short story along the same lines, except he's talking about telephone switching systems."Dial F for Frankenstein". (and Clarke wrote about a potential hazard of high performance computing in "The Nine Billion Names of God") For a really, really turgid look at such things, I ran across Hahn, Torsten. Risk Communication and Paranoid Hermenutics: Towards a Distinction Between "Medical Thrillers" and "Mind-Control Thrillers" in Narrations on Biocontrol New Literary History - Volume 36, Number 2, Spring 2005, pp. 187-204 See... If the members of the list hadn't done the hard science/engineering/math route, you could have majored in a more liberal arts and wound up writing about things like "paranoid hermenutics". From the abstract: " Industrial society in its specific modernity is shown as a sociological form of the past, which has already been replaced by what is called risk society. According to this suggestion, the society we live in can no longer be understood by observing politics, for it is marked by different subpolitics operating beyond democratic legitimation. " Jim From diep at xs4all.nl Wed Mar 18 14:05:49 2009 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: Message-ID: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> Ken Thompson, with all respect forgets to mention something, that's that there is nothing to earn with computer-go. You get what you pay for. Sometimes someone starts a go program first, in order to figure out the above later. Instantly work stops then and worlds strongest go program no longer gets maintained, let alone gets improved. A few hobbyists will continue and progress very slowly as a result now at clusters, using worlds most inefficient search that exists in game tree search. They didn't discover even yet how to use efficiently hashtables (which reduces the search space exponential). In short zero Einstein's in computer-go so far on the search front, whereas the hardware they can get their hands on for computer-go is big, and there is a lot possible in computer-go, forward pruning and selectivity works better there than in chess (to say polite). Hopefully a Chinese Einstein one day for computer-go search algorithms. They already found a lot that works for computer-chess. For super selective search however you definitely need a few clever guys. It is interesting how you try to grab attention for a game where the strongest current commercial go-program sold less copies than that past 24 hours there was posts on this mailing list. Thanks, Vincent On Mar 18, 2009, at 2:56 PM, Peter St. John wrote: > This article at Wired is about Go playing computers: http:// > blog.wired.com/wiredscience/2009/03/gobrain.html > Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube > video of a famous game set to music :-) > > My beef, which started with Ken Thompson saying he was disappointed > by how little we learned about human cognition from chess > computers, is about statements like this: > > "People hoped that if we had a strong Go program, it would teach us > how our minds work. But that's not the case," said Bob Hearn, a > Dartmouth College artificial intelligence programmer. "We just > threw brute force at a program we thought required intellect." > > And yet the article points out: > > [our brain is an]...efficiently configured biological processor ? > sporting 1015 neural connections, capable of 1016 calculations per > second > > Our brains do brute-force massively distributed computing. We just > aren't conscious of most of it. > > Peter > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From xclski at yahoo.com Wed Mar 18 17:47:35 2009 From: xclski at yahoo.com (Ellis Wilson) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine Message-ID: <78341.46286.qm@web37908.mail.mud.yahoo.com> Peter St. John wrote: > This article at Wired is about Go playing computers: > http://blog.wired.com/wiredscience/2009/03/gobrain.html > Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube video of a > famous game set to music :-) > > My beef, which started with Ken Thompson saying he was disappointed by how > little we learned about human cognition from chess computers, is about > statements like this: > > "People hoped that if we had a strong Go program, it would teach us how our > minds work. But that's not the case," said Bob > Hearn, > a Dartmouth College artificial intelligence programmer. "We just threw brute > force at a program we thought required intellect." > > And yet the article points out: > > [our brain is an]...efficiently configured biological processor ? sporting > 1015 neural connections, capable of 1016 calculations per second > > Our brains do brute-force massively distributed computing. We just aren't > conscious of most of it. > > Peter Peter, I would agree with Ken in that it is a disappointing and ultimately fruitless process to attempt to learn about human cognition by building a program to emulate some very specific activity of human beings. This line of thought, in its purest sense, is reductionism. While I do find artificial intelligence to be very interesting, I believe at some point or another we will have to recognize that the brain (and our subsequent existence) is something more than the result of the perceivable atoms therein. No viewpoint is completely objective as long as we are finite human-beings and occupy a place in the world we perceive. To say that all simulation of some portion of our thoughts is fruitless is incorrect, as I think some insight into the mind is possible through codifying thought. However, there exist far to many catch-22's and logical fallacies in using the mind to understand the mind to ever fully understand how it works from a scientific point of view. Philosophy will at some point have to step in to explain the (possibly huge) gaps between even the future's fastest simulated brains and our own. In a book by Thamas Nagel, "The View from Nowhere" I believe he puts it most poignantly by stating, "Eventually, I believe, current attempts to understand the mind by analogy with man-made computers that can perform superbly some of the same external tasks as conscious beings will be recognized as a gigantic waste of time". This was written over twenty years ago. Science has given us tools to make our lives wonderfully easier and thereby has proven to be useful, but it answers none of the multitude of mind-body dilemmas, validates the reality of our perception, nor will it or any other reductionist theory provide insight into the much more complex areas of cognition. This is especially true with the discovery of quantum mechanics, which makes the observer's subjective perception absolutely necessary. Full objectivity (or in this application full codification of human thought) just isn't possible. I wish it weren't so, for by study I am a computer scientist and by hobby philosopher, however, at present I remain skeptical. Ellis Wilson From rgb at phy.duke.edu Wed Mar 18 21:52:49 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <78341.46286.qm@web37908.mail.mud.yahoo.com> References: <78341.46286.qm@web37908.mail.mud.yahoo.com> Message-ID: On Wed, 18 Mar 2009, Ellis Wilson wrote: > into the much more complex areas of cognition. This is especially true > with the discovery of quantum mechanics, which makes the observer's > subjective perception absolutely necessary. Full objectivity (or in > this application full codification of human thought) just isn't possible. I disagree with this statement, as it is not an accurate description of quantum theory (a common one, but inaccurate nonetheless) and it does a mild disservice to the theory of cognition as well. Regarding quantum theory: The "observer" in quantum theory is nothing more than a subsystem of a quantum mechanical whole. The correct mathematical treatment of this OPEN system is the Nakajima-Zwanzig generalized master equation, which describes it as a non-Markovian integrodifferential equation with a memory kernel that integrates over prior states and an immediate differential input from the instantaneous environment. It is a description that can be made recursive and further generalized by block diagonalization of multiple subsystems. In a fully relativistic, time reversible quantum description of an entire closed system there is no need for an observer and no lack of determinism in the resulting time evolution. Indeterminate time evolution and the appearance of "wavefunction collapse" is a consequence of entropy and obviously breaks time reversal invariance. The NZGME explicitly traces over the state of the "bath" in its partition of Universe into "system" + "bath" (everything else) and the projection of this classical, stochastic state into the residual quantum state results in all of the oddities. To put it still another way, Schrodinger's cat is always definitely alive until it is definitely dead because it is impossible to adiabatically disconnect the inside of the infernal device from the outside, to untangle the entangled quantum state of everything from some particular part of it. Every particle inside the box is always interacting with every particle outside the box, and so quantum "collapse" is either ongoing or presumptive of a degree of separation that is not possible in our spacetime. > I wish it weren't so, for by study I am a computer scientist and by > hobby philosopher, however, at present I remain skeptical. I'm rather hopeful, myself. I think that some real progress has been made and think that we are five, at most ten, years away from "real AI" -- true machine intelligence and self-aware systems. Not just programmed simulations of intelligence or decision trees -- the real thing. I think that the computational problem is well within the reach of modern beowulfs; the hard part is the formulation of the awareness kernel and having just the right insights. Even there there appears to be progress. Only in the last decade has it finally been recognized that awareness is a DYNAMIC process, not a static one, in certain crucial ways (see e.g. Tim van Gelder's work). This is what I hope to work on next, aside from random numbers. rgb > > Ellis Wilson > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From herborn at usna.edu Thu Mar 19 07:13:08 2009 From: herborn at usna.edu (Steve Herborn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> References: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> Message-ID: While I don't intend to wane as philosophical as some in this thread, I think many are forgetting one important aspect. The one thing that mankind has done for a very long time is used tools to solve problems. i.e. how do I kill that Mastodon better then with my bare-hands? etc. A computer playing Go against a human is just really playing against the collective knowledge & abilities of humans to that point in time. It was humans who built the computer and wrote the software. If Computer-Go is improved its improvements will come from the hands & minds of man and something is earned, our overall collective body-of-knowledge is increased. It may not be used to re-solve that particular problem, but perhaps other problems in the future. Also, if you get what you pay for -- exactly what do you get when you use Open-source software? Steven A. Herborn U.S. Naval Academy Advanced Research Computing 410-293-6480 (Desk) 757-418-0505 (Cell) -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Vincent Diepeveen Sent: Wednesday, March 18, 2009 5:06 PM To: Peter St. John Cc: Beowulf Mailing List Subject: Re: [Beowulf] Wired article about Go machine Ken Thompson, with all respect forgets to mention something, that's that there is nothing to earn with computer-go. You get what you pay for. Sometimes someone starts a go program first, in order to figure out the above later. Instantly work stops then and worlds strongest go program no longer gets maintained, let alone gets improved. A few hobbyists will continue and progress very slowly as a result now at clusters, using worlds most inefficient search that exists in game tree search. They didn't discover even yet how to use efficiently hashtables (which reduces the search space exponential). In short zero Einstein's in computer-go so far on the search front, whereas the hardware they can get their hands on for computer-go is big, and there is a lot possible in computer-go, forward pruning and selectivity works better there than in chess (to say polite). Hopefully a Chinese Einstein one day for computer-go search algorithms. They already found a lot that works for computer-chess. For super selective search however you definitely need a few clever guys. It is interesting how you try to grab attention for a game where the strongest current commercial go-program sold less copies than that past 24 hours there was posts on this mailing list. Thanks, Vincent On Mar 18, 2009, at 2:56 PM, Peter St. John wrote: > This article at Wired is about Go playing computers: http:// > blog.wired.com/wiredscience/2009/03/gobrain.html > Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube video > of a famous game set to music :-) > > My beef, which started with Ken Thompson saying he was disappointed by > how little we learned about human cognition from chess computers, is > about statements like this: > > "People hoped that if we had a strong Go program, it would teach us > how our minds work. But that's not the case," said Bob Hearn, a > Dartmouth College artificial intelligence programmer. "We just threw > brute force at a program we thought required intellect." > > And yet the article points out: > > [our brain is an]...efficiently configured biological processor - > sporting 1015 neural connections, capable of 1016 calculations per > second > > Our brains do brute-force massively distributed computing. We just > aren't conscious of most of it. > > Peter > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org To change your subscription > (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Thu Mar 19 09:34:39 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] running hot? Message-ID: are you running your machinerooms warm to save power on cooling? if so, could you pass on anything you learned about, for instance, whether this degrades the lifespan of components? I guess the first issue is how to ensure that any thermostatic fans in the nodes don't freak out. can this be done using ACPI? I suppose it depends on what controls the bios exposes (our HP DL's don't appear to offer much control.) I'm also very interested to know whether you have hard data on how much power/money this saves. anyone omitting chillers entirely and using filtered outside air? thanks, mark hahn. From eugen at leitl.org Thu Mar 19 10:36:08 2009 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] running hot? In-Reply-To: References: Message-ID: <20090319173608.GD11917@leitl.org> On Thu, Mar 19, 2009 at 12:34:39PM -0400, Mark Hahn wrote: > are you running your machinerooms warm to save power on cooling? Speaking about warm: http://www.datacenterknowledge.com/archives/2009/03/19/rackable-cloudrack-turns-up-the-heat/ Rackable CloudRack Turns Up The Heat March 19th, 2009 : Rich Miller The server trays from the CloudRack C2 enclosure from Rackable have no on-board fans and power supplies. The server trays from the CloudRack C2 have no on-board fans or power supplies. Are you ready for the 100-degree data center? Rackable Systems has introduced a new version of its CloudRack enclosure that it says can operate in environments as hot as 104 degrees, offering customers the option of saving energy costs by raising the temperature in their data center. The new CloudRack C2 is Rackable?s latest effort to combine higher density and lower power usage by shifting components out of the server tray and into the enclosure. The C2 introduces cabinet-level power distribution technology, using rectifiers to convert AC power to 12V DC power. This innovation, combined with the cabinet-level fans introduced in the initial CloudRack, mean that the server trays contain no fans or power supplies. Rackable says the CloudRack fans and rectifiers equate to an N+1 redundancy. Rackable says the design innovations will allow data center operators to safely run server-packed CloudRacks at temperatures up to 40 degrees C, or 104 degrees Fahrenheit. Most data centers operate in a temperature range between 68 and 74 degrees, and some are as cold as 55 degrees. ?The CloudRack C2 is a landmark achievement,? said Mark Barrenechea, president and CEO of Rackable Systems (RACK). ?Most notably, it solves the problem of stranded power. Data centers can now also reduce power consumption by simply turning up the thermostat while using CloudRack C2. It is the most energy-efficient and thermally-intelligent cabinet technology Rackable has ever offered.? The first CloudRack design introduced last fall featured two to four large fans in the rear of the enclosure. The C2 goes with a denser configuration of 18 smaller fans in the rear of the 23U half-rack, with 42 fans cooling the 46U full rack. Rackable says this can support up to 1,280 cores per cabinet using the company?s MicroSlice servers. Raising the baseline temperature inside the data center - known as a set point - can save money spent on air conditioning. Data center managers can save 4 percent in energy costs for every degree of upward change in the set point. Google and Intel have encouraged data center engineers to consider raising their set point as a way to improve energy efficiency, while HP and Sun Microsystems have made higher temperatures a focus of their data center efficiency services. In January the American Society for Heating, Refrigerating and Air-conditioning Engineers (ASHRAE) expanded its recommendations for ambient data center temperatures, raising its recommended upper limit from 77 degrees to 80.6 degrees. Some data center managers warn that running equipment near the high end of the manufacturers? suggested range for equipment could void warranties with equipment vendors. Another major concern is what happens in the event of a cooling failure, when a lower set point could buy a few additional minutes of recovery time before the room heat reaches unacceptable levels. Running your data center warmer also raises the potential for ?hot spots? to form in areas where cooling airflow doesn?t reach an entire rack. That?s why it?s a good idea to implement advanced monitoring of rack temperatures and data center airflow before nudging the set point higher. But the focus on temperature and energy efficiency is unlikely to abate. ?Energy has become a central design point for the data center,? said Jed Scaramella, senior research analyst, Datacenters, IDC. ?The density, power and thermal efficiencies Rackable achieves with CloudRack C2 enable customers to drive meaningful performance gains, while at the same time helping to reduce overall data center operating expenses.? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From mathog at caltech.edu Thu Mar 19 12:13:28 2009 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re:running hot? Message-ID: Mark Hahn wrote: > are you running your machinerooms warm to save power on cooling? How much would that really save? Is there a study somewhere demonstrating substantial power savings? Whatever the steady state temperature in the room the AC still has to pump out heat at the same rate it is generated. Raising the room temperature could affect heat exchange slightly because of a steeper/shallower T gradient across the walls/floor/ceiling. For instance, increasing RT would let a little more power drain out through the walls instead of the AC, assuming it is cooler outside than inside. Does a PC doing the same work uses more or less power at 70 or 80 degrees? Off the top of my head I wouldn't expect a huge change in either number for a 10 degree RT change. Maybe it makes a big difference if the machine room is very poorly insulated. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From bill at cse.ucdavis.edu Thu Mar 19 12:39:26 2009 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re:running hot? In-Reply-To: References: Message-ID: <49C29F6E.3020305@cse.ucdavis.edu> David Mathog wrote: > Mark Hahn wrote: > >> are you running your machinerooms warm to save power on cooling? > > How much would that really save? Is there a study somewhere > demonstrating substantial power savings? There's a data center going in the bay area and with a few concession from vendors they were able to rise the standard temperature enough so they will be able to run their cooling approximately 100 hours a year. > Whatever the steady state temperature in the room the AC still has to > pump out heat at the same rate it is generated. Exactly right. > Raising the room > temperature could affect heat exchange slightly because of a > steeper/shallower T gradient across the walls/floor/ceiling. For > instance, increasing RT would let a little more power drain out through > the walls instead of the AC, assuming it is cooler outside than inside. > > Does a PC doing the same work uses more or less power at 70 or 80 degrees? Don't forget to factor in the ambient temp. Say it's 70F, it's much easier to extract useful cooling from 70F air if your internal temperature is 100F instead of 80F. > Off the top of my head I wouldn't expect a huge change in either number > for a 10 degree RT change. Maybe it makes a big difference if the > machine room is very poorly insulated. Sounds like you are assuming a mostly closed system, it's not. From hahn at mcmaster.ca Thu Mar 19 12:51:04 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re:running hot? In-Reply-To: References: Message-ID: >> are you running your machinerooms warm to save power on cooling? > > How much would that really save? Is there a study somewhere > demonstrating substantial power savings? yes, that's unclear to me as well. I've heard people in the machineroom infrastructure biz claim that 25% of your power goes to cooling. that seems like a lot to me we have a number of 30T Liebert chillers: 3-phase 575V total required fuse is 80A. but we don't run the humidifier (11.6A) and rarely the reheat (30.1A). fan is 11A (10 HP) and each compressor is 20.5A. these numbers are all taken from the electrical sticker inside the unit, and must represent peak. I had an electrician measure currents on the feed, and IIRC, the number was cycling between 24 and 40A depending on whether the unit was at 50% or 100% cooling. (that makes some sense: 8A fan, 16A/compressor.) I don't know offhand how to convert 40A 3phase 575V into KW power, though - is there a sqrt(3) in there? 30T extracts 105.5KW, though. > Whatever the steady state temperature in the room the AC still has to > pump out heat at the same rate it is generated. Raising the room > temperature could affect heat exchange slightly because of a > steeper/shallower T gradient across the walls/floor/ceiling. For sure - I'm assuming no heat flux through walls/floor/etc. but if the setpoint is close to outside, won't the HVAC do less work and consume less power? the ultimate temptation is to dispense with HVAC entirely and use filtered outside air. I guess there are two components: the heat-pump efficiency and the delta-t (setpoint vs outside). I agree that the former won't care about fiddling the setpoint; power dissipated that way may dominate. From james.p.lux at jpl.nasa.gov Thu Mar 19 14:17:23 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Re:running hot? In-Reply-To: Message-ID: On 3/19/09 12:13 PM, "David Mathog" wrote: > Mark Hahn wrote: > >> are you running your machinerooms warm to save power on cooling? > > How much would that really save? Is there a study somewhere > demonstrating substantial power savings? > > Whatever the steady state temperature in the room the AC still has to > pump out heat at the same rate it is generated. Raising the room > temperature could affect heat exchange slightly because of a > steeper/shallower T gradient across the walls/floor/ceiling. For > instance, increasing RT would let a little more power drain out through > the walls instead of the AC, assuming it is cooler outside than inside. The efficiency of a heat pump changes a lot as the differential temperature changes. COP = Qc/Work (Qc total heat pulled out of cold side, W work done to do it, Qh = Qc+W, of course) Carnot says COP<= Tc/(Th-Tc), so the closer Th and Tc are, the higher the "best" COP can be This is true in practice as well as theory. There's a certain "base" power consumption and a part that's dependent on the delta T. EER is the BTU/Hr per Watt assuming the condenser is at 95F (lots of good British units there..). EER/3.413 = COP. The SEER is calculated using a seasonal pattern of condenser temps.. And will always be higher than the EER. A typical EER might be 10-12, corresponding to a COP of around 3. That is, to remove a 100kW heat load from your machine room will take 30kW. Home AC is usually in the SEER around 13 or higher range (13 is the lowest you can sell today, in the US), but industrial systems can do much better (but probably not 2 or 3 times better): Variable compressor speeds and variable fan speeds are two ways to get there. > Does a PC doing the same work uses more or less power at 70 or 80 degrees? Probably not a big difference. The resistance of the wiring and components goes up as the temperature goes up, but it's a small effect (ppm sort of scale) and there's enough other things happening that it would be hard to predict. > > Off the top of my head I wouldn't expect a huge change in either number > for a 10 degree RT change. Maybe it makes a big difference if the > machine room is very poorly insulated. It could make a pretty large difference, especially if your room temp is close to outside temp, because the denominator of that COP fraction is small, so COP changes fast. Going from a delta T of 10 to 5 could double the COP, halving the energy required to pump the same amount of heat. Jim From dnlombar at ichips.intel.com Thu Mar 19 16:08:20 2009 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] running hot? In-Reply-To: References: Message-ID: <20090319230820.GA8509@nlxdcldnl2.cl.intel.com> On Thu, Mar 19, 2009 at 09:34:39AM -0700, Mark Hahn wrote: > are you running your machinerooms warm to save power on cooling? > Here's a relevant paper at IEEE: -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From diep at xs4all.nl Thu Mar 19 18:30:24 2009 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <78341.46286.qm@web37908.mail.mud.yahoo.com> References: <78341.46286.qm@web37908.mail.mud.yahoo.com> Message-ID: On Mar 19, 2009, at 1:47 AM, Ellis Wilson wrote: > > Peter St. John wrote: >> This article at Wired is about Go playing computers: >> http://blog.wired.com/wiredscience/2009/03/gobrain.html >> Includes a pic of a 24 node cluster at Santa Cruz, and a YouTube >> video of a >> famous game set to music :-) >> >> My beef, which started with Ken Thompson saying he was >> disappointed by how >> little we learned about human cognition from chess computers, is >> about >> statements like this: >> >> "People hoped that if we had a strong Go program, it would teach >> us how our >> minds work. But that's not the case," said Bob >> Hearn, >> a Dartmouth College artificial intelligence programmer. "We just >> threw brute >> force at a program we thought required intellect." >> >> And yet the article points out: >> >> [our brain is an]...efficiently configured biological processor ? >> sporting >> 1015 neural connections, capable of 1016 calculations per second >> >> Our brains do brute-force massively distributed computing. We just >> aren't >> conscious of most of it. >> >> Peter > > Peter, > > I would agree with Ken in that it is a disappointing and ultimately > fruitless process to attempt to learn about human cognition by > building > a program to emulate some very specific activity of human beings. > This > line of thought, in its purest sense, is reductionism. While I do > find > artificial intelligence to be very interesting, I believe at some > point > or another we will have to recognize that the brain (and our > subsequent > existence) is something more than the result of the perceivable atoms > therein. No viewpoint is completely objective as long as we are > finite > human-beings and occupy a place in the world we perceive. > Well to avoid to have too much of a discussion on artificial intelligence on a beowulf mailing list, as subjects in this range seems to pop up a lot, being one of worlds leading experts here, let's drop my 2 cents. First of all it is my impression that too much gets written about Artificial Intelligence by people who just base themselves upon literature and especially their wildest dreams. A big problem of 99.99% of all publications being just wet dreams from professors and researchers and PHD thesis is that the field hardly progresses in the public sector. This is what you typically see. If someone writes about how to approach in software the human brain, without EVER having written 1 line of code, it is of course wishful thinking that the field EVER progresses. Seriously 99.99 of all researches, sometimes bigtime funded, are like that. Some of them, usually students, go a step further already and write software. Yet again they usually are stubborn as hell and are just redoing each others experiment in a zillion different incarnations. With respect to 'learning' in chess, like 99.99% of all attempts you can by means of deduction prove that they basically optimize a simple piece square table. Really it is that bad. It is really 100k+ researches and nearly all of them are in fact a very INEFFICIENT form of parameter optimization. This is why the field gets dominated by low level coders, sometimes, especially past few years, mathematical creative types. They manage to write something using KISS principles, but most of them have commercial goals and achievement goals in mind, very few of them have pure scientific goals in mind. This last for a simple reason; just like physicists it takes like 10 years to get some expertise in the field, even for the best, by the time then that they understand what happens there, they already have a job or work for some sort of organisation, and publication of papers is not their intention. Then what is there? Well basically the research done, vaste majority with respect to how the brain works, has been done in big secrecy. They do progress from biological viewpoint: "how does the brain work". Though majority doesn't say a word there of course, the few "public words" i exchange with researchers who examine brains (quite litterary using MRI type scanners and such stuff) they definitely claim a lot of progress. What gets said there makes sense. Much larger field is of course the military type experiments. Now i directly want to mention that i find these experiments disgusting, revolting and in vaste majority of cases total unnecessary. The experiments involve monkeys. They all die. Experiments with brains is the most disgusting form of experiments to animals. I was in Ramat-Gan a few years ago. In the Bar-Ilan research center. Greenpeace had some demonstrations there for a while, but after some time they left. They should not have... What the many military organisations do for type of experiments, probably even at humans there, well you can safely assume, they get done. Sometimes you see a discovery channel episode from the 50s. If they did do it back then, they sure do now. Brain manipulation attempts i'd call them. That's quite different to what i and many others try to do in software. That's APPROACHING the human brain. Now from the 70s and 80s of course there is already some primitive software there which just gets expanded for medical world; giving a diagnosis based upon guidelines programmed in the database. yet that is quite a collection of small knowledge rules. The big difference with a chess evaluation function, not to confuse with the search part which makes moves in algorithmic and nowadays massive parallel manner, the evaluation function is one big complex whole. In my case i really wrote worlds largest and most complex evaluation function for the game of chess. When the years progress you get more insight in what is important there. You conclude then for specific things quit interesting phenomena's. Such as that certain form of knowledg needs to get corrected with quite complex functions. In complex manners an indexatoin takes place to get information patterns from a position. Using these patterns, with n-th order logic you then create new "overview" functions, just like a human is doing it. That is highly effective. Yet there is another aspect where mankind seems to be really superior. It is exactly what Jaap van den Herik has been claiming already years ago, that is the relative weight you give to patterns. Being a titled chessplayer, of course i "estimate" myself first what it probably is for my program. You toy some and draw a conclusion. These parameter optimizations are not lineair programming optimizations. They are quite complex. In fact no good algorithms are there yet to get it done, as the number of parameters is too much (far over 10000). You can't independantly pick a value for a parameter and/or pin it down to a value, and then try to tune the other parameters. It's getting done that way, but it never resulted in a real objective great parameter tuning. A lot of progress gets booked here. Though of course the pure tuning in itself is less relevant than the bugfixing which is a result of it, scientifically seen a lot of experiments are possible there. I'm busy carrying out a plan there which will probably take years. The actual optimization i intend to run on some tens of thousands of cores alltogether. I found some manners to increase tuning speed. Note this is total scientific project, there is no direct 'benefit' for it in terms of elorating of the chess engine. So where scientifically it will look great maybe, most computerchess programmers will ignore it. For the parameter tuning world it is very relevant however, as only past few years some progress gets booked here. Yet the combination of the 2 things (lot of chess domain knowledge and accurate tuning), makes the chess engines a lot more human. Especially the tuning, as majority really doesn't have a very big evaluation function. Yet one shouldn't mistake itself that sometimes very big lookup tables can replace a lot of knowledge rules. For todays processors those tables are faster though than evaluating all that code. You can in short precalculate them. A lookup from L1 usually and sometimes L2, is on average at most a few cycles, in most cases even the full latency gets prefetched and hidden full automatic by L1. That's great of todays processors. So it looks bloody fast, the theory behind it is not so simple though. As you might see just that seemingly simple lookup table inside the code that is quickly searching at millions of chesspositions a second, in reality it replaces a lot of code. Of course combined with the deep searches they do nowadays, it plays really strong. But that was the case end of 90s also. Yet there is still a lot possible there. The future will tell. In todays computerchess, when not cut'n pasting existing codes, making your 'own' chess engine, it is really complicated to reach the top soon, as it takes 10 years to build a strong engine if you didn't make one before. So very few start that challenge nowadays, which of course hurts the field bigtime now. In 80s and 90s it was still possible, without being algorithmic strong, to have some sort of impact in the field being a good low level programmer (say in assembler). That's sheer impossible nowadays. In the long run i intend to publish something there, but i'm not 100% sure yet. A lot have tried climbing the Olympus and very few managed to reach the top. Let's put it this manner, i see possibilities, which i discussed with others already (and they find it a great idea), to do parameter tuning in a generic human manner, yet all this extreme experimental attempts are only possible thanks to todays huge crunching power. Don't forget that. In 90s most attempts there were always typically having such little calculation power, that the number of experiments done, already was so insignificant, that even worlds most brilliant tuningsalgorithm would have failed, as never statistical sureness was used to be sure that a new tuning of the parameters would work. Most "automatic learning" attempts trying to approach the human manner of learning, like TD learning (temporal difference learning), they already undertake action after 1 single game. That's of course way too little datapoints. Then the question is how human something is, if it basically randomly flips a few parameters. That is IMHO a very weak manner of tuning which expressed in big O, is exponential to the number of parameters, if you're feeling so lucky, that is, to win the lotto anyway one day in your life. Yet the bottom line is that todays huge number crunching solutions, such as gpgpu, with all its many cores, gives huge possibilities here, to really do a lot of experiments. In such case it is worth trying to tune thousands of parameters, which some years ago was just total impossible. There was just too little crunching power for artificial intelligence to let clever algorithms have an impact. To quote someone, Frans Morsch (won in 1995 world championship, beating for example deep blue), "One simply doesn't have the system time for sophisticated algorithmic tricks last few plies, when each node is far under 1000 clock cycles of the cpu". So where it is true that most money incentives have gone away out of computerchess, the clever guys are still there and/or help others brainstorm with ideas what to try and how to progress, and the increase in computing power gives rise to new algorithmic and tuning possibilities to approach the human mind in manners that is real simple. Because after all, let's be realistic. If computers can't perfectly play the game of chess using sophisticated chessknowledge (so i do not mean having a database with 10^43 possibilities), how is an artificial intelligent program EVER going to drive a car from your home to a destination in a safe manner. In that computerchess will be a major contributor, simply because it is public what happens there. Maybe researchers are not very talkative, but you CAN speak to them if you want to and you DO see the progress they book. Vincent > To say that all simulation of some portion of our thoughts is > fruitless > is incorrect, as I think some insight into the mind is possible > through > codifying thought. However, there exist far to many catch-22's and > logical fallacies in using the mind to understand the mind to ever > fully > understand how it works from a scientific point of view. Philosophy > will at some point have to step in to explain the (possibly huge) gaps > between even the future's fastest simulated brains and our own. > > In a book by Thamas Nagel, "The View from Nowhere" I believe he > puts it > most poignantly by stating, "Eventually, I believe, current > attempts to > understand the mind by analogy with man-made computers that can > perform > superbly some of the same external tasks as conscious beings will be > recognized as a gigantic waste of time". This was written over twenty > years ago. Science has given us tools to make our lives wonderfully > easier and thereby has proven to be useful, but it answers none of the > multitude of mind-body dilemmas, validates the reality of our > perception, nor will it or any other reductionist theory provide > insight > into the much more complex areas of cognition. This is especially > true > with the discovery of quantum mechanics, which makes the observer's > subjective perception absolutely necessary. Full objectivity (or in > this application full codification of human thought) just isn't > possible. > > I wish it weren't so, for by study I am a computer scientist and by > hobby philosopher, however, at present I remain skeptical. > > Ellis Wilson > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From marcelosoaressouza at gmail.com Fri Mar 20 04:51:14 2009 From: marcelosoaressouza at gmail.com (Marcelo Souza) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] MPICH2 1.0.8 x86_64 (amd64) Package for Debian 5.0 (Lenny) Message-ID: <12c9ca330903200451i56f8281apc50aa94d2d53e100@mail.gmail.com> http://www.cebacad.net/files/mpich2_1.0.8_amd64.deb http://www.cebacad.net/files/mpich2_1.0.8_amd64.deb.md5 -- Marcelo Soares Souza http://marcelo.cebacad.net From marcelosoaressouza at gmail.com Fri Mar 20 04:53:14 2009 From: marcelosoaressouza at gmail.com (Marcelo Souza) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] OpenMPI 1.3.1 x86_64 (amd64) Package for Debian 5.0 (Lenny) Message-ID: <12c9ca330903200453t658a47abya494e2a87b60aa01@mail.gmail.com> http://www.cebacad.net/files/openmpi_1.3.1_amd64.deb http://www.cebacad.net/files/openmpi_1.3.1_amd64.deb.md5 -- Marcelo Soares Souza http://marcelo.cebacad.net From dnlombar at ichips.intel.com Fri Mar 20 06:36:39 2009 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] running hot? In-Reply-To: <20090319230820.GA8509@nlxdcldnl2.cl.intel.com> References: <20090319230820.GA8509@nlxdcldnl2.cl.intel.com> Message-ID: <20090320133639.GB8509@nlxdcldnl2.cl.intel.com> On Thu, Mar 19, 2009 at 04:08:20PM -0700, David N. Lombard wrote: > On Thu, Mar 19, 2009 at 09:34:39AM -0700, Mark Hahn wrote: > > are you running your machinerooms warm to save power on cooling? > > > Here's a relevant paper at IEEE: > Also this: -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From deadline at eadline.org Mon Mar 23 11:58:27 2009 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> Message-ID: <40680.192.168.1.213.1237834707.squirrel@mail.eadline.org> > > Also, if you get what you pay for -- exactly what do you get when you use > Open-source software? > Interesting question. How do you define "pay" ? -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From chiendarret at gmail.com Thu Mar 19 02:22:28 2009 From: chiendarret at gmail.com (Francesco Pietra) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] ssh connection passwordless Message-ID: HI: I have a computing machine and a desktop ssh passwordless interconnected through a Zyxel router (which is dhpc on Internet). I have now added a second computing machine. I am unable to get all three machines passwordless interconnected at the same time. Just only two. If I want to have the third computer passwordless connected to one of the other two, I have to exchange id_rsa.pub between the two again. Mistake or intrinsic feature of ssh? What I did: (1)generating the keys with "ssh-keygen -t rsa" (2) getting "reserved" the machines on the router (3)scp id_rsa.pub to the "authorized_keys" It is also mandatory that asking the "date" to the other computer (slogin ... date), the date is given without asking the password. That is an issue of a computational code that for its internal parallelization needs that (I have not investigated why). thanks francesco From reuti at staff.uni-marburg.de Tue Mar 24 03:29:40 2009 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] ssh connection passwordless In-Reply-To: References: Message-ID: Hi, Am 19.03.2009 um 10:22 schrieb Francesco Pietra: > I have a computing machine and a desktop ssh passwordless > interconnected through a Zyxel router (which is dhpc on Internet). I > have now added a second computing machine. I am unable to get all > three machines passwordless interconnected at the same time. Just only > two. If I want to have the third computer passwordless connected to > one of the other two, I have to exchange id_rsa.pub between the two > again. Mistake or intrinsic feature of ssh? > > What I did: > > (1)generating the keys with "ssh-keygen -t rsa" > > (2) getting "reserved" the machines on the router > > (3)scp id_rsa.pub to the "authorized_keys" - you can have more than one line in the authorized keys file, hence put there the id_rsa.pub from all other nodes in addition. - when you need this only for interactive work, you can have a local ss-agent running on your desktop and put in ~/.ssh/config and on both node a two lines: Host * ForwardAgent yes" good explanation you can find here: http://unixwiz.net/techtips/ssh- agent-forwarding.html - another option might be to setup /etc/ssh/ssh_known_hosts on your two compurte nodes to include per line the short hostname, the FQDN, the TCP-IP address besides the other hosts ssh keys (not the user's one) as this would avoid any password or adding of the machines to you personal ~/.ssh/known_hosts file. This won't work with your workstation of course as it's TCP/IP address varies. -- Reuti > It is also mandatory that asking the "date" to the other computer > (slogin ... date), the date is given without asking the password. That > is an issue of a computational code that for its internal > parallelization needs that (I have not investigated why). > > thanks > > francesco > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Tue Mar 24 04:41:30 2009 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <40680.192.168.1.213.1237834707.squirrel@mail.eadline.org> References: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> <40680.192.168.1.213.1237834707.squirrel@mail.eadline.org> Message-ID: You pay a fulltime sysadmin to solve your problems in that case :) pay as in 'salary pay'. Though i'm very positive about for example Sun's open office, and open source in general, it's quite clumsy to use practical for simple things like printing name labels to stick on envelopes ('etiketten' we call 'em). If experienced IT guys don't manage within 1 day to get something like that done with it, for sure office personnel with less of an experience there will fail. Then additional the documentation totally fails there. Now i won't bother you with the fact that i have an apple macbookpro laptop with open-office for it, and that despite hours of googling, it just doesn't work. Good old win2000 + old word version had to solve it. In short open source can work only if you have experienced Linux guys who make ready whatever you need on it, and if the functionality you need is sufficient and documented. This usually is the case for the top1000 companies. Netherlands has about 1021 (roughly) companies of 1000+ personnel, not to mention governments. For these open source is a possibility. Not for the majority of users and companies. Clusters and Beowulf type systems are definitely the exception here; for them modifying that kernel and a security that only allows intelligence agencies to enter and no one else, is important. On Mar 23, 2009, at 7:58 PM, Douglas Eadline wrote: > >> >> Also, if you get what you pay for -- exactly what do you get when >> you use >> Open-source software? >> > > Interesting question. How do you define "pay" ? > > > > -- > Doug > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From gerry.creager at tamu.edu Tue Mar 24 07:32:19 2009 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> <40680.192.168.1.213.1237834707.squirrel@mail.eadline.org> Message-ID: <49C8EEF3.2000206@tamu.edu> I'm not even sure why I'm entering into this... Vincent, I use OpenOffice on a daily basis, interact with Windows users w/ Word, and have no problems. I do considerably more than printing labels, too. We trade documents and spreadsheets back and forth, in support of my projects. The only application I've seen trouble with was a document created using Office (not OpenOffice) for the Mac, by a user who sent the result out in RTF. I'm not sure what he did but I couldn't process it in MS Office on *my* iMac at the office, nor on OpenOffice on the iMac, my laptop, nor my home systems. My family uses OpenOffice, including my kids for whom "Office" is a school requirement. They have no problems, and their teachers see no difference. My wife is not an IT professional (she delivers babies as a midwife) and her frustration with Office is greater than with OpenOffice. If you don't like open-source solutions, fine, but why don't you stop trying to convince a reasonably large group of reasonably intelligent folk to follow your lead? gc Vincent Diepeveen wrote: > You pay a fulltime sysadmin to solve your problems in that case :) > > pay as in 'salary pay'. > > Though i'm very positive about for example Sun's open office, > and open source in general, > it's quite clumsy to use practical for simple things like printing name > labels > to stick on envelopes ('etiketten' we call 'em). > > If experienced IT guys don't manage within 1 day to get something like > that done with it, > for sure office personnel with less of an experience there will fail. > Then additional the > documentation totally fails there. > > Now i won't bother you with the fact that i have an apple macbookpro > laptop with > open-office for it, and that despite hours of googling, it just doesn't > work. > > Good old win2000 + old word version had to solve it. > > In short open source can work only if you have experienced Linux guys > who make ready > whatever you need on it, and if the functionality you need is sufficient > and documented. > > This usually is the case for the top1000 companies. > > Netherlands has about 1021 (roughly) companies of 1000+ personnel, not > to mention > governments. For these open source is a possibility. > > Not for the majority of users and companies. > > Clusters and Beowulf type systems are definitely the exception here; for > them modifying that kernel > and a security that only allows intelligence agencies to enter and no > one else, is important. > > On Mar 23, 2009, at 7:58 PM, Douglas Eadline wrote: > >> >>> >>> Also, if you get what you pay for -- exactly what do you get when you >>> use >>> Open-source software? >>> >> >> Interesting question. How do you define "pay" ? >> >> >> >> -- >> Doug >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From landman at scalableinformatics.com Tue Mar 24 08:02:47 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <49C8EEF3.2000206@tamu.edu> References: <97AFFFED-8A62-42D8-952A-610FAA092D61@xs4all.nl> <40680.192.168.1.213.1237834707.squirrel@mail.eadline.org> <49C8EEF3.2000206@tamu.edu> Message-ID: <49C8F617.6070904@scalableinformatics.com> Gerry Creager wrote: > I'm not even sure why I'm entering into this... > > Vincent, I use OpenOffice on a daily basis, interact with Windows users > w/ Word, and have no problems. I do considerably more than printing > labels, too. We trade documents and spreadsheets back and forth, in > support of my projects. > > The only application I've seen trouble with was a document created using > Office (not OpenOffice) for the Mac, by a user who sent the result out We too use OOo for pretty much everything. Though when we get documents from outside, in many cases they are OfficeXXXX where XXXX is some number (2000, 2003, 2007). Invariably some formatting gets lost in the conversion process. Annoying but it happens. I should note also that formatting gets lost converting from 2000 to 2003 to 2007. Its not an OOo feature ... it is a problem with format conversion ... and the economic self interest of one organization to get everyone to run their latest and greatest (Hey Microsoft ... if you actually would release an Office for Linux, more than a few people might, I dunno ... buy it? I would) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From xclski at yahoo.com Tue Mar 24 09:06:38 2009 From: xclski at yahoo.com (Ellis Wilson) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine Message-ID: <316141.15220.qm@web37904.mail.mud.yahoo.com> Vincent Diepeveen wrote: > If experienced IT guys don't manage within 1 day to get something like > that done with it, > for sure office personnel with less of an experience there will fail. > Then additional the > documentation totally fails there. Actually, I would be inclined to think that office personnel are MORE capable for this type of task than "IT guys". Not only will they have spent a vast majority of their day working with all types of word processing software, they would be in their position less single-minded about one particular brand of software and thereby more open-minded in their approach. I can't tell you the number of times I've battled with "IT guys" who have severe prejudices against things that in fact they don't know about because they are too lazy to experiment with different solutions to the same problem. Ellis From diep at xs4all.nl Tue Mar 24 13:28:55 2009 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <316141.15220.qm@web37904.mail.mud.yahoo.com> References: <316141.15220.qm@web37904.mail.mud.yahoo.com> Message-ID: <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> I'm not sure what environment you guys are working, but the average IQ100 office personnel is a lot more clumsy than you guys can imagine. In general of course if someone sucks in everything, he or she still can go work for a bank. Though nowadays i shouldn't say that too loud either it seems :) Bad paid and simple work. Not so long ago i saw 'em still use at one bank OS-2 from IBM as client :) Most posters here are so far away from normal world that they have no clue about 99.99% of workfloor. Mistake 1 they make is retry the same thing 100 times. Those posting here for sure will try each time something else until they figure it out. At best you can say that open source is progressing. It's far from usable. Then we had some workable x-windows type GUI, suddenly it was kicked out and replaced by big crap called x.org. Eating huge RAM and ugly slow. Doesn't even run well at a tad older machines with a bit less RAM. So in order to run open-office latest versions, you also need an expensive new machine. That's another weird phenomena. Vincent On Mar 24, 2009, at 5:06 PM, Ellis Wilson wrote: > > Vincent Diepeveen wrote: >> If experienced IT guys don't manage within 1 day to get something >> like >> that done with it, >> for sure office personnel with less of an experience there will fail. >> Then additional the >> documentation totally fails there. > > Actually, I would be inclined to think that office personnel are MORE > capable for this type of task than "IT guys". Not only will they have > spent a vast majority of their day working with all types of word > processing software, they would be in their position less single- > minded > about one particular brand of software and thereby more open-minded in > their approach. > > I can't tell you the number of times I've battled with "IT guys" who > have severe prejudices against things that in fact they don't know > about > because they are too lazy to experiment with different solutions to > the > same problem. > > Ellis > > > > > > > From polk678 at gmail.com Tue Mar 24 09:28:17 2009 From: polk678 at gmail.com (gossips J) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] mvapich2 does not run successfully mpi1 application Message-ID: Hi Folks, I am a student and want to know about mvapich2-1.2p1. It does not run my MPI1 application successfully. Basically it stuck somewhere in middle of execution. I am running this for 80 processes. I figured out that if i do set "on demand threshold" environment settings to anything above 80 it works fine with out any issues. Basically what is causing this behavior? Why test gets stuck up at some point? How to debug this??? If anybody can provide some insight on how to handle this with mvapich2 than it would be great. Looking for help, Thanks in advance, Polk J. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090324/6db7a5b0/attachment.html From carlossegurag at gmail.com Tue Mar 24 03:05:49 2009 From: carlossegurag at gmail.com (Carlos Segura) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] ssh connection passwordless In-Reply-To: References: Message-ID: The step (3) is not correct, because you are deleting the old authorized_keys. The steps are: (1) Generate the keys in the client: ssh-keygen -t rsa (2) Copy the public key to the servers: ssh-copy-id -i ~/.ssh/id_rsa.pub user@server1 ssh-copy-id -i ~/.ssh/id_rsa.pub user@server2 Step (2) will add id_rsa.pub to the authorized keys. Carlos 2009/3/19 Francesco Pietra > HI: > > I have a computing machine and a desktop ssh passwordless > interconnected through a Zyxel router (which is dhpc on Internet). I > have now added a second computing machine. I am unable to get all > three machines passwordless interconnected at the same time. Just only > two. If I want to have the third computer passwordless connected to > one of the other two, I have to exchange id_rsa.pub between the two > again. Mistake or intrinsic feature of ssh? > > What I did: > > (1)generating the keys with "ssh-keygen -t rsa" > > (2) getting "reserved" the machines on the router > > (3)scp id_rsa.pub to the "authorized_keys" > > It is also mandatory that asking the "date" to the other computer > (slogin ... date), the date is given without asking the password. That > is an issue of a computational code that for its internal > parallelization needs that (I have not investigated why). > > thanks > > francesco > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090324/52e82dd5/attachment.html From rgb at phy.duke.edu Tue Mar 24 15:25:57 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... Message-ID: Doing certain classes of work one has to satisfy e.g. banking due diligence, which tends to be stronger than ordinary cluster due diligence. One aspect of that security (generally required, quite independent of whether or not it really increases security) is "strong authentication", currently held to be multifactor authentication, e.g. SSH keys AND a one-time password, a password AND biometrics, etc. I've got a possible gig set up that may need this and have been investigating the OTP devices for cost and linux capability. The cost seems generally to be "high", and while there are a few that are up-front linux capable, it seems to be really difficult to find a company that will just sell you a key generator at (say) $10 a pop and give you a matching piece of software to run on your linux server. There are a couple of possible exceptions to pursue in addition to the e.g. RSA-like solutions with their enormous cost, but I thought I'd throw it out to the group here too. Is there a straightforward low-cost way to generate OTP's without ten thousand dollar server software packages? rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From billycrook at gmail.com Tue Mar 24 15:42:21 2009 From: billycrook at gmail.com (Billy Crook) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Tue, Mar 24, 2009 at 17:25, Robert G. Brown wrote: > Doing certain classes of work one has to satisfy e.g. banking due > diligence, which tends to be stronger than ordinary cluster due > diligence. ?One aspect of that security (generally required, quite > independent of whether or not it really increases security) is "strong > authentication", currently held to be multifactor authentication, e.g. > SSH keys AND a one-time password, a password AND biometrics, etc. > > I've got a possible gig set up that may need this and have been > investigating the OTP devices for cost and linux capability. ?The cost > seems generally to be "high", and while there are a few that are > up-front linux capable, it seems to be really difficult to find a > company that will just sell you a key generator at (say) $10 a pop and > give you a matching piece of software to run on your linux server. > > There are a couple of possible exceptions to pursue in addition to the > e.g. RSA-like solutions with their enormous cost, but I thought I'd > throw it out to the group here too. ?Is there a straightforward low-cost > way to generate OTP's without ten thousand dollar server software > packages? > > ? rgb > > Robert G. Brown ? ? ? ? ? ? ? ? ? ? ? ?http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 ?Fax: 919-660-2525 ? ? email:rgb@phy.duke.edu If you want to spend as little as possible: http://www.cl.cam.ac.uk/~mgk25/otpw.html And if your users don't like typing long random things in, but you still want them to use one-time credentials: http://www.yubico.com/products/yubikey/ Both can be integrated with PAM. Yubikeys go for $25 (less in quantity). Their server side software is Free Software, hosted on Google Code. http://code.google.com/u/simon75j/ From rgb at phy.duke.edu Tue Mar 24 20:28:24 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Tue, 24 Mar 2009, Billy Crook wrote: > If you want to spend as little as possible: > http://www.cl.cam.ac.uk/~mgk25/otpw.html Thanks, I hadn't found this one. I'll look it over. > And if your users don't like typing long random things in, but you > still want them to use one-time credentials: > http://www.yubico.com/products/yubikey/ This one I had found -- it isn't exactly like the secureid thing, but it looks like it would work in a self-sufficient way, and one can overload/reload it with your own AES keys so that you really aren't relying in any way on a third party for authentication. > > Both can be integrated with PAM. Yubikeys go for $25 (less in > quantity). Their server side software is Free Software, hosted on > Google Code. http://code.google.com/u/simon75j/ Have you tried either or both of them? rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From billycrook at gmail.com Tue Mar 24 21:31:22 2009 From: billycrook at gmail.com (Billy Crook) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Tue, Mar 24, 2009 at 22:28, Robert G. Brown wrote: > On Tue, 24 Mar 2009, Billy Crook wrote: >> Both can be integrated with PAM. ?Yubikeys go for $25 (less in >> quantity). ?Their server side software is Free Software, hosted on >> Google Code. http://code.google.com/u/simon75j/ > > Have you tried either or both of them? > ? rgb I've considered the former, but I wouldn't have the patience to hand type something unique every time, so I just keep long passphrases and regularly change them. As for the latter, I purchased a few yubikeys to play with a month ago, and have personalized (re-keyed) one. Sort of... Their GNU+Linux personalization tool has a ways to go. I worked with them to get it to compile under 64bit distributions. While the tool will "allow" you to choose a passphrase and random seed, it did not as of a couple weeks ago provide any means of directly assigning an AES key. I spoke with a developer there, and they are going to implement that in the immediate future, along with some sort of official format for storing key data (in databases or .ssh/authorized_yubikeys files). They seem to have focused mostly on Windows for the programming tool though. To program them in GNU+Linux, one must first unload the usbhid module, or load it in a quirks mode, because the module otherwise locks the device and it's not accessible to the personalization tool even as root. They're working on that as well. As of right now, their current version of the personalization tool didn't compile. As of yet, I've only made real use of them with their factory-programmed keys, to authenticate to yubico's openid provider. Other people to whom I have given some yubikeys have been using the pam module on their servers so ssh with a one time password, with much success. They are of course, usnig yubico to authenticate the OTPs. I plan to check back every few weeks to watch the progress on their Free Software tools for personalization, and eventually use mine as additional factors of authentication for ssh and openvpn. From what I understand they do entirely intend for users to be able to operate completely independent from yubico without having to pay for software licenses. They even publish their enterprisey 'yubikey management server' for administering your user's yubikeys, pam modules, re-keying tools, the actual authentication code, and many other things on that Google Code page. I've not tested most of it. Your mileage may vary. I'd like to hear what others think of these little gadgets as well. Here's what a few from my 'demo key' look like: ecebedeeefegeheiejekecebhvbcdiiiirfekttdkvlfhbuldbgedtlc ecebedeeefegeheiejekecebhktreuklveuvgbhhfcrlfduvjrvinbtc ecebedeeefegeheiejekecebcvfkvtbnhhtifgckuffffklcnjbjcbdu -Billy From smulcahy at atlanticlinux.ie Tue Mar 24 23:45:23 2009 From: smulcahy at atlanticlinux.ie (stephen mulcahy) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: <49C9D303.3050000@atlanticlinux.ie> Billy Crook wrote: > If you want to spend as little as possible: > http://www.cl.cam.ac.uk/~mgk25/otpw.html This looks pretty good on paper and as a bonus (for us Debian users at least ;) it's included in Debian Lenny and up. I wonder has anyone done an analysis of the security of this? -stephen -- Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway) From lynesh at cardiff.ac.uk Wed Mar 25 01:55:38 2009 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] any users of MyPBS accounting package? Message-ID: <1237971338.2821.3.camel@w609.insrv.cf.ac.uk> I'm looking at it as an interim accounting system. The page at my-pbs.sourceforge.net looks like there hasn't been any development in the last couple of years. Is anyone currently using it? Does it work? Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From rgb at phy.duke.edu Wed Mar 25 04:22:11 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:21 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Tue, 24 Mar 2009, Billy Crook wrote: >> Have you tried either or both of them? >> ? rgb > > I've considered the former, but I wouldn't have the patience to hand > type something unique every time, so I just keep long passphrases and > regularly change them. ... > Google Code page. I've not tested most of it. Your mileage may vary. > I'd like to hear what others think of these little gadgets as well. > > Here's what a few from my 'demo key' look like: > ecebedeeefegeheiejekecebhvbcdiiiirfekttdkvlfhbuldbgedtlc > ecebedeeefegeheiejekecebhktreuklveuvgbhhfcrlfduvjrvinbtc > ecebedeeefegeheiejekecebcvfkvtbnhhtifgckuffffklcnjbjcbdu Well, that's more than enough to convince me to give them a trial. They're more expensive than mypw in the short run, but general purpose and rekeyable means a lot, and it sounds like they (at least) have a strong commitment to the open source universe. Sure, their tool may be windows-centric at first, but I can't blame them for that -- that's where a good chunk of their money will come from. But the open source community will make even better tools on their own given time and information, and it sounds like the information required is openly provided. So I guess I'll visit their site again and buy one or two. Maybe I'll see if I can get OPM^* to pay for the OTPT, heh heh... rgb * OPM = "Other People's Money" > > -Billy > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Glen.Beane at jax.org Wed Mar 25 04:34:44 2009 From: Glen.Beane at jax.org (Glen Beane) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] any users of MyPBS accounting package? In-Reply-To: <1237971338.2821.3.camel@w609.insrv.cf.ac.uk> Message-ID: it is a dead project I worked on it over 3 years ago, not much has happened since then. I think it works OK if you use TORQUE or some other PBS, but I would look for something else. It is no longer developed, and if I were doing it again there are definitely things I would want to do differently in some of the back end code. -glen On 3/25/09 4:55 AM, "Huw Lynes" wrote: I'm looking at it as an interim accounting system. The page at my-pbs.sourceforge.net looks like there hasn't been any development in the last couple of years. Is anyone currently using it? Does it work? Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090325/6070d18a/attachment.html From Glen.Beane at jax.org Wed Mar 25 04:45:27 2009 From: Glen.Beane at jax.org (Glen Beane) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] any users of MyPBS accounting package? In-Reply-To: Message-ID: I forgot to mention, one of the last big changes we made was remove the need for the qsub/qdel wrapper scripts and the MyPBS website is a bit outdated since it does not seem to mention that. I think this made the package a lot more usable since the wrapper scripts could be a little problematic (they broke interactive jobs). On 3/25/09 7:34 AM, "Glen Beane gbeane" wrote: it is a dead project I worked on it over 3 years ago, not much has happened since then. I think it works OK if you use TORQUE or some other PBS, but I would look for something else. It is no longer developed, and if I were doing it again there are definitely things I would want to do differently in some of the back end code. -glen On 3/25/09 4:55 AM, "Huw Lynes" wrote: I'm looking at it as an interim accounting system. The page at my-pbs.sourceforge.net looks like there hasn't been any development in the last couple of years. Is anyone currently using it? Does it work? Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090325/f2179ffc/attachment.html From lynesh at cardiff.ac.uk Wed Mar 25 05:11:57 2009 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] any users of MyPBS accounting package? In-Reply-To: References: Message-ID: <1237983117.2821.8.camel@w609.insrv.cf.ac.uk> On Wed, 2009-03-25 at 07:45 -0400, Glen Beane wrote: > I forgot to mention, one of the last big changes we made was remove > the need for the qsub/qdel wrapper scripts and the MyPBS website is a > bit outdated since it does not seem to mention that. I think this made > the package a lot more usable since the wrapper scripts could be a > little problematic (they broke interactive jobs). > > > On 3/25/09 7:34 AM, "Glen Beane gbeane" wrote: > > it is a dead project > > I worked on it over 3 years ago, not much has happened since > then. I think it works OK if you use TORQUE or some other > PBS, but I would look for something else. It is no longer > developed, and if I were doing it again there are definitely > things I would want to do differently in some of the back end > code. > Thanks for the info Glen. I might still install it to gather stats while I implement something better suited to our specific needs. Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From rgb at phy.duke.edu Wed Mar 25 06:25:30 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: <49C9D303.3050000@atlanticlinux.ie> References: <49C9D303.3050000@atlanticlinux.ie> Message-ID: On Wed, 25 Mar 2009, stephen mulcahy wrote: > Billy Crook wrote: >> If you want to spend as little as possible: >> http://www.cl.cam.ac.uk/~mgk25/otpw.html > > This looks pretty good on paper and as a bonus (for us Debian users at least > ;) it's included in Debian Lenny and up. On paper is right! It requires one to carry around paper or a PDA of some sort to literally look up a OTP. > I wonder has anyone done an analysis of the security of this? One problem is that I'm not sure that this meets the requirements for two-factor security according to the due-diligence spec of e.g. a bank. I fail to see how it is more secure than e.g. dumping /dev/random through an ascii translator and into a file, and just working through the file sequentially on both ends -- in fact, to me it seems to be less secure, because it is at least partially keyed and there seems to be no point in having a key if you're going to carry a table of shared secrets around with you. One way or the other, one has to carry the PSK file with you -- printed on paper, on a USB stick. Paper has the obvious problem that you can run out of passwords. A USB stick could hold enough to not run out, but is subject to snooping on the host you're using to read it while logging in. All of them are subject to the theft/loss of the USB stick or paper, all of them are subject to man in the middle on the host you're logging in from. Basically, if you are logging in from an untrusted host you can ALWAYS be presented with a SHELL that records your login keystrokes, logs in as you, permits you to do your work by passing through both directions of the traffic transparently, and then simply simulates your logout on your end while holding on to the remote shell. To me this suggests that the real marginal benefit of ALL of the two-factor authentication methods, secureid or otpw or whatever, is that it raises the bar a tiny bit on a snooper presumed to have root control of a system one is coming in from. Really, just a tiny bit. I don't think that it would be terribly difficult to write a general purpose network module for any operating system that could both sit in the middle and offer up a trojan port for a third party to come in at will and take over the "terminated" session(s) from an arbitrary remote/breakout site. The attacker might not have the convenience of being able to login as you whenever they want, as the session in question cannot be restarted once THEY choose to terminate it, but hey, do they NEED to be able to restart it or can they do tremendous damage at the end of the one session? I rather think the latter. Yeah, raising the bar even this trifle probably knocks out most of the simple script-kiddies and over the counter rootkit or web-borne viral spyware on Windoze boxen, but they aren't the real problem with high-profile targets such as banks or organizations with lots of e.g. SSNs and personal data. The danger there is the professional ubercracker, the very person two-factor auth is supposed to foil. OTP in general doesn't really foil them. The ONLY thing that can foil them, really, is to have trusted/trustable systems on both ends of a connection, in which case plain old one factor passwordless shared secret ssh is more than adequate -- if you brought your own secrets. Try explaining that to a security officer at a bank, of course, and you'll get a polite smile and a CYA insistence that you use two-factor auth anyway or you aren't getting close to their data. And besides, what can they do? The majority of the clients accessing protected servers in the internet Universe is still Windows, and Windows security is a really, really unfunny joke. IMO a secure login from a Windows box is an oxymoron, no matter what the authentication factors used or software interface in question might be, but alas, I haven't yet seen questions on a due-diligence form that mandates the non-use of Windows systems as clients permitted to access the protected data/server. The fundamental problem with security is that it is a weak-link problem. You are never more secure than the weakest exploitable chink in your armor. You can pile on locks and armed guards at the door to let in only properly authenticated sheep, but that will never prevent the egress of a properly disguised wolf, or a wolf that goes in through the wide open window on the side of the barn. You therefore have to extend the security perimeter to the meadow where the sheep play no matter what or you are just fooling yourself. And I say this as a sheep. I work on lots of very secure systems with confidential data on them and with root privileges. I don't sweat the integrity of the ssh sessions themselves -- if ssh is cracked civilization as we know it is doomed anyway, no due diligence can protect you then. I sweat over the crackability of my laptop. It has to be just as secure as the systems I log into, and yeah, I can't afford to have it get stolen as my ssh keys are right there on its disk, readable by anyone who reaches root, just as my passwords are snoopable by anyone who reaches root, just as the connections themselves can easily be hijacked by anyone who reaches root. Running nightly-yum-updated linux with basically no open ports, I can sleep at night. I >>would<< sleep better with a two-factor auth system in place -- every little bit helps. But nothing, really, provides protection against the pro-grade ubercracker, one who can take over your system at the kernel level, once they gain access as root or as myself (who pops in and out of being root all day). Well, that's not quite true. Monitoring and reading the logs on the servers, on the firewalls, paying attention, using passive no-open-port network monitors on the switched wires, looking for anomalies -- that, and the fact that relatively stupid script-kiddie crackers often have broken scripts that leave footprints where ubercracker tools in the hands of an ubercracker do not -- give sysadmins a fighting chance. But no more than that. One has to BE an ubercracker (just a white-hat one) to defend effectively against an ubercracker. rgb > > -stephen > > -- > Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie > Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway) > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From nixon at nsc.liu.se Wed Mar 25 03:53:24 2009 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: (Robert G. Brown's message of "Tue\, 24 Mar 2009 23\:28\:24 -0400 \(EDT\)") References: Message-ID: "Robert G. Brown" writes: > On Tue, 24 Mar 2009, Billy Crook wrote: > >> And if your users don't like typing long random things in, but you >> still want them to use one-time credentials: >> http://www.yubico.com/products/yubikey/ > > This one I had found -- it isn't exactly like the secureid thing, but it > looks like it would work in a self-sufficient way, and one can > overload/reload it with your own AES keys so that you really aren't > relying in any way on a third party for authentication. The Yubikey is really nifty. (Of course, it's Swedish. 8^) ) I like the price and the form factor, and the really clever, in-hindsight-obvious idea of the Yubikey pretending to be a USB keyboard and entering the OTP for you. The one thing I dislike is that it is based on a symmetric scheme. All AES keys are stored on the authentication server. If the authentication server ever gets compromised, you have to replace or rekey your entire deployed base of Yubikeys. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From james.p.lux at jpl.nasa.gov Wed Mar 25 07:47:15 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: Message-ID: To me this suggests that the real marginal benefit of ALL of the two-factor authentication methods, secureid or otpw or whatever, is that it raises the bar a tiny bit on a snooper presumed to have root control of a system one is coming in from. Really, just a tiny bit. I don't think that it would be terribly difficult to write a general purpose network module for any operating system that could both sit in the middle and offer up a trojan port for a third party to come in at will and take over the "terminated" session(s) from an arbitrary remote/breakout site. The attacker might not have the convenience of being able to login as you whenever they want, as the session in question cannot be restarted once THEY choose to terminate it, but hey, do they NEED to be able to restart it or can they do tremendous damage at the end of the one session? I rather think the latter. For SecureID, you can set up your application to periodically reauthenticate either on a clock schedule or when you ask to do things that are particularly sensitive (>ftp GET "nuclear weapon release code"... Please reauthenticate..). Since knowing the pseudorandom 6 digit number now doesn't help you some 60 seconds into the future, you can make this pretty strong (at the cost of annoyance). For FIPS201 badges, since they have both contact and contactless interfaces, you can do a strategy where the initial authentication is via the contact interface (which can see the crypto engine), and then you periodically ping the RFID part to make sure that the physical badge is still in the vicinity. (or, more painfully, make it so that the badge has to be always connected.. But that raises real usability issues with having two computers) As always, the idea is to require both "a thing you know" and a "a thing you have".. The man in the middle can figure out the thing you know (e.g. By a spoof interface that grabs keystrokes), but it's tough to emulate the "thing you have", since it's behavior over time isn't predictable. IMO a secure login from a Windows box is an oxymoron, no matter what the authentication factors used or software interface in question might be, but alas, I haven't yet seen questions on a due-diligence form that mandates the non-use of Windows systems as clients permitted to access the protected data/server. I would qualify the Windows Box term.. If you lockdown the software configuration, I think one can make sure it's relatively secure. If you allow casual admin access to install whatever apps you want, then, yes, it's insecure. However, most banks (for example) do NOT do this, at least for inhouse PCs.. They rigorously control the software image (to the extent that you boot from a shared image over the network).. The only thing on the local disk is essentially a "cache" which gets compared/refreshed against the master image. No sticking in random USB widgets either.. If it looks like a disk drive, it gets encrypted (causing wailing and gnashing of teeth for employees who plug their MP3 players or cameras in) And yes, they DO have a variety of processes in place to require business partners to have appropriately secured systems. Where it gets loose is the "customer contact at home" end, where they're trading off annoyance of customers against security. This is like the credit card fraud situation.. If you lock it down, nobody will be able to use the card, so you trade off some losses (a few percent) against having volume. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090325/f8bcc551/attachment.html From rgb at phy.duke.edu Wed Mar 25 08:14:51 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Wed, 25 Mar 2009, Lux, James P wrote: > For SecureID, you can set up your application to periodically reauthenticate > either on a clock schedule or when you ask to do things that are > particularly sensitive (>ftp GET ?nuclear weapon release code?... Please > reauthenticate..). ??Since knowing the pseudorandom 6 digit number now > doesn?t help you some 60 seconds into the future, you can make this pretty > strong (at the cost of annoyance). A trivial annoyance to the ubercracker. Remember, his agent is intercepting ALL of your passwords and passing them through, and snooping ALL of the data returned. You might prevent him from looking up the code himself later in the session. You won't prevent him from seeing what you see, ever, or holding onto the session itself (which will often give him the opportunity to trojan the server itself and completely bypass its usual auth scheme, subject to firewalls and monitoring in between). > For FIPS201 badges, since they have both contact and contactless interfaces, > you can do a strategy where the initial authentication is via the contact > interface (which can see the crypto engine), and then you periodically ping > the RFID part to make sure that the physical badge is still in the vicinity. > (or, more painfully, make it so that the badge has to be always connected.. > But that raises real usability issues with having two computers) > > As always, the idea is to require both ?a thing you know? and a ?a thing you > have?.. The man in the middle can figure out the thing you know (e.g. By a > spoof interface that grabs keystrokes), but it?s tough to emulate the ?thing > you have?, since it?s behavior over time isn?t predictable. There's man in the middle as in out on the Internet, and there's man in the middle as in "owning your client". I claim that the former is trivially solved already by ordinary ssh, the latter cannot be solved, period. Access of any sort from a compromised client compromises the protected host, quite possibly seriously compromises it. Although yeah, adding multiple auths adds both annoyance and an incrementally raising bar... Booting your access host from a read-only flash drive, with built in PSK credentials -- now THAT'S two-factor authentication. > IMO a secure login from a Windows box is an oxymoron, no matter > what the > authentication factors used or software interface in question > might be, > but alas, I haven't yet seen questions on a due-diligence form > that > mandates the non-use of Windows systems as clients permitted to > access > the protected data/server. > > I would qualify the Windows Box term.. If you lockdown the software > configuration, I think one can make sure it?s relatively secure. ?If you Sure. But that's simply controlling the incoming client, and I AGREE that this is what one has to do to make ANYTHING secure. Now demonstrate to me any additional advantage to using yubikeys, secureids, or anything else you like over simple ssl or ssh bidirectionally secured unspoofable unsnoopable connections with no password at all. Passwords are overrated. In fact, they are very nearly pointless as far as strong security is concerned -- they add almost nothing once you've established two reliable endpoints with an encrypted link between them, provided only that the mechanism for establishing the link cannot itself be snooped or cracked. And there are at least three (related) ways to do that that I know of, probably more. > allow casual admin access to install whatever apps you want, then, yes, it?s > insecure. ?However, most banks (for example) do NOT do this, at least for > inhouse PCs.. They rigorously control the software image (to the extent that > you boot from a shared image over the network).. The only thing on the local > disk is essentially a ?cache? which gets compared/refreshed against the > master image. ?No sticking in random USB widgets either.. If it looks like a > disk drive, it gets encrypted (causing wailing and gnashing of teeth for > employees who plug their MP3 players or cameras in). All pointless, presuming a compromised system anywhere, all overkill assuming a secure system on both ends. Well, not quite overkill -- part of this is MAKING the system secure, which I strongly approve of. And there is a point to encrypting drives, because drives can be stolen by non-ubercrackers. No ubercracker would deign to steal a drive, because the way he gets access is on the backs of those with perfect rights and permissions granting access. He can't get those after the fact, from an encrypted stolen drive. > And yes, they DO have a variety of processes in place to require business > partners to have appropriately secured systems. Not an easy problem -- I'm not making fun of it, and a lot of what they require, while inadequate and somewhat misleading, is indeed helpful against certain things. Close the doors you can, then cross your fingers. > Where it gets loose is the ?customer contact at home? end, where they?re > trading off annoyance of customers against security. ?This is like the > credit card fraud situation.. If you lock it down, nobody will be able to > use the card, so you trade off some losses (a few percent) against having > volume. Security is always a cost-benefit problem. More secure is more annoying and harder and more expensive to use. Easy to use for idiots is almost always equivalent to insecure, although Linux does far, far better for those idiots that Windows does -- consumer Windows goes out of its way to be vulnerable, which is what is so annoying about it. Default insecure, not default secure. And their silly tell-me-twices don't make them a damn iota more secure... rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From gdjacobs at gmail.com Wed Mar 25 13:52:10 2009 From: gdjacobs at gmail.com (Geoffrey Jacobs) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> Message-ID: <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> On Tue, Mar 24, 2009 at 3:28 PM, Vincent Diepeveen wrote: > I'm not sure what environment you guys are working, > but the average IQ100 office personnel is a lot more clumsy than > you guys can imagine. > > In general of course if someone sucks in everything, he or she still can go > work > for a bank. > > Though nowadays i shouldn't say that too loud either it seems :) > > Bad paid and simple work. Not so long ago i saw 'em still use at one > bank OS-2 from IBM as client :) Bank employees use what they're told to use. No exceptions. Most posters here are so far away from normal world that they have no clue > about 99.99% of workfloor. > Define normal in terms of business. Ford used to announce an executive firing by chopping up the poor sods furniture. Is that normal? Mistake 1 they make is retry the same thing 100 times. Those posting here > for sure will > try each time something else until they figure it out. > > At best you can say that open source is progressing. It's far from usable. > Then we had some workable x-windows type GUI, suddenly it was kicked out > and replaced by > big crap called x.org. Eating huge RAM and ugly slow. Doesn't even run > well at a tad older machines > with a bit less RAM. I hope you're distinguishing between x.org and the windowing manager. So in order to run open-office latest versions, you also need an expensive > new machine. That's another > weird phenomena. No. Please stop. A processor like a Celeron 600 will run OpenOffice easily on Fedora/Mandrake/Ubuntu. The same processor will run wonderfully using something lightweight like XFCE. It was a little slow, but I used to run OpenOffice 2 on a 266Mhz IBM laptop. Not recommended for XP and Office. > > > Vincent > > -- MORE CORE AVAILABLE, BUT NOT FOR YOU -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090325/961f6413/attachment.html From rgb at phy.duke.edu Wed Mar 25 20:55:37 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: On Wed, 25 Mar 2009, Geoffrey Jacobs wrote: > Bad paid and simple work. Not so long ago i saw 'em still use at > one > bank OS-2 from IBM as client :) > > > Bank employees use what they're told to use. No exceptions. Not only are they told what to do -- in banks in particular, they cannot make ANY CHANGE in ANY COMPUTER SYSTEM associated with the actual banking process without going through an extensive and expensive auditing and certification process. Banks are locked down tighter than a drum. As they should be, although the lockdown IIRC interferes with anything like a normal update stream, as all updates have to be tested and certified. Banks tend to stick with systems on a decadal time scale, because when I say expensive I mean expensive, like six month long periods of testing and six more months of training, that kind of thing. It's not like running out to the store and getting a box with the latest version of Vista Ultimate and slapping it inside your bank's trusted financial network. > No. Please stop. A processor like a Celeron 600 will run OpenOffice easily > on Fedora/Mandrake/Ubuntu.? The same processor will run wonderfully using > something lightweight like XFCE. It was a little slow, but I used to run > OpenOffice 2 on a 266Mhz IBM laptop. Not recommended for XP and Office. OO is a bit bloated -- I seem to recall 300 MB+ of rpms in the last round -- but hey, WYSIWYG integrated office tools are the very DEFINITION of bloat so it is really no surprise. You want non-bloat, stop using word processors or integrated office suites altogether and stick with jove. Small. Tight. Fast. The essential tool of rgbots. OO to my experience generally runs just fine ONCE IT LOADS on just about any linux box with 512 MB or more of memory (where memory is important -- ooffice beats out even the bloat of firefox in VSZ, although firefox has a much larger RSS), and yeah, on older systems it can take a while to load. On my current-gen laptop, it's downright peppy, though, usually, and its RSS is under 100 MB (the relatively few times I have to use it) which I guess is better than being over 100 MB. Sort of. But it isn't really a problem. I'm more interested in trying to figure out why 32 bit Centos under 64 bit VMware on a dual core 4 GB laptop is really, really slow while it boots and for a short while afterwards, then speeds up until it runs almost normally. I'm guessing it's either a memory management problem or a 32/64 bit problem. Interestingly, 64 bit vmware workstation wouldn't let me install 64 bit centos on my 64 bit laptop. The 32 bit version works fine, but it may be causing my split processor to have a severe identity crisis as the code mix percolates through it. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From nixon at nsc.liu.se Thu Mar 26 06:57:42 2009 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: (Robert G. Brown's message of "Wed\, 25 Mar 2009 11\:14\:51 -0400 \(EDT\)") References: Message-ID: "Robert G. Brown" writes: > But that's simply controlling the incoming client, and I AGREE > that this is what one has to do to make ANYTHING secure. Now > demonstrate to me any additional advantage to using yubikeys, secureids, > or anything else you like over simple ssl or ssh bidirectionally secured > unspoofable unsnoopable connections with no password at all. Well, some banks over here have a authentication system that uses a hardware crypto token with a keypad. You use it for a challenge-response procedure to log in to the Internet banking site - nothing new so far - but you also use it to sign (using challenge-response) each bunch of transactions you perform on the banking site. And - this is the key point - to sign the transactions you actually enter certain parts of the transaction data (like the total amount to transfer) into the crypto token. Even with total control over the client PC, it's real hard for an attacker to do anything really evil in that setting. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From nixon at nsc.liu.se Thu Mar 26 07:03:00 2009 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: (Robert G. Brown's message of "Wed\, 25 Mar 2009 23\:55\:37 -0400 \(EDT\)") References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: "Robert G. Brown" writes: > Not only are they told what to do -- in banks in particular, they cannot > make ANY CHANGE in ANY COMPUTER SYSTEM associated with the actual > banking process without going through an extensive and expensive > auditing and certification process. As in health-care. Which is why you get hospitals with Conficker/Downadup running rampant through medical equipment with embedded Windows systems. Basically, you're not allowed to patch them without FDA approval. That's scary. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From rgb at phy.duke.edu Thu Mar 26 07:28:12 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On Thu, 26 Mar 2009, Leif Nixon wrote: > "Robert G. Brown" writes: > >> But that's simply controlling the incoming client, and I AGREE >> that this is what one has to do to make ANYTHING secure. Now >> demonstrate to me any additional advantage to using yubikeys, secureids, >> or anything else you like over simple ssl or ssh bidirectionally secured >> unspoofable unsnoopable connections with no password at all. > > Well, some banks over here have a authentication system that uses a > hardware crypto token with a keypad. You use it for a challenge-response > procedure to log in to the Internet banking site - nothing new so far - > but you also use it to sign (using challenge-response) each bunch of > transactions you perform on the banking site. And - this is the key > point - to sign the transactions you actually enter certain parts of the > transaction data (like the total amount to transfer) into the crypto token. > > Even with total control over the client PC, it's real hard for an > attacker to do anything really evil in that setting. I agree. Of course, what you're saying is that the actual transaction agent is the token, and the token is separate and secure. The PC is already a part of the external network back to the trusted host. I stand corrected (sort of) for this exception, although it is really just an example of a perfectly controlled transactional client (and the PC itself is no longer really the client). rgb > > -- > Leif Nixon - Systems expert > ------------------------------------------------------------ > National Supercomputer Centre - Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Mar 26 07:42:52 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: On Thu, 26 Mar 2009, Leif Nixon wrote: > "Robert G. Brown" writes: > >> Not only are they told what to do -- in banks in particular, they cannot >> make ANY CHANGE in ANY COMPUTER SYSTEM associated with the actual >> banking process without going through an extensive and expensive >> auditing and certification process. > > As in health-care. Which is why you get hospitals with > Conficker/Downadup running rampant through medical equipment with > embedded Windows systems. Basically, you're not allowed to patch them > without FDA approval. > > That's scary. Um, I don't believe that this is the case, and I say this as a semi-pro consultant in health care. Most hospitals probably do something along these lines as part of the standard CYA, but the regulations, especially HIPAA, are "due diligence" recommendations with an amazing {\em lack} of specification. You can pretty much do whatever you like, but heaven help you if you drop your patients' data or violate their confidentiality. At the very least you'd better be able to show that you tried hard to keep things secure... This leads to an extremely wide range of IT practice in the EMR revolution that congress has more or less mandated as a condition of getting paid for medicare and medicaid. Very small practices run whatever they can manage, usually a small/cheap EMR on a Windows server, with virtually unsecured Windows clients -- again, pretty much whatever Windows systems one happens to own, with whatever mix of Win95 on up on systems up to 8 or 9 years old that happen to be lying around. Seriously. No regulation, no government certification process, no full time IT staff -- if you're lucky (or hire a good consultant:-) they'll figure out that they need actual antivirus on all of their systems, regular Windows updates on their server and clients, and that they shouldn't use WEP on their over-the-counter wireless network. Intermediate practices (like the one I do most of my consulting for) start OUT like that -- it had a 10 year old SOLARIS x86 server and a truly terrifying mix of PCs when I started out (and the Solaris server is still running, sort of, under a desk, 4 GB hard drives and all -- go figure:-). Now it runs with locked down linux servers running vmware, a mix of linux and windows vm servers (including the primary EMR under LINUX, thankfully, data relatively protected) and I still view the goddamn WinXX PC clients to be the weak link in the security of the whole system, but we have no choice. Only hospitals are as slow and ponderous as you describe (my sister works for ex-A4healthsys, and has been doing hospital systems for close to 20 years now). They aren't ponderous because of the need for certification, but because they are ponderous and because of the expense of change. Which is what keeps my sister in business, basically -- she goes around and messes with the infinite problems in the legacy hospital management suites running on antique hardware being managed by borderline incompetents when the original authors of those suites are long since gone, the operating systems are no longer supported, the hardware is obsolete and breaks a lot, and the underlying database is something of dark evil. Believe me, I know, as she bends my ear a lot and asks me for help with perl scripts designed to scrape the data out of this or that nightmarish interface. rgb > > -- > Leif Nixon - Systems expert > ------------------------------------------------------------ > National Supercomputer Centre - Linkoping University > ------------------------------------------------------------ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From tjrc at sanger.ac.uk Thu Mar 26 08:44:17 2009 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: <3F1DAD5F-556B-4388-9179-F8CC754622B9@sanger.ac.uk> On 26 Mar 2009, at 2:42 pm, Robert G. Brown wrote: > Um, I don't believe that this is the case, and I say this as a semi- > pro > consultant in health care. I don't know about hospital software, but it's certainly the case for some DNA sequencer instruments. Our ABI 3700 capillary sequencers have Windows machines attached for the data collection. ABI explicitly forbid us from either patching Windows, or from installing antivirus software. Doing so would drop us off support. Consequently, all those machines are on their own strongly firewalled network where hopefully they can't get infected, and if they are, the infections can't get back out again. At least, not easily. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jlb17 at duke.edu Thu Mar 26 08:54:04 2009 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: On Thu, 26 Mar 2009 at 10:42am, Robert G. Brown wrote > On Thu, 26 Mar 2009, Leif Nixon wrote: >> As in health-care. Which is why you get hospitals with >> Conficker/Downadup running rampant through medical equipment with >> embedded Windows systems. Basically, you're not allowed to patch them >> without FDA approval. >> >> That's scary. > > Um, I don't believe that this is the case, and I say this as a semi-pro > consultant in health care. Most hospitals probably do something along > these lines as part of the standard CYA, but the regulations, especially > HIPAA, are "due diligence" recommendations with an amazing {\em lack} of > specification. You can pretty much do whatever you like, but heaven > help you if you drop your patients' data or violate their > confidentiality. At the very least you'd better be able to show that > you tried hard to keep things secure... Note that Leif mentioned medical equipment with embedded Windows systems. And he's right -- you're not allowed to touch the software build on those without getting the new build approved by the FDA (at least, not if you want to use said equipment on real live patients). And those machines are generally networked so that the data (images, e.g.) can be uploaded. It is very, very scary. Why anyone ever made the decision to run medical equipment on Windows (over the screams of the engineering team) is utterly beyond me. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From a.travis at abdn.ac.uk Thu Mar 26 09:18:22 2009 From: a.travis at abdn.ac.uk (Tony Travis) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <3F1DAD5F-556B-4388-9179-F8CC754622B9@sanger.ac.uk> References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> <3F1DAD5F-556B-4388-9179-F8CC754622B9@sanger.ac.uk> Message-ID: <49CBAACE.2090704@abdn.ac.uk> Tim Cutts wrote: > [...] > I don't know about hospital software, but it's certainly the case for > some DNA sequencer instruments. Our ABI 3700 capillary sequencers > have Windows machines attached for the data collection. ABI > explicitly forbid us from either patching Windows, or from installing > antivirus software. Doing so would drop us off support. Hello, Tim. You were lucky ;-) ABI told us that we should NOT TOUCH the keyboard of our old Mac-based DNA sequencer during a run because it might cause the system to crash! Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis@abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From rgb at phy.duke.edu Thu Mar 26 10:01:02 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: <3F1DAD5F-556B-4388-9179-F8CC754622B9@sanger.ac.uk> References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> <3F1DAD5F-556B-4388-9179-F8CC754622B9@sanger.ac.uk> Message-ID: On Thu, 26 Mar 2009, Tim Cutts wrote: > > On 26 Mar 2009, at 2:42 pm, Robert G. Brown wrote: > >> Um, I don't believe that this is the case, and I say this as a semi-pro >> consultant in health care. > > I don't know about hospital software, but it's certainly the case for some > DNA sequencer instruments. Our ABI 3700 capillary sequencers have Windows > machines attached for the data collection. ABI explicitly forbid us from > either patching Windows, or from installing antivirus software. Doing so > would drop us off support. > > Consequently, all those machines are on their own strongly firewalled network > where hopefully they can't get infected, and if they are, the infections > can't get back out again. At least, not easily. That I'll believe, although there you're not dealing with the government, you're dealing with a vendor, and the price you pay to secure the machines is exactly what you stated -- put the system(s) in a box and pray. There are also quite possibly legal liability issues -- those are common reasons for a policy like this on the part of a vendor. (Legal) risk management is far more likely to be dictating policy than government edict, and you'll often see very different strategies for that management depending on whether it is IT dominated or lawyer dominated. IT people want to patch and test but stay current, lawyers want CYA and no change. The latter often don't UNDERSTAND the arguments for staying current and patching holes -- they only understand "certification", which they interpret as "they get sued if we have a problem, not us". Truthfully, this is one reason a lot of people stay with MS, in spite of their abysmal track record with security. They're so bad, they provide an automatic "it's not our fault" escape clause, and the company is so big that they have deep pockets should they get sued due to a contratemps and they make the lawyers feel all warm and fuzzy because how'd they get so big if their systems weren't reliable? Twenty years ago it was "Nobody ever got fired for buying IBM", same argument. Red Hat has been working hard at providing at least the illusion of a similar level of stability and risk assumptions, and of course in general they have a much easier time of actually delivering. rgb > > Tim > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a > charity registered in England with number 1021457 and acompany registered in > England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From tjrc at sanger.ac.uk Thu Mar 26 10:04:01 2009 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: <709891E6-D374-48D8-B034-EC6EC4C55596@sanger.ac.uk> On 26 Mar 2009, at 3:54 pm, Joshua Baker-LePain wrote: > Note that Leif mentioned medical equipment with embedded Windows > systems. And he's right -- you're not allowed to touch the software > build on those without getting the new build approved by the FDA (at > least, not if you want to use said equipment on real live > patients). And those machines are generally networked so that the > data (images, e.g.) can be uploaded. It is very, very scary. Why > anyone ever made the decision to run medical equipment on Windows > (over the screams of the engineering team) is utterly beyond me. I suspect the reason is usually that the raw devices the equipment uses (typically a CCD camera or something similar) are only shipped with drivers for Windows, and the upstream component vendor won't support the instrument vendor controlling their hardware with their own software drivers on some other operating system. It all comes down to support matrixes. Old ABI 377 gel sequencers used to use Macintoshes, but that was back in the days of System 7 with a software stack that was so terrible a single error in the network stack would cause your entire sequencing run to be lost (which is expensive, and sometimes unrepeatable) and the ABI 3700 move to Windows was actually an improvement, at the time... Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rgb at phy.duke.edu Thu Mar 26 10:03:46 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: On Thu, 26 Mar 2009, Joshua Baker-LePain wrote: > On Thu, 26 Mar 2009 at 10:42am, Robert G. Brown wrote > >> On Thu, 26 Mar 2009, Leif Nixon wrote: > >>> As in health-care. Which is why you get hospitals with >>> Conficker/Downadup running rampant through medical equipment with >>> embedded Windows systems. Basically, you're not allowed to patch them >>> without FDA approval. >>> >>> That's scary. >> >> Um, I don't believe that this is the case, and I say this as a semi-pro >> consultant in health care. Most hospitals probably do something along >> these lines as part of the standard CYA, but the regulations, especially >> HIPAA, are "due diligence" recommendations with an amazing {\em lack} of >> specification. You can pretty much do whatever you like, but heaven >> help you if you drop your patients' data or violate their >> confidentiality. At the very least you'd better be able to show that >> you tried hard to keep things secure... > > Note that Leif mentioned medical equipment with embedded Windows systems. And > he's right -- you're not allowed to touch the software build on those without > getting the new build approved by the FDA (at least, not if you want to use > said equipment on real live patients). And those machines are generally > networked so that the data (images, e.g.) can be uploaded. It is very, very > scary. Why anyone ever made the decision to run medical equipment on Windows > (over the screams of the engineering team) is utterly beyond me. Ah, I see, thanks. I completely missed the point about medical equipment. Sorry Leif. Or as Jane used to say on SNL, "Never Mind..." Need coffee, need coffee, got no coffee. Maybe a bite of chocolate instead. ;-) > > -- > Joshua Baker-LePain > QB3 Shared Cluster Sysadmin > UCSF > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From james.p.lux at jpl.nasa.gov Thu Mar 26 10:17:04 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of Joshua Baker-LePain > Sent: Thursday, March 26, 2009 8:54 AM > To: Robert G. Brown > Cc: Leif Nixon; beowulf@beowulf.org > Subject: Re: [Beowulf] Wired article about Go machine > > Note that Leif mentioned medical equipment with embedded > Windows systems. > And he's right -- you're not allowed to touch the software > build on those without getting the new build approved by the > FDA (at least, not if you want to use said equipment on real > live patients). And those machines are generally networked > so that the data (images, e.g.) can be uploaded. It is very, > very scary. Why anyone ever made the decision to run medical > equipment on Windows (over the screams of the engineering > team) is utterly beyond me. > Not to be a MS fanboy, but it should also be recognized that Windows Embedded is a different animal than consumer Windows. Or, more properly, Windows Embedded CAN be made a different animal. It's up to the system integrator/manufacturer to "do the right thing". After all, it's not the windows kernel that's the problem, it's the "other stuff" and "configuration" that is the problem. It's the person who decides "Hey, let's let the user run PowerPoint on the EKG monitor" that is the real problem, which is exacerbated if your system development team has come from a more traditional non-windows embedded development world, where the idea of network connectivity was a pipe dream (gosh, wouldn't it be neat if we could get the data out by some means other than a 9600 bps serial link! And no more GPIB/IEEE-488!). Windows *is* a seductive trap.. Hey, we can load windows up and then we can just write to USB thumb drives, or use a browser!.. It creates the illusion that you don't need to invest some serious resources in the OS for your device. The problem being that those developers (who are very development cost sensitive) AREN'T usually people who have enterprise/networking system experience. They're "consumers" of desktop services, but their mindset is focussed on running the crosscompiler for the Z80 or x86 that's on the embedded card. There's a whole different mindset when you go from "microcontroller running test equipment" to "networked attached computing device that happens to make measurements". When you're running on a dedicated box with limited interfaces, the very box itself enforces a form of security. You don't even think about viruses, because there's no way to load new software short of sending it back to the factory to burn a new PROM. Heck, they may not even be aware of the existence of Embedded Windows, so they may think that loading up consumer XP is where it's at. It solves the immediate need: network and disk accesses without having to actually write any software. Maybe security winds up on some "to-do for the next release" list. I know lots of embedded developers who basically have devoted all their mindshare to their particular embedded platform (be it VxWorks, eCos, RTEMS, or whatever) and don't really have the time and inclination to become an expert on Windows (which, itself, is probably a >1 year task, PARTICULARLY if you come from a Unix environment. It's the long time Unix SysAdmin guys who you find writing weird little scripts and stuff to do things that Windows actually already does, but in a non-unix-obvious way) I think (and it's just speculation, so "flame on" if you will) that those very same developers, if they put Linux on the piece of gear, would make the same boneheaded system configuration issues, etc. that they do with Windows. The only saving grace is that the "vanilla" installlation of Linux, particularly if they picked a distro targeted at embedded apps, *is* somewhat better configured than the "vanila" installation of XP. Jim From jlb17 at duke.edu Thu Mar 26 10:23:05 2009 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Wired article about Go machine In-Reply-To: References: <316141.15220.qm@web37904.mail.mud.yahoo.com> <4EE78016-9754-4800-9F42-CD71038B45C4@xs4all.nl> <5d1ee7420903251352x15d74c7brc5a65b2cb610f6bb@mail.gmail.com> Message-ID: On Thu, 26 Mar 2009 at 10:17am, Lux, James P wrote >> From: beowulf-bounces@beowulf.org >> [mailto:beowulf-bounces@beowulf.org] On Behalf Of Joshua Baker-LePain >> >> Note that Leif mentioned medical equipment with embedded >> Windows systems. >> And he's right -- you're not allowed to touch the software >> build on those without getting the new build approved by the >> FDA (at least, not if you want to use said equipment on real >> live patients). And those machines are generally networked >> so that the data (images, e.g.) can be uploaded. It is very, >> very scary. Why anyone ever made the decision to run medical >> equipment on Windows (over the screams of the engineering >> team) is utterly beyond me. > > Not to be a MS fanboy, but it should also be recognized that Windows > Embedded is a different animal than consumer Windows. Or, more properly, > Windows Embedded CAN be made a different animal. It's up to the system I should have clarified that when I said "embedded" above I didn't mean "Embedded". The equipment I was obliquely referring to was running bog standard XP. *shudder* -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From jcownie at cantab.net Thu Mar 26 12:23:36 2009 From: jcownie at cantab.net (James Cownie) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: On 26 Mar 2009, at 13:57, Leif Nixon wrote: > > Well, some banks over here have a authentication system that uses a > hardware crypto token with a keypad. You use it for a challenge- > response > procedure to log in to the Internet banking site - nothing new so > far - > but you also use it to sign (using challenge-response) each bunch of > transactions you perform on the banking site. And - this is the key > point - to sign the transactions you actually enter certain parts of > the > transaction data (like the total amount to transfer) into the crypto > token. > > Even with total control over the client PC, it's real hard for an > attacker to do anything really evil in that setting. > But check this analysis of the UK version, which seems to be almost exactly as described... http://www.cl.cam.ac.uk/~sjm217/papers/fc09optimised.pdf -- -- Jim -- James Cownie -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090326/fb46d989/attachment.html From DPHURST at uncg.edu Thu Mar 26 20:32:23 2009 From: DPHURST at uncg.edu (Dow Hurst DPHURST) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? Message-ID: We've got a couple of weeks max to finalize spec'ing a new cluster.? Has anyone knowledge of lowering latency for NAMD by implementing a multi-rail IB solution using MVAPICH or Intel's MPI?? My research tells me low latency is key to scaling our code of choice, NAMD, effectively.? Has anyone cut down real effective latency to below 1.0us using multi-rail IB for molecular dynamics codes such Gromacs, Amber, CHARMM, or NAMD?? What about lowered latency for parallel abnitio calculations involving NwChem, Jaguar, or Gaussian using multi-rail IB? If so, what was the configuration of cards and software?? Any caveats involved, except price?? ;-) Multi-rail IB is not something I know much about so am trying to get up to speed on what is possible and what is not.? I do understand that lowering latency using multi-rail has to come from the MPI layer knowing how to use the hardware properly and some MPI implementations have such options and others don't.? I understand that MVAPICH has some capabilities to use multi-rail and that NAMD is run on top of MVAPICH on many IB based clusters.? Any links or pointers to how I can quickly educate myself on the topic would be appreciated. Best wishes, Dow __________________________________ Dow P. Hurst, Research Scientist Department of Chemistry and Biochemistry University of North Carolina at Greensboro 435 New Science Bldg. Greensboro, NC 27402-6170 dphurst@uncg.edu Dow.Hurst@mindspring.com 336-334-5122 office 336-334-4766 lab 336-334-5402 fax -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090326/ac5111c9/attachment.html From lindahl at pbm.com Thu Mar 26 21:03:30 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: References: Message-ID: <20090327040330.GB5661@bx9.net> On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: > We've got a couple of weeks max to finalize spec'ing a new cluster.? Has > anyone knowledge of lowering latency for NAMD by implementing a > multi-rail IB solution using MVAPICH or Intel's MPI? Multi-rail is likely to increase latency. BTW, Intel MPI usually has higher latency than other MPI implementations. If you look around for benchmarks you'll find that QLogic InfiniPath does quite well on NAMD and friends, compared to that other brand of InfiniBand adaptor. For example, at http://www.ks.uiuc.edu/Research/namd/performance.html the lowest line == best performance is InfiniPath. Those results aren't the most recent, but I'd bet that the current generation of adaptors has the same situation. -- Greg (yeah, I used to work for QLogic.) From niftyompi at niftyegg.com Thu Mar 26 21:36:57 2009 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: <20090327043657.GA4061@tosh2egg.wr.niftyegg.com> On Thu, Mar 26, 2009 at 10:28:12AM -0400, Robert G. Brown wrote: > Subject: Re: [Beowulf] One time password generators... Scanning back I did not see VPN as a component of a solution. Perhaps I missed it. Layered security should be part of most projects... IMO It makes sense to me that the keyboard box find itself well inside a DMZ zone with the only "live" network being the secured net. It may be that a VPN solution with integrated OTP support will prove easier to evaluate, justify, install, support and REPLACE. Once inside the VPN, ssh and friends might be used to manage resources (in contrast to access). One value of this is that once inside the VPN, cluster tools and applications can use different access methods as apropriate to the task at hand. I.e. I cannot see a per host OTP solution for an MPI cluster or multiple NFS server mounts. Later, mitch -- T o m M i t c h e l l Found me a new hat, now what? From niftyompi at niftyegg.com Thu Mar 26 22:20:18 2009 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <20090327040330.GB5661@bx9.net> References: <20090327040330.GB5661@bx9.net> Message-ID: <20090327052018.GB4061@tosh2egg.wr.niftyegg.com> On Thu, Mar 26, 2009 at 09:03:30PM -0700, Greg Lindahl wrote: > On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: > > > We've got a couple of weeks max to finalize spec'ing a new cluster.? Has > > anyone knowledge of lowering latency for NAMD by implementing a > > multi-rail IB solution using MVAPICH or Intel's MPI? > > Multi-rail is likely to increase latency. > > BTW, Intel MPI usually has higher latency than other MPI > implementations. > > If you look around for benchmarks you'll find that QLogic InfiniPath > does quite well on NAMD and friends, compared to that other brand of > InfiniBand adaptor. For example, at > > http://www.ks.uiuc.edu/Research/namd/performance.html > > the lowest line == best performance is InfiniPath. Those results > aren't the most recent, but I'd bet that the current generation of > adaptors has the same situation. What this implies is that NAMD is not purely bandwidth limited. Rather it is limited by other quickness issues. For the most part multi-rail is a bandwidth enhancement play... With multi-rail do double check the system bus (PCI-e) bandwidth. If multi-rail is used determine how the data is mux-ed between rails and what the impact of that decision code path has on quickness and/or bandwidth. If multi-rail is to go very fast MPI needs to manage each rail/LID in productive ways for the application. I doubt that this "productive way" has a simple general one size fits all answer. NAMD is clearly a "got to benchmark it" application! Both the data link hardware and the MPI library integration with that hardware are important... The last table on Greg's URI pointer -- NAMD version is also important! It is possible that NAMD.next will move to be more bandwidth limited than it is today and then the notion of best interconnect/ platform will change. > -- Greg > (yeah, I used to work for QLogic.) Me too. Later, mitch -- T o m M i t c h e l l Found me a new hat, now what? From DPHURST at uncg.edu Thu Mar 26 22:46:46 2009 From: DPHURST at uncg.edu (Dow Hurst DPHURST) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <20090327040330.GB5661@bx9.net> References: , <20090327040330.GB5661@bx9.net> Message-ID: To: beowulf@beowulf.org From: Greg Lindahl Sent by: beowulf-bounces@beowulf.org Date: 03/27/2009 12:03AM Subject: Re: [Beowulf] Lowered latency with multi-rail IB? On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: > We've got a couple of weeks max to finalize spec'ing a new cluster.? Has > anyone knowledge of lowering latency for NAMD by implementing a > multi-rail IB solution using MVAPICH or Intel's MPI? Multi-rail is likely to increase latency. BTW, Intel MPI usually has higher latency than other MPI implementations. If you look around for benchmarks you'll find that QLogic InfiniPath does quite well on NAMD and friends, compared to that other brand of InfiniBand adaptor. For example, at http://www.ks.uiuc.edu/Research/namd/performance.html the lowest line == best performance is InfiniPath. Those results aren't the most recent, but I'd bet that the current generation of adaptors has the same situation. -- Greg (yeah, I used to work for QLogic.) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf I'm very familiar with that benchmark page.? ;-) One motivation for designing a MPI layer to lower latency with multi-rail is when making use of accelerator cards or GPUs.? There is so much more work being done that the interconnect quickly becomes the limiting factor.? One Tesla GPU is equal to 12 cores for the current implementation of NAMD/CUDA so the scaling efficiency really suffers.? I'd like to see how someone could scale efficiently beyond 16 IB connections with only two GPUs per IB connection when running NAMD/CUDA. Some codes are sped up far beyond 12x and reach 100x such as VMD's cionize utility.? I don't think that particular code requires parallelization (not sure).? However, as NAMD/CUDA is tuned, the efficiency on the GPU is increased, and new bottlenecks found and fixed from previously ignored sections of code, there will be even more than a 12x speedup.? So, a solution to the interconnect bottleneck needs to be developed and I wondered if multi-rail would be the answer.? Thanks so much for your thoughts! Best wishes, Dow -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090327/5ae7c169/attachment.html From nixon at nsc.liu.se Thu Mar 26 13:18:36 2009 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: (James Cownie's message of "Thu\, 26 Mar 2009 19\:23\:36 +0000") References: Message-ID: James Cownie writes: > But check this analysis of the UK version, which seems to be almost > exactly as described... > > http://www.cl.cam.ac.uk/~sjm217/papers/fc09optimised.pdf Interesting. Thanks. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From hbugge at platform.com Fri Mar 27 01:45:14 2009 From: hbugge at platform.com (=?ISO-8859-1?Q?H=E5kon_Bugge?=) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <20090327052018.GB4061@tosh2egg.wr.niftyegg.com> References: <20090327040330.GB5661@bx9.net> <20090327052018.GB4061@tosh2egg.wr.niftyegg.com> Message-ID: On Mar 27, 2009, at 6:20 , Nifty Tom Mitchell wrote: > On Thu, Mar 26, 2009 at 09:03:30PM -0700, Greg Lindahl wrote: >> On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: >> >>> We've got a couple of weeks max to finalize spec'ing a new >>> cluster. Has >>> anyone knowledge of lowering latency for NAMD by implementing a >>> multi-rail IB solution using MVAPICH or Intel's MPI? >> >> Multi-rail is likely to increase latency. In genetral true, but it depends. With a heavy all-to-all communication pattern, I have seen intelligent use of two HCAs (not muiltirail per se in order to increase bandwidth). I have observed almost 2x performance increase on SPEC MPI2007 pop2 application when going from one to two HCAs. This is a typical many-small-packets application. I am not saying the same is true for NAMD, but I wouldn't rule it out. >> BTW, Intel MPI usually has higher latency than other MPI >> implementations. Intel MPI and MVAPICH is not what I would call top-of-the-line performers (but I am biased as I work for a company delivering an MPI product). You might find it useful to look at pop2 performance for different interconnects and MPI implementations at ttp://www.spec.org/mpi2007/results/mpi2007.html Thanks, H?kon From rgb at phy.duke.edu Fri Mar 27 06:11:17 2009 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] One time password generators... In-Reply-To: <20090327043657.GA4061@tosh2egg.wr.niftyegg.com> References: <20090327043657.GA4061@tosh2egg.wr.niftyegg.com> Message-ID: On Thu, 26 Mar 2009, Nifty Tom Mitchell wrote: > On Thu, Mar 26, 2009 at 10:28:12AM -0400, Robert G. Brown wrote: >> Subject: Re: [Beowulf] One time password generators... > > > Scanning back I did not see VPN as a component > of a solution. Perhaps I missed it. > > Layered security should be part of most projects... IMO > It makes sense to me that the keyboard box > find itself well inside a DMZ zone with the only > "live" network being the secured net. > > It may be that a VPN solution with integrated OTP > support will prove easier to evaluate, justify, install, support > and REPLACE. > > Once inside the VPN, ssh and friends might be used > to manage resources (in contrast to access). > > One value of this is that once inside the VPN, cluster tools and applications > can use different access methods as apropriate to the task at hand. > I.e. I cannot see a per host OTP solution for an MPI cluster or > multiple NFS server mounts. Well, of course I and many others use VPNs, but: a) VPNs often provide one with the illusion of security more than the reality. In far too many cases, they encourage people to build intranets that basically have a hard and crunchy exterior and a soft and chewy center. After all, why protect those interior systems? They're inside our VPN! Which means that if someone ever does succeed in snooping the VPN credentials -- something that is generally appallingly easy from a compromised Windows box, where the VPN shared secrets are stored readable by at least the user and where one can assume that they keyboard is being snooped by trolls -- they can get into your protected network. Welcome to my personal nightmare on at least one of the networks I consult for. Sure, I'm not stupid and I harden the INTERIOR servers but a troll loose on the inside disguised as a rug can do plenty of damage from underneath his bridge. b) An ssh-only solution is hard through and through. A firewall with only ssh ports open to pass through, ssh-based tunnel/vpn connections for specific services or general access you want to open to specific people on the outside. You're no better off in terms of risk of exterior Windows boxes being compromised, but you will generally not directly compromise the servers and traffic is NEVER unencrypted. In the case of a VPN, as soon as you're inside all those encapsulated packets are deencapsulated and everything is plaintext again by default. c) SSL solutions are more or less equivalent to ssh, except that they "can" have one way positive host identification that prevents one kind of MitM attack. Ultimately, while no security model that I know of is foolproof (and there are plenty of fools in the world, alas) the best compromises seem to be things like: Access from clients that are themselves "hardened" in terms of security. No Explorer! No Outlook! No Windows (unless it was installed and is maintained by a real professional, and not used for anything except business purposes, no permitting your teenager to use it to play games and download game buffs from random sites or to cruise porn). Hard exterior, permitting only authorized participants in through authenticated holes from those hardened outside clients. Bidirectional encryption of all traffic between client and all resources inside. NO plaintext, anywhere on the network. Layered/castle keep model inside. Everything hard, but server resources cased in diamond with very specific holes drilled for service connections and with only competent, strong-authenticated staff permitted access from their own personally secured diamond hard systems. Strong encryption (goes without saying) and passwords or crypt credentials, not wussy ones. This goes for users, and must be checked regularly. Multifactor auth seems to add little to this, although perhaps it helps a bit to protect against lously user passwords and easily accessible crypt credentials on exterior clients. Helps hold off script kiddies, but not the Ubercracker. rgb > > Later, > mitch > > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From gasreaa at yahoo.com Thu Mar 26 20:30:05 2009 From: gasreaa at yahoo.com (Adrian Wong) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Uses for AMQP in HPC clusters Message-ID: <33220.95019.qm@web52605.mail.re2.yahoo.com> I have been tracking the developments in Advanced Message Queue Protocol (AMQP) and there has been some progress in implementation. Does this middleware have any applicability in the HPC cluster domain? What would you use it for? Adrian Wong Slide From joshua_mora at usa.net Thu Mar 26 22:15:17 2009 From: joshua_mora at usa.net (Joshua mora acosta) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? Message-ID: <401NcAFoR2746S11.1238130917@cmsweb11.cms.usa.net> The only way I got under 1usec in PingPong test or with ib_[write/send/read]_lat is with QDR and back to back (ie. no switch). With switch I get 1.1[3-7]usec [HP-MPI, OpenMPI, MVAPICH]. It does not matter the MPI although I have to agree with Greg that multirail also increases latency. Multirail is used for: i) reliability ii) higher bandwidth Best regards, Joshua ------ Original Message ------ Received: 11:11 PM CDT, 03/26/2009 From: Greg Lindahl To: beowulf@beowulf.org Subject: Re: [Beowulf] Lowered latency with multi-rail IB? > On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: > > > We've got a couple of weeks max to finalize spec'ing a new cluster.? Has > > anyone knowledge of lowering latency for NAMD by implementing a > > multi-rail IB solution using MVAPICH or Intel's MPI? > > Multi-rail is likely to increase latency. > > BTW, Intel MPI usually has higher latency than other MPI > implementations. > > If you look around for benchmarks you'll find that QLogic InfiniPath > does quite well on NAMD and friends, compared to that other brand of > InfiniBand adaptor. For example, at > > http://www.ks.uiuc.edu/Research/namd/performance.html > > the lowest line == best performance is InfiniPath. Those results > aren't the most recent, but I'd bet that the current generation of > adaptors has the same situation. > > -- Greg > (yeah, I used to work for QLogic.) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From prajeev at tuxcentrix.com Thu Mar 26 23:35:47 2009 From: prajeev at tuxcentrix.com (Puthanveettil Prabhakaran Prajeev) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: References: <20090327040330.GB5661@bx9.net> Message-ID: http://www.penguincomputing.com/cluster_computing Can the above be of any help to you ? Regards Prajeev On Fri, Mar 27, 2009 at 11:16 AM, Dow Hurst DPHURST wrote: > To: beowulf@beowulf.org > From: Greg Lindahl > Sent by: beowulf-bounces@beowulf.org > Date: 03/27/2009 12:03AM > Subject: Re: [Beowulf] Lowered latency with multi-rail IB? > > On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: > > > We've got a couple of weeks max to finalize spec'ing a new cluster. Has > > anyone knowledge of lowering latency for NAMD by implementing a > > multi-rail IB solution using MVAPICH or Intel's MPI? > > Multi-rail is likely to increase latency. > > BTW, Intel MPI usually has higher latency than other MPI > implementations. > > If you look around for benchmarks you'll find that QLogic InfiniPath > does quite well on NAMD and friends, compared to that other brand of > InfiniBand adaptor. For example, at > > http://www.ks.uiuc.edu/Research/namd/performance.html > > the lowest line == best performance is InfiniPath. Those results > aren't the most recent, but I'd bet that the current generation of > adaptors has the same situation. > > -- Greg > (yeah, I used to work for QLogic.) > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > I'm very familiar with that benchmark page. ;-) > > One motivation for designing a MPI layer to lower latency with multi-rail > is when making use of accelerator cards or GPUs. There is so much more work > being done that the interconnect quickly becomes the limiting factor. One > Tesla GPU is equal to 12 cores for the current implementation of NAMD/CUDA > so the scaling efficiency really suffers. I'd like to see how someone could > scale efficiently beyond 16 IB connections with only two GPUs per IB > connection when running NAMD/CUDA. > > Some codes are sped up far beyond 12x and reach 100x such as VMD's cionize > utility. I don't think that particular code requires parallelization (not > sure). However, as NAMD/CUDA is tuned, the efficiency on the GPU is > increased, and new bottlenecks found and fixed from previously ignored > sections of code, there will be even more than a 12x speedup. So, a > solution to the interconnect bottleneck needs to be developed and I wondered > if multi-rail would be the answer. Thanks so much for your thoughts! > Best wishes, > Dow > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090327/c1a93109/attachment.html From balaji at mcs.anl.gov Wed Mar 25 13:29:23 2009 From: balaji at mcs.anl.gov (Pavan Balaji) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] [hpc-announce] CFP: International Workshop on High Performance Interconnects for Distributed Computing (HPI-DC) Message-ID: <49CA9423.10400@mcs.anl.gov> Workshop on High Performance Interconnects for Distributed Computing (HPI-DC'09) in conjunction with Cluster 2009 August 31, 2009, New Orleans, Louisiana http://www.cercs.gatech.edu/hpidc2009 ********************************************************************* Call for Papers ********************************************************************* The emergence of 10.0 GigE and above, InfiniBand and other high-performance interconnection technologies, programmable NICs and networking platforms, and protocols like DDP and RDMA over IP, make it possible to create tightly linked systems across physical distances that exceed those of traditional single cluster or server systems. These technologies can deliver communication capabilities that achieve the performance levels needed by high end applications in enterprise systems and like those produced by the high performance computing community. Furthermore, the manycore nature of next generation platforms and the creation of distributed cloud computing infrastructure will greatly increase the demand for high performance communication capabilities over wide area distances. The purpose of this workshop is to explore the confluence of distributed computing and communications technologies with high performance interconnects, as applicable or applied to realistic high end applications. The intent is to create a venue that will act as a bridge between researchers developing tools and platforms for high-performance distributed computing, end user applications seeking high performance solutions, and technology providers aiming to improve interconnect and networking technologies for future systems. The hope is to foster knowledge creation and intellectual interchanges between HPC and Cloud computing end users and technology developers, in the specific domain of high performance distributed interconnects. Topics of interest include but are not limited to: # Hardware/software architectures for communication infrastructures for HPC and Cloud Computing # Data and control protocols for interactive and large data volume applications # Novel devices and technologies to enhance interconnect properties # Interconnect-level issues when extending high performance beyond single machines, including architecture, protocols, services, QoS, and security # Remote storage (like iSCSI), remote databases, and datacenters, etc. # Development tools, programming environments and models (like PGAS, OpenShmem, Hadoop, etc.), ranging from programming language support to simulation environments. PAPER SUBMISSIONS: HPI-DC invites authors to submit original and unpublished work. Please submit extended abstracts or full papers, not exceeding 8 double-column pages in 10 point font or larger, in IEEE format. Electronic submission is strongly encouraged. Hard copies will be accepted only if electronic submission is not possible. Submission implies the willingness of at least one of the authors to register and present the paper. Any questions concerning hardcopy submissions or any other issues may be directed to the Program Co-Chairs. IMPORTANT DATES: # Paper submission: June 5th, 2009 # Notification of acceptance: July 10th, 2009 # Final manuscript due: July 29th, 2009 # Workshop date: Aug. 31st, 2009 ORGANIZATION: General Chair # Steve Poole, Oak Ridge National Lab Program Co-Chairs # Pavan Balaji, Argonne National Lab # Ada Gavrilovska, Georgia Institute of Technology Technical Program Committee: # Ahmad Afsahi, Queen's University, Canada # Taisuke Boku, University of Tsukuba, Japan # Ron Brightwell, Sandia National Laboratory # Patrick Geoffray, Myricom # Kei Hiraki, University of Tokyo, Japan # Hyun-wook Jin, Konkuk University, Korea # Pankaj Mehra, HP Research # Guillaume Mercier, INRIA, France # Scott Pakin, Los Alamos National Laboratory # D. K. Panda, Ohio State University # Fabrizio Petrini, IBM Research # Karsten Schwan, Georgia Tech # Jesper Traeff, NEC, Europe # Sudhakar Yalamanchili, Georgia Tech # Weikuan Yu, Auburn University If you have any questions about the workshop, please contact us at hpidc09-chairs@mcs.anl.gov. ==================================================================== If you do not wish to receive any more emails on this list, you can unsubscribe here: https://lists.mcs.anl.gov/mailman/listinfo/hpc-announce ==================================================================== -- Pavan Balaji http://www.mcs.anl.gov/~balaji From hearnsj at googlemail.com Fri Mar 27 09:52:58 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: References: <20090327040330.GB5661@bx9.net> Message-ID: <9f8092cc0903270952x680196dct66d7fc160a3f3e39@mail.gmail.com> >From the blurb on SGI MPT - their ICE systems have multirail- http://www.sgi.fr/WP_MPT_SGI.pdf SGI MPT utilizes multiple InfiniBand rails to perform message pathway distribution and message striping. Message pathway distribution is done by strategically mapping individual routes (source to destination) to the available rails. Since routes are mapped to different rails, more aggregate bandwidth is available in situations where many MPI processes are communicating at the same time. SGI MPT performs message striping by sending portions of large messages on each rail in parallel with the effect of nearly doubling the effective MPI point-topoint bandwidth. From hearnsj at googlemail.com Fri Mar 27 09:55:36 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: References: <20090327040330.GB5661@bx9.net> Message-ID: <9f8092cc0903270955h6750372aq5607146e94a31d7c@mail.gmail.com> >From the blurb on SGI MPT -(their ICE systems have multirail) http://www.sgi.fr/WP_MPT_SGI.pdf SGI MPT utilizes multiple InfiniBand rails to perform message pathway distribution and message striping. Message pathway distribution is done by strategically mapping individual routes (source to destination) to the available rails. Since routes are mapped to different rails, more aggregate bandwidth is available in situations where many MPI processes are communicating at the same time. SGI MPT performs message striping by sending portions of large messages on each rail in parallel with the effect of nearly doubling the effective MPI point-topoint bandwidth. From Craig.Tierney at noaa.gov Fri Mar 27 10:20:03 2009 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <401NcAFoR2746S11.1238130917@cmsweb11.cms.usa.net> References: <401NcAFoR2746S11.1238130917@cmsweb11.cms.usa.net> Message-ID: <49CD0AC3.9000301@noaa.gov> Joshua mora acosta wrote: > The only way I got under 1usec in PingPong test or with > ib_[write/send/read]_lat is with QDR and back to back (ie. no switch). > With switch I get 1.1[3-7]usec [HP-MPI, OpenMPI, MVAPICH]. > It does not matter the MPI although I have to agree with Greg that multirail > also increases latency. > Multirail is used for: > i) reliability > ii) higher bandwidth > > Best regards, > Joshua > What about using multi-rail to increase message rate? That isn't the same as latency, but if you put messages on both wires you should get more. A vendor tried to convince me recently that this was important, but they had no benchmarks to back it up. Craig > > ------ Original Message ------ > Received: 11:11 PM CDT, 03/26/2009 > From: Greg Lindahl > To: beowulf@beowulf.org > Subject: Re: [Beowulf] Lowered latency with multi-rail IB? > >> On Thu, Mar 26, 2009 at 11:32:23PM -0400, Dow Hurst DPHURST wrote: >> >>> We've got a couple of weeks max to finalize spec'ing a new cluster. Has > >>> anyone knowledge of lowering latency for NAMD by implementing a >>> multi-rail IB solution using MVAPICH or Intel's MPI? >> Multi-rail is likely to increase latency. >> >> BTW, Intel MPI usually has higher latency than other MPI >> implementations. >> >> If you look around for benchmarks you'll find that QLogic InfiniPath >> does quite well on NAMD and friends, compared to that other brand of >> InfiniBand adaptor. For example, at >> >> http://www.ks.uiuc.edu/Research/namd/performance.html >> >> the lowest line == best performance is InfiniPath. Those results >> aren't the most recent, but I'd bet that the current generation of >> adaptors has the same situation. >> >> -- Greg >> (yeah, I used to work for QLogic.) >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Craig Tierney (craig.tierney@noaa.gov) From hbugge at platform.com Fri Mar 27 10:37:28 2009 From: hbugge at platform.com (=?ISO-8859-1?Q?H=E5kon_Bugge?=) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <49CD0AC3.9000301@noaa.gov> References: <401NcAFoR2746S11.1238130917@cmsweb11.cms.usa.net> <49CD0AC3.9000301@noaa.gov> Message-ID: On Mar 27, 2009, at 18:20 , Craig Tierney wrote: > What about using multi-rail to increase message rate? That isn't > the same as latency, but if you put messages on both wires you > should get more. Exactly why we saw almost 2x speedup on message rate (latency) sensitive apps using Platform MPI. We call the technique alternating; any communicating peer will use a HCA or port, but a single MPI process will alternate between different HCAs (or ports), depending on which peer he communicates with. H?kon From hearnsj at googlemail.com Fri Mar 27 10:43:07 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: References: <401NcAFoR2746S11.1238130917@cmsweb11.cms.usa.net> <49CD0AC3.9000301@noaa.gov> Message-ID: <9f8092cc0903271043kd69761bme47973e744703a36@mail.gmail.com> JC on a bike... I need multirail email for resiliency. If this email is blank I will shoot myself. >From the blurb on SGI MPT - their ICE systems have multirail- http://www.sgi.fr/WP_MPT_SGI.pdf SGI MPT utilizes multiple InfiniBand rails to perform message pathway distribution and message striping. Message pathway distribution is done by strategically mapping individual routes (source to destination) to the available rails. Since routes are mapped to different rails, more aggregate bandwidth is available in situations where many MPI processes are communicating at the same time. SGI MPT performs message striping by sending portions of large messages on each rail in parallel with the effect of nearly doubling the effective MPI point-topoint bandwidth. From kus at free.net Fri Mar 27 10:58:20 2009 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: Message-ID: In message from Dow Hurst DPHURST (Thu, 26 Mar 2009 23:32:23 -0400): >We've got a couple of weeks max to finalize spec'ing a new cluster.? >Has >anyone knowledge of lowering latency for NAMD by implementing a >multi-rail IB solution using MVAPICH or Intel's MPI?? My research >tells >me low latency is key to scaling our code of choice, NAMD, >effectively.? Has anyone cut down real effective latency >to below 1.0us using multi-rail IB for molecular dynamics codes such >Gromacs, Amber, CHARMM, or NAMD?? What about lowered latency for >parallel abnitio calculations involving NwChem, Jaguar, or Gaussian >using multi-rail IB? In opposition to molecular dynamics programs (Gromacs/Amber/Charmm) where low latency is necessary, for some quantum chemical programs (Gaussian, Gamess-US) there is relative low interconnect dependency. I measured message lengthes for Gaussian-03 for a set of calculation methods, and this messages are middle-to-large in sizes. NWChem is the only quantum-chemical program I know, which require high interconnect performance. I don't know about Jaguar. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow > > >If so, what was the configuration of cards and software?? Any caveats >involved, except price?? >;-) > >Multi-rail IB is not something I know much about so am trying to get >up to speed on what is possible and what is not.? I do understand >that lowering latency using multi-rail has to come from the MPI layer >knowing how to use the hardware properly and some MPI implementations >have such options and others don't.? I understand that MVAPICH has >some capabilities to use multi-rail and that NAMD is run on top of >MVAPICH on many IB based clusters.? Any links or pointers to how I >can quickly educate myself on the topic would be appreciated. >Best wishes, > >Dow > >__________________________________ >Dow P. Hurst, Research Scientist >Department of Chemistry and Biochemistry >University of North Carolina at Greensboro >435 New Science Bldg. >Greensboro, NC 27402-6170 >dphurst@uncg.edu >Dow.Hurst@mindspring.com >336-334-5122 office >336-334-4766 lab >336-334-5402 fax >-- >This message has been scanned for viruses and >dangerous content by MailScanner, and is >believed to be clean. > From lindahl at pbm.com Fri Mar 27 11:01:49 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Uses for AMQP in HPC clusters In-Reply-To: <33220.95019.qm@web52605.mail.re2.yahoo.com> References: <33220.95019.qm@web52605.mail.re2.yahoo.com> Message-ID: <20090327180149.GA6195@bx9.net> On Thu, Mar 26, 2009 at 08:30:05PM -0700, Adrian Wong wrote: > I have been tracking the developments in Advanced Message Queue > Protocol (AMQP) and there has been some progress in implementation. > Does this middleware have any applicability in the HPC cluster > domain? What would you use it for? It kinda looks like Revenge of CORBA to me. HPC people wasted a lot of time with CORBA. -- greg From hbugge at platform.com Fri Mar 27 11:27:15 2009 From: hbugge at platform.com (=?ISO-8859-1?Q?H=E5kon_Bugge?=) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? In-Reply-To: <383NcAsij1662S13.1238177375@cmsweb13.cms.usa.net> References: <383NcAsij1662S13.1238177375@cmsweb13.cms.usa.net> Message-ID: <53272520-FD5A-4432-BB09-126439843124@platform.com> , H?kon On Mar 27, 2009, at 19:09 , Joshua mora acosta wrote: > So a way to quantify if multirail helps on network latency driven > workloads > there should be a sinthetic benchmark that can be built to show off > the impact > of balancing these requests among multiple HCAs bound to different > network > paths or core pairs like an all-to-[all,gather,scatter] or barrier > benchmark > and in theory observe half the total latency of that overall > communication. > So I think multirail will reduce latency of sinchronizations > (collective > calls) but not to latency driven point to point communications. > Well, you're wrong. As stated, we do see speedup due to increased message rate on apps not using collectives. As to a benchmark, and all- to-all with _many_ processes per node will show you this. ($MPI_HOME/ examples/bin/mpi_msg_rate in our distro). I would actually claim the opposite; it will _not_ help on most collective operations, because they perform SMP optimizations and one process sends and receives on behalf of the other processes. Uncorrelated messages from many processes on a single node will, on the other hand, take advantages of the accumulated increased message rate provided by multiple HCAs. H?kon > Joshua > > ------ Original Message ------ > Received: 12:37 PM CDT, 03/27/2009 > From: H?kon Bugge > To: Craig Tierney Cc: Joshua mora acosta > , DPHURST@uncg.edu, beowulf@beowulf.org > Subject: Re: [Beowulf] Lowered latency with multi-rail IB? > >> On Mar 27, 2009, at 18:20 , Craig Tierney wrote: >> >>> What about using multi-rail to increase message rate? That isn't >>> the same as latency, but if you put messages on both wires you >>> should get more. >> >> Exactly why we saw almost 2x speedup on message rate (latency) >> sensitive apps using Platform MPI. We call the technique alternating; >> any communicating peer will use a HCA or port, but a single MPI >> process will alternate between different HCAs (or ports), depending >> on >> which peer he communicates with. >> >> >> H?kon > > > From joshua_mora at usa.net Fri Mar 27 11:09:35 2009 From: joshua_mora at usa.net (Joshua mora acosta) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Lowered latency with multi-rail IB? Message-ID: <383NcAsij1662S13.1238177375@cmsweb13.cms.usa.net> So a way to quantify if multirail helps on network latency driven workloads there should be a sinthetic benchmark that can be built to show off the impact of balancing these requests among multiple HCAs bound to different network paths or core pairs like an all-to-[all,gather,scatter] or barrier benchmark and in theory observe half the total latency of that overall communication. So I think multirail will reduce latency of sinchronizations (collective calls) but not to latency driven point to point communications. Joshua ------ Original Message ------ Received: 12:37 PM CDT, 03/27/2009 From: H?kon Bugge To: Craig Tierney Cc: Joshua mora acosta , DPHURST@uncg.edu, beowulf@beowulf.org Subject: Re: [Beowulf] Lowered latency with multi-rail IB? > On Mar 27, 2009, at 18:20 , Craig Tierney wrote: > > > What about using multi-rail to increase message rate? That isn't > > the same as latency, but if you put messages on both wires you > > should get more. > > Exactly why we saw almost 2x speedup on message rate (latency) > sensitive apps using Platform MPI. We call the technique alternating; > any communicating peer will use a HCA or port, but a single MPI > process will alternate between different HCAs (or ports), depending on > which peer he communicates with. > > > H?kon From asabigue at fing.edu.uy Sun Mar 29 06:22:10 2009 From: asabigue at fing.edu.uy (ariel sabiguero yawelak) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Memory errors poll Message-ID: <49CF7602.6030205@fing.edu.uy> Hi all. This is not a direct HPC question per-se, but your clusters are an excellent source for the information I need, so here it goes: /Could those of you running ECC memory give me an updated figure on the number of errors detected/corrected per day per system? / We are working on self-healing mechanisms and we need actual information on the number of errors that state-of-the-art systems are facing today. You can imagine why I envy your farms.... I have an old figure of about 1 error-bit per day per system at sea level, but I would like to know if it is getting worse or better. thanks in advance ariel -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090329/b2ab3668/attachment.html From hahn at mcmaster.ca Sun Mar 29 22:11:20 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: <49CF7602.6030205@fing.edu.uy> References: <49CF7602.6030205@fing.edu.uy> Message-ID: > /Could those of you running ECC memory give me an updated figure on the > number of errors detected/corrected per day per system? / we replace dimms which show > 1000 corrected ECCs per day (or any overflows, for which counts are inaccurate, or any uncorrectable errors.) > I have an old figure of about 1 error-bit per day per system at sea > level, but I would like to know if it is getting worse or better. we have several thousand nodes, and most of them go for months without any corrected ECCs (probably all within 200M of sea level). From james.p.lux at jpl.nasa.gov Mon Mar 30 08:42:03 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: References: <49CF7602.6030205@fing.edu.uy> Message-ID: > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of Mark Hahn > Sent: Sunday, March 29, 2009 10:11 PM > To: ariel sabiguero yawelak > Cc: Beowulf@beowulf.org > Subject: Re: [Beowulf] Memory errors poll > > > /Could those of you running ECC memory give me an updated figure on > > the number of errors detected/corrected per day per system? / > > we replace dimms which show > 1000 corrected ECCs per day (or > any overflows, for which counts are inaccurate, or any > uncorrectable errors.) That seems a remarkably high rate, for the raw memory errors. Micron quotes something like 100 soft errors per 1E9 device hours. (That's a FIT:failure in time of 100) If I saw that rate, I'd assume that there's something seriously wrong with the part. > > > I have an old figure of about 1 error-bit per day per system at sea > > level, but I would like to know if it is getting worse or better. This is something readily available from the memory manufacturers, at the device level. Beware of random stuff you read on the web.. That is, check the date of the data being used in the article. Technologies change over time, pretty substantially, so observations about DRAM error rates in 1998 probably aren't applicable to DRAM error rates in 2008 (unless you happen to be using 10 year old memory!) A recent paper is by Borucki, Schindlbeck and Slayman (IEEE CFP 08 RPS-CDR 46th ann. Intl. Rel. Physics Symp. 2008, pp482ff) comments that for modern parts, high energy cosmic rays are more important than alpha particles, and reports on measurements made on DIMMs. They blasted modern mobos in a neutron test facility, and then scaled for New York. It looks like about 100-200 FIT/Gb, which corresponds with Micron's numbers, above. They also looked at multibit and logic errors as well as simple memory cell errors. As expected, the SEU rate (per bit) is going down as features get smaller, but logic error rates stay roughly the same. OK.. So you got a box with, say, 4Gbyte of RAM.. That's 32 Gb, so you'd expect something like 5000 errors per 1E9 hours, or 5 errors per 1E6 hours.. An error every 200,000 hours or 22 years (if my before coffee math in my head is right) I suspect that most "memory errors" reported for PCs (whether in clusters or not) are manifestations of bus timing problems, perhaps over temperature, rather than actual bit flips in memory. The actual measured rate of single event upsets is so low > > we have several thousand nodes, and most of them go for > months without any corrected ECCs (probably all within 200M > of sea level). > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org To change your > subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From mathog at caltech.edu Mon Mar 30 09:56:02 2009 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] GPU diagnostics? Message-ID: Have any of you CUDA folks produced diagnostic programs you run during "burn in" of new GPU based systems, in order to weed out problem units before putting them into service? Minimally, something resembling memtest86, to be used to find buggy memory associated with the GPU? Optimally, it would also more directly exercise the GPU's capabilities. I asked on the NV linux forum if there were any official Nvidia graphics card diagnostic programs, and nobody there answered with one. This was originally with respect to some VDPAU issues, where it looked at first like there might be a hardware problem on a small set of systems, including mine, although in the end it turned out to be an uninitialized variable (it was not my code). There was no objective way to demonstrate for VDPAU based software that "this graphics card is functioning normally" to help sort this out. I figured the CUDA folks should have something like this, else how could you trust the results from the GPU calculations? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Mon Mar 30 10:10:17 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:22 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: References: Message-ID: <49D0FCF9.3070708@scalableinformatics.com> David Mathog wrote: > Have any of you CUDA folks produced diagnostic programs you run during > "burn in" of new GPU based systems, in order to weed out problem units > before putting them into service? Minimally, something resembling > memtest86, to be used to find buggy memory associated with the GPU? > Optimally, it would also more directly exercise the GPU's capabilities. > > I asked on the NV linux forum if there were any official Nvidia graphics > card diagnostic programs, and nobody there answered with one. This was > originally with respect to some VDPAU issues, where it looked at first > like there might be a hardware problem on a small set of systems, > including mine, although in the end it turned out to be an uninitialized > variable (it was not my code). There was no objective way to > demonstrate for VDPAU based software that "this graphics card is > functioning normally" to help sort this out. I figured the CUDA folks > should have something like this, else how could you trust the results > from the GPU calculations? Vendors have an nVidia supplied *GEMM based burn in test. Been thinking about a set of diagnostics end users can run as a sanity check. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Mon Mar 30 10:45:17 2009 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? Message-ID: Joe Landman wrote: > Vendors have an nVidia supplied *GEMM based burn in test. Been thinking > about a set of diagnostics end users can run as a sanity check. My suspicion is that vendors run such burn in tests only for a very brief time. That time being "the minimum time required to find the percentage of failed units above which it would cost us more if they were found to be bad in the field" - and not a second longer. Finding marginal memory, certainly one of the easier tests, can easily take 24 hours of testing. Somehow I cannot imagine vendors spending quite that long burning in a graphics card. Well, maybe a top of the line pro card, but certainly not your run of the mill $39 budget card. Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From becker at scyld.com Mon Mar 30 14:10:35 2009 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: Message-ID: On Mon, 30 Mar 2009, David Mathog wrote: > Joe Landman wrote: > > Vendors have an nVidia supplied *GEMM based burn in test. Been thinking > > about a set of diagnostics end users can run as a sanity check. > > My suspicion is that vendors run such burn in tests only for a very > brief time. That time being "the minimum time required to find the > percentage of failed units above which it would cost us more if they > were found to be bad in the field" - and not a second longer. I don't know about other vendors, but that's not Penguin's approach. One reason is that we don't know the failure profile. But really it's a trade-off between delivery expectations, likelihood of failures, and even how much air conditioning capacity remains in the burn-in room. We used to have a published policy of a minimum three day successful burn-in. If a part failed, or even if the machine rebooted, the three day clock started again. The challenge with that policy is that it leads to unpredictable delivery, which is distressing to someone that needs servers or workstations Right Now. Today the policy is much more flexible, in part driven by Penguin's change to building mostly clusters. Burn-in time is based on the product, potentially modified by per-machine notes on the customer delivery requirements. Cluster nodes have a preliminary stand-alone burn-in before being racked into a cluster. Whole clusters then have a full burn-in, usually running benchmarks and demo applications. You might expect nearly zero errors when already-tested machines are grouped in a cluster, but cluster applications can reveal errors that typical burn-in tests don't trigger. And even a low percentage of failures looks pretty bad when you have a few hundred machines in a cluster. > Finding > marginal memory, certainly one of the easier tests, can easily take 24 > hours of testing. And typically those memory modules test OK in a tester, even after being pulled from a machine showing memory errors. (That's not surprising, since most distributors test modules just before shipping them, and they are tested again just before installation.) > Somehow I cannot imagine vendors spending quite that > long burning in a graphics card. Well, maybe a top of the line pro > card, but certainly not your run of the mill $39 budget card. I'm guessing every vendor shipping big clusters or CUDA GPU systems does a substantial burn-in, although it's likely rare that they use parallel applications and check for successful runs. It's consumer-oriented low end production lines that can't fit a longer burn-in into the process. A production line with pre-imaged OS installations pretty much cannot do a full burn-in. -- Donald Becker becker@scyld.com Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA From mathog at caltech.edu Mon Mar 30 15:02:00 2009 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? Message-ID: Donald Becker wrote: > On Mon, 30 Mar 2009, David Mathog wrote: > > > Joe Landman wrote: > > > Vendors have an nVidia supplied *GEMM based burn in test. Been thinking > > > about a set of diagnostics end users can run as a sanity check. > > > > My suspicion is that vendors run such burn in tests only for a very > > brief time. That time being "the minimum time required to find the > > percentage of failed units above which it would cost us more if they > > were found to be bad in the field" - and not a second longer. > > I don't know about other vendors, but that's not Penguin's approach. By "vendor" I meant graphics card vendors, not cluster or HPC vendors. My interest in this sort of diagnostic arose in relation to an inexpensive graphics card bought at Newegg. I was asking here specifically because it seemed likely that HPC vendors _would_ have the sort of GPU diagnostic I was seeking, and might be willing to share it. (As opposed to the tool Joe referred to, which seems not to be generally available.) Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Mon Mar 30 15:31:17 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: References: Message-ID: <49D14835.9040803@scalableinformatics.com> David Mathog wrote: > Donald Becker wrote: >> On Mon, 30 Mar 2009, David Mathog wrote: >> >>> Joe Landman wrote: >>>> Vendors have an nVidia supplied *GEMM based burn in test. Been > thinking >>>> about a set of diagnostics end users can run as a sanity check. >>> My suspicion is that vendors run such burn in tests only for a very >>> brief time. That time being "the minimum time required to find the >>> percentage of failed units above which it would cost us more if they >>> were found to be bad in the field" - and not a second longer. >> I don't know about other vendors, but that's not Penguin's approach. > > By "vendor" I meant graphics card vendors, not cluster or HPC vendors. > My interest in this sort of diagnostic arose in relation to an > inexpensive graphics card bought at Newegg. I was asking here > specifically because it seemed likely that HPC vendors _would_ have > the sort of GPU diagnostic I was seeking, and might be willing to share > it. (As opposed to the tool Joe referred to, which seems not to be > generally available.) FWIW, we agree with (and implement something similar to) Don's burn in procedure, and yes, it sometimes annoys customers who want it *now*. But it also (massively) reduces infant mortality rates (and we we have even designed new disk packaging to reduce the impact of the sometimes fatal disk malady named UPS/Fedex-osis). This said, there really isn't a memory checker for GPUs just yet. Could be done, and probably should be ... Also, likely we should have a long term crunching diagnostic, where we already know the answer to a computational problem, and simply have it burn cycles. But GPUs are more complex than this, we need to worry about PCIe bus transfers, several different flavors of memory, etc. Really, since there is very little you can do if a GPU card is toast, other than replace it, it might be better to have the test done at this granularity. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From lindahl at pbm.com Mon Mar 30 15:45:51 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: <49D14835.9040803@scalableinformatics.com> References: <49D14835.9040803@scalableinformatics.com> Message-ID: <20090330224551.GA11672@bx9.net> On Mon, Mar 30, 2009 at 06:31:17PM -0400, Joe Landman wrote: > This said, there really isn't a memory checker for GPUs just yet. Could > be done, and probably should be ... But will it be like memtest86, which isn't as good as HPL at finding problems? If you've got DGEMM for your GPU, you're there. -- greg From landman at scalableinformatics.com Mon Mar 30 16:09:31 2009 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: <20090330224551.GA11672@bx9.net> References: <49D14835.9040803@scalableinformatics.com> <20090330224551.GA11672@bx9.net> Message-ID: <49D1512B.3060309@scalableinformatics.com> Greg Lindahl wrote: > On Mon, Mar 30, 2009 at 06:31:17PM -0400, Joe Landman wrote: > >> This said, there really isn't a memory checker for GPUs just yet. Could >> be done, and probably should be ... > > But will it be like memtest86, which isn't as good as HPL at finding > problems? If you've got DGEMM for your GPU, you're there. Heh... I erased the paragraph where I tore into using memtest* as anything other than a gross checker ... felt it wasn't too relevant. We run a few parallel codes as our testers. Beats the heck out of the system (you can hear the fans spin up on variable speed systems). Specifically, we purposefully (computationally) overload the unit and make sure we don't throw EDACs/MCEs. Yeah, *GEMM is good (some GPU cards don't do DGEMMs on them though ... older nVidia/ATI don't). Too bad Cuda won't run on the ATIs. Would really make maintaining this thing easy. If people can live with SGEMMs, and other FFT-like things, we can probably leverage (and make available) an older code we used a while ago. Actually, for another project, we just did a DGETF and a few other ports. Let me know if you want me to clean it up and make it available. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From james.p.lux at jpl.nasa.gov Mon Mar 30 16:38:30 2009 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: References: Message-ID: > > Finding > > marginal memory, certainly one of the easier tests, can > easily take 24 > > hours of testing. > > And typically those memory modules test OK in a tester, even > after being pulled from a machine showing memory errors. > (That's not surprising, since most distributors test modules > just before shipping them, and they are tested again just > before installation.) I suspect that the problem is not a "memory" problem per-se, but some other aspect.. Maybe a marginal timing thing on the bus. A lot of "memory tester" boxes basically just test that the memory is functional (i.e. you can read and write all locations at the rated speed). Looking at products from http://www.memorytest.com/ (which happened to be the first google hit) it looks like it does a basic functional test, but, in it's normal stock configuration, doesn't exercise the parts at the timing margins (i.e. drive it with setup and hold times at minimums, or perhaps the worst case transition timing). Nor does a simple tester really test whether the logic level voltage tolerances are what they should be (i.e. is the "eye" as open as it should be) The tester here http://www.microtestsystem.com/rs800-166.html seems to be able to just step the timing in suitable multiples of the basic clock rate (e.g. 2,3,4 clocks for Trd), but doesn't check to see if, maybe, the part stops working at 1.9 clocks. But hey, it DOES have a "heavy duty test start" button, which would be important! We won't even get into the possibility of latent ESD damage from handling. From lindahl at pbm.com Mon Mar 30 17:48:26 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: References: <49CF7602.6030205@fing.edu.uy> Message-ID: <20090331004826.GA3408@bx9.net> On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote: >> /Could those of you running ECC memory give me an updated figure on the >> number of errors detected/corrected per day per system? / > > we replace dimms which show > 1000 corrected ECCs per day > (or any overflows, for which counts are inaccurate, or any uncorrectable > errors.) These systems are a couple of generations old, right? I think I have Linux set up to record single-bit errors, and the rate I get is basically zero oh, uh, 5 terabytes of modern ram, at sea level. When I installed some new memory I had a few systems with modest numbers of single-bit upsets, and the vendor was happy to swap dimms until the problem went away. I think he also does that during his factory burn-in. -- greg From lindahl at pbm.com Mon Mar 30 18:00:26 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: <49D1512B.3060309@scalableinformatics.com> References: <49D14835.9040803@scalableinformatics.com> <20090330224551.GA11672@bx9.net> <49D1512B.3060309@scalableinformatics.com> Message-ID: <20090331010026.GA5666@bx9.net> On Mon, Mar 30, 2009 at 07:09:31PM -0400, Joe Landman wrote: > If people can live with SGEMMs, and other FFT-like things, we can > probably leverage (and make available) an older code we used a while > ago. Well, anything that's intensive and checks its answer is likely to be pretty good. -- greg From marcelosoaressouza at gmail.com Fri Mar 27 13:31:28 2009 From: marcelosoaressouza at gmail.com (Marcelo Souza) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] MPICH2 1.0.8p1 x86_64 (amd64) Package for Debian 5.0 (Lenny) Message-ID: <12c9ca330903271331w1fc7330cp96c87bbdaf01216b@mail.gmail.com> http://www.cebacad.net/files/mpich2_1.0.8p1_amd64.deb http://www.cebacad.net/files/mpich2_1.0.8p1_amd64.deb.md5 Tomorrow i386 packages. -- Marcelo Soares Souza http://marcelo.cebacad.net From chiendarret at gmail.com Sun Mar 29 01:25:30 2009 From: chiendarret at gmail.com (Francesco Pietra) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] ssh connection passwordless In-Reply-To: <63a33c4e0903242021q55999430oa45f77cad5646831@mail.gmail.com> References: <63a33c4e0903242021q55999430oa45f77cad5646831@mail.gmail.com> Message-ID: I got everything working fine - like in the past - along a more pedestrian cross copying the authorized keys. Then, slogin put the other machines on known_hosts. Next slogin occurs without password. Also (which is mandatory to run parallel a special computational code) issuing on the server itself slogin servername places itself on known_hosts. Then, issuing ssh servername date gives the date without requesting the password. That is, this is an ssh to the same machine from which it is commanded and it works because the machine knows itself. I am no system expert, just need to have the machine running the codes; I have not much investigated why I was unable to send the keys. Maybe that above is inelegant. I know that other users solved the issue of getting the date that way with keychain. For me it was easier as above. The permissions on the desktop required changes to get the security status you suggested. regards francesco On Wed, Mar 25, 2009 at 5:21 AM, Ashish Zankar wrote: > In addition to the above steps follow these steps:- > > 1) check the permission for?users home directory it should 755 i.e. group > and others should???? not have write permission on the?users home directory. > 2) check the permission on .ssh directory in the?users home directory. > ?? it should like this??? drwx------?? 2 tjx864?? scs??? 4096 Jan 28 14:33 > .ssh/ > 3) check the permission of the authorized keys file in .ssh directory of > the?users home?????? directory. > ?? it should be like this -rw-------?? 1?user_id???group? 793 Jan 28 14:35 > authorized_keys > ?? and also?????????????? -rw-------?? 1?user_id???group??? 1675 Jan 28 > 14:34 id_rsa > Please let me know if the above solution has helped you or not. > Thanks & Regards, > Ashish Zankar > Bangalore. > On Thu, Mar 19, 2009 at 2:52 PM, Francesco Pietra > wrote: >> >> HI: >> >> I have a computing machine and a desktop ssh passwordless >> interconnected through a Zyxel router (which is dhpc on Internet). I >> have now added a second computing machine. I am unable to get all >> three machines passwordless interconnected at the same time. Just only >> two. If I want to have the third computer passwordless connected to >> one of the other two, I have to exchange id_rsa.pub between the two >> again. Mistake or intrinsic feature of ssh? >> >> What I did: >> >> (1)generating ?the keys with "ssh-keygen -t rsa" >> >> (2) getting "reserved" the machines on the router >> >> (3)scp id_rsa.pub to the "authorized_keys" >> >> It is also mandatory that asking the "date" to the other computer >> (slogin ... date), the date is given without asking the password. That >> is an issue of a computational code that for its internal >> parallelization needs that (I have not investigated why). >> >> thanks >> >> francesco >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Thanks & Regards, > Ashish Zankar > Bangalore > From jorg.sassmannshausen at strath.ac.uk Mon Mar 30 06:14:50 2009 From: jorg.sassmannshausen at strath.ac.uk (=?ISO-8859-1?Q?J=F6rg_Sa=DFmannshausen?=) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] TCP connect error: ECONNREFUSED. Message-ID: <49D0C5CA.4050008@strath.ac.uk> Dear all, I am having this rather anoying problem with the parallel execution of one of the programs (GAMESS US version) on our cluster. The error message is: TCP connect error: ECONNREFUSED. TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. A fatal error occurred on DDI Process 0. TCP connect error: ECONNREFUSED. TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. A fatal error occurred on DDI Process 60. TCP connect error: ECONNREFUSED. TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. A fatal error occurred on DDI Process 2. TCP connect error: ECONNREFUSED. [ ... ] Eventually, the ddicick tips over and the whole thing crashes. The program is using rsh (yes, I know, security, I did not install the cluster!) and I can rsh comp10 -> comp02 and there is no firewall installed between the nodes (at least, not that I am aware of). Trying to run the same job with the same number of nodes will fail X times and at X+1 suddenly work. I could not work out a pattern for that (other that I get exponentially annoyed). Right now, there is only one gigabit network connecting the cluster, so nfs, mpi etc. is all running over one interface (again, I did not set up the cluster). I have run out of ideas of where to look. I checked (as quickly as possible) at some nodes with netstat, the ddicick program is acutally running. Changing to ssh did not solve the problem. I would appreciate any feedback as it is highly anyoing to wait Y days to get the job running and then it crashes. All the best from Glasgow! J?rg -- ************************************************************* J?rg Sa?mannshausen Research Fellow University of Strathclyde Department of Pure and Applied Chemistry 295 Cathedral St. Glasgow G1 1XL email: jorg.sassmannshausen@strath.ac.uk web: http://sassy.formativ.net Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html From hahn at mcmaster.ca Mon Mar 30 21:12:43 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: References: <49CF7602.6030205@fing.edu.uy> Message-ID: >>> /Could those of you running ECC memory give me an updated figure on >>> the number of errors detected/corrected per day per system? / >> >> we replace dimms which show > 1000 corrected ECCs per day (or >> any overflows, for which counts are inaccurate, or any >> uncorrectable errors.) > > > That seems a remarkably high rate, for the raw memory errors. Micron quotes > something like 100 soft errors per 1E9 device hours. (That's a > FIT:failure in time of 100) 1000 per day seems high? it doesn't worry me much, since it's low enough that there will be very few double errors by coincidence, and almost certainly no measurable overhead. (overhead of polling and logging CEs _is_ measurable on machines with bad dimms, btw.) these dimms have 16 chips. also, these are observed CEs, which includes problems due to other dimms, sockets, the csrow bus and the (opteron) memory controller. I'm also not claiming that there are a significant number of dimms showing > 0 but < 1000 CEs/day. > If I saw that rate, I'd assume that there's something seriously wrong with the part. perhaps. one problem is that I don't have a good load-generator. when idle, or loaded with light-footprint jobs, even nodes with a real problem can wind up reporting few CEs. initially, my attempt at a load-generator was simply a multithreaded stream-like thing that kept blasting bit-patterns into big arrays. as far as I know, it's as likely to write bad ECC as read it, so you have to alternate r/w cycles. but being sequential is probably less than optimal (indeed, perhaps why memtest86 sometimes gives false negatives). > I suspect that most "memory errors" reported for PCs (whether in clusters >or not) are manifestations of bus timing problems, perhaps over temperature, >rather than actual bit flips in memory. The actual measured rate of single >event upsets is so low sure. I'm just talking about observed events reported by ECC hardware. interestingly, it's easy to imagine a scenario where the MC trains its dram parameters at one temperature, but winds up operating at another. and possibly operating poorly - things like skew are set by the bios and afaik never recalibrated. From hahn at mcmaster.ca Mon Mar 30 21:14:06 2009 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: <20090331004826.GA3408@bx9.net> References: <49CF7602.6030205@fing.edu.uy> <20090331004826.GA3408@bx9.net> Message-ID: >> we replace dimms which show > 1000 corrected ECCs per day >> (or any overflows, for which counts are inaccurate, or any uncorrectable >> errors.) > > These systems are a couple of generations old, right? waaait a minute - I think I gave the wrong impression. we have about 13 TB of this gen hardware (yes, from 3 years ago). our observed rate is that at a given moment, a fraction of 1% of the nodes have any EC's at all. our vendor is happy to replace dimms that have a nontrivial rate, and there aren't a lot of nodes that have had this done. one interesting thing is that during a 3-year period, seems like about 1% of nodes developed higher EC rates that disappeared when the dimms were reseated. I wonder whether this was the result of thermal cycling... > I think I have Linux set up to record single-bit errors, and the rate using edac? I toyed with mcelog before that, but never really got much traction until edac came with an updated kernel. From lindahl at pbm.com Mon Mar 30 21:25:43 2009 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] Memory errors poll In-Reply-To: References: <49CF7602.6030205@fing.edu.uy> <20090331004826.GA3408@bx9.net> Message-ID: <20090331042543.GB4943@bx9.net> On Tue, Mar 31, 2009 at 12:14:06AM -0400, Mark Hahn wrote: > using edac? I toyed with mcelog before that, but never really got much > traction until edac came with an updated kernel. Yes, EDAC. -- greg From hearnsj at googlemail.com Tue Mar 31 00:11:52 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] TCP connect error: ECONNREFUSED. In-Reply-To: <49D0C5CA.4050008@strath.ac.uk> References: <49D0C5CA.4050008@strath.ac.uk> Message-ID: <9f8092cc0903310011r496819e1m195b9a685a61107c@mail.gmail.com> 2009/3/30 J?rg Sa?mannshausen : > Dear all, > > I am having this rather anoying problem with the parallel execution of one > of the programs (GAMESS US version) on our cluster. The error message is: > Guys and girls, I am not putting Jorg in the spotlight here, I hope he understands this. As a general point - I have seen on several mailing lists folks asking for specific help with a system. That is of course what community mailing lists are for, and we all can learn from the answers. However, please, please contact the company who installed the cluster. They will be happy to provide support for it. I can state categorically that when working for two leading cluster companies we always went above and beyond what was strictly required for support, and would delve into issues like this at a very low level. That is why you buy a prebuilt and tested cluster with support rather than a pile of cardboard boxes. Cluster vendors also keep records of the configuration of systems - so if you are landed with an 'orphaned' system, or one you are not familiar with again just phone the vendor. As a personal point, there is nothing worse than being told three years down the line that a certain system never worked properly if that comes out of the blue and the end users never reported it or asked for help. While I'm on a rant, the staff working for cluster vendors are very knowlegeable - there is a definite revolving door between academia, industry and HPC integrators. You might find the guy who visits you to debug some TCP/IP problem today is sitting beside you next week. From kilian.cavalotti.work at gmail.com Tue Mar 31 01:27:55 2009 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] X5500 Message-ID: <200903311027.56296.kilian.cavalotti.work@gmail.com> So, Nehalem-EP is out, NDAs are gone, and we're all waiting for figures. :) Being advantaged by my easterly timezone, I'll start: SGI published a press release about the inclusion of Nehalems into their ICE platform, and added some benchmark galore. """ Performance Soars by 140 Percent with Unprecedented Scalability The new system delivers reliably scalable performance gains of up to 140 percent across a variety of data-intensive applications, including the Fluent computational fluid dynamics (CFD) application, the VASP Ab-Initio simulation, and WRF weather modeling. * Fluent. A 14-million-cell model of the external flow of a truck body was tested on 64 cores of an Altix ICE 8200EX with Intel? Xeon? processor 5500 series at 2.93GHz. On the new SGI system, the test ran 1.59x faster than AMD Shanghai at 2.7GHz and 1.73x faster than Altix ICE 8200EX with Intel? Xeon? processor X5472 at 3.0GHz on 64 cores ? with near linear scalability. For a larger 111-million-cell model, the performance improvement is even higher. Altix ICE 8200EX with Intel? Xeon? processor 5500 series at 2.93GHz/1067 is 1.64x faster than AMD Shanghai on 64 cores and 1.81x faster than Altix ICE 8200EX with Intel? Xeon? X5472 3.0GHz on 128 cores. * VASP. When running the bench.PdO standard benchmark for the Vienna Ab-Initio Simulation Package (VASP), the Altix ICE 8200EX with Intel? Xeon? processor 5500 series at 2.93GHz on 32 cores is 1.95x faster than the same system equipped with Intel? Xeon? processors X5470 at 3.0GHz. * WRF. The Weather Research and Forecasting (WRF) model of the continental United States at 2.5km resolution runs 2.4x faster on Altix ICE 8200EX with Intel? Xeon? processor 5500 series at 2.93GHz on 128 cores than it does on a similarly configured system featuring Intel? Xeon? processors X5470 at 3.0GHz. """ http://www.sgi.com/company_info/newsroom/press_releases/2009/march/altix_ice.html Any other numbers, people? Cheers, -- Kilian From kilian.cavalotti.work at gmail.com Tue Mar 31 02:52:18 2009 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: <49C9D303.3050000@atlanticlinux.ie> Message-ID: <200903311152.19553.kilian.cavalotti.work@gmail.com> On Wednesday 25 March 2009 14:25:30 Robert G. Brown wrote: > in fact, to me it seems to be less > secure, because it is at least partially keyed and there seems to be no > point in having a key if you're going to carry a table of shared secrets > around with you. Well, I think that the point of otpw is indeed to use OTPs which are made of a password prefix and a generated key suffix. So each time you log on, it requires something you know (the password), and something you have (the generated key on paper). It seems much more secure to me than say the traditional OPIE or S/KEY, as those only use the generated keys list to authenticate. And moreover, in those traditional schemes, the generated keys are deduced from each other, so that if you know the last one, you can basically regenerate the whole list. Cheers, -- Kilian From kilian.cavalotti.work at gmail.com Tue Mar 31 03:16:16 2009 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] One time password generators... In-Reply-To: References: Message-ID: <200903311216.18692.kilian.cavalotti.work@gmail.com> On Tuesday 24 March 2009 23:25:57 Robert G. Brown wrote: > There are a couple of possible exceptions to pursue in addition to the > e.g. RSA-like solutions with their enormous cost, but I thought I'd > throw it out to the group here too. Is there a straightforward low-cost > way to generate OTP's without ten thousand dollar server software > packages? When administering a previous cluster, I had to setup this kind of secure access for users. Management had a high sense of systems security, and absolutely rebuffed the idea of seeing their multi-million dollar cluster pwned and transformed into a spam sending workhorse. So users *had* to authenticate using one time passwords. To do so, users where provided a web-based OTP generator (through an SSL connection, identification being taken care of by a campus wide authentication mechanism). With this OTP, they could authenticate to a firewall running authpf [1]. After successful authentication, and for as long as they kept their authpf session open, they could then log on to the cluster frontends, using regular SSH authentication, delegated to campus Kerberos servers. MITM attacks (from the network) were somewhat mitigated by the OTP usage, but the whole chain security was relying on the campus authentication mechanism, which was, well, secure. It was far from a perfectly flawless and secure setup, but at least, access to the cluster was only allowed at the firewall level to currently authenticated users. Access was denied as soon as the firewall connection was closed. Authpf is a really useful piece of software. [1] http://www.openbsd.org/faq/pf/authpf.html Cheers, -- Kilian From hearnsj at googlemail.com Tue Mar 31 04:21:21 2009 From: hearnsj at googlemail.com (John Hearns) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] TCP connect error: ECONNREFUSED. In-Reply-To: <9f8092cc0903310011r496819e1m195b9a685a61107c@mail.gmail.com> References: <49D0C5CA.4050008@strath.ac.uk> <9f8092cc0903310011r496819e1m195b9a685a61107c@mail.gmail.com> Message-ID: <9f8092cc0903310421gd232941q27314d393d4ab267@mail.gmail.com> 2009/3/31 John Hearns : > 2009/3/30 J?rg Sa?mannshausen : >> Dear all, >> >> I am having this rather anoying problem with the parallel execution of one >> of the programs (GAMESS US version) on our cluster. The error message is: >> > As a general point - I have seen on several mailing lists folks asking > for specific help with a system. I got out on bed on the wrong side this morning. I do hope Jorg stays with the list - having expertise in computational chemistry will only add to the lists power. Maybe I should say things differently - your vendor's responsibility does not stop at hardware replacements. They often will go beyond their strictly contractual obligations and will diagnose problems with Linux kernel drivers, batch schedulers, application software etc. This is part of the "value add" which is why you (read your institution) went to a specialised cluster vendor, and not the lowest bidder. From kus at free.net Tue Mar 31 09:05:16 2009 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] X5500 In-Reply-To: <200903311027.56296.kilian.cavalotti.work@gmail.com> Message-ID: In message from Kilian CAVALOTTI (Tue, 31 Mar 2009 10:27:55 +0200): > ... >Any other numbers, people? I beleive there is also a bit other important numbers - prices for Xeon 55XX and system boards ;-) I didn't see prices on pricegrabber, for example. Is there some price information available ? Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemostry RAS Moscow From m.j.harvey at imperial.ac.uk Tue Mar 31 09:05:19 2009 From: m.j.harvey at imperial.ac.uk (M J Harvey) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] GPU diagnostics? In-Reply-To: References: Message-ID: <49D23F3F.9070308@imperial.ac.uk> David Mathog wrote: > Have any of you CUDA folks produced diagnostic programs you run during > "burn in" of new GPU based systems, in order to weed out problem units > before putting them into service? A while ago I wrote a CUDA implementation of a subset of the Memtest86+ algorithms,to test the reliability of the consumer GPUs used by our distributed computing project, GPUGRID. You can get them here: http://ccs.chem.ucl.ac.uk/~matt/cudamemtest.tgz That said, we never really used it in anger (most of the stability problems we were having turned out to be due to 'factory-overclocked' GPUs) so YMMV. MJH -- Matt Harvey Email: m.j.harvey@imperial.ac.uk HPC Systems Support Analyst Imperial College London PGP Key ID: 0xD234302E http://www.imperial.ac.uk/ict/services/highperformancecomputing From bill at cse.ucdavis.edu Tue Mar 31 11:24:22 2009 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] X5500 In-Reply-To: References: Message-ID: <49D25FD6.8040606@cse.ucdavis.edu> Mikhail Kuzminsky wrote: > In message from Kilian CAVALOTTI (Tue, > 31 Mar 2009 10:27:55 +0200): >> ... >> Any other numbers, people? > > I beleive there is also a bit other important numbers - prices for Xeon > 55XX and system boards ;-) www.siliconmechanics.com has system pricing, I'm sure there are tons of others. > I didn't see prices on pricegrabber, for example. > Is there some price information available ? If you want parts, newegg has some, the E5520 is $379.99. The S7010 tyan (dual socket nehalem server motherboard) is $389.99. I have to say that I'm surprised at the number of Nehalem chips released (around 11 for dual sockets) and the pricing seems surprisingly low for a new release. From rpnabar at gmail.com Tue Mar 31 15:54:55 2009 From: rpnabar at gmail.com (Rahul Nabar) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] job runs with mpirun on a node but not if submitted via Torque. Message-ID: I've a strange OpenMPI/Torque problem while trying to run a job on our Opteron-SC-1435 based cluster: Each node has 8 cpus. If I got to a node and run like so then the job works: mpirun -np 6 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} Same job if I submit through PBS/Torque then it starts running but the individual processes keep crashing: mpirun -np 6 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} I know that the --hostfile directive is not needed in the latest torque-OpenMPI jobs. I also tried including: mpirun -np 6 --hosts node17,node17,node17,node17,node17,node17 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} Still does not work. What could be going wrong? Are there other things I need to worry about when PBS steps in? Any tips? The ${DACAPOEXE_PAR} refers to a fortran executable for the computational chemistry code DACAPO. What;s the differences between submitting a job on a node via mpirun directly vs via Torque. Shouldn't these both be transparent to the fortran calls. I am assuming don't have to dig into the fortran code. Any debug tips? Thanks! -- Rahul From djholm at fnal.gov Tue Mar 31 16:43:50 2009 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] job runs with mpirun on a node but not if submitted via Torque. In-Reply-To: References: Message-ID: How are your individual MPI processes crashing when run under Torque? Are there any error messages? The environment for a Torque job on a worker node under openMPI is inherited from the pbs_mom process. Sometimes differences between this environment and the standard login environment can cause troubles. For example, on Infiniband clusters the "maximum locked memory" ulimit may need to be adjusted by editing the script used to launch pbs_mom (usually the pbs-client init.d script). I've also seen stack size problems in some user binaries that require such a ulimit adjustment to mimic what they may have in their .bash_profile. Instead of logging into the node directly, you might want to try an interactive job (use "qsub -I") and then try your mpirun. This may give you messages that for some reason aren't getting back to you in your job's .o or .e files. Don Holmgren Fermilab On Tue, 31 Mar 2009, Rahul Nabar wrote: > I've a strange OpenMPI/Torque problem while trying to run a job on our > Opteron-SC-1435 based cluster: > > Each node has 8 cpus. > > If I got to a node and run like so then the job works: > > mpirun -np 6 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} > > Same job if I submit through PBS/Torque then it starts running but the > individual processes keep crashing: > > mpirun -np 6 ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} > > I know that the --hostfile directive is not needed in the latest > torque-OpenMPI jobs. > > I also tried including: > > mpirun -np 6 --hosts node17,node17,node17,node17,node17,node17 > ${EXE_PATH}/${DACAPOEXE_PAR} ${ARGS} > > Still does not work. > > What could be going wrong? Are there other things I need to worry > about when PBS steps in? Any tips? > > The ${DACAPOEXE_PAR} refers to a fortran executable for the > computational chemistry code DACAPO. > > What;s the differences between submitting a job on a node via mpirun > directly vs via Torque. Shouldn't these both be transparent to the > fortran calls. I am assuming don't have to dig into the fortran code. > Any debug tips? > > Thanks! From rpnabar at gmail.com Tue Mar 31 16:58:45 2009 From: rpnabar at gmail.com (Rahul Nabar) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] job runs with mpirun on a node but not if submitted via Torque. In-Reply-To: References: Message-ID: On Tue, Mar 31, 2009 at 6:43 PM, Don Holmgren wrote: > > How are your individual MPI processes crashing when run under Torque? ?Are > there any error messages? Thanks Don! There aren't any useful error messages. My job hierarchy is actually like so: {shell_script sumitted to Torque} --> calls Python--> Loop until convergence {Calls a fortran executable} The fortran executable is the one that has the mpi calls to parrellize over processors. The crash is *not* so bad that torque kills the job. What happens is that the fortran exec crashes and python continues to loop it over and over again. The crash is only whenever I submit via torque. If I do this instead mpirun fron node --> shell wrapper--> calls Python--> Loop until convergence {Calls a fortran executable} Then everything works fine. Note that the Python and shell are not truely parallelized. The fortran is the only place where actual parallelization happens. > The environment for a Torque job on a worker node under openMPI is inherited > from the pbs_mom process. ?Sometimes differences between this environment > and > the standard login environment can cause troubles. Exactly. Can I somehow obtain a dump of this environment to compare the direct mprun vs the torque run? What would be the best way? Just a dump from set? Any crucial variables to look for? Maybe a ulimit? > > Instead of logging into the node directly, you might want to try an > interactive > job (use "qsub -I") and then try your mpirun. I'm trying that now. -- Rahul From rpnabar at gmail.com Tue Mar 31 17:05:40 2009 From: rpnabar at gmail.com (Rahul Nabar) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] job runs with mpirun on a node but not if submitted via Torque. In-Reply-To: References: Message-ID: On Tue, Mar 31, 2009 at 6:43 PM, Don Holmgren wrote: > Instead of logging into the node directly, you might want to try an > interactive > job (use "qsub -I") and then try your mpirun. ?This may give you messages > that > for some reason aren't getting back to you in your job's .o or .e files. I tried an interactive job; this seems the key: forrtl: error (78): process killed (SIGTERM) mpirun noticed that job rank 5 with PID 10580 on node node17 exited on signal 11 (Segmentation fault). I do not get this segfault when I run directly on the node but only when I run via Torque. Any clues? -- Rahul From rpnabar at gmail.com Tue Mar 31 17:10:27 2009 From: rpnabar at gmail.com (Rahul Nabar) Date: Wed Nov 25 01:08:23 2009 Subject: [Beowulf] job runs with mpirun on a node but not if submitted via Torque. In-Reply-To: References: Message-ID: On Tue, Mar 31, 2009 at 6:43 PM, Don Holmgren wrote: > ?I've > also seen stack size problems in some user binaries that require such a > ulimit adjustment to mimic what they may have in their .bash_profile. Also, ulimit returns "unlimited" in both cases. -- Rahul From dgs at slac.stanford.edu Tue Mar 31 10:57:03 2009 From: dgs at slac.stanford.edu (David Simas) Date: Wed Nov 25 01:08:24 2009 Subject: [Beowulf] TCP connect error: ECONNREFUSED. In-Reply-To: <49D0C5CA.4050008@strath.ac.uk> References: <49D0C5CA.4050008@strath.ac.uk> Message-ID: <20090331175703.GC17687@horus.slac.stanford.edu> On Mon, Mar 30, 2009 at 02:14:50PM +0100, J?rg Sa?mannshausen wrote: > Dear all, > > I am having this rather anoying problem with the parallel execution of > one of the programs (GAMESS US version) on our cluster. The error > message is: > > TCP connect error: ECONNREFUSED. > TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. > A fatal error occurred on DDI Process 0. > TCP connect error: ECONNREFUSED. > TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. > A fatal error occurred on DDI Process 60. > TCP connect error: ECONNREFUSED. > TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208. > A fatal error occurred on DDI Process 2. > TCP connect error: ECONNREFUSED. > > [ ... ] > > Eventually, the ddicick tips over and the whole thing crashes. The > program is using rsh (yes, I know, security, I did not install the > cluster!) and I can rsh comp10 -> comp02 and there is no firewall > installed between the nodes (at least, not that I am aware of). Trying > to run the same job with the same number of nodes will fail X times and > at X+1 suddenly work. I could not work out a pattern for that (other > that I get exponentially annoyed). Right now, there is only one gigabit > network connecting the cluster, so nfs, mpi etc. is all running over one > interface (again, I did not set up the cluster). How rapidly are these rsh connection attempts occuring? The rsh protocol requires connections from privileged ports - less than 1024. If a host attempts to make more than 1024 to another host in less than TCP TIME-WAIT seconds, it will run out ports and the connections will fail. I've seen this occur with parallel applications using rsh. David S. > > I have run out of ideas of where to look. I checked (as quickly as > possible) at some nodes with netstat, the ddicick program is acutally > running. Changing to ssh did not solve the problem. > > I would appreciate any feedback as it is highly anyoing to wait Y days > to get the job running and then it crashes. > > All the best from Glasgow! > > J?rg > > > -- > ************************************************************* > J?rg Sa?mannshausen > Research Fellow > University of Strathclyde > Department of Pure and Applied Chemistry > 295 Cathedral St. > Glasgow > G1 1XL > > email: jorg.sassmannshausen@strath.ac.uk > web: http://sassy.formativ.net > > Please avoid sending me Word or PowerPoint attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From jorg.sassmannshausen at strath.ac.uk Tue Mar 31 15:54:38 2009 From: jorg.sassmannshausen at strath.ac.uk (=?iso-8859-1?q?J=F6rg_Sa=DFmannshausen?=) Date: Wed Nov 25 01:08:24 2009 Subject: [Beowulf] TCP connect error: ECONNREFUSED. In-Reply-To: <200903311900.n2VJ06hD010607@bluewest.scyld.com> References: <200903311900.n2VJ06hD010607@bluewest.scyld.com> Message-ID: <200903312354.38497.jorg.sassmannshausen@strath.ac.uk> Dear John and others, first of all, I am reading the list now for some years and I gained quite a bit of knowledge here, so why shall I leave? Don't worry, I got no intention to do so. I can see you point and I have already contacted the cluster support. However, I have mixed experiences with support in the past. The last cluster I was working with was basically the hardware and the guys from out IT installed the OS on it. They probably had a hell of a time to do so, but the consequent clusters are easier to install (for them). As they don't know computational chemistry and I want to use the programs, we made the inofficial deal that I am dealing with the programs and they with the OS. So we both learned something and the system was running. Things are different here and I slowly have to find my way around. To be honest, I stumbled over some severe missconfiguration, hence my trust in that company is not strong. Being upfront as I am, there is some substantial sloppy workmanship in the software installation and it already took me some days to iron out some of it. Even though the cluster grew over the last years, one certainly can do better. For that reason I was asking here for some new ideas how to tackle the problem and, to be honest, I am impressed by the feedback I got from here. I am on the GAMESS list as well and this error comes up there some times. Usually it is a misconfigured hosts file, sometimes firewalls as people are using machines connected to the WAN instead of a local, private LAN. I can rule that out. I thought I reply in more detail to John's post to show that I am not offended and what motivations I had to post here. All the best from Glasgow! J?rg On Dienstag 31 M?rz 2009 beowulf-request@beowulf.org wrote: > 2009/3/31 John Hearns : > > 2009/3/30 J?rg Sa?mannshausen : > >> Dear all, > >> > >> I am having this rather anoying problem with the parallel execution of > >> one of the programs (GAMESS US version) on our cluster. The error > >> message is: > > > > As a general point - I have seen on several mailing lists folks asking > > for specific help with a system. > > I got out on bed on the wrong side this morning. > I do hope Jorg stays with the list - having expertise in computational > chemistry will only add to > the lists power. > > Maybe I should say things differently - your vendor's responsibility > does not stop at hardware replacements. > They often will go beyond their strictly contractual obligations and > will diagnose problems with > Linux kernel drivers, batch schedulers, application software etc. This > is part of the "value add" which is why you (read your institution) > went to a specialised cluster vendor, and not the lowest bidder. -- ************************************************************* J?rg Sa?mannshausen Research Fellow University of Strathclyde Department of Pure and Applied Chemistry 295 Cathedral St. Glasgow G1 1XL email: jorg.sassmannshausen@strath.ac.uk web: http://sassy.formativ.net Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html