From hahn at mcmaster.ca Tue Jan 1 12:01:12 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Building a 2 node cluster using mpich In-Reply-To: <2D1ECDD5-85D7-4A02-B49F-3BEE4D9CCB93@staff.uni-marburg.de> References: <2D1ECDD5-85D7-4A02-B49F-3BEE4D9CCB93@staff.uni-marburg.de> Message-ID: >> 4.Finally, create identical user accounts on each node. In our case, >> we create the user DevArticle on each node in our cluster. You can >> either create the identical user accounts during installation, or you >> can use the adduser command as root. > > better use NIS (or LDAP). So you only have to define the users once. for a small cluster, LDAP is overkill (and NIS is, afaik, still insecure). it's much easier to either have a single, shared NFS root (so /etc/{passwd,shadow,group} are inherently in sync) or else just periodically rsync these files from a master node to all others. From tcarroll at ursinus.edu Thu Jan 3 18:43:29 2008 From: tcarroll at ursinus.edu (Thomas Carroll) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Socket AM2 Opterons?? and comments requested... Message-ID: <1199414609.31014.23.camel@Inverness> Hi, I recently posted a potential build for my new cluster nodes and got some great advice (thanks especially to Bill and Mark). I've done some additional research and concluded that AMD will likely give the best performance for my code and also the best bang for the buck. AMD also has many more appealing mobo options. Here's my current prospective node: AMD Athlon 64 X2 6400+ Windsor 3.2GHz Socket AM2 125W Dual-Core G.SKILL 4GB(2 x 2GB) 240-Pin DDR2 SDRAM DDR2 800 (PC2 6400) GIGABYTE GA-M61P-S3 AM2 NVIDIA GeForce 6100 ATX AMD Motherboard Rosewill R804BK Black Steel ATX Mid Tower Computer Case 300W I've scored some free hard drives and dvd drives so that I can start diskful and get simulating before I figure out exactly how I want to do diskless. I'll also be going with GigE and doing the best I can; it seems others have had success with ScaLAPACK and GigE for my type of application. (Hopefully a small grant in the near future will allow an upgrade of the network - myrinet is beyond my budget right now.) My main question (besides throwing this configuration out there for comments) is about the CPU. I noticed on newegg that there are a few socket AM2 dual core opterons (the Santa Ana). Does anyone have any experience with these? There are only a few (most of the opterons seem to be socket F and the socket F mobos are far too expensive). The opteron seems to be a popular cluster choice - any thoughts on whether these would be superior to my choice above? Again, thanks everyone for the help! -tom From andrew at moonet.co.uk Fri Jan 4 04:50:37 2008 From: andrew at moonet.co.uk (andrew holway) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Virtual resource manager Message-ID: Hi, Id like to find out if there are any projects out there to develop a resource manager that can control a virtual cluster. We would like to explore the idea of using xen to deploy operating systems on nodes, checkpoint jobs and deploy MS ccs. Primarily interested in open source initiatives. If anyone has heard anything like this please let me know. Cheers Andy From ascheinine at tuffmail.us Fri Jan 4 05:35:55 2008 From: ascheinine at tuffmail.us (Alan Louis Scheinine) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <477E363B.1020901@tuffmail.us> Andrew Holway wrote: > I'd like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open source > initiatives. A good question. I don't know the answer, nonetheless I would like to mention one point of view. From what I've seen with LSF and SGE, they expect to have a certain set of computers with specific names. In contrast, with Xen that number of computers with different names and different addresses is arbitrary. But on the other hand, if you want to balance the computational load, you need to know the number of actual processors. I realize that you asked about a "resource manager" which does not necessarily imply load balancing. Nevertheless, to focus on the load balancing aspect, it seems practical to have a batch system that is based on actual computers so that the job manager knows how much real resources have been given, then for a Xen-based job a set of computers runs a (parallel) job starts as a script that creates the Xen processes and when finished returns the nodes to the job queue pool as non-virtual machines. I don't know what is available for others to use, we are developing something in-house. It is not simple because the Xen processes will run parallel jobs, so they may need a NIS server and DNS server for their specialized names and users. Moreover, for security in a Grid computing environment, each collection of Xen processes for a parallel job will have its own VLAN. So the script that starts the Xen collection needs to also change the Ethernet switch. I look forward to reading suggestions from other Beowulf list members. Best regards, Alan Scheinine -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From csamuel at vpac.org Fri Jan 4 15:22:53 2008 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Virtual resource manager In-Reply-To: <133368502.3801199488840255.JavaMail.root@zimbra.vpac.org> Message-ID: <232523152.3821199488973683.JavaMail.root@zimbra.vpac.org> Hi Andrew, ----- "andrew holway" wrote: > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open > source initiatives. How about Moab ? Not open source, but it does seem like it can do a fair bit of what you want.. http://www.clusterresources.com/products/mwm/docs/5.6resourceprovisioning.shtml > Enabling provisioning consists of configuring an interface to a > provisioning manager, specifying which nodes can take advantage > of this service, and what the estimated cost and duration of > each change will be. This interface can be used to contact a > system such as System Imager, XCat, Xen, RedCarpet or NIM or > to contact a locally developed system via a script or web service. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From gmpc at sanger.ac.uk Sat Jan 5 06:08:25 2008 From: gmpc at sanger.ac.uk (Guy Coates) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <477F8F59.3060709@sanger.ac.uk> andrew holway wrote: > Hi, > > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. You might want to take a look at openqrm as a starting point; http://www.openqrm.org/ It allows you to dynamically provision virtual or real machine images onto physical hardware. It will also grow or shrink the pool of virtual machines in response to changes in "load". Cheers, Guy -- Dr Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 ex 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From forum.san at gmail.com Fri Jan 4 23:59:41 2008 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] For only NAMD users Message-ID: Hi All, I installed NAMDCharm2.6 on AMD64 dual core dual processor with gcc and MPICH2. I don't know the science behind this application. As a HPC support engineer, I've to test it our cluster hardware. If any member of this Mailing list used NAMD please let me know how to run it and where I can get the input files. regards, Sangamesh HPC Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080105/76c26415/attachment.html From wrankin at ee.duke.edu Sat Jan 5 18:39:03 2008 From: wrankin at ee.duke.edu (Bill Rankin) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] For only NAMD users In-Reply-To: References: Message-ID: <7DE7AF46-D73F-4BE5-90F2-A13D06661D06@ee.duke.edu> On the NAMD website: http://www.ks.uiuc.edu/Research/namd/ If you look through the release notes: http://www.ks.uiuc.edu/Research/namd/2.6/notes.html towards the bottom of the document they have a note on running NAMD with some simple input files. Hope this helps, -bill On Jan 5, 2008, at 2:59 AM, Sangamesh B wrote: > > Hi All, > > > I installed NAMDCharm2.6 on AMD64 dual core dual processor with > gcc and MPICH2. > > I don't know the science behind this application. > > As a HPC support engineer, I've to test it our cluster hardware. > > If any member of this Mailing list used NAMD please let me know > how to run it and where I can get the input files. > > regards, > Sangamesh > HPC Engineer > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From john.leidel at gmail.com Mon Jan 7 08:14:09 2008 From: john.leidel at gmail.com (John Leidel) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] quad-socket opteron memory performance Message-ID: <1199722449.13428.40.camel@e521.site> Does anyone have any recent memory performance numbers [specifically latency] from the quad-socket opteron's? --john From raysonlogin at gmail.com Mon Jan 7 09:41:05 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <73a01bf20801070941n91ffa5bk72b101b8a7cda65f@mail.gmail.com> On Jan 4, 2008 7:50 AM, andrew holway wrote: > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open source > initiatives. Take a look at this paper: "Xen and the Art of Cluster Scheduling". It integrates SGE (Sun Grid Engine) and Xen, and creates XGE (Xen Grid Engine): http://ds.informatik.uni-marburg.de/de/publications/pdf/Xen%20and%20the%20Art%20of%20Cluster%20Scheduling.pdf And Sun has another open source project called Open xVM. xVM Server is a hypervisor based on Xen and xVM Ops Center allows provisioning of cluster nodes. http://openxvm.org/ http://en.wikipedia.org/wiki/Sun_xVM xVM is used to deploy cluster nodes at the Ranger supercomputer at TACC. With 3,936 nodes and 16 cores per node: http://blogs.sun.com/stevewilson/entry/xvm_at_tacc I believe you would be able to get some feedback from both the SGE and the OpenxVM projects... SGE homepage: http://gridengine.sunsource.net/ Rayson > > If anyone has heard anything like this please let me know. > > Cheers > > Andy > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From ntmoore at gmail.com Mon Jan 7 14:39:11 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node Message-ID: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> Hi, I've aquired a few clusternodes that have multiple ethernet jacks. One the system-config-network applet I see several different adaptors (eg eth0 and eth1). Right now, I've got a many more free ports on my switch than I have nodes in my cluster, so I'm wondering if there's some performance benefit from hooking up the second NIC. Do any of you have a tutorial on multiple NIC's per compute node that you'd be willing to share? I'm assigning static IP's with named, and cocurrently maintaining /etc/hosts files on each machine with the full cluster map. Do I "just" assign a secon IP address for the second eth1 jack? Is there more to it? Nathan Moore -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080107/8c4ffa93/attachment.html From gdjacobs at gmail.com Mon Jan 7 15:15:47 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> Message-ID: <4782B2A3.1090205@gmail.com> Nathan Moore wrote: > Hi, > > I've aquired a few clusternodes that have multiple ethernet jacks. One > the system-config-network applet I see several different adaptors (eg > eth0 and eth1). Right now, I've got a many more free ports on my switch > than I have nodes in my cluster, so I'm wondering if there's some > performance benefit from hooking up the second NIC. > > Do any of you have a tutorial on multiple NIC's per compute node that > you'd be willing to share? I'm assigning static IP's with named, and > cocurrently maintaining /etc/hosts files on each machine with the full > cluster map. Do I "just" assign a secon IP address for the second eth1 > jack? Is there more to it? Is your switch capable of trunking, or can it be configured into multiple VLANs? -- Geoffrey D. Jacobs From ntmoore at gmail.com Mon Jan 7 16:38:38 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> Message-ID: <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> I don't know. Its a 24 port cisco that I got from our local network admin. Nathan On Jan 7, 2008 5:15 PM, Geoff Jacobs wrote: > Nathan Moore wrote: > > Hi, > > > > I've aquired a few clusternodes that have multiple ethernet jacks. One > > the system-config-network applet I see several different adaptors (eg > > eth0 and eth1). Right now, I've got a many more free ports on my switch > > than I have nodes in my cluster, so I'm wondering if there's some > > performance benefit from hooking up the second NIC. > > > > Do any of you have a tutorial on multiple NIC's per compute node that > > you'd be willing to share? I'm assigning static IP's with named, and > > cocurrently maintaining /etc/hosts files on each machine with the full > > cluster map. Do I "just" assign a secon IP address for the second eth1 > > jack? Is there more to it? > > Is your switch capable of trunking, or can it be configured into > multiple VLANs? > > -- > Geoffrey D. Jacobs > > -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080107/1de1f1e4/attachment.html From gdjacobs at gmail.com Tue Jan 8 05:10:25 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> Message-ID: <47837641.9010707@gmail.com> Nathan Moore wrote: > I don't know. Its a 24 port cisco that I got from our local network admin. > > Nathan Timothy Mattox describes the network engineering challenges wrt switches far better than I could: http://www.beowulf.org/archive/2001-March/002760.html You're going to have to make a decision on what strategy to follow, and part of that decision is going to be informed by the performance characteristics of your application, as well as the networking hardware your cluster will be equipped with. So, if your application does a great deal of file I/O on the nodes, you might consider implementing a service network through the second network ports. However, if your application needs more total bandwidth on a single network, you will want to go with channel bonding. If the driver(s) for your network ports do not play well with the channel bonding interface, you will have to go with another option, or buy some different network cards. Also, depending on the capability of the switch, you might have to buy a second (dumb) switch to make your plans work. -- Geoffrey D. Jacobs From hahn at mcmaster.ca Tue Jan 8 08:40:04 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <47837641.9010707@gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: > So, if your application does a great deal of file I/O on the nodes, you > might consider implementing a service network through the second network > ports. this is convenient, since if you use two nics with different subnets, traffic will be segregated and non-interfering. > However, if your application needs more total bandwidth on a > single network, you will want to go with channel bonding. If the besides being slightly trickier to configure, it also only gives you higher _aggregate_ bandwidth. any single flow between a pair of IPs will not be faster than a single link. bonding/teaming/link-aggregation is mainly useful for inter-switch links and hosts like a fileserver which are effectively a hotspot and can take advantage of multiple concurrent flows (again, where each flow is no faster than 1 link.) in other words, there's no standard "raid0 of nics" ;) From peter.st.john at gmail.com Tue Jan 8 10:12:33 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: Mark, I don't get it? I would have thought that if a large package were split between two NICs with two cables, then assuming the buffering and recombination at each end to be faster than the transmission, then the transmission would be faster than over a single cable? You don't mean that the router must be a bottleneck, by giving necessarily only one pathway to a pair of IPs? Probably I'm missing something about what is would be meant by "(merely) aggregate bandwidth"? Thanks, Peter On Jan 8, 2008 11:40 AM, Mark Hahn wrote: > > So, if your application does a great deal of file I/O on the nodes, you > > might consider implementing a service network through the second network > > ports. > > this is convenient, since if you use two nics with different subnets, > traffic will be segregated and non-interfering. > > > However, if your application needs more total bandwidth on a > > single network, you will want to go with channel bonding. If the > > besides being slightly trickier to configure, it also only gives you > higher _aggregate_ bandwidth. any single flow between a pair of IPs > will not be faster than a single link. bonding/teaming/link-aggregation > is mainly useful for inter-switch links and hosts like a fileserver > which are effectively a hotspot and can take advantage of multiple > concurrent flows (again, where each flow is no faster than 1 link.) > > in other words, there's no standard "raid0 of nics" ;) > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080108/38b127b4/attachment.html From patrick at myri.com Tue Jan 8 10:29:46 2008 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: <4783C11A.3050308@myri.com> Peter St. John wrote: > I don't get it? I would have thought that if a large package were split > between two NICs with two cables, then assuming the buffering and > recombination at each end to be faster than the transmission, then the > transmission would be faster than over a single cable? You don't mean that The problem is ordering of packets and TCP. When you send a single TCP stream over two (or more) paths, then some packets will arrive out-of-order at the destination. TCP really does not like out-of-order packets and performance takes a (big) hit. That's why most channel bonding mechanisms balance multiple streams over multiple NICs and send each stream on a single NIC. Other protocols than TCP may not have this problem if they don't require strict ordering for performance. Patrick From peter.st.john at gmail.com Tue Jan 8 10:56:11 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <4783C11A.3050308@myri.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> <4783C11A.3050308@myri.com> Message-ID: One could use the ...I'm thinking of the extra-big-packet size in IP6. But if you have small numbers of large datasets, you could increase your perceived bandwidth with two NICs and larger packets, maybe by using some protocol other than TCP? Thanks, Peter On Jan 8, 2008 1:29 PM, Patrick Geoffray wrote: > Peter St. John wrote: > > I don't get it? I would have thought that if a large package were split > > between two NICs with two cables, then assuming the buffering and > > recombination at each end to be faster than the transmission, then the > > transmission would be faster than over a single cable? You don't mean > that > > The problem is ordering of packets and TCP. When you send a single TCP > stream over two (or more) paths, then some packets will arrive > out-of-order at the destination. TCP really does not like out-of-order > packets and performance takes a (big) hit. > > That's why most channel bonding mechanisms balance multiple streams over > multiple NICs and send each stream on a single NIC. Other protocols than > TCP may not have this problem if they don't require strict ordering for > performance. > > Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080108/404a9d51/attachment.html From patrick at myri.com Tue Jan 8 11:20:09 2008 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> <4783C11A.3050308@myri.com> Message-ID: <4783CCE9.1060809@myri.com> Peter St. John wrote: > One could use the ...I'm thinking of the extra-big-packet size in IP6. But > if you have small numbers of large datasets, you could increase your > perceived bandwidth with two NICs and larger packets, maybe by using some > protocol other than TCP? If you don't drop packets, UDP is the simplest solution. However, you will always lose a packet at some point, so you will need a reliable protocol. You can build one on top of UDP at the host level, or you can do your own wire protocol on Ethernet. SCTP has some support for multipath out of the box last time I looked. Patrick From supercomputer at gmail.com Wed Jan 9 08:07:20 2008 From: supercomputer at gmail.com (Chris Vaughan) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Why need of a scheduler?? In-Reply-To: <428810f20711290514o1912c265q2f7fdd7dd45fa960@mail.gmail.com> References: <428810f20711290514o1912c265q2f7fdd7dd45fa960@mail.gmail.com> Message-ID: <216ee070801090807m6d0abb29n66f35ad126e07ff5@mail.gmail.com> On Nov 29, 2007 1:14 PM, amjad ali wrote: > Hello all, > > I want to develop and run my parallel code (MPI based) on a Beowulf > cluster. I have no problem as such that many user might log on to the > cluster simultaneously. Suppose that I am free to use cluster dedicatedly > for my single parallel application. > > 1) Do I really need a cluster scheduler installed on the cluster? Should I > use scheduler? > Yes, it makes things easier to control and keep track of. > > 2) Is there any effect/benefit on the running of a parallel code with or > without cluster job scheduler? > It depends how many jobs/nodes you run on a 4 node system you would be fine running something like torque with pbs_sched. If your requirements become higher I'd recommend Maui and for those complex environments with many cores/nodes I'd recommend Moab. The benefit is ease of use, the more jobs you run the harder it is to manage those jobs. > > 3) How you differentiate between cluster scheduler and cluster resource > manager? > One schedules the other gives back information about what resources are available. > > 4) If there is any significant difference between a scheduler and manager > then plaese tell me that which of the fall in which category: > > OpenPBS, PBS Professional, SGE, Maui, Moab, Torque, Scyld, LSF, SLURM etc. > Torque=Resource Manager (RM) w/basic scheduling PBS=(RM) w/some scheduling functionality SGE=(RM) w/some scheduling functionality Maui=Scheduler Moab=Scheduler + More OpenPBS=Use Torque LSF=(RM) w/some scheduling functionality SLURM=(RM) w/basic scheduling A resource manager manages resources where as a scheduler can schedule these resources. Although something like torque resource manager (OpenPBS) has pbs _sched a fifo scheduler it is still in-adequate in most environments and you would need a scheduler such as Maui or Moab to schedule it. > > 5) What is maent by " PBS/SGE/LSF supports integration with the Maui > scheduler? > You have this mixed up with Moab, Moab can talk to all of these resource managers and give you a single point of job submission/administration over all resource managers. Cluster Resources provide free support to eval Moab which can be quite handy http://www.clusterresources.com/pages/products/evaluate.php > > Precise, easy and brief reply requested. Thanks to all. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- ------------------------------ Christopher Vaughan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080109/8892874d/attachment.html From tom.elken at qlogic.com Fri Jan 11 10:13:58 2008 From: tom.elken at qlogic.com (Tom Elken) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] quad-socket opteron memory performance In-Reply-To: <1199722449.13428.40.camel@e521.site> References: <1199722449.13428.40.camel@e521.site> Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A019A88D7@AVEXCH1.qlogic.org> > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of John Leidel > Sent: Monday, January 07, 2008 8:14 AM > To: beowulf@beowulf.org > Subject: [Beowulf] quad-socket opteron memory performance > > Does anyone have any recent memory performance numbers [specifically > latency] from the quad-socket opteron's? On a quad-socket system from a major system vendor with 4x Opteron 2218 (Rev. F, dual-core, 2.6 GHz), I measure 90 - 92 nsec for memory latency, 5.5 GB/s for serial STREAM performance, and 17 GB/s for OpenMP STREAM w/ 8 threads, on 4 sockets. -Tom > > --john > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > From orion at cora.nwra.com Fri Jan 11 10:19:41 2008 From: orion at cora.nwra.com (Orion Poplawski) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] SSE4 benefits? Message-ID: <4787B33D.5050905@cora.nwra.com> Does anyone have a feel for what the benefits of SSE4 are? What kind of codes, compilers take advantage of it? Thanks! -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From jpilldev at gmail.com Sun Jan 13 17:11:54 2008 From: jpilldev at gmail.com (J Pill) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Problem with a simple MPI Program Message-ID: Hello. I'm trying to run a simple hello word program: #include "mpi.h" #include int main (argc, argv) int argc; char **argv; { MPI_Init (&argc, &argv); printf ("hello word\n"); MPI_Finalize(); return 0; } I compile with mpicc and there's no problem, but when i try to run with mpiexec or mpirun y have the folliwing: $ mpirun -np 2 hello problem with execution of hello on DebianJPill: [Errno 2] No such file or directory problem with execution of hello on DebianJPill: [Errno 2] No such file or directory But running the file generated there no problem. $./hello What i have doing wrong? thanks a lot -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080113/5c107457/attachment.html From mengkuan at sxven.com Sun Jan 13 20:03:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console Message-ID: Greetings, I would like to announce the availability of VMC (Virtual Machine Console). VMC is an attempt to provide an opensource, web-based VM management infrastructure. It uses libvirt as the underlying library to manage para-virtualized Xen VMs. In time we intend to scale this to manage VM clusters running HPC applications. You can find out more on our "Introduction to VMC" page: http://www.sxven.com/vmc List of current features and future plans: http://www.sxven.com/vmc/features To get started, we have made available a "VMC Install" document: http://www.sxven.com/vmc/gettingstarted We invite people to take a look at VMC and tell us what you like and what you don't like. If you have any problems, questions or suggestions please feel free to contact us at dev@sxven.com or post them on our forum: http://forum.sxven.com/ Best regards, Meng Kuan From csamuel at vpac.org Sun Jan 13 22:17:07 2008 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Problem with a simple MPI Program In-Reply-To: Message-ID: <1801047904.42861200291427379.JavaMail.root@zimbra.vpac.org> ----- "J Pill" wrote: > Hello. Hiya, > I compile with mpicc and there's no problem, but when i try to run > with mpiexec or mpirun y have the folliwing: > > $ mpirun -np 2 hello > problem with execution of hello on DebianJPill: [Errno 2] No such file > or directory > problem with execution of hello on DebianJPill: [Errno 2] No such file > or directory You probably just need to do: mpirun -np 2 ./hello (or with the full path) to make the location explicit as it will just be searching your $PATH otherwise (and you don't want . in your $PATH).. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From jac67 at georgetown.edu Mon Jan 14 07:26:08 2008 From: jac67 at georgetown.edu (Jess Cannata) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Off Topic: HPC Training Message-ID: <478B7F10.3010603@georgetown.edu> We have some upcoming HPC/Beowulf Systems Administration training courses. We will be also be holding an Advanced Sun Grid Engine class in the next couple of months. For more information, see the following link: http://www.gridswatch.com/index.php?option=com_content&task=view&id=25&Itemid=16 -- Jess Cannata Advanced Research Computing Georgetown University 202-687-3661 From mwill at penguincomputing.com Mon Jan 14 13:51:05 2008 From: mwill at penguincomputing.com (Michael Will) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] Why need of a scheduler?? In-Reply-To: <216ee070801090807m6d0abb29n66f35ad126e07ff5@mail.gmail.com> Message-ID: <433093DF7AD7444DA65EFAFE3987879C5ABA26@orca.penguincomputing.com> If you only run your application one at a time interacatively, then you don't need to deal with the overhead and complexity of a scheduler. However if you are planning to batch queue up a few runs or several different applications, then it might be worthwhile to read into torque/maui/moab and the like. You mentioned Scyld in your question below, which within the categories you where interested in is basically a ressource manager which comes with torque prebundled to allow scheduling and batch queueing. Moab/Taskmaster is then the add-on module to allow more complex scheduling. Michael ________________________________ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Chris Vaughan Sent: Wednesday, January 09, 2008 8:07 AM To: amjad ali Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Why need of a scheduler?? On Nov 29, 2007 1:14 PM, amjad ali wrote: Hello all, I want to develop and run my parallel code (MPI based) on a Beowulf cluster. I have no problem as such that many user might log on to the cluster simultaneously. Suppose that I am free to use cluster dedicatedly for my single parallel application. 1) Do I really need a cluster scheduler installed on the cluster? Should I use scheduler? Yes, it makes things easier to control and keep track of. 2) Is there any effect/benefit on the running of a parallel code with or without cluster job scheduler? It depends how many jobs/nodes you run on a 4 node system you would be fine running something like torque with pbs_sched. If your requirements become higher I'd recommend Maui and for those complex environments with many cores/nodes I'd recommend Moab. The benefit is ease of use, the more jobs you run the harder it is to manage those jobs. 3) How you differentiate between cluster scheduler and cluster resource manager? One schedules the other gives back information about what resources are available. 4) If there is any significant difference between a scheduler and manager then plaese tell me that which of the fall in which category: OpenPBS, PBS Professional, SGE, Maui, Moab, Torque, Scyld, LSF, SLURM etc. Torque=Resource Manager (RM) w/basic scheduling PBS=(RM) w/some scheduling functionality SGE=(RM) w/some scheduling functionality Maui=Scheduler Moab=Scheduler + More OpenPBS=Use Torque LSF=(RM) w/some scheduling functionality SLURM=(RM) w/basic scheduling A resource manager manages resources where as a scheduler can schedule these resources. Although something like torque resource manager (OpenPBS) has pbs _sched a fifo scheduler it is still in-adequate in most environments and you would need a scheduler such as Maui or Moab to schedule it. 5) What is maent by " PBS/SGE/LSF supports integration with the Maui scheduler? You have this mixed up with Moab, Moab can talk to all of these resource managers and give you a single point of job submission/administration over all resource managers. Cluster Resources provide free support to eval Moab which can be quite handy http://www.clusterresources.com/pages/products/evaluate.php Precise, easy and brief reply requested. Thanks to all. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ------------------------------ Christopher Vaughan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080114/3a5a58c6/attachment.html From deadline at eadline.org Wed Jan 16 05:19:02 2008 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: Message-ID: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> While your project looks interesting and I like the idea of VMs, however I have not seen a good answer to the fact that VM = layers and in HPC layers = latency. Any thoughts? Also, is it open source? -- Doug > Greetings, > > I would like to announce the availability of VMC (Virtual Machine > Console). VMC is an attempt to provide an opensource, web-based VM > management infrastructure. It uses libvirt as the underlying library > to manage para-virtualized Xen VMs. In time we intend to scale this to > manage VM clusters running HPC applications. > > You can find out more on our "Introduction to VMC" page: > > http://www.sxven.com/vmc > > List of current features and future plans: > > http://www.sxven.com/vmc/features > > To get started, we have made available a "VMC Install" document: > > http://www.sxven.com/vmc/gettingstarted > > We invite people to take a look at VMC and tell us what you like and > what you don't like. If you have any problems, questions or > suggestions please feel free to contact us at dev@sxven.com or post > them on our forum: > > http://forum.sxven.com/ > > Best regards, > Meng Kuan > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:478b0403265441923983023! > -- Doug From geoff at galitz.org Wed Jan 16 05:39:02 2008 From: geoff at galitz.org (Geoff) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: I certainly cannot speak for the VMC project, but application migration and fault tolerance (the primary benefits other than easy access to heterogeneus environments from VMs) are always going to result in a peformance hit of some kind. You cannot expect to do more things with no overhead. There is great value in introducing HA concepts into an HPC cluster depending on the goals and configuration of the cluster in question (as always). I cannot count the number of times a long running job (weeks) crashed, bumming me out as a result, even with proper checkpointing routines integrated into the code and/or system. As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to all queues of all clusters he managed in order to force researchers to think about checkpoints and smart restarts. I couldn't understand why so many folks from his particular unit kept asking me about arrays inside the scheduler submission scripts and nested commends until I found that out. Unfortunately I came to the conclusion that folks in his unit were spending more time writing job submission scripts than code... well... maybe that is an exaggeration. -geoff Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : > While your project looks interesting and I like the idea of > VMs, however I have not seen a good answer to the fact that VM = layers > and in HPC layers = latency. Any thoughts? Also, is it open source? > > -- > Doug > > >> Greetings, >> >> I would like to announce the availability of VMC (Virtual Machine >> Console). VMC is an attempt to provide an opensource, web-based VM >> management infrastructure. It uses libvirt as the underlying library >> to manage para-virtualized Xen VMs. In time we intend to scale this to >> manage VM clusters running HPC applications. >> >> You can find out more on our "Introduction to VMC" page: >> >> http://www.sxven.com/vmc >> >> List of current features and future plans: >> >> http://www.sxven.com/vmc/features >> >> To get started, we have made available a "VMC Install" document: >> >> http://www.sxven.com/vmc/gettingstarted >> >> We invite people to take a look at VMC and tell us what you like and >> what you don't like. If you have any problems, questions or >> suggestions please feel free to contact us at dev@sxven.com or post >> them on our forum: >> >> http://forum.sxven.com/ >> >> Best regards, >> Meng Kuan >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> !DSPAM:478b0403265441923983023! >> > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From deadline at eadline.org Wed Jan 16 06:18:20 2008 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> I get the desire for fault tolerance etc. and I like the idea of migration. It is just that many HPC people have spent careers getting applications/middleware as close to the bare metal as possible. The whole VM concept seems orthogonal to this goal. I'm curious how people are approaching this problem. -- Doug > > > I certainly cannot speak for the VMC project, but application migration > and fault tolerance (the primary benefits other than easy access to > heterogeneus environments from VMs) are always going to result in a > peformance hit of some kind. You cannot expect to do more things with no > overhead. There is great value in introducing HA concepts into an HPC > cluster depending on the goals and configuration of the cluster in > question (as always). > > I cannot count the number of times a long running job (weeks) crashed, > bumming me out as a result, even with proper checkpointing routines > integrated into the code and/or system. > > > As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to > all queues of all clusters he managed in order to force researchers to > think about checkpoints and smart restarts. I couldn't understand why so > many folks from his particular unit kept asking me about arrays inside the > scheduler submission scripts and nested commends until I found that out. > Unfortunately I came to the conclusion that folks in his unit were > spending more time writing job submission scripts than code... well... > maybe that is an exaggeration. > > -geoff > > > > Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : > >> While your project looks interesting and I like the idea of >> VMs, however I have not seen a good answer to the fact that VM = layers >> and in HPC layers = latency. Any thoughts? Also, is it open source? >> >> -- >> Doug >> >> >>> Greetings, >>> >>> I would like to announce the availability of VMC (Virtual Machine >>> Console). VMC is an attempt to provide an opensource, web-based VM >>> management infrastructure. It uses libvirt as the underlying library >>> to manage para-virtualized Xen VMs. In time we intend to scale this to >>> manage VM clusters running HPC applications. >>> >>> You can find out more on our "Introduction to VMC" page: >>> >>> http://www.sxven.com/vmc >>> >>> List of current features and future plans: >>> >>> http://www.sxven.com/vmc/features >>> >>> To get started, we have made available a "VMC Install" document: >>> >>> http://www.sxven.com/vmc/gettingstarted >>> >>> We invite people to take a look at VMC and tell us what you like and >>> what you don't like. If you have any problems, questions or >>> suggestions please feel free to contact us at dev@sxven.com or post >>> them on our forum: >>> >>> http://forum.sxven.com/ >>> >>> Best regards, >>> Meng Kuan >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >>> >> >> >> -- >> Doug >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > ------------------------------- > Geoff Galitz, geoff@galitz.org > Blankenheim, Deutschland > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:478e094566431543480883! > -- Doug From mengkuan at sxven.com Wed Jan 16 06:31:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: On Jan 16, 2008 9:19 PM, Douglas Eadline wrote: > While your project looks interesting and I like the idea of > VMs, however I have not seen a good answer to the fact that VM = layers > and in HPC layers = latency. Any thoughts? Also, is it open source? We performed some benchmark testing with linpack and bonnie++ on the VM and on the physical host. For para-virtualized VMs, the linpack performance is on par with the physical host. However, for bonnie++ tests, para-virtualized VMs fell way behind physical host's performance. In short, CPU-bound and memory intensive HPC apps should do ok but not IO-intensive apps. More testing and fine-tuning will probably be needed to see how far we can push the VM in terms of IO-intensive operations but we are hoping that in time to come virtualization technologies will be able to narrow that gap. And yes, the VMC application is open source. You can find the download links in the VMC Install document. Regards, Meng Kuan From apittman at concurrent-thinking.com Wed Jan 16 06:35:57 2008 From: apittman at concurrent-thinking.com (Ashley Pittman) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> On Wed, 2008-01-16 at 09:18 -0500, Douglas Eadline wrote: > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. There was a paper on this at SC, I don't know if you caught it... http://sc07.supercomputing.org/schedule/event_detail.php?evid=11066 If I was to try and sum it up in one paragraph it would be: "The advantages of virtulisation are obvious but for some reason the HPC community have been slow to reap these benefits, we predict that this is because of a perception that the performance of comms and VM operations suffers when virtulised. This is true however we have demonstrated that with months of work this performance loss could be minimised such that instead of slowing down performance a lot it would only slow down performance a bit." I think progress is being made on the comms front, both in terms of raw numbers (bandwidth/latency) but also in reducing CPU usage but we are still a long way from it being widely used. Ashley, From rgb at phy.duke.edu Wed Jan 16 06:55:28 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: On Wed, 16 Jan 2008, Douglas Eadline wrote: > > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. As previously noted, however, YMMV and one size does not fit all. There are two distinct ways of managing the heterogeneous environments that some cluster applications might require. One is indeed the creation of VMs -- running an extremely thin toplevel operating system that does little else but to run the host VM and respond to provisioning requests, as is the case in many corporate HA environments. The other is to create a similar provisioning system that works at the level of e.g. grub and/or PXE to provide the ability to easily boot a node into a unique environment that might last only for the duration of a particular computation. Neither is particularly well supported in current clustering, although projects for both have been around for some time (Duke's Cluster On Demand project and wulfware being examples of one, Xen and various VMs as examples of the other). There are plenty of parallel chores that are tolerant of poor latency -- the whole world of embarrassingly parallel computations plus some extension up into merely coarse grained, not terribly synchronous real parallel computations. Remember, people did parallel computation effectively with 10Base ethernet for many years (more than a decade) before 100Base came along, and cluster nodes would now ROUTINELY be provisioned with at least 1000Base. Even a 1000Base VM is going to have better latency in most cases than a 10Base ever did on the old hardware it ran on, and it might well compete with early 100Base latencies. It isn't exactly like running in a VM is going to cripple all code. VMs can also be wonderful for TEACHING clustering and for managing "political" problems. In many environments there are potential nodes with lots of spare cycles that "have to run Windows" 24x7 and have a Windows console available at the desktop at all times (and thus cannot be dual booted) but which CAN run e.g. VMware and an "instant node" VM under Windows. Having any sort of access to a high-latency Linux VM node running on a Windows box beats the hell out of having no node at all or having to port one's code to work under Windows. We can therefore see that there are clearly environments where the bulk of the work being done is latency tolerant and where VMs may well have benefits in administration and security and fault tolerance and local politics that make them a great boon in clustering, just as there are without question computations for which latency is the devil and any suggestion of adding a layer of VM latency on top of what is already inherent to the device and minimal OS will bring out the peasants with pitchforks and torches. Multiboot systems, via grub and local provisioning or PXE and remote e.g. NFS provisioning is also useful but is not always politically possible or easy to set up. It is my hope that folks working on both sorts of multienvironment provisioning and sysadmin environments work hard and produce spectacular tools. I've done way more work than I care to setting up both of these sorts of things. It is not easy, and requires a lot of expertise. Hiding this detail and expertise from the user would be a wonderful contribution to practical clustering (and of course useful in the HA world as well). rgb > > -- > Doug > > > >> >> >> I certainly cannot speak for the VMC project, but application migration >> and fault tolerance (the primary benefits other than easy access to >> heterogeneus environments from VMs) are always going to result in a >> peformance hit of some kind. You cannot expect to do more things with no >> overhead. There is great value in introducing HA concepts into an HPC >> cluster depending on the goals and configuration of the cluster in >> question (as always). >> >> I cannot count the number of times a long running job (weeks) crashed, >> bumming me out as a result, even with proper checkpointing routines >> integrated into the code and/or system. >> >> >> As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to >> all queues of all clusters he managed in order to force researchers to >> think about checkpoints and smart restarts. I couldn't understand why so >> many folks from his particular unit kept asking me about arrays inside the >> scheduler submission scripts and nested commends until I found that out. >> Unfortunately I came to the conclusion that folks in his unit were >> spending more time writing job submission scripts than code... well... >> maybe that is an exaggeration. >> >> -geoff >> >> >> >> Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : >> >>> While your project looks interesting and I like the idea of >>> VMs, however I have not seen a good answer to the fact that VM = layers >>> and in HPC layers = latency. Any thoughts? Also, is it open source? >>> >>> -- >>> Doug >>> >>> >>>> Greetings, >>>> >>>> I would like to announce the availability of VMC (Virtual Machine >>>> Console). VMC is an attempt to provide an opensource, web-based VM >>>> management infrastructure. It uses libvirt as the underlying library >>>> to manage para-virtualized Xen VMs. In time we intend to scale this to >>>> manage VM clusters running HPC applications. >>>> >>>> You can find out more on our "Introduction to VMC" page: >>>> >>>> http://www.sxven.com/vmc >>>> >>>> List of current features and future plans: >>>> >>>> http://www.sxven.com/vmc/features >>>> >>>> To get started, we have made available a "VMC Install" document: >>>> >>>> http://www.sxven.com/vmc/gettingstarted >>>> >>>> We invite people to take a look at VMC and tell us what you like and >>>> what you don't like. If you have any problems, questions or >>>> suggestions please feel free to contact us at dev@sxven.com or post >>>> them on our forum: >>>> >>>> http://forum.sxven.com/ >>>> >>>> Best regards, >>>> Meng Kuan >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf@beowulf.org >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>> >>>> >>>> >>> >>> >>> -- >>> Doug >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> >> -- >> ------------------------------- >> Geoff Galitz, geoff@galitz.org >> Blankenheim, Deutschland >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> !DSPAM:478e094566431543480883! >> > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From gerry.creager at tamu.edu Wed Jan 16 07:25:13 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> Message-ID: <478E21D9.60900@tamu.edu> Ashley Pittman wrote: > On Wed, 2008-01-16 at 09:18 -0500, Douglas Eadline wrote: >> I get the desire for fault tolerance etc. and I like the idea >> of migration. It is just that many HPC people have spent >> careers getting applications/middleware as close to the bare >> metal as possible. The whole VM concept seems orthogonal to >> this goal. I'm curious how people are approaching this >> problem. > > There was a paper on this at SC, I don't know if you caught it... > > http://sc07.supercomputing.org/schedule/event_detail.php?evid=11066 > > If I was to try and sum it up in one paragraph it would be: > > "The advantages of virtulisation are obvious but for some reason the HPC > community have been slow to reap these benefits, we predict that this is > because of a perception that the performance of comms and VM operations > suffers when virtulised. This is true however we have demonstrated that > with months of work this performance loss could be minimised such that > instead of slowing down performance a lot it would only slow down > performance a bit." > > I think progress is being made on the comms front, both in terms of raw > numbers (bandwidth/latency) but also in reducing CPU usage but we are > still a long way from it being widely used. I'm constantly reminded of a meeting early on in the SCOOP project, which I participate in (http://scoop.sura.org). "We're able to virtualize our model applications using VMware and only see a 13% performance hit". Note that, at this time I was tweaking for ms upgrades in MPI communications.... We need to look at virtualization as a means of mitigating, on a heterogeneous hardware environment, the concept of porting to every different available machine type. In other words, I think that for a grid environment, we might see a lot of benefit for virtualization but for a local, homogeneous, cluster, it's less an issue. By the way: In order to compensate for their "13%" degradation, I had to nearly double the number of virtual nodes over real nodes to get the same performance data. That's "expensive" but very do-able on a grid environment. gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From laytonjb at charter.net Wed Jan 16 07:31:11 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: <478E233F.9080103@charter.net> Douglas Eadline wrote: > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. > Like many things, the devil is in the details. While I don't want to be as prodigious as rgb, I want to mention a few things and ask some questions: - With multi-core processors, to get the best performance you want to assign a process to a core. But this can cause problems when moving a process or creating a checkpoint. For example VMware explicitly tells you not to do this. While I can't state their position, in general the idea is that restarting a check-pointed VM may have problems when a process is pinned to a core (even more so if the CPU is different). Also, moving a pinned process to another node may cause problems if the nodes is different in pretty much any way (it may also be affected by what's on the new node). - As Ashley pointed out, the network aspect is still very problematic. Getting good performance out of a NIC in a VM is not easy and from what I understand difficult or impossible to do with multi-core nodes (I would love to hear if someone has gotten very good performance out of a NIC in a VM when other VM's are also using the same NIC. Please give as many details as possible) - As Meng mentioned, IO is still problematic (I think for the same reasons that interconnects are). - I haven't seen any benchmarks run in VM's using several nodes with an interconnect. Does anyone know of any? - Has anyone tried moving processes around to different nodes for an MPI job? I'm curious what they found. I would like to see virtualization take off in HPC, but I have to see a few demos of things working and I need to see reasons why I should adopt it. Right not I don't relish taking my "High" Performance Computing system and turning it into "Kind-of-High" Performance Computing because it would allow non-code specific checkpointing or movement of processes. Losing 10% in performance, for example, in HPC is a big deal, and I haven't yet seen the benefits of virtirualization for giving up the 10% (I'm dying to be shown to be wrong though). The only aspect of virtualization that could make some sense in HPC is what rgb mentioned - allowing the user to select and OS as part of their job and installing or tearing down the OS as part of the job. I can see this being very useful if the details could be worked out (I know there are people working on it but I haven't seen any large demonstrations of it yet and I would really like to see such a beastie). Anyway, my 2 cents (and probably my last since this topic falls under Landman's Rule: of flammability). Jeff From landman at scalableinformatics.com Wed Jan 16 08:26:14 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E233F.9080103@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> Message-ID: <478E3026.2030206@scalableinformatics.com> Jeffrey B. Layton wrote: > Anyway, my 2 cents (and probably my last since this topic falls under > Landman's Rule: of flammability). uh... er ... uh .... huh ? Hey ... the coffee hasn't quite kicked in yet, and we have been pounding out DragonFly code (and it is working ... woo hoo! Jobs submit and all that) ... I saw the VMC bit and decided it wasn't worth spending time talking about it as Doug, Jeff, RGB, and others would pound it into the dirt^H^H^H^H^H^H discuss the salient aspects ... yeah thats the ticket. That and I was busy. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From Michael.Frese at NumerEx.com Wed Jan 16 08:32:49 2008 From: Michael.Frese at NumerEx.com (Michael H. Frese) Date: Wed Nov 25 01:06:47 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E233F.9080103@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> Message-ID: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >- With multi-core processors, to get the best performance you want to > assign a process to a core. Excuse my ignorance, please, but can someone tell me how to do that on Linux (2.6 kernels would be fine)? The kernel scheduler -- as opposed to a cluster scheduler -- is a complete black box as far as I know. While I am it, where do I find a minimal list of processes necessary to run a cluster node. I can't see any reason to run the PC Smart Card demon, pcscd, but I don't know what else I can pitch. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/fcf9cd66/attachment.html From Michael.Frese at NumerEx.com Wed Jan 16 08:50:20 2008 From: Michael.Frese at NumerEx.com (Michael H. Frese) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3415.6040208@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> Message-ID: <6.2.5.6.2.20080116095006.04ed6cc8@NumerEx.com> Cool. Thanks. Mike At 09:43 AM 1/16/2008, Shannon V. Davidson wrote: >Michael H. Frese wrote: >>At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >>>- With multi-core processors, to get the best performance you want to >>> assign a process to a core. >> >>Excuse my ignorance, please, but can someone tell me how to do that >>on Linux (2.6 kernels would be fine)? > >sched_setaffinity(2) >taskset(1) >numactl(1) > >> >>The kernel scheduler -- as opposed to a cluster scheduler -- is a >>complete black box as far as I know. >> >>While I am it, where do I find a minimal list of processes >>necessary to run a cluster node. I can't see any reason to run the >>PC Smart Card demon, pcscd, but I don't know what else I can pitch. >> >> >>Mike >> >> >> >> >> >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/02ffec2a/attachment.html From bill at platform.com Wed Jan 16 08:53:39 2008 From: bill at platform.com (Bill Bryce) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console Message-ID: Try the man pages for the taskset command on Linux 2.6 machine. There are also system calls sched_setaffinity() and sched_getaffinity() Regards, Bill. -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org]On Behalf Of Michael H. Frese Sent: January 16, 2008 11:33 AM To: "Jeffrey B. Layton"laytonjb@charter.net Cc: beowulf@beowulf.org Subject: Re: [Beowulf] VMC - Virtual Machine Console At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: - With multi-core processors, to get the best performance you want to assign a process to a core. Excuse my ignorance, please, but can someone tell me how to do that on Linux (2.6 kernels would be fine)? The kernel scheduler -- as opposed to a cluster scheduler -- is a complete black box as far as I know. While I am it, where do I find a minimal list of processes necessary to run a cluster node. I can't see any reason to run the PC Smart Card demon, pcscd, but I don't know what else I can pitch. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/e795bb9a/attachment.html From anandvaidya.ml at gmail.com Wed Jan 16 07:21:50 2008 From: anandvaidya.ml at gmail.com (Anand Vaidya) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Question on COAMPS, WRF and NHM Message-ID: <200801162321.50928.anandvaidya.ml@gmail.com> We are in the process of acquiring a new cluster for running weather modelling software viz. NRL COAMPS, WRF and NHM (Japan) We are currently running COAMPS on a Cluster of 50+ GigE and dual socket DC Opterons, NFS, CentOS4, RAM size=1GB/core, the performance seems to be limited by I/O (network I/O primarily). The performance flattens out at about 32CPU. Looking at the budget, current hardware availability, we have narrowed down to dual socket Intel Quad Cores, with 2GB/core and DDR infiniband, and CentOS 5.x, OpenMPI 1.2.x, SGE 6.x (Or maybe we will buy faster D-DC AMDs) We did enquire with the organizations regarding suitability of these, they could only offer limited help (understandably, the orgs may not be running the configs we intend to buy) I do understand that factors such as grid size etc play a role. I am right now looking at gross factors before getting into actual test runs with different configs. I would like to to whether any users of the aforementioned software can help answer the following questions: - Does memory bandwidth (STREAMS?) have a significant impact? (Intel shared bus -vs- AMD's dedicated interconnect), since QCs worsen the shared bus loading - Is infiniband worth it? (NRL seems to think it does enhance performance), however no additional details are available. - Is a parallel filesystem (eg: Lustre, GPFS, GFS) vs NFS Regards Anand From svdavidson at charter.net Wed Jan 16 08:43:01 2008 From: svdavidson at charter.net (Shannon V. Davidson) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> Message-ID: <478E3415.6040208@charter.net> Michael H. Frese wrote: > At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >> - With multi-core processors, to get the best performance you want to >> assign a process to a core. > > Excuse my ignorance, please, but can someone tell me how to do that on > Linux (2.6 kernels would be fine)? sched_setaffinity(2) taskset(1) numactl(1) > > The kernel scheduler -- as opposed to a cluster scheduler -- is a > complete black box as far as I know. > > While I am it, where do I find a minimal list of processes necessary > to run a cluster node. I can't see any reason to run the PC Smart > Card demon, pcscd, but I don't know what else I can pitch. > > > Mike > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/91a95e1d/attachment.html From landman at scalableinformatics.com Wed Jan 16 09:09:55 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <478E3A63.1090703@scalableinformatics.com> Meng Kuan wrote: > We performed some benchmark testing with linpack and bonnie++ on the > VM and on the physical host. For para-virtualized VMs, the linpack > performance is on par with the physical host. However, for bonnie++ > tests, para-virtualized VMs fell way behind physical host's > performance. In short, CPU-bound and memory intensive HPC apps should > do ok but not IO-intensive apps. More testing and fine-tuning will > probably be needed to see how far we can push the VM in terms of > IO-intensive operations but we are hoping that in time to come > virtualization technologies will be able to narrow that gap. Hi Meng: Not to ignite flammable substances here ... but there are a few hallmarks of HPC applications. One of those is "beating the heck out of a specific available resource". Extra layers only add to this. What I want is thunking-free VMs. It would be really nice to take an 8 core workstation/server, run our base OS on one or two cores, and run other OSes on the other cores. The problem is that this is not easy to do with todays commodity hardware. Moreover, you pay a (sometimes huge) performance penalty for doing this, as you have single points of information flow (SPIF). These SPIFs are anathema to HPC. They are rate limiting. They can increase contention/latency, decrease effective bandwidth. I like the idea of VMs for services that need HA, and for OSes like windows that need a safe place to run in. HPC apps will stress one or the other portion of the machine. They will beat on the memory bandwidth in some cases, which is why, despite AMD Opterons of old (single/dual core) having a disadvantage in computational performance to older Xeons of woodcrest derivation, they are still faster on specific memory bound problems and code (that second memory bus is hard to beat). That said, and the point of this is that many HPC apps are rapidly becoming IO bound, as they need to move ginormous (meaning really large) amounts of data to and from disk, and MPI codes usually need to move data at the lowest latency possible. There VMs which negatively impact IO performance (bandwidth/latency) will be problematic. What would be interesting is a VM OS bypass for IO. VM talk directly to hardware. Not sure it is possible though, unless you are using a hypervisor, and a thin VM (OpenVZ?). Just some thoughts, hopefully not all that flammable (Jeff, what is that rule? I am being asked, and I don't have an answer ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From gerry.creager at tamu.edu Wed Jan 16 09:39:24 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Question on COAMPS, WRF and NHM In-Reply-To: <200801162321.50928.anandvaidya.ml@gmail.com> References: <200801162321.50928.anandvaidya.ml@gmail.com> Message-ID: <478E414C.10700@tamu.edu> No experience running COAMPS but for WRF I think your proposed system will work well. Memory bandwidth will play a role in preformance but file IO will also. Infiniband _is_ worth the cost/effort. I'd strongly recomment Luster/Gluster or GFS over NFS for this. gerry Anand Vaidya wrote: > We are in the process of acquiring a new cluster for running weather modelling > software viz. NRL COAMPS, WRF and NHM (Japan) > > We are currently running COAMPS on a Cluster of 50+ GigE and dual socket DC > Opterons, NFS, CentOS4, RAM size=1GB/core, the performance seems to be > limited by I/O (network I/O primarily). The performance flattens out at about > 32CPU. > > Looking at the budget, current hardware availability, we have narrowed down to > dual socket Intel Quad Cores, with 2GB/core and DDR infiniband, and CentOS > 5.x, OpenMPI 1.2.x, SGE 6.x (Or maybe we will buy faster D-DC AMDs) > > We did enquire with the organizations regarding suitability of these, they > could only offer limited help (understandably, the orgs may not be running > the configs we intend to buy) > > I do understand that factors such as grid size etc play a role. I am right now > looking at gross factors before getting into actual test runs with different > configs. > > I would like to to whether any users of the aforementioned software can help > answer the following questions: > > - Does memory bandwidth (STREAMS?) have a significant impact? (Intel shared > bus -vs- AMD's dedicated interconnect), since QCs worsen the shared bus > loading > > - Is infiniband worth it? (NRL seems to think it does enhance performance), > however no additional details are available. > > - Is a parallel filesystem (eg: Lustre, GPFS, GFS) vs NFS > > Regards > Anand > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From lindahl at pbm.com Wed Jan 16 09:53:46 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> Message-ID: <20080116175346.GA18703@bx9.net> > >- With multi-core processors, to get the best performance you want to > > assign a process to a core. > > Excuse my ignorance, please, but can someone tell me how to do that > on Linux (2.6 kernels would be fine)? Use an MPI which does this for you? Two examples are InfiniPath MPI and OpenMPI. -- greg From laytonjb at charter.net Wed Jan 16 10:08:12 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3A63.1090703@scalableinformatics.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> Message-ID: <478E480C.5020503@charter.net> Joe Landman wrote: > Just some thoughts, hopefully not all that flammable (Jeff, what is > that rule? I am being asked, and I don't have an answer ...) Rule: (Theorem) Anything that appears to be flame-bait, actually is. Corollary: Not matter what you say, no matter how much experience you have, no matter how much evidence you have, someone will always either: (a) violently disagree with you to their death bed, inviting more posts on the subject or any other subject that appears to be flame bait. -or- (b) Misunderstand everything and post something worthless possibly inviting more posts on the subject or any other subject that appears to be flame bait. From landman at scalableinformatics.com Wed Jan 16 10:28:03 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E480C.5020503@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> <478E480C.5020503@charter.net> Message-ID: <478E4CB3.5090202@scalableinformatics.com> Jeffrey B. Layton wrote: > Joe Landman wrote: >> Just some thoughts, hopefully not all that flammable (Jeff, what is >> that rule? I am being asked, and I don't have an answer ...) > Rule: (Theorem) > Anything that appears to be flame-bait, actually is. Ahhh.... I wonder if we can say "flame-bait is isomorphic to text editor wars, c.f. vi vs emacs". > Corollary: > Not matter what you say, no matter how much experience > you have, no matter how much evidence you have, someone > will always either: > (a) violently disagree with you to their death bed, inviting more > posts on the subject or any other subject that appears to be flame > bait. > -or- > (b) Misunderstand everything and post something worthless > possibly inviting more posts on the subject or any other subject > that appears to be flame bait. Heh... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Wed Jan 16 15:09:47 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E480C.5020503@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> <478E480C.5020503@charter.net> Message-ID: On Wed, 16 Jan 2008, Jeffrey B. Layton wrote: Dear Jeff: > Joe Landman wrote: >> Just some thoughts, hopefully not all that flammable (Jeff, what is that >> rule? I am being asked, and I don't have an answer ...) > Rule: (Theorem) > Anything that appears to be flame-bait, actually is. > > Corollary: > Not matter what you say, no matter how much experience > you have, no matter how much evidence you have, someone > will always either: > (a) violently disagree with you to their death bed, inviting more > posts on the subject or any other subject that appears to be flame > bait. As I sit here in my comfortable bed experiencing severe chest pain, I have to tell you that you are wrong, wrong, wrong. This is not what flame-bait is. I may have to cut you with a knife. > -or- > (b) Misunderstand everything and post something worthless > possibly inviting more posts on the subject or any other subject > that appears to be flame bait. Flame bait (as all proper fishermen know) is what you get when you spill your glass of straight Everclear into the worms and then "accidentally" knock the coal of your cigar in on top as you sway gently from side to side in the boat, waiting for fish to bite. It's a variant of stink bait and cut bait -- fish don't bite on them much either. In fact, fish don't bite much. But they REALLY don't bite on flame bait. Only cluster-fanatics bite on flame bait. Usually after a tall, cool glass of Everclear and a cigar... rgb > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From mengkuan at sxven.com Wed Jan 16 18:31:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3A63.1090703@scalableinformatics.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> Message-ID: On Jan 17, 2008 1:09 AM, Joe Landman wrote: > That said, and the point of this is that many HPC apps are rapidly > becoming IO bound, as they need to move ginormous (meaning really large) > amounts of data to and from disk, and MPI codes usually need to move > data at the lowest latency possible. > > There VMs which negatively impact IO performance (bandwidth/latency) > will be problematic. > > What would be interesting is a VM OS bypass for IO. VM talk directly to > hardware. Not sure it is possible though, unless you are using a > hypervisor, and a thin VM (OpenVZ?). I believe Xen is working towards that. For instance, their latest release (Xen 3.2.0) has: - Preliminary PCI pass-through support (using appropriate Intel or AMD I/O-virtualisation hardware) I have read on the Xen lists that some folks have successfully increased network performance this way. OpenVZ is a possibility and it definitely is "thinner" than Xen in this aspect. This is why we are using the libvirt library which is starting to include support for containers like OpenVZ. > > Just some thoughts, hopefully not all that flammable (Jeff, what is > that rule? I am being asked, and I don't have an answer ...) Not at all. In fact, its great to hear and learn from you guys. Thanks! Regards, Meng Kuan From Craig.Tierney at noaa.gov Wed Jan 16 09:16:18 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Nov 25 01:06:48 2009 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <478E3BE2.8060301@noaa.gov> Geoff wrote: > ..Interesting discussion deleted.. > > As a funny aside, I once knew a sysadmin who applied 24 hour timelimits > to all queues of all clusters he managed in order to force researchers > to think about checkpoints and smart restarts. I couldn't understand > why so many folks from his particular unit kept asking me about arrays > inside the scheduler submission scripts and nested commends until I > found that out. Unfortunately I came to the conclusion that folks in > his unit were spending more time writing job submission scripts than > code... well... maybe that is an exaggeration. > Our queue limits are 8 hours. They are set this way for two reasons. First, we have real time jobs that need to get through the queues and we believe that allowing significantly longer jobs would block those really important jobs. Second, for a multi-user system, it isn't very fair for a user to run multi-day jobs and prevent shorter jobs from getting in. It is about being fair. Use the resource and then get back in line. I know that at other US Government facilities it is common practice to set sub-day queue limits. I recently helped setup one site that had queue limits set at 12 hours. Another large organization near the top of the top 500 list does this as well. This means that codes need check-pointing. Although we are all waiting for the holy grail of system level check-pointing, the odds of that being implemented consistently across architectures AND not have a significant performance hit is unlikely. This means that researchers have to also be software engineers. If they want to get real work done, adding check-pointing is one of the steps. As one operations manager at a major HPC site once said to me 'codes that don't support check-pointing aren't real codes'. Allowing users to run for days or weeks as SOP is begging for failure. Did that sysadmin who set 24 hour time limits ever analyze the amount of lost computational time because of larger time limits? Craig -- Craig Tierney (craig.tierney@noaa.gov) From nixon at nsc.liu.se Thu Jan 17 00:31:42 2008 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: <478E3BE2.8060301@noaa.gov> (Craig Tierney's message of "Wed\, 16 Jan 2008 10\:16\:18 -0700") References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Craig Tierney writes: > Allowing users to run for days or weeks as SOP is begging for failure. Define failure. Our time limit is typically somewhere around 5 or 6 days. Many codes don't have checkpointing, and it's often simply not possible to add it because you don't have access to the source code. With backfill scheduling, short and narrow jobs typically don't have to wait *that* long, at least with the job mixture we see. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From nixon at nsc.liu.se Thu Jan 17 00:52:02 2008 From: nixon at nsc.liu.se (Leif Nixon) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E21D9.60900@tamu.edu> (Gerry Creager's message of "Wed\, 16 Jan 2008 09\:25\:13 -0600") References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> <478E21D9.60900@tamu.edu> Message-ID: Gerry Creager writes: > I'm constantly reminded of a meeting early on in the SCOOP project, > which I participate in (http://scoop.sura.org). "We're able to > virtualize our model applications using VMware and only see a 13% > performance hit". Oops. Please note that the VMware license agreement forbids the users to publish benchmark figures, unless the benchmark method has been cleared with VMware beforehand. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From Hakon.Bugge at scali.com Thu Jan 17 02:09:59 2008 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <200801162000.m0GK08wZ023932@bluewest.scyld.com> References: <200801162000.m0GK08wZ023932@bluewest.scyld.com> Message-ID: <20080117101001.0F05F35AEA2@mail.scali.no> At 21:00 16.01.2008, Greg Lindahl wrote: >Use an MPI which does this for you? > >Two examples are InfiniPath MPI and OpenMPI. .. and another is Scali MPI Connect. We do it in two dimensions; latency or bandwidth policy, that is to use as few or many sockets as possible. Once that is selected, the resolution can be defined as a hyperthread, core (all HTs constituting a core), socket (all cores constituting a socket), or a node (all sockets on a node). The resolution is important for hybrid application; on a dual-socket, quad-core system, you can specify bandwidth policy and socket resolution staring two MPI processes. The first rank will be bound to all the cores on the first socket, the second on all the cores on the other socket. Further, the decision on which cores/sockets to use is determined dynamically, so multiple MPI instances on the same node is supported. Thanks, Hakon From Bogdan.Costescu at iwr.uni-heidelberg.de Thu Jan 17 05:53:36 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: On Wed, 16 Jan 2008, Craig Tierney wrote: > Our queue limits are 8 hours. > ... > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? While I agree with the idea and reasons of short job runtime limits, I disagree with your formulation. Being many times involved in discussions about what runtime limits should be set, I wouldn't make myself a statement like yours; I would say instead: YMMV. In other words: choose what fits better the job mix that users are actually running. If you have determined that 8h max. runtime is appropriate for _your_ cluster and increasing it to 24h would lead to a waste of computational time due to the reliability of _your_ cluster, then you've done your job well. But saying that everybody should use this limit is wrong. Furthermore, although you mention that system-level checkpointing is associated with a performance hit, you seem to think that user-level checkpointing is a lot lighter, which is most often not the case. Apart from the obvious I/O limitations that could restrict saving & loading of checkpointing data, there are applications for which developers have chosen to not store certain data but recompute it every time it is needed because the effort of saving, storing & loading it is higher than the computational effort of recreating it - but this most likely means that for each restart of the application this data has to be recomputed. And smaller max. runtimes mean more restarts needed to reach the same total runtime... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From dnlombar at ichips.intel.com Thu Jan 17 08:34:19 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: <20080117163419.GA27510@nlxdcldnl2.cl.intel.com> On Thu, Jan 17, 2008 at 02:53:36PM +0100, Bogdan Costescu wrote: > On Wed, 16 Jan 2008, Craig Tierney wrote: > > >Our queue limits are 8 hours. > >... > >Did that sysadmin who set 24 hour time limits ever analyze the amount > >of lost computational time because of larger time limits? > > While I agree with the idea and reasons of short job runtime limits, I > disagree with your formulation. Being many times involved in > discussions about what runtime limits should be set, I wouldn't make > myself a statement like yours; I would say instead: YMMV. In other > words: choose what fits better the job mix that users are actually > running. If you have determined that 8h max. runtime is appropriate > for _your_ cluster and increasing it to 24h would lead to a waste of > computational time due to the reliability of _your_ cluster, then > you've done your job well. But saying that everybody should use this > limit is wrong. Completely agree. > Furthermore, although you mention that system-level checkpointing is > associated with a performance hit, you seem to think that user-level > checkpointing is a lot lighter, which is most often not the case. Hmmm. A system level checkpoint must save the complete state of the process to be checkpointed plus all of its siblings/children plus varying amounts of external state; a machine level checkpoint must save complete machine(s) state. A user level checkpoint need only save the data that define the current state--that could well be a small set of values. Having written that, it may be *easier* (even cheaper) to expend the resources to save the complete state than to restructure some suitably complex code to expose a restart state. I certainly know an application that fits that model during most of its runtime. But, at the end of the day, that is just trading runtime for design/coding/validation time and the notion's validity depends on which side of the operation you sit. Consider this though, if as an admin you only rely on user- level checkpoint, you *will* end up with an argument from one or more users about the maximum runtime at some point; with a system (or machine) checkpoint, you'll likely avoid a lot of agida[1], especially when unplanned or emergency outages/reprioritzations occur. > Apart from the obvious I/O limitations that could restrict saving & > loading of checkpointing data, there are applications for which > developers have chosen to not store certain data but recompute it > every time it is needed because the effort of saving, storing & > loading it is higher than the computational effort of recreating it - > but this most likely means that for each restart of the application > this data has to be recomputed. And smaller max. runtimes mean more > restarts needed to reach the same total runtime... As you note, only the application can know that it's easier to recompute than save and restore. I suspect many of us can site specific examples where it's easier to recompute; some could probably also cite cases where recomputing is faster too... [1] Hearburn, indigestion, general upset or agitation. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From Craig.Tierney at noaa.gov Thu Jan 17 08:43:19 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: <478F85A7.9040600@noaa.gov> Bogdan Costescu wrote: > On Wed, 16 Jan 2008, Craig Tierney wrote: > >> Our queue limits are 8 hours. >> ... >> Did that sysadmin who set 24 hour time limits ever analyze the amount >> of lost computational time because of larger time limits? > > While I agree with the idea and reasons of short job runtime limits, I > disagree with your formulation. Being many times involved in discussions > about what runtime limits should be set, I wouldn't make myself a > statement like yours; I would say instead: YMMV. In other words: choose > what fits better the job mix that users are actually running. If you > have determined that 8h max. runtime is appropriate for _your_ cluster > and increasing it to 24h would lead to a waste of computational time due > to the reliability of _your_ cluster, then you've done your job well. > But saying that everybody should use this limit is wrong. First all I agree that it is always a YMMV case. We good about that here (the list). My point was, that in every instance that I have seen, multi-day queue limits are not the norm. Those places do have exceptions for particular codes and particular projects. I know our system would handle 24h queues in terms of reliability, but with the job mix we have, it would cause problems beyond stability (we are currently looking at a new scheduler to solve that problem). > > Furthermore, although you mention that system-level checkpointing is > associated with a performance hit, you seem to think that user-level > checkpointing is a lot lighter, which is most often not the case. There was an assumption in my statement that I didn't share with people. I was thinking about system-level checkpointing that will probably work for clusters which will be some sort of VM based solution. That will have the overhead of the virtual machine as well as moving the data when the time comes. > Apart > from the obvious I/O limitations that could restrict saving & loading of > checkpointing data, there are applications for which developers have > chosen to not store certain data but recompute it every time it is > needed because the effort of saving, storing & loading it is higher than > the computational effort of recreating it - but this most likely means > that for each restart of the application this data has to be recomputed. Yes, but didn't you just say the recomputing that data are faster than the IO time associated with reading it? A checkpoint isn't model results. A checkpoint is a state of the model at a particular time, so in this case you would save that data. Its already in memory, you just need to write it out with every other bit of relevant information. No extra needed computations. > And smaller max. runtimes mean more restarts needed to reach the same > total runtime... > Yes, anytime you are doing something other than the model run (like checkpointing) your run will take longer. This is another one of those "it depends" scenario. If the runtime takes 1% longer, and it makes the other users happier or lessens the loss due to an eventual crash, is it worth it? The 1% number is a target I would design for, based on the workload we experience (multitude of different sized jobs, not one big job). I would buy a couple of nodes with 3ware cards and run either Lustre or PVFS2 over it for a place to dump the checkpoints. The filesystem would be mostly volatile (so redundancy wouldn't be critical), and would more than meet the reliability requirements of my system (>97%). Craig -- Craig Tierney (craig.tierney@noaa.gov) From smulcahy at aplpi.com Fri Jan 18 00:57:45 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3415.6040208@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> Message-ID: <47906A09.2040908@aplpi.com> Shannon V. Davidson wrote: > > Michael H. Frese wrote: >> At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >>> - With multi-core processors, to get the best performance you want to >>> assign a process to a core. >> >> Excuse my ignorance, please, but can someone tell me how to do that on >> Linux (2.6 kernels would be fine)? > > sched_setaffinity(2) > taskset(1) > numactl(1) Hi, As an aside to this, do 2.6 kernels make some efforts to keep a process on a specific core anyways recognising the benefits to the cache of doing so (I suspect they do but maybe I just dreamed it up)? As a further aside, some MPI libraries (OpenMPI comes to mind) seem to make some efforts to keep processes on the same cores also (or can be instructed to via a run-time option). I'm wondering how much of a performance benefit there is to using the above-mentioned OS commands to set affinity (versus the trade-off in setting this up). -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From hahn at mcmaster.ca Fri Jan 18 12:15:16 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <47906A09.2040908@aplpi.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: > As an aside to this, do 2.6 kernels make some efforts to keep a process on a > specific core anyways recognising the benefits to the cache of doing so (I > suspect they do but maybe I just dreamed it up)? yes - the code is pretty reasonable, though probably more tuned towards typical desktop/webhost-type applications. there are affinity heuristics for managing which core a proc will be run on, as well as for guiding memory allocation on numa machines. (pretty soon, of course, all multi-socket machines will be numa and need these issues handled...) From mathog at caltech.edu Fri Jan 18 12:43:45 2008 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] network question Message-ID: The questions is: do modern networks bundle multiple TCP "ack" together into a single packet? If so, on linux does ifconfig count all N acks in that single packet as if they were separate packets? Here's the background: I have been modifying nettee lately and ran across something which was a bit mysterious. The initial observation had TCP_NODELAY set on the data line. This was a mistake but it was largely being compensated for by a minwrite variable which controlled how big the buffer had to be before it was emptied by a read. Anyway, when running in that mode these 3 tests A->B B->C A->B->C were performed. (All 3 are on a single 100baset switch.) The first two ran at "full speed" (11.x Mb/sec) and the third much slower. Which is odd, since B could read and write at "full speed", just not both at the same time. So to work on this issue runs on B were instrumented like this: ifconfig eth0 | grep packet; nettee... ; ifconfig eth0 and the RX/TX counts compared before and after to see how many packets moved in/out on each test. For A->B, on B there were 27857 packets in and 13963 packets out. For A->B->C on B there were 41887 packets in and 42441 packets out. Since the sum of in + out on B for the first test is 41820, which is very close to 41887 for B in the second test, there are definitely a lot of ack packets coming back to B from C, and that seemed likely to be the problem. Only it wasn't, at least according to ifconfig. By varying the minwrite parameter described above the nettee throughput on B (and so the whole 3 member chain) could be adjusted to "full speed". The same speed was obtained by not setting TCP_NODELAY, in which case the minwrite parameter made no difference. Oddly, in these configurations where the program ran fastest the RX/TX counts changed only very slightly and not as I expected. In one typical "optimized" relay B had 41744 in RX and 41822 in TX. These numbers are only very slightly different from the unoptimized example shown above. The one way in which they were remarkable, and this could just be a coincidence, was that for the highest transfer rates the observed RX/TX ratio was closer to 1.0 than for other configurations. So my best guess for explaining how the data rate could increase so much is that there really were fewer packets, and ifconfig has somehow concealed this fact by breaking the multiple acks out and counting them as separate packets. Or is there something else going on? I also discovered that "athcool" really screws up network throughput on these Athlon machines. Dropping from about 11.4 Mb/sec to about 6.7 Mb/sec. We run a script that keeps track of CPU usage and shuts athcool off when CPU use peaks, but nettee only reached 20-30 percent of CPU so that never kicked in. I wonder if Athlon64's have the same issue when they are in their power saving mode, but have not run tests yet to find out. Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From bernard at vanhpc.org Fri Jan 18 14:53:26 2008 From: bernard at vanhpc.org (Bernard Li) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <47906A09.2040908@aplpi.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: On 1/18/08, stephen mulcahy wrote: > I'm wondering how much of a performance benefit there is to using the > above-mentioned OS commands to set affinity (versus the trade-off in > setting this up). I guess the answer to the above question is "it depends on your code" -- but I'd also like to hear whether there are any general performance benefits to setting CPU affinity. Do major schedulers support this? Would this help with embarrassingly parallel jobs VS large MPI jobs on manycore machines? Thanks, Bernard From geoff at galitz.org Sun Jan 20 10:40:50 2008 From: geoff at galitz.org (Geoff) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] who is buying those $200 PCs from wal-mart? In-Reply-To: <6.2.3.4.2.20071115101758.02f266d8@mail.jpl.nasa.gov> References: <6.2.3.4.2.20071115101758.02f266d8@mail.jpl.nasa.gov> Message-ID: Am 15.11.2007, 19:55 Uhr, schrieb Jim Lux : > > > That dissatisfaction is among the small subset of consumers who read > Slashdot or this list or who write for and read those magazines. For > them Vista is a pain. > Just my $.02 worth... even me for Vista is not a pain. I use it in my work to run VMware workstation so I can write and test the tools needed to manage my clusters. I use putty to connect to them, deploy my tools and use them. The $200 PC would surely be underpowered to run something like VMware Workstation but it does mean that genuine real work can happen on that platform... just not running 3 or 4 VM's at once like I do. Running a single VM for that kind of work would be ok, I think. At my old lab we had about a dozen boxes whose purpose was to prepare jobs for cluster submission. No compilation was required, it was all defining parameters and visualization. There is definitely room for these lower power machines in the universe. Even if they do run Vista. -geoff -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From geoff at galitz.org Sun Jan 20 10:42:03 2008 From: geoff at galitz.org (Geoff) Date: Wed Nov 25 01:06:48 2009 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Interesting. We (and by we, I refer to my time at UC Berkeley College of Chemistry) used to implement multiple queues with various time restrictions to accomdate short, medium, long and extended run jobs. It was an honor to system to be sure, but I spent a great amount of time working with the researchers on an indvidual level to foster the trust that an honor system needs. There was also a little logic to allow submitted jobs to skew towards one end of the spectrum if the cluster was not fully utilized, and not expected to be so. Working that closely with folks also allowed us to chart cluster usage for about a month (and sometimes much more) so we can tweak cluster policy if appropriate. It worked out for the most part, but there was the occasional scofflaw. With the trust relationship I had with the researchers, we could usually nag the scofflaws back into line. Layer 8 issues can certainly lead to trouble, but it can also be used to your advantage! Just a personal observation. I realize this kind of thing would not work everywhere. -geoff > > Our queue limits are 8 hours. They are set this way for two reasons. > First, we have real time jobs that need to get through the queues and > we believe that allowing significantly longer jobs would block those > really important jobs. Second, for a multi-user system, it isn't very > fair for a user to run multi-day jobs and prevent shorter jobs from > getting > in. It is about being fair. Use the resource and then get back in line. > > I know that at other US Government facilities it is common practice to > set sub-day queue limits. I recently helped setup one site that had > queue limits set at 12 hours. Another large organization near the top > of the top 500 list does this as well. > > This means that codes need check-pointing. Although we are all waiting > for the holy grail of system level check-pointing, the odds of that being > implemented consistently across architectures AND not have a significant > performance hit is unlikely. This means that researchers have to also be > software engineers. If they want to get real work done, adding > check-pointing > is one of the steps. As one operations manager at a major HPC site once > said > to me 'codes that don't support check-pointing aren't real codes'. > > Allowing users to run for days or weeks as SOP is begging for failure. > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? > > Craig > -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From geoff at galitz.org Sun Jan 20 10:42:05 2008 From: geoff at galitz.org (Geoff) Date: Wed Nov 25 01:06:48 2009 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Interesting. We (and by we, I refer to my time at UC Berkeley College of Chemistry) used to implement multiple queues with various time restrictions to accomdate short, medium, long and extended run jobs. It was an honor to system to be sure, but I spent a great amount of time working with the researchers on an indvidual level to foster the trust that an honor system needs. There was also a little logic to allow submitted jobs to skew towards one end of the spectrum if the cluster was not fully utilized, and not expected to be so. Working that closely with folks also allowed us to chart cluster usage for about a month (and sometimes much more) so we can tweak cluster policy if appropriate. It worked out for the most part, but there was the occasional scofflaw. With the trust relationship I had with the researchers, we could usually nag the scofflaws back into line. Layer 8 issues can certainly lead to trouble, but it can also be used to your advantage! Just a personal observation. I realize this kind of thing would not work everywhere. -geoff PS, sorry for any duplicate copies of this email, I am having some ISP issues this week. Am 16.01.2008, 18:16 Uhr, schrieb Craig Tierney : > Geoff wrote: >> > > ..Interesting discussion deleted.. > >> As a funny aside, I once knew a sysadmin who applied 24 hour >> timelimits to all queues of all clusters he managed in order to force >> researchers to think about checkpoints and smart restarts. I couldn't >> understand why so many folks from his particular unit kept asking me >> about arrays inside the scheduler submission scripts and nested >> commends until I found that out. Unfortunately I came to the >> conclusion that folks in his unit were spending more time writing job >> submission scripts than code... well... maybe that is an exaggeration. >> > > Our queue limits are 8 hours. They are set this way for two reasons. > First, we have real time jobs that need to get through the queues and > we believe that allowing significantly longer jobs would block those > really important jobs. Second, for a multi-user system, it isn't very > fair for a user to run multi-day jobs and prevent shorter jobs from > getting > in. It is about being fair. Use the resource and then get back in line. > > I know that at other US Government facilities it is common practice to > set sub-day queue limits. I recently helped setup one site that had > queue limits set at 12 hours. Another large organization near the top > of the top 500 list does this as well. > > This means that codes need check-pointing. Although we are all waiting > for the holy grail of system level check-pointing, the odds of that being > implemented consistently across architectures AND not have a significant > performance hit is unlikely. This means that researchers have to also be > software engineers. If they want to get real work done, adding > check-pointing > is one of the steps. As one operations manager at a major HPC site once > said > to me 'codes that don't support check-pointing aren't real codes'. > > Allowing users to run for days or weeks as SOP is begging for failure. > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? > > Craig > -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From raysonlogin at gmail.com Sun Jan 20 19:20:33 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: <73a01bf20801201920q3ac7b647q335e1cd13bec4cdb@mail.gmail.com> On Jan 18, 2008 5:53 PM, Bernard Li wrote: > -- but I'd also like to hear whether there are any general performance > benefits to setting CPU affinity. Do major schedulers support this? > Would this help with embarrassingly parallel jobs VS large MPI jobs on > manycore machines? I am working on adding processor affinity support for serial and parallel jobs for Grid Engine, and I am working with the OpenMPI developers to define an interface. http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=27044 http://www.open-mpi.org/community/lists/devel/2008/01/2949.php http://www.open-mpi.org/community/lists/devel/2008/01/2964.php BTW, LSF 7.0.2 supports processor affinity for serial jobs. However, supporting processor affinity for serial jobs is only useful when the OS scheduler is dumb... See also: "Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support" -- this paper talks about the support of processor affinity in SLURM: http://www.cs.huji.ac.il/~feit/parsched/jsspp07/p2-balle.pdf Rayson > > Thanks, > > Bernard > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From forum.san at gmail.com Sun Jan 20 21:43:36 2008 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] For GROMACS users Message-ID: Hi, I'm a Linux guy and now it is required to install Gromacs on Solaris 10, x86 system. I faced lot of problems but still not installed. I think this is because, the binaries are not in the path. Don't know where the binaries are available in Solaris. I downloaded the binutils from Gromacs website, but the installation gave following error: ls/windres ] ; then echo $r/./binutils/windres ; else if [ ' i386-pc-solaris2.10' = 'i386-pc-solaris2.10 ' ] ; then echo windres; else echo windres ; fi; fi`" "CONFIG_SHELL=/bin/sh" "MAKEINFO=`if [ -f $r/build-i386-pc-solaris2.10/texinfo/makeinfo/Makefile ] ; then echo $r/build-i386-pc-solaris2.10 /texinfo/makeinfo/makeinfo ; else if (makeinfo --version | egrep 'texinfo[^0-9]*([1-3][0-9]|4\.[2-9]|[5-9])') >/dev/null 2>&1; then echo makeinfo; else echo $s/missing makeinfo; fi; fi` --split-size=5000000" 'AR=ar' 'AS=as' 'CC=gcc' 'CXX=c++' 'DLLTOOL=dlltool' 'LD=/usr/ccs/bin/ld' 'NM=nm' 'RANLIB=ranlib' 'WINDRES=windres' install) make: Fatal error: Command failed for target `install-binutils' Can any one guide me to install gromacs on Solaris? regards, Sangamesh HPC Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080121/194ae225/attachment.html From toon.knapen at gmail.com Mon Jan 21 22:49:03 2008 From: toon.knapen at gmail.com (Toon Knapen) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] how to detect boundedness Message-ID: I would like to ask you all what your preferred method is to detect if and how strongly an application is cpu-, memory- or I/O-bound. Do you 1) just run the app. on different machines (with diff. characteristics) 2) use the profiler 3) use hardware monitors such as cache-miss rate, ... .... Thanks in advance, toon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080122/349c4c11/attachment.html From smulcahy at aplpi.com Tue Jan 22 00:38:48 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] how to detect boundedness In-Reply-To: References: Message-ID: <4795AB98.1050306@aplpi.com> Toon Knapen wrote: > I would like to ask you all what your preferred method is to detect if > and how strongly an application is cpu-, memory- or I/O-bound. Do you > 1) just run the app. on different machines (with diff. characteristics) > 2) use the profiler > 3) use hardware monitors such as cache-miss rate, ... > .... > Hi, I'm inclined to use a bunch of tools (htop, dstat, vmstat, iostat, free, ganglia) to get a picture of whats happening on the system while my app is running and then start making some inferences from the behaviour I observe. Having the ability to test your application on hardware with different characteristics after coming to some tentative conclusions about the charactertistics of your application sounds like a good option but could be pretty time consuming. -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From carsten.aulbert at aei.mpg.de Wed Jan 23 01:43:42 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Vendor/Distributor for customized ethernet cables? Message-ID: <47970C4E.7010703@aei.mpg.de> Hi, we need a few thousand cables (mix of Cat5e and/or Cat6) but I have a very hard time finding a distributor which can offer me other lengths than the "standard European" .5, 1.0 and 2.0m. But I need cables with various lengths, about half of the cables ranging from 50 to 100cm and the other half from 50 to 200cm. Thus my question, are you willing to share your secret with me where to buy those cables in other lengths and with acceptable time frames, i.e. only a few weeks instead of 6 week lead time plus 1-6 weeks delivery depending on air or ship freight? Please reply either on the list or privately, whatever you prefer. Companies form within the EU or North America are preferred. Thanks a lot Carsten From andrew at moonet.co.uk Wed Jan 23 07:02:36 2008 From: andrew at moonet.co.uk (andrew holway) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] fast disks for fluent Message-ID: hi, We were thinking of using tempfs to make fast scratch disks for fluent. Has anyone done this or any other method to ease the disk bottleneck. Thanks Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080123/37b19bb8/attachment.html From landman at scalableinformatics.com Wed Jan 23 07:31:24 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] fast disks for fluent In-Reply-To: References: Message-ID: <47975DCC.8020402@scalableinformatics.com> andrew holway wrote: > hi, > > We were thinking of using tempfs to make fast scratch disks for fluent. Has > anyone done this or any other method to ease the disk bottleneck. Without turning this into an advertisement, we have used our JackRabbit system as the launch node for Fluent jobs for a few customers. They regularly work with 20-30 GB case files, and this has made a significant (positive) impact upon their work. I haven't seen it in fluent 6.3, but there has been some talk of parallel IO at some point. The issue with tmpfs is that it is a ram disk, so you are trading ram for storage space. You could potentially starve the fluent run with low memory conditions while doing this. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From laytonjb at charter.net Wed Jan 23 08:31:45 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] fast disks for fluent In-Reply-To: References: Message-ID: <47976BF1.3020402@charter.net> andrew holway wrote: > hi, > > We were thinking of using tempfs to make fast scratch disks for > fluent. Has anyone done this or any other method to ease the disk > bottleneck. Can you describe the problem you are running? If it's not something like an LES, time accurate, or unsteady problem, then the IO requirements are minimal (compared to the run time). But if it's one of these problems, then IO can be a bottleneck. If this is the case, can you describe the hardware you are using, the problem size, the number of cores (nodes) you are using, etc. Then we (the list) can look at some options for you. The next version of Fluent (call it 6.4 for the sake of argument) will have parallel IO capability (MPI-IO). It's going to be tuned for several specific parallel storage systems, but should work for other systems. The IO improvements are really good if you have large problems, not so much for smaller problems. Jeff From laytonjb at charter.net Thu Jan 24 13:50:20 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Any Slurm experts out there? Message-ID: <4799081C.10502@charter.net> Afternoon all, Are there any Slurm experts out there? I'm playing with Slurm for the first time and need some help converting PBS scripts to Slurm scripts. Thanks! Jeff From hahn at mcmaster.ca Thu Jan 24 15:39:49 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Any Slurm experts out there? In-Reply-To: <4799081C.10502@charter.net> References: <4799081C.10502@charter.net> Message-ID: > Are there any Slurm experts out there? I'm playing with Slurm > for the first time and need some help converting PBS scripts to > Slurm scripts. I've gotten pretty familiar with slurm, but otoh I'm opposed to job scripts, and normally use slurm "inline". (I strongly believe that the queueing system should be a prefix to the user's command, as if they were running it directly. ie: sqsub -o outlog ./myserialprog sqsub -o outlog -i in -e err --mail -q mpi -n 32 -N 8 ./mpihello --verbose that sort of thing. maybe this drives our users nuts, and they're just hankering to screw around with pesky little scripts. donno. it drives me crazy when some user discovers that we haven't gotten around to removing the PBS compatibility commands that LSF supplies. the problem is that LSF then records the literal content of the PBS script as the job's command. and that means that our unified sql db of all jobs have garbage for some jobs, rather than a meaningful string that hints at being nwchem, blast, etc.) sorry for the peeve-venting, but thanks for the opportunity ;) From eagles051387 at gmail.com Wed Jan 23 08:35:33 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] rendering cluster Message-ID: i am about to begin creating a small cluster for someone whose a graphic designer and does alot of rendering on large files. what would be the easiest way to go about setting up a cluster like this. i was thinking of setting it up and configuring it from scratch using kubuntu linux. -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080123/2437423c/attachment.html From bernard at vanhpc.org Wed Jan 23 12:10:57 2008 From: bernard at vanhpc.org (Bernard Li) Date: Wed Nov 25 01:06:48 2009 Subject: CPU affinity for serial jobs (was Re: [Beowulf] VMC - Virtual Machine Console) Message-ID: Hi Rayson: On 1/20/08, Rayson Ho wrote: Long time no talk! > I am working on adding processor affinity support for serial and > parallel jobs for Grid Engine, and I am working with the OpenMPI > developers to define an interface. > > http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=27044 > http://www.open-mpi.org/community/lists/devel/2008/01/2949.php > http://www.open-mpi.org/community/lists/devel/2008/01/2964.php > > BTW, LSF 7.0.2 supports processor affinity for serial jobs. However, > supporting processor affinity for serial jobs is only useful when the > OS scheduler is dumb... > > See also: "Enhancing an Open Source Resource Manager with > Multi-Core/Multi-threaded Support" -- this paper talks about the > support of processor affinity in SLURM: > http://www.cs.huji.ac.il/~feit/parsched/jsspp07/p2-balle.pdf Thanks for the information. I get the sense that CPU affinity is beneficial even for embarrassingly/serial jobs -- however I am curious whether anybody has actual numbers to back this? And is the potential benefits worth the time/effort to set this up rather than let the default Linux scheduler deal with it. Cheers, Bernard From forum.san at gmail.com Fri Jan 25 06:41:20 2008 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Tight MPICH2 Integration with SGE Message-ID: Hi all, I'm doing the Tight MPICH2 (not MPICH) Integration with SGE on a cluster with, dual core dual AMD64 opteron processor. Followed the sun document located at: http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html The document explains following three kinds of TI: Tight Integration(TI) using Process Manager(PM): gforker TI using PM: SMPD ? Daemonless TI using PM: SMPD ? Daemonbased I did the TI with gforker and tested it successfully. But failed to do TI with daemonless-SMPD. Let me explain what I did. Installed the MPICH2 with smpd configuration. The sge is installed at: /opt/gridengine And created MPICH2-SM folder in /opt/gridengine/mpi by referring the following lines from the document start_proc_args /usr/sge/mpich2_smpd_rsh/startmpich2.sh -catch_rsh $pe_hostfile stop_proc_args /usr/sge/mpich2_smpd_rsh/stopmpich2.sh Copied the startmpi.sh, stopmpi.sh from /opt/gridengine/mpi to /opt/gridengine/mpi/MPICH2-SM dir, because nothing has given in the doc what to include in these scripts. Using qmon, created MPICH2-GF pe. # qconf -sp MPICH2-SM pe_name MPICH2-SM slots 999 user_lists rootuserset xuser_lists NONE start_proc_args /opt/gridengine/mpi/MPICH2-SM/startmpich2sm.sh stop_proc_args /opt/gridengine/mpi/MPICH2-SM/stopmpich2sm.sh allocation_rule $round_robin control_slaves FALSE job_is_first_task TRUE urgency_slots min Added this PE to default queue all.q. Then submitted the job with following script: # cat sgeSM.sh #!/bin/sh #$ -cwd #$ -pe MPICH2-SM 4 #$ -e msge2.Err #$ -o msge2.out #$ -v MPI_HOME=/opt/MPI_LIBS/MPICH2-GNU/MPICH2-SM/bin #$ -v MEME_DIRECTORY=/opt/MEME-MAX $MPI_HOME/mpiexec -np 4 -machinefile /root/MFM /opt/MEME-MAX/bin/meme_p /opt/MEME-MAX/NCCS/samevivo_sample.txt -dna -mod tcm -nmotifs 10 -nsites 100 -minw 5 -maxw 50 -revcomp -text -maxsize 200500 It gave following error: # cat msge2.Err startmpich2sm.sh: got wrong number of arguments rm: cannot remove `/tmp/92.1.all.q/machines': No such file or directory rm: cannot remove `/tmp/92.1.all.q/rsh': No such file or directory I guess the problem might be with the scripts startmpich2sm.sh and stopmpich2sm.sh. Can any one guide me to resolve this issue.. Thanks & Regards, Sangamesh HPC Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080125/279bdef4/attachment.html From john.leidel at gmail.com Sat Jan 26 15:51:47 2008 From: john.leidel at gmail.com (John Leidel) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] rendering cluster In-Reply-To: References: Message-ID: <1201391507.4863.9.camel@e521.site> I would suggest using the Rocks toolkit + the Animation Studio Roll. check out: www.rocksclusters.org http://www.rocksclusters.org/ftp-site/pub/rocks/beta/4.3/ news post: http://insidehpc.com/2007/10/15/rocks-teamthaigrid-release-animation-studio-roll/ On Wed, 2008-01-23 at 17:35 +0100, Jon Aquilina wrote: > i am about to begin creating a small cluster for someone whose a > graphic designer and does alot of rendering on large files. what would > be the easiest way to go about setting up a cluster like this. i was > thinking of setting it up and configuring it from scratch using > kubuntu linux. > > -- > Jonathan Aquilina > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Sat Jan 26 16:02:22 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:48 2009 Subject: CPU affinity for serial jobs (was Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: References: Message-ID: <479BCA0E.7040801@scalableinformatics.com> Bernard Li wrote: > > I get the sense that CPU affinity is beneficial even for > embarrassingly/serial jobs -- however I am curious whether anybody has Well, yes it is. > actual numbers to back this? And is the potential benefits worth the Here are some, more about process CPU->memory affinity than process CPU->cache affinity... I am forcing the memory to be on a different CPU than running the code: landman@dualcore:/big/stream-jb-2006-6-8$ numactl --physcpubind=1 --membind=1 ./stream_d_c_omp_x86_64 Function Rate (MB/s) RMS time Min time Max time Copy: 3122.6206 0.2050 0.2050 0.2052 Scale: 3107.5644 0.2061 0.2059 0.2065 Add: 3118.4948 0.3080 0.3078 0.3084 Triad: 3107.2203 0.3091 0.3090 0.3093 I am forcing the memory to be on the same CPU running the code: landman@dualcore:/big/stream-jb-2006-6-8$ numactl --physcpubind=1 --membind=0 ./stream_d_c_omp_x86_64 Function Rate (MB/s) RMS time Min time Max time Copy: 3705.1849 0.1729 0.1727 0.1734 Scale: 3954.8385 0.1619 0.1618 0.1623 Add: 3893.1524 0.2468 0.2466 0.2472 Triad: 3857.3264 0.2490 0.2489 0.2493 This is more of a memory affinity issue, you want your code running on the memory local to the particular CPU. A decade ago, we saw all sorts of performance degredation when processes would migrate from CPU to CPU on the big SGI machines, partially defeating the utility of cache (not to mention forcing a huge amount of inter-cpu traffic when you had massive invalidation storms as a result of the scheduler moving the process). Simply having the scheduler leave the process on the same CPU turned out to be a significant win. Especially for long running jobs that were very cache intensive (e.g. such as EP codes, Monte Carlo, ...) > time/effort to set this up rather than let the default Linux scheduler > deal with it. The linux scheduler is actually quite reasonable about this (these days, with a modern kernel). Some older kernels had problems. > > Cheers, > > Bernard -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From dag at sonsorol.org Sat Jan 26 19:40:40 2008 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Tight MPICH2 Integration with SGE In-Reply-To: References: Message-ID: Hi Sangamesh, First things first - Not sure if this affects you but the mpich2-1.06p1 release does not currently work with tight SGE integration. The specific SGE mailing list thread where this is discussed is linked to from here: http://gridengine.info/articles/2008/01/25/tight-mpich2-integration-broken-with-mpich2-1-0-6p1 Another problem I see is inside your job script: >> $MPI_HOME/mpiexec -np 4 -machinefile /root/MFM /opt/MEME-MAX/bin/ >> meme_p /opt/MEME-MAX/NCCS/samevivo_sample.txt -dna -mod tcm - >> nmotifs 10 -nsites 100 -minw 5 -maxw 50 -revcomp -text -maxsize >> 200500 In this command you are explicitly asking for 4 CPUs and you are hard- coding in the path to a MPI machines file. This makes nonsense of the entire concept of Grid Engine MPICH integration, the whole point which is to allow the SGE scheduler to control how many CPUs you job gets and (more importantly) where those CPUs actually are. Your mpiexec command needs to take the value for "-np" and the value for "-machinefile" from the SGE scheduler. This is done via environment variables. Your command should probably look something like this: $MPI_HOME/mpiexec -np $NSLOTS -machinefile $TMPDIR/machines Finally, your PE configuration does not match what you say is in the documentation: > start_proc_args /usr/sge/mpich2_smpd_rsh/startmpich2.sh -catch_rsh > $pe_hostfile vs. > start_proc_args /opt/gridengine/mpi/MPICH2-SM/startmpich2sm.sh I would guess that not passing $pe_hostfile to startmpich2.sh in your start_proc_args is probably the reason for the specific error you quote. So my specific advice boils down to: (1) Make sure you are not using the MPICH2 that has been causing problems for SGE people recently (2) Fix your SGE job script by adding in "-np $NSLOTS" and "- machinefile $TMPDIR/machines" (3) Pass the parameter $pe_hostfile to your start_proc_args line in your parallel environment (PE) config Regards, Chris On Jan 25, 2008, at 9:41 AM, Sangamesh B wrote: > Hi all, > > I'm doing the Tight MPICH2 (not MPICH) Integration with SGE on > a cluster with, dual core dual AMD64 opteron processor. > > Followed the sun document located at: > > http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html > > The document explains following three kinds of TI: > Tight Integration(TI) using Process Manager(PM): gforker > TI using PM: SMPD ? Daemonless > TI using PM: SMPD ? Daemonbased > > I did the TI with gforker and tested it successfully. > > > But failed to do TI with daemonless-SMPD. > > Let me explain what I did. > > Installed the MPICH2 with smpd configuration. > > The sge is installed at: /opt/gridengine > > And created MPICH2-SM folder in /opt/gridengine/mpi by referring the > following lines from the document > > start_proc_args /usr/sge/mpich2_smpd_rsh/startmpich2.sh -catch_rsh > $pe_hostfile > stop_proc_args /usr/sge/mpich2_smpd_rsh/stopmpich2.sh > Copied the startmpi.sh, stopmpi.sh from /opt/gridengine/mpi to /opt/ > gridengine/mpi/MPICH2-SM dir, because nothing has given in the doc > what to include in these scripts. > > Using qmon, created MPICH2-GF pe . > > # qconf -sp MPICH2-SM > pe_name MPICH2-SM > slots 999 > user_lists rootuserset > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/MPICH2-SM/startmpich2sm.sh > stop_proc_args /opt/gridengine/mpi/MPICH2-SM/stopmpich2sm.sh > allocation_rule $round_robin > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > > Added this PE to default queue all.q . > > Then submitted the job with following script: > > # cat sgeSM.sh > #!/bin/sh > > #$ -cwd > > #$ -pe MPICH2-SM 4 > > #$ -e msge2.Err > > #$ -o msge2.out > > #$ -v MPI_HOME=/opt/MPI_LIBS/MPICH2-GNU/MPICH2-SM/bin > > #$ -v MEME_DIRECTORY=/opt/MEME-MAX > > $MPI_HOME/mpiexec -np 4 -machinefile /root/MFM /opt/MEME-MAX/bin/ > meme_p /opt/MEME-MAX/NCCS/samevivo_sample.txt -dna -mod tcm -nmotifs > 10 -nsites 100 -minw 5 -maxw 50 -revcomp -text -maxsize 200500 > > It gave following error: > > # cat msge2.Err > > startmpich2sm.sh: got wrong number of arguments > rm: cannot remove `/tmp/92.1.all.q/machines': No such file or > directory > rm: cannot remove `/tmp/92.1.all.q/rsh': No such file or directory > > I guess the problem might be with the scripts startmpich2sm.sh and > stopmpich2sm.sh. > > Can any one guide me to resolve this issue.. > > Thanks & Regards, > Sangamesh > HPC Engineer > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From reuti at staff.uni-marburg.de Sun Jan 27 03:17:12 2008 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <73a01bf20801201920q3ac7b647q335e1cd13bec4cdb@mail.gmail.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> <73a01bf20801201920q3ac7b647q335e1cd13bec4cdb@mail.gmail.com> Message-ID: Hi, Am 21.01.2008 um 04:20 schrieb Rayson Ho: > On Jan 18, 2008 5:53 PM, Bernard Li wrote: >> -- but I'd also like to hear whether there are any general >> performance >> benefits to setting CPU affinity. Do major schedulers support this? >> Would this help with embarrassingly parallel jobs VS large MPI >> jobs on >> manycore machines? > > I am working on adding processor affinity support for serial and > parallel jobs for Grid Engine, and I am working with the OpenMPI > developers to define an interface. > > http://gridengine.sunsource.net/servlets/BrowseList? > list=dev&by=thread&from=27044 > http://www.open-mpi.org/community/lists/devel/2008/01/2949.php > http://www.open-mpi.org/community/lists/devel/2008/01/2964.php > > BTW, LSF 7.0.2 supports processor affinity for serial jobs. However, > supporting processor affinity for serial jobs is only useful when the > OS scheduler is dumb... independent from any timing benefits: it will help to prevent users to use more than the one granted slot. Some parallel libs are just forking or using threads and don't need any qrsh to spawn a parallel job. Nasty users can just use OpenMP with a thread count greater than 1 for now. -- Reuti From deadline at clustermonkey.net Mon Jan 28 10:30:43 2008 From: deadline at clustermonkey.net (Douglas Eadline) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] IB for smallish clusters In-Reply-To: References: <4799081C.10502@charter.net> Message-ID: <44749.192.168.1.1.1201545043.squirrel@mail.eadline.org> For those that long for IB on their small cluster: http://www.clustermonkey.net//content/view/222/1/ -- Doug From laytonjb at charter.net Mon Jan 28 10:39:06 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB Message-ID: <479E214A.5010005@charter.net> Just in case you've missed the announcements: http://www.clustermonkey.net//content/view/222/1/ http://www.hpcwire.com/hpc/2073649.html Enjoy! Jeff From dag at sonsorol.org Tue Jan 29 08:00:14 2008 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Sun has posted 10 HPC/Grid positions open in Regensburg and Prague Message-ID: <25DC06C3-5529-4B47-A56A-8D5399DCE05A@sonsorol.org> I *think* topical job postings are OK for this list. If not, I apologize & moderators can feel free to wipe this message ... One of the Grid Engine managers just posted a summary of open HPC/Grid positions at Sun, specifically in the Regensburg and Prague offices. For people looking to work in Germany or the Czech Republic they could be interesting positions. Via the SGE list archives, the message can be found here: http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=23265 -Chris From hahn at mcmaster.ca Wed Jan 30 05:58:18 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <479E214A.5010005@charter.net> References: <479E214A.5010005@charter.net> Message-ID: > Just in case you've missed the announcements: > http://www.clustermonkey.net//content/view/222/1/ I'm always happy about new levels pricing agression, but I'm a bit puzzled about for what kind of workloads this will matter. whenever I ask about IB bandwidth, people always point fingers at weather codes, which apparently are fond of doing the transpose in multi-dimension FFT's using all-to-all. while convenient, this seems a bit silly, since transpose is O(N) communications, not O(N^2). higher bandwidth/node also makes sense if you're configuring fairly fat nodes (many cores, probably also lots of ram). but if you do that, you also amortize the networking, so a cheaper IB setup matters less. perhaps there are some extremely file-IO intensive workloads that can sustain ~1 GB/s, but I'd expect them to require some hefty fileserving hardware, which would also hide the IB cost. IB for gaming? I have one ratio: 1e-1/3e-6. that's human reaction time versus IB latency. also, I think it's a bit disingenous to use 10G Chelsio TOE to compare, rather than 10G Myri which is cheaper and faster. also: http://www.chelsio.com/sandia_benchmark_tech.html finally, how the heck do you make Gb as slow as 120 us? -Mark "not actually anti-IB" Hahn. PS: does anyone have first-hand experience with ConnectX performance? From laytonjb at charter.net Wed Jan 30 07:34:57 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: References: <479E214A.5010005@charter.net> Message-ID: <47A09921.9030503@charter.net> Mark, Thanks for being the knucklehead that allows me to respond to you and a bunch of other knuckleheads :) >> Just in case you've missed the announcements: >> http://www.clustermonkey.net//content/view/222/1/ > > I'm always happy about new levels pricing agression, but I'm > a bit puzzled about for what kind of workloads this will matter. > > whenever I ask about IB bandwidth, people always point fingers at > weather codes, which apparently are fond of doing the transpose > in multi-dimension FFT's using all-to-all. while convenient, this > seems a bit silly, since transpose is O(N) communications, not O(N^2). > > higher bandwidth/node also makes sense if you're configuring fairly > fat nodes (many cores, probably also lots of ram). but if you do that, > you also amortize the networking, so a cheaper IB setup matters less. > > perhaps there are some extremely file-IO intensive workloads that can > sustain ~1 GB/s, but I'd expect them to require some hefty fileserving > hardware, which would also hide the IB cost. > > IB for gaming? I have one ratio: 1e-1/3e-6. that's human reaction > time versus IB latency. Mark - I know these comments are not directed at me per say, but at the community in general. My response is - test your own applications and then determine their characteristics and let us all know about what you have learned. I'm sure everyone is dying to learn. But, let me also say, that extrapolating characteristics from one code to other codes that do similar things is dead wrong. It depends on the quality of the code (as always :) ). > also, I think it's a bit disingenous to use 10G Chelsio TOE to compare, > rather than 10G Myri which is cheaper and faster. also: > http://www.chelsio.com/sandia_benchmark_tech.html Sigh... I've got a couple of emails from people on this. In general the emails revolve around a single thing "I don't like your numbers!" and some of them seem to come from vendors. I won't get into full rant mode on this one (no one would read it anyway), but let me just say, that the numbers in the table are taken from a table that is about a year old. I picked just a few numbers that I thought were worthwhile rather than reproduce the whole table. I'm sorry your favorite interconnect wasn't in there - I'm sure I'm doing you a great disservice my not including and the company is mortally wounded and lief s we know it will collapse and the energy from the collapse will be great enough to form a black hole centered on the earth which will cause a chain reaction and the entire universe will be sucked into it and that's that. I post what benchmark numbers I have and I always say that the numbers are probably not comparable between vendors, don't use them to make a decision, and please test your applications. All of the numbers are at least a year old, but I haven't had time to look or ask for updated numbers so I used the best I have clearly saying that the numbers are old. So, Mark, if you don't like the numbers I invite you, nay I beg you, to write your own interconnect article with your favorite numbers or interconnect in there. I'm sure we would all LOVE to hear what you say (that is said in a serious tone, not condescending like the rest of this email). > finally, how the heck do you make Gb as slow as 120 us? Those numbers are from some old tests. The NICs were pretty crappy to be honest. But it does represent an upper bound. But then again, as I said before, please post your own numbers to either this list (please be sure to include copious notes as I'm sure some knucklehead will chime in and complain that your numbers stink and that you should do the test a completely different way. It happens all of the time). Even better, how about writing something for ClusterMonkey? The site is for the community and we make no money what so ever from it. So giving back to the community is a worthwhile thing (IMHO) when you don't agree with anything. Enough of my rant - I could go on for days and days about the crap we see as authors trying to help the community and help people. It's truly amazing. And ranting is not the point of this list :) > > -Mark "not actually anti-IB" Hahn. How about just "anti-everything" That's seems much more appropriate. Jeff From peter.st.john at gmail.com Wed Jan 30 07:56:28 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: References: <479E214A.5010005@charter.net> Message-ID: On Jan 30, 2008 8:58 AM, Mark Hahn wrote: > > IB for gaming? I have one ratio: 1e-1/3e-6. that's human reaction > time versus IB latency. > > Not to stray off-topic, but I must defend the needs of gamers. There are e+6 pixels and the video card has to react to a very great deal, sometimes, before the player is presented with his decisecond opportunity. I've spent minutes staring at a lagged screen without being able to take an action. Math, Physics, and Computer Science are all great challanges, but nobody has a harder job than Necromancers. But the bottleneck seems to be the video card, not the network pipeline; I just mean the net's job is bigger than my reaction time. I used to think, "this 300 baud modem is great, it's faster than I can type" but I wouldn't be able to handshake with the ISP, now, with that. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080130/b87a02c7/attachment.html From Shainer at mellanox.com Wed Jan 30 09:33:47 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A56F@mtiexch01.mti.com> Dear Mark, > > Just in case you've missed the announcements: > > http://www.clustermonkey.net//content/view/222/1/ > > I'm always happy about new levels pricing agression, but I'm > a bit puzzled about for what kind of workloads this will matter. > > whenever I ask about IB bandwidth, people always point > fingers at weather codes, which apparently are fond of doing > the transpose in multi-dimension FFT's using all-to-all. > while convenient, this seems a bit silly, since transpose is > O(N) communications, not O(N^2). > > higher bandwidth/node also makes sense if you're configuring > fairly fat nodes (many cores, probably also lots of ram). > but if you do that, you also amortize the networking, so a > cheaper IB setup matters less. > > perhaps there are some extremely file-IO intensive workloads > that can sustain ~1 GB/s, but I'd expect them to require some > hefty fileserving hardware, which would also hide the IB cost. Bandwidth is one aspect of an interconnect. Other aspects are of course latency, CPU overhead etc etc. Some application will benefit from the latency, and some form bandwidth and some from a combination of all. Weather codes will show great benefits, and also CFD, rendering, bio codes (NAMD etc), Monte Carlo simulations and even mathematica users. Just a partial list. > IB for gaming? I have one ratio: 1e-1/3e-6. that's human > reaction time versus IB latency. > Oh yes... I guess you did not play for a long time. Did you? Talk with someone who suffer from lagging and you will get the story, even When he has a great video card. It's the network and the CPU overhead that are the cause of this issue > also, I think it's a bit disingenous to use 10G Chelsio TOE > to compare, rather than 10G Myri which is cheaper and faster. also: > http://www.chelsio.com/sandia_benchmark_tech.html > I really don't want to set fire here but funny that you point to such data as a proof point. If you want a serious discussion on Chelsio "testing" we can have it, probably in a different mail thread. > finally, how the heck do you make Gb as slow as 120 us? > > -Mark "not actually anti-IB" Hahn. Good one. > PS: does anyone have first-hand experience with ConnectX performance? I do, but you probably want to hear from Myricom ... :-) From landman at scalableinformatics.com Wed Jan 30 10:04:37 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784FE5A56F@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A56F@mtiexch01.mti.com> Message-ID: <47A0BC35.6030700@scalableinformatics.com> Gilad Shainer wrote: >> IB for gaming? I have one ratio: 1e-1/3e-6. that's human >> reaction time versus IB latency. >> > > Oh yes... I guess you did not play for a long time. Did you? Talk > with someone who suffer from lagging and you will get the story, even > When he has a great video card. It's the network and the CPU overhead > that are the cause of this issue Er... ah ... yeah. Milliseconds is typical in FPS games. hundreds of ms are bad. Hundreds of microseconds aren't ... ok, depends upon your FPS, I am sure the military folks have *really* fun ones which require that sort of latency. Either that, or I have played the wrong games. I thought 300 baud was enough ... ya know ... "adventure" with an acoustic coupler ... (sorry, couldn't resist ... been one of those days looking for bugs in the code, when it wasn't in the code). >> also, I think it's a bit disingenous to use 10G Chelsio TOE >> to compare, rather than 10G Myri which is cheaper and faster. also: >> http://www.chelsio.com/sandia_benchmark_tech.html >> > > I really don't want to set fire here but funny that you point to such > data > as a proof point. If you want a serious discussion on Chelsio "testing" > we > can have it, probably in a different mail thread. Well, as a big proponent of real-world application benchmarks, I would like to hear your take on this. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From hahn at mcmaster.ca Wed Jan 30 10:49:11 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784FE5A56F@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A56F@mtiexch01.mti.com> Message-ID: > Bandwidth is one aspect of an interconnect. Other aspects are > of course latency, CPU overhead etc etc. Some application will > benefit from the latency, and some form bandwidth and some > from a combination of all. Weather codes will show great benefits, > and also CFD, rendering, bio codes (NAMD etc), Monte Carlo simulations > and even mathematica users. Just a partial list. duh. >> IB for gaming? I have one ratio: 1e-1/3e-6. that's human >> reaction time versus IB latency. > > Oh yes... I guess you did not play for a long time. Did you? Talk I understand human psychophysics and I understand cluster performance. I have never spent much time playing games. it is perhaps ironic that my wife is a psych professor who _does_ actually study psychophysics and behavior, including gaming. humans really do run at only about 100 Hz, so whether the interconnect is 1 or 50 us is really really not going to make a difference. > with someone who suffer from lagging and you will get the story, even I suspect you are referring mainly to wide-area gaming, which is so entirely different as to be not comparable. it's certainly true that wide-area gaming suffers network issues, but isn't that equally obvious? cable-modem congestion, for instance, or geographic timelag has no bearing on this IB-gaming idea. > When he has a great video card. It's the network and the CPU overhead > that are the cause of this issue thanks for reiterating the obvious again. >> PS: does anyone have first-hand experience with ConnectX performance? > > I do, but you probably want to hear from Myricom ... :-) why the heck do you think I did not mean exactly what I said? I would be most interested in hearing from someone who has bought and is using a connectx cluster, especially the latency they experience. in fact, my comparison would be to our elan4 clusters. From mathog at caltech.edu Wed Jan 30 13:05:13 2008 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:06:48 2009 Subject: [Beowulf] Re: Cheap SDR IB Message-ID: Joe Landman wrote: > Gilad Shainer wrote: > > >> IB for gaming? I have one ratio: 1e-1/3e-6. that's human > >> reaction time versus IB latency. > >> > > > > Oh yes... I guess you did not play for a long time. Did you? Talk > > with someone who suffer from lagging and you will get the story, even > > When he has a great video card. It's the network and the CPU overhead > > that are the cause of this issue > > Er... ah ... yeah. Milliseconds is typical in FPS games. hundreds of > ms are bad. Hundreds of microseconds aren't ... ok, depends upon your > FPS, I am sure the military folks have *really* fun ones which require > that sort of latency. Many FPS games are still keyboard driven, and the scan rate on the keyboard is likely only on the order of 10Hz. Gaming mice scan position a lot faster though, last I looked they were closing in on 10000 data points per second. Even so, human reaction time is now, and probably will be forever, at the .1 second level, so even if that gaming mouse could record 1000 button presses a second, no gamer is ever going to be able to push that button at anywhere near that rate. IB would be massive overkill for gaming, 100 (or even 10) baseT should work just fine unless the network is hideously congested, in which case the game is probably going to become unplayable due to dropped UDP packets. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From jeff.blasius at yale.edu Wed Jan 30 13:24:57 2008 From: jeff.blasius at yale.edu (Jeff Blasius) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: References: Message-ID: On Jan 30, 2008 4:05 PM, David Mathog wrote: > Joe Landman wrote: > > Gilad Shainer wrote: > > > > >> IB for gaming? I have one ratio: 1e-1/3e-6. that's human > > >> reaction time versus IB latency. > > >> > > > > > > Oh yes... I guess you did not play for a long time. Did you? Talk > > > with someone who suffer from lagging and you will get the story, even > > > When he has a great video card. It's the network and the CPU overhead > > > that are the cause of this issue > > > > Er... ah ... yeah. Milliseconds is typical in FPS games. hundreds of > > ms are bad. Hundreds of microseconds aren't ... ok, depends upon your > > FPS, I am sure the military folks have *really* fun ones which require > > that sort of latency. > > Many FPS games are still keyboard driven, and the scan rate on the > keyboard is likely only on the order of 10Hz. Gaming mice scan position > a lot faster though, last I looked they were closing in on 10000 data > points per second. Even so, human reaction time is now, and probably > will be forever, at the .1 second level, so even if that gaming mouse > could record 1000 button presses a second, no gamer is ever going to be > able to push that button at anywhere near that rate. > > IB would be massive overkill for gaming, 100 (or even 10) baseT should > work just fine unless the network is hideously congested, in which case > the game is probably going to become unplayable due to dropped UDP packets. Yes, but put "Gaming" in front of any device name and it'll sell. Gaming mice are a good example. This is another http://www.killernic.com/ The $300 NIC. -jeff > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Jeff Blasius / jeff.blasius@yale.edu Phone: (203)432-9940 51 Prospect Rm. 011 High Performance Computing (HPC) UNIX Systems Administrator, Linux Systems Design & Support (LSDS) Yale University Information Technology Services (ITS) From hahn at mcmaster.ca Wed Jan 30 14:18:21 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: References: Message-ID: > Yes, but put "Gaming" in front of any device name and it'll sell. I think Mellanox should really look into making UV-active IB cables, or at least with blue leds ;) From bill at cse.ucdavis.edu Wed Jan 30 14:43:03 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <479E214A.5010005@charter.net> References: <479E214A.5010005@charter.net> Message-ID: <47A0FD77.7000103@cse.ucdavis.edu> Jeffrey B. Layton wrote: > Just in case you've missed the announcements: > > http://www.clustermonkey.net//content/view/222/1/ Interesting that Infiniband is getting down close to the predicted $100 per HCA price. Certainly $250 per node (HCA, cable, and switch) for 24 ports makes it easier to justify if the performance is there. So of course that begs the question, what is the performance? I noticed that the price is for the InfiniHost III Lx, does anyone know if the Ex (mentioned in the performance table) and Lx (mentioned in the price list) perform identically? Or any of the standard numbers for MPI latency and/or bandwidth for the Lx part? I googled around and found a few PDFs, but no real numbers. Some EE magazine mentioned 4us roughly, but it wasn't clear exactly what they were measuring. From rgb at phy.duke.edu Wed Jan 30 14:47:37 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: References: Message-ID: On Wed, 30 Jan 2008, David Mathog wrote: > IB would be massive overkill for gaming, 100 (or even 10) baseT should > work just fine unless the network is hideously congested, in which case > the game is probably going to become unplayable due to dropped UDP packets. And in fact most online games (e.g. World of Warcraft) are played over connections that are almost certainly slower than 1.5 Mbps -- my boys play (two or three at a time) over a shared DSL link at 768 kbps. It can be laggy if anyone (like me, yum updating a host or their mother working on her EMR) hogs the link but otherwise it is playable. Major lag in gametime is usually due to bottlenecks elsewhere. They accomplish this by putting movement in a lagged map of the "universe" on your local machine, and updating object data across the slow link as fast as possible to keep the Universe views of everybody in any given visual field in sync (PC and NPC alike). You can tell when certain network outages occur because you can e.g. move your character around but nothing happens and all the other players disappear. There are a few orders of magnitude difference between DSL (acceptable when it isn't too congested due to OTHER people using the line) and GigE, let alone IB. GigE >>or<< IB might well be able to run the gaming universe on the >>server<< and just render the display on the host without too much lag, but that's very different from what real RPGs do. rgb > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From Shainer at mellanox.com Wed Jan 30 15:20:43 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <47A0FD77.7000103@cse.ucdavis.edu> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A60B@mtiexch01.mti.com> > > Just in case you've missed the announcements: > > > > http://www.clustermonkey.net//content/view/222/1/ > > Interesting that Infiniband is getting down close to the > predicted $100 per HCA price. Certainly $250 per node (HCA, > cable, and switch) for 24 ports makes it easier to justify if > the performance is there. So of course that begs the > question, what is the performance? > > I noticed that the price is for the InfiniHost III Lx, does > anyone know if the Ex (mentioned in the performance table) > and Lx (mentioned in the price list) perform identically? Or > any of the standard numbers for MPI latency and/or bandwidth > for the Lx part? > > I googled around and found a few PDFs, but no real numbers. > Some EE magazine mentioned 4us roughly, but it wasn't clear > exactly what they were measuring. > Mellanox has 4 IB families - InfiniHost, InfiniHost III Lx, InfiniHost III Ex and ConnectX. InfiniHost is a PCI-X device and the other 3 are PCIe based. The Ex and LX support IB DDR and ConnectX supports IB QDR and PCIe Gen2. >From latency point of view, the LX is a 3.5-3.3us MPI latency (SDR/DDR), the Ex is 2.6-2.3us (SDR/DDR) and ConnectX is 1.2-1us (PCIe Gen1/Gen2). For BW, Lx provides ~1400MB/s, EX is ~1500MB/s and ConnectX is ~1900MB/s uni-directional on PCIe Gen2. Feel free to contact me directly for more info. Gilad. From Shainer at mellanox.com Wed Jan 30 15:30:11 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A60D@mtiexch01.mti.com> > I would be most interested in hearing from someone who has > bought and is using a connectx cluster, especially the > latency they experience. > in fact, my comparison would be to our elan4 clusters. > You can run your own tests to measure the latency and other characteristics of ConnectX on Mellanox Cluster Center. Mellanox provide cluster access free of charge to anyone who want to benchmark InfiniBand. More info and the run-time request form in on Mellanox web site - http://www.mellanox.com/applications/clustercenter.php Gilad From pherrero at fi.upm.es Tue Jan 29 07:19:28 2008 From: pherrero at fi.upm.es (Pilar Herrero) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] OTM'08: CALL FOR WORKSHOP PROPOSALS Message-ID: <479F4400.2020802@fi.upm.es> *************************************************************** CALL FOR WORKSHOP PROPOSALS OnTheMove OTM Federated Conferences and Workshops 2008 (OTM'08) 9-14th November, 2008 Monterrey, Mexico http://www.cs.rmit.edu.au/fedconf Proceedings will be published by Springer Verlag *************************************************************** The 14 workshops of OTM'06 as well as the 11 workshops of OTM'07 were a real success as hundreds of researchers converged through the presentation of interesting ideas in several domains relevant to the themes of distributed, meaningful and ubiquitous computing and information systems. Proposals for new workshops are presently solicited for affiliation with OTM 2008. The main goal of the OTM 2008 workshops is to stimulate and facilitate an active exchange, interaction and comparison of new approaches and methods. OTM'08 provides an opportunity for a highly diverse body of researchers and practitioners by federating five successful related and complementary conferences: * GADA'08 (International Conference on Grid computing, high-performAnce and Distributed Applications) * CoopIS'08 (International Conference on Cooperative Information Systems) * DOA'08 (International Symposium on Distributed Objects and Applications) * ODBASE'08 (International Conference on Ontologies, DataBases, and applications of Semantics) * IS'08 (Information Security Symposium) OTM'08 especially encourages proposals that are related to the OnTheMove themes. The format of each workshop is to be determined by the organisers. Please consult the formats of the workshops held in the previous editions for examples of successful workshops: * http://www.cs.rmit.edu.au/fedconf/2007/index.html?page=persys2007cfp * http://www.cs.rmit.edu.au/fedconf/2006/index.html?page=is2006cfp * http://www.cs.rmit.edu.au/fedconf/2005/cams2005cfp.html SUBMISSION: ----------- Researchers and practitioners are invited to submit workshop proposals to the OTM 2008 Workshop Chair: Pilar Herrero (pherrero@fi.upm.es) no later than *** Monday, February 18, 2008 *** Submission should be made by e-mail (ASCII/PS/PDF/DOC format are accepted) using "OTM Workshop Proposal Submission" as the email subject. Prospective organizers are also encouraged to discuss with the Workshops Chair prior to submitting proposals. PROPOSAL CONTENTS: ------------------ In order to make it easier to evaluate your proposal, it would be greatly appreciated if in crafting your proposal you could include the following information: * A brief technical description of the workshop, specifying the workshop goals and the technical issues that will be its focus. * A brief discussion of why and to whom the workshop is of interest. * A list of related workshops held within the last two years, if any, and their relation to the proposed workshop. * If applicable, detailed information about previous editions of the same workshop (e.g., number of submissions, number of attendees). * A preliminary call for participation/papers. * The names, postal addresses, phone numbers, and email addresses of the proposed workshop organizing committee. * The name of the primary contact for the organizing committee; an email address of this person should be given. * A description of the qualifications of the individual committee members with respect to organizing scientific events, including a list of workshops previously arranged by any members of the proposed organizing committee, if any. * The Advertising procedure: How do the committee members plan to advertise their workshop * A brief description of how the proposed workshop could complement the four main conferences scopes. SELECTION CRITERIA: ------------------- The selection of the workshops to be included in the final OTM 2008 program will be based upon a number of factors, including: the scientific/technical interest of the topics, the quality of the proposal, the need to avoid strictly overlapping workshops, and the unavoidable need to limit the overall number of selected workshops. ORGANIZERS' RESPONSABILITIES: ----------------------------- Workshop organizers will be responsible for the following: * Making a Web site for the workshop to be located in the main web site http://www.cs.rmit.edu.au/fedconf (All the conferences and workshops MUST be located in this main web site) * Advertising the workshop and issuing a call for participation/papers. * Collecting submissions, notifying acceptances in due time, and ensuring a transparent and fair selection process. All workshop organizers commit themselves to adopt the same deadlines for submissions and notifications of acceptance. * Ensuring that the workshop organizers and the participants get registered to the workshop and are invited to register to the main conference. * At least one PC co-chair per OTM Workshop MUST be registered and present for the whole event. OTM'08 reserves the right to cancel any workshop if the above responsibilities are not fulfilled or if too few attendees register for the workshop. WORKSHOPS PROCEEDINGS: ---------------------- Papers accepted by the workshops are likely to be published as a joint volume of Lecture Notes in Computer Science (LNCS) by Springer. We look forward to your support in making OTM?08 workshops the most exciting one. *********************************************************************** Send proposals (in ASCII/PS/PDF/DOC format) and inquiries via email to: Pilar Herrero (pherrero@fi.upm.es) using "OTM Workshop Proposal Submission" as the email subject. ******** Deadline: Monday, February 18, 2008 ************ *********************************************************************** From forum.san at gmail.com Wed Jan 30 08:03:22 2008 From: forum.san at gmail.com (Sangamesh B) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] For CPMD users Message-ID: Hi CPMD users, With a CPMD parallel job, I'm getting a Segmentation Fault error. Let me explain what I did. Installed MPICH with Intel Compilers. Configure looks as follows: ./configure --prefix=/opt/MPI_LIBS/MPICH-Intel -cc=/opt/intel/cce/10.1.008/bin/icc -fc=/opt/intel/fce/10.1.008/bin/ifort --enable-f77 --with-device=ch_p4 --with-arch=LINUX When I run a CPMD job with 1-4 proceses, the job is getting killed and gives following error: # mpirun -machinefile /export/M4 -np 4 ./cpmd.x /opt/APPLICATIONS/CPMD/singlemol.input > single4.out Killed by signal 2. forrtl: error (69): process interrupted (SIGINT) Killed by signal 2. Killed by signal 2. If only one process is used and without redirection, the following error occurred: p0_16857: p4_error: interrupt SIGSEGV: 11 Can anybody explain what might be the cause for this? regards, Sangamesh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080130/e93d4eec/attachment.html From atchley at myri.com Wed Jan 30 19:32:20 2008 From: atchley at myri.com (Scott Atchley) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784FE5A60B@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A60B@mtiexch01.mti.com> Message-ID: On Jan 30, 2008, at 6:20 PM, Gilad Shainer wrote: > For BW, Lx provides ~1400MB/s, EX is ~1500MB/s and ConnectX is > ~1900MB/s > uni-directional on PCIe Gen2. > > Feel free to contact me directly for more info. > > Gilad. My god, IB bandwidths always confuse me. :-) I thought IB SDR was 10 GB/s signal rate and 8 Gb/s data rate. How do you squeeze ~1400 MB/s out of 8 Gb/s? I see you offer Lx cards in PCIe 4x and 8x. Again, PCIe is encoded at 10bit/8bit so the data rate is 8 Gb/s. So the above value is for your 8x cards only, no? The thread is about your 4x cards, no? Are there many PCIe Gen2 motherboards yet? Scott From eugen at leitl.org Wed Jan 30 23:31:08 2008 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: References: Message-ID: <20080131073108.GT10128@leitl.org> On Wed, Jan 30, 2008 at 01:05:13PM -0800, David Mathog wrote: > IB would be massive overkill for gaming, 100 (or even 10) baseT should > work just fine unless the network is hideously congested, in which case > the game is probably going to become unplayable due to dropped UDP packets. Modern games are moving towards realtime large-scale physical simulations, and there's a natural mapping of the terrain to a 2d grid (torus) or 3d grid of nodes. (Unfortunately, e.g. Second Life seems to be written in .Net (Mono) and partitioned by virtual servers, so no MPI there). -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From Hakon.Bugge at scali.com Thu Jan 31 01:50:50 2008 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <200801302001.m0UK0UCS015867@bluewest.scyld.com> References: <200801302001.m0UK0UCS015867@bluewest.scyld.com> Message-ID: <20080131095052.EB94635B03D@mail.scali.no> At 21:01 30.01.2008, Mark Hahn wrote: >whenever I ask about IB bandwidth, people always point fingers >at weather codes, which apparently are fond of doing the transpose >in multi-dimension FFT's using all-to-all. while convenient, this >seems a bit silly, since transpose is O(N) communications, not O(N^2). Mark, interconnect does matter. Here is a solid benchmark using WRF, 128 cores, Woodcrest 3.00GHz. System spec can be found at http://www.spec.org/mpi2007/results/res2007q4/mpi2007-20071013-00029.html Scali MPI Connect 5.6.2 using IB (IB as specified in the link above): Success 127.wrf2 base mref ratio 21.03, runtime 370.653066 Scali MPI Connect 5.6.2 using Gbe (Broadcom NetXtreme II BCM5708 1000Base-T (B2), Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.3.29, 9k MTU, Switch? Jeff, can you fill in here, the system should be familiar to you): Success 127.wrf2 base mref ratio=4.82, runtime=1618.248048 That's a pretty decent advantage to IB, isn't? Thanks, Hakon From i.kozin at dl.ac.uk Thu Jan 31 04:20:43 2008 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <20080131095052.EB94635B03D@mail.scali.no> Message-ID: I thought the thrust of the original post was that you can build now a cheap IB cluster with up to 24 nodes. The subsequent discussion was around questioning whether you need IB for up to 16-24 nodes. The advantage you point to is for 32 nodes. There is no question that IB is much better at this scaling point for many codes, not just WRF. I used to think that the typical break point is about 16 nodes. We have a lot of app data on our web site which confirm this and will be looking into how (if at all) quad cores change this. As far as the games are concerned never underestimate what people can do if sufficient and affordable resources are provided. If the opposite were true we'd be still staring at black and white/green screens and playing packman. Game developers have to offload a lot (too much in fact) to a client out of necessity, not because they want to (BTW, there are amazing exploits because of the lags and the way the clients are built). Having said that there is no question that IB is hardly applicable to MMORPGs on the scale they are, probably not even for LAN parties. However a small cluster for real time ray tracing might be a good proposition. Igor -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of H?kon Bugge Sent: 31 January 2008 09:51 To: Mark Hahn Cc: Beowulf Mailing list Subject: Re: [Beowulf] Cheap SDR IB At 21:01 30.01.2008, Mark Hahn wrote: >whenever I ask about IB bandwidth, people always point fingers >at weather codes, which apparently are fond of doing the transpose >in multi-dimension FFT's using all-to-all. while convenient, this >seems a bit silly, since transpose is O(N) communications, not O(N^2). Mark, interconnect does matter. Here is a solid benchmark using WRF, 128 cores, Woodcrest 3.00GHz. System spec can be found at http://www.spec.org/mpi2007/results/res2007q4/mpi2007-20071013-00029.html Scali MPI Connect 5.6.2 using IB (IB as specified in the link above): Success 127.wrf2 base mref ratio 21.03, runtime 370.653066 Scali MPI Connect 5.6.2 using Gbe (Broadcom NetXtreme II BCM5708 1000Base-T (B2), Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.3.29, 9k MTU, Switch? Jeff, can you fill in here, the system should be familiar to you): Success 127.wrf2 base mref ratio=4.82, runtime=1618.248048 That's a pretty decent advantage to IB, isn't? Thanks, Hakon _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Thu Jan 31 04:29:33 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: <20080131073108.GT10128@leitl.org> References: <20080131073108.GT10128@leitl.org> Message-ID: On Thu, 31 Jan 2008, Eugen Leitl wrote: > On Wed, Jan 30, 2008 at 01:05:13PM -0800, David Mathog wrote: > >> IB would be massive overkill for gaming, 100 (or even 10) baseT should >> work just fine unless the network is hideously congested, in which case >> the game is probably going to become unplayable due to dropped UDP packets. > > Modern games are moving towards realtime large-scale physical simulations, > and there's a natural mapping of the terrain to a 2d grid (torus) or 3d > grid of nodes. (Unfortunately, e.g. Second Life seems to be written > in .Net (Mono) and partitioned by virtual servers, so no MPI there). Yeah, and Second Life is pig-dog-slow over DSL, as well. I don't think it puts much of a virtual world on your PC and relies on actually rendering information sent from the servers. Not enough bandwidth or server time in the world yet to make that work particularly well except to e.g. University workstations with minimum 45 Mbps bottlenecks in between... rgb > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From hahn at mcmaster.ca Thu Jan 31 06:09:35 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <20080131095052.EB94635B03D@mail.scali.no> References: <200801302001.m0UK0UCS015867@bluewest.scyld.com> <20080131095052.EB94635B03D@mail.scali.no> Message-ID: >> whenever I ask about IB bandwidth, people always point fingers >> at weather codes, which apparently are fond of doing the transpose >> in multi-dimension FFT's using all-to-all. while convenient, this >> seems a bit silly, since transpose is O(N) communications, not O(N^2). > > Mark, > > interconnect does matter. I did not claim the opposite - I said that for small, cost-sensitive clusters, it would be unusual to need IB's advantages (high bandwidth and latency comparable to other non-Gb interconnects.) in particular, I'm curious about the conventional wisdom about weather codes and bandwidth. > Here is a solid benchmark using WRF, 128 cores, > Woodcrest 3.00GHz. System spec can be found at > http://www.spec.org/mpi2007/results/res2007q4/mpi2007-20071013-00029.html > > Scali MPI Connect 5.6.2 using IB (IB as specified in the link above): > Success 127.wrf2 base mref ratio 21.03, runtime 370.653066 I was curious about this: you only used one DDR port; was that because of lack of switch ports, or because WRF uses bandwidth <= DDR? > Scali MPI Connect 5.6.2 using Gbe (Broadcom NetXtreme II BCM5708 1000Base-T > (B2), Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.3.29, 9k MTU, > Switch? Jeff, can you fill in here, the system should be familiar to you): > Success 127.wrf2 base mref ratio=4.82, runtime=1618.248048 > > That's a pretty decent advantage to IB, isn't? sure, and these are very fat nodes for which a fat interconnect is appropriate for almost any workload that's not embarassing. but really I wasn't suggesting that plain old Gb (bandwidth in particular) was adequate for all possible clusters. I was questioning whether IB was a panacea for small, cost-sensitive ones... From landman at scalableinformatics.com Thu Jan 31 06:46:53 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: References: <200801302001.m0UK0UCS015867@bluewest.scyld.com> <20080131095052.EB94635B03D@mail.scali.no> Message-ID: <47A1DF5D.4010805@scalableinformatics.com> Mark Hahn wrote: > sure, and these are very fat nodes for which a fat interconnect is > appropriate for almost any workload that's not embarassing. but really > I wasn't suggesting that plain old Gb (bandwidth in particular) was > adequate for all possible clusters. I was questioning whether IB was a > panacea for small, cost-sensitive ones... Cheap gigabit is fine for small cost sensitive clusters. You can get cheap (not great, but ok) gigabit switches with 48 ports for under $700 today. They are not as fast as the higher cost ones from HP and others, but they are great for inexpensive clusters. At the small cluster side of things, the cost per core and cost per node (fully burdened with switches, cables, OS, compilers, etc) is very important. At these prices for some small clusters, the cost to add IB is no longer completely prohibative. But, at the same time, the benefit needs to outweigh the costs. I would argue that the more interesting small clusters with IB probably won't be used for message passing, but for storage using NFSoverRDMA to move large chunks of data back and forth. There you get 5-8x better performance on your data xfer from storage than you get with gigabit. For *some* workloads (life science "database" analysis , large image processing, ...) this could be quite important. Most of the benchmark results (life science codes, chemistry codes, engineering codes, ...) I have seen/worked on don't show a huge difference between gigabit and IB until you get north of 32 cores. Of course, that is 2-4 nodes these days ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From ascheinine at tuffmail.us Thu Jan 31 07:10:01 2008 From: ascheinine at tuffmail.us (Alan Louis Scheinine) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: References: <200801302001.m0UK0UCS015867@bluewest.scyld.com> <20080131095052.EB94635B03D@mail.scali.no> Message-ID: <47A1E4C9.6030705@tuffmail.us> With regard to weather codes. I looked at a program for local forecasting. Just six or eight computational nodes are used. The grid of physical data is not very dense because the initial conditions do not have high spatial resolution. The consequence is that each subdomain has alot of surface area, that is, alot of communication. Moreover, there are many variables, temperature and pressure for diffusion, momentum and pressure for convection, humidity and ground conditions. The typical structure is a simple finite-difference block then exchange of data, then a finite-difference block on another group of variables and on and on. There is no "kernel" of calculation but rather the code is "flat". The result is that alot of time is spent waiting for the exchange of data at the boundaries of subdomains on different compute nodes. Best regards, Alan Scheinine Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From peter.st.john at gmail.com Thu Jan 31 08:03:00 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] For CPMD users In-Reply-To: References: Message-ID: Sangamesh, If it turns out that you need to recompile with the debugging symbol table, and then use a symbolic debugger to examine the "core" file against the application source, and you've never done that before, then drop me a line (but I would not be up-to-date about your compiler or your debugger). However, we are hoping someone has a better, easier response for you :-) Good luck, Peter On Jan 30, 2008 11:03 AM, Sangamesh B wrote: > Hi CPMD users, > > With a CPMD parallel job, I'm getting a Segmentation Fault error. > > Let me explain what I did. > > Installed MPICH with Intel Compilers. Configure looks as follows: > > ./configure --prefix=/opt/MPI_LIBS/MPICH-Intel -cc=/opt/intel/cce/10.1.008/bin/icc > -fc=/opt/intel/fce/10.1.008/bin/ifort --enable-f77 --with-device=ch_p4 > --with-arch=LINUX > > > When I run a CPMD job with 1-4 proceses, the job is getting killed and > gives following error: > > # mpirun -machinefile /export/M4 -np 4 ./cpmd.x > /opt/APPLICATIONS/CPMD/singlemo l.input > single4.out > Killed by signal 2. > forrtl: error (69): process interrupted (SIGINT) > Killed by signal 2. > Killed by signal 2. > > If only one process is used and without redirection, the following error > occurred: > > p0_16857: p4_error: interrupt SIGSEGV: 11 > > Can anybody explain what might be the cause for this? > > regards, > Sangamesh > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080131/b8ed7a0a/attachment.html From deadline at eadline.org Thu Jan 31 08:34:11 2008 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: References: <20080131073108.GT10128@leitl.org> Message-ID: <59768.192.168.1.1.1201797251.squirrel@mail.eadline.org> Look if you want to do on-line gaming right you need to bypass the physical input devices and jack directly into the brain. That is unless I'm already sitting in a chair (or pod) and jacked into this reality. Now were did I put my cool sunglasses :-) -- Doug > On Thu, 31 Jan 2008, Eugen Leitl wrote: > >> On Wed, Jan 30, 2008 at 01:05:13PM -0800, David Mathog wrote: >> >>> IB would be massive overkill for gaming, 100 (or even 10) baseT should >>> work just fine unless the network is hideously congested, in which case >>> the game is probably going to become unplayable due to dropped UDP >>> packets. >> >> Modern games are moving towards realtime large-scale physical >> simulations, >> and there's a natural mapping of the terrain to a 2d grid (torus) or 3d >> grid of nodes. (Unfortunately, e.g. Second Life seems to be written >> in .Net (Mono) and partitioned by virtual servers, so no MPI there). > > Yeah, and Second Life is pig-dog-slow over DSL, as well. I don't think > it puts much of a virtual world on your PC and relies on actually > rendering information sent from the servers. Not enough bandwidth or > server time in the world yet to make that work particularly well except > to e.g. University workstations with minimum 45 Mbps bottlenecks in > between... > > rgb > >> >> > > -- > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:47a1bff969671446633523! > -- Doug From rgb at phy.duke.edu Thu Jan 31 09:42:14 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: Cheap SDR IB In-Reply-To: <59768.192.168.1.1.1201797251.squirrel@mail.eadline.org> References: <20080131073108.GT10128@leitl.org> <59768.192.168.1.1.1201797251.squirrel@mail.eadline.org> Message-ID: On Thu, 31 Jan 2008, Douglas Eadline wrote: > > Look if you want to do on-line gaming right > you need to bypass the physical input devices > and jack directly into the brain. That is > unless I'm already sitting in a chair (or pod) > and jacked into this reality. Now were did I > put my cool sunglasses :-) Yeah, I keep waiting for the transducers, but they never quite appear. Still using fingers to type, still using eyes to see. I do so want a mobile neural interface backed by a small mountain of processors and a petabyte or so of RAID. Transparent overlay of normal vision, noise suppression capabilities of the ears, a few hundred movies, six month's worth of music, all the e-books in the existence (all with a "dream mode" where content is delivered at very low levels when I'm sleeping) AND the the reverse ability to record dreams and create things in my sleep. Literally. rgb > > -- > Doug > >> On Thu, 31 Jan 2008, Eugen Leitl wrote: >> >>> On Wed, Jan 30, 2008 at 01:05:13PM -0800, David Mathog wrote: >>> >>>> IB would be massive overkill for gaming, 100 (or even 10) baseT should >>>> work just fine unless the network is hideously congested, in which case >>>> the game is probably going to become unplayable due to dropped UDP >>>> packets. >>> >>> Modern games are moving towards realtime large-scale physical >>> simulations, >>> and there's a natural mapping of the terrain to a 2d grid (torus) or 3d >>> grid of nodes. (Unfortunately, e.g. Second Life seems to be written >>> in .Net (Mono) and partitioned by virtual servers, so no MPI there). >> >> Yeah, and Second Life is pig-dog-slow over DSL, as well. I don't think >> it puts much of a virtual world on your PC and relies on actually >> rendering information sent from the servers. Not enough bandwidth or >> server time in the world yet to make that work particularly well except >> to e.g. University workstations with minimum 45 Mbps bottlenecks in >> between... >> >> rgb >> >>> >>> >> >> -- >> Robert G. Brown Phone(cell): 1-919-280-8443 >> Duke University Physics Dept, Box 90305 >> Durham, N.C. 27708-0305 >> Web: http://www.phy.duke.edu/~rgb >> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> !DSPAM:47a1bff969671446633523! >> > > > -- > Doug > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From Shainer at mellanox.com Thu Jan 31 09:54:10 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <488039019.20080131084823@gmx.net> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A677@mtiexch01.mti.com> Donnerstag, 31. Januar 2008, meintest Du: SA> On Jan 30, 2008, at 6:20 PM, Gilad Shainer wrote: >> For BW, Lx provides ~1400MB/s, EX is ~1500MB/s and ConnectX is >> ~1900MB/s >> uni-directional on PCIe Gen2. >> Feel free to contact me directly for more info. >> Gilad. SA> My god, IB bandwidths always confuse me. :-) SA> I thought IB SDR was 10 GB/s signal rate and 8 Gb/s data rate. How do SA> you squeeze ~1400 MB/s out of 8 Gb/s? The 1400 MB/ are probably for DDR mode which is 20 GB/s signal rate and 16 GB/s data rate. SA> I see you offer Lx cards in PCIe 4x and 8x. Again, PCIe is encoded at SA> 10bit/8bit so the data rate is 8 Gb/s. So the above value is for your SA> 8x cards only, no? The thread is about your 4x cards, no? Values are for x8 cards. I tested the x4 cards a few months ago and what i measured was: - Latency was not affected by the slower PCIe connection - Bandwidth dropped to 700-750 MB/s where it is 900 to 950 with a x8 SDR card. Gilad: Lx is been used in PCIe x8 and x4, and in the PCIe x8 adapters - you can find IB SDR and IB DDR. The cards mentioned in the article are the IB SDR PCIe x4 with provide the same low latency, and the BW is limited by the PCIe to ~750MB/s. With the recent chipsets it is little bit higher. (Intel MPI Benchmarks, pingpong) It depends on your application if the lower bandwidth affects performance. Regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080131/3d063f26/attachment.html From Shainer at mellanox.com Thu Jan 31 10:09:39 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com> > > I thought the thrust of the original post was that you can > build now a cheap IB cluster with up to 24 nodes. The > subsequent discussion was around questioning whether you need > IB for up to 16-24 nodes. > The advantage you point to is for 32 nodes. There is no > question that IB is much better at this scaling point for > many codes, not just WRF. > I used to think that the typical break point is about 16 > nodes. We have a lot of app data on our web site which > confirm this and will be looking into how (if at all) quad > cores change this. > With more cores on a single node, the IB benefits are seen in much lower number of nodes. I am testing some applications on a new cluster that I have (dual sockets quad core Barcelona), and my first results are with Fluent new benchmarks. I will have the numbers posted soon, so you all can take a look. For 2 nodes, IB shows an average of 15-20% higher performance then GigE, and this gap gets bigger with cluster size. At 4 nodes the difference was 40-50%. Even more important, 3 nodes results with IB were higher then 8 nodes with GigE, and GigE stop scaling after 3-4 nodes (performance numbers were flat after 3-4 nodes). This is only one example, I know, but I am sure there will be many more. I personally going to check it on more applications, and would appreciate any suggestion on other applications people have interest to check. > As far as the games are concerned never underestimate what > people can do if sufficient and affordable resources are > provided. If the opposite were true we'd be still staring at > black and white/green screens and playing packman. Game > developers have to offload a lot (too much in fact) to a > client out of necessity, not because they want to (BTW, there > are amazing exploits because of the lags and the way the > clients are built). Having said that there is no question > that IB is hardly applicable to MMORPGs on the scale they > are, probably not even for LAN parties. However a small > cluster for real time ray tracing might be a good proposition. > > Igor > From xclski at yahoo.com Wed Jan 30 19:01:17 2008 From: xclski at yahoo.com (Ellis Wilson) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB Message-ID: <498137.43833.qm@web37901.mail.mud.yahoo.com> David Mathog wrote: > Joe Landman wrote: > >> Gilad Shainer wrote: >> >> >>>> IB for gaming? I have one ratio: 1e-1/3e-6. that's human >>>> reaction time versus IB latency. >>>> >>>> >>> Oh yes... I guess you did not play for a long time. Did you? Talk >>> with someone who suffer from lagging and you will get the story, even >>> When he has a great video card. It's the network and the CPU overhead >>> that are the cause of this issue >>> >> Er... ah ... yeah. Milliseconds is typical in FPS games. hundreds of >> ms are bad. Hundreds of microseconds aren't ... ok, depends upon your >> FPS, I am sure the military folks have *really* fun ones which require >> that sort of latency. >> > > Many FPS games are still keyboard driven, and the scan rate on the > keyboard is likely only on the order of 10Hz. Gaming mice scan position > a lot faster though, last I looked they were closing in on 10000 data > points per second. Even so, human reaction time is now, and probably > will be forever, at the .1 second level, so even if that gaming mouse > could record 1000 button presses a second, no gamer is ever going to be > able to push that button at anywhere near that rate. > > IB would be massive overkill for gaming, 100 (or even 10) baseT should > work just fine unless the network is hideously congested, in which case > the game is probably going to become unplayable due to dropped UDP packets. > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > Hate to jump in on this one since its rapidly approaching "dead horse" level, however I have to agree (though without any nice numeric evidence to back) that the very vast majority of gamers will not benefit from this level of latency or bandwidth whatsoever. This largely is because a vast majority play while connected through their ISP, not during a "LAN party" or the like. Even at the few occasions of "LAN parties", where the advantages of IB would be (in theory) realized, many of these are simply for companionship and the advantages of natural communication, but still play on a server connected through the ISP. Thus, even to kill the person beside you, a packet would need to travel to the ISP, then the server, any number of intermediate hops then back to you and the opponent. Obviously, the cost of these traversal greatly outweighs the cost of it coming in through your modem and being routed to your particular PC. The only interest of mine (because I am unaware to the differences in costs) is the benefit of running a NIC that has the lowest processing overhead. It could be very possible that the simpler, older NICs would out-perform the more complicated interconnects because your frames per second would be somewhat better, having more PCU resources oriented towards the game. Again, since I do not have numbers or knowledge on the specifics of various interconnects and their local NICs costs, this is simply speculation. Though I should chime in since my generation is typified as being addicted to computer games :), Ellis ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping From mark.kosmowski at gmail.com Thu Jan 31 07:40:01 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Re: For CPMD users Message-ID: Sangamesh: I am by no means an expert with either clustering or CPMD, but am learning both. I am using OpenMPI, not MPICH, but can relate some things that I would look for. 1) First, have other CPMD parellel jobs worked correctly on the same nodes with the same executable? 2) Does the cpmd executable work for this input file on a single processor (i.e. not calling it as an mpich job)? >From 1 and 2 you can determine if you have an input file issue or a parellelization issue. 3) Does calling the "hostname" command using the same MPICH configuration return the expected result? My cluster is three dual Opteron machines - if they were named Node1, Node2 and Node3 and I ran hostname using two processors on each of the three machines I would expect to see: "Node1; Node1; Node2; Node2; Node3; Node3" where the semi-colons are actually line breaks. 4) Can all of the nodes freely talk to one another (i.e. if using ssh, can each node ssh correctly to every other node)? 5) Where does the cpmd output file terminate? If it can't find the pseudopotentials, you may not be properly passing PP_LIBRARY_PATH to the mpich call of cpmd. Good luck, Mark E. Kosmowski > > Hi CPMD users, > > With a CPMD parallel job, I'm getting a Segmentation Fault error. > > Let me explain what I did. > > Installed MPICH with Intel Compilers. Configure looks as follows: > > ./configure --prefix=/opt/MPI_LIBS/MPICH-Intel > -cc=/opt/intel/cce/10.1.008/bin/icc > -fc=/opt/intel/fce/10.1.008/bin/ifort --enable-f77 --with-device=ch_p4 > --with-arch=LINUX > > > When I run a CPMD job with 1-4 proceses, the job is getting killed and > gives following error: > > # mpirun -machinefile /export/M4 -np 4 ./cpmd.x > /opt/APPLICATIONS/CPMD/singlemol.input > single4.out > Killed by signal 2. > forrtl: error (69): process interrupted (SIGINT) > Killed by signal 2. > Killed by signal 2. > > If only one process is used and without redirection, the following error > occurred: > > p0_16857: p4_error: interrupt SIGSEGV: 11 > > Can anybody explain what might be the cause for this? > > regards, > Sangamesh > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://www.scyld.com/pipermail/beowulf/attachments/20080130/e93d4eec/attachment-0001.html > > ------------------------------ > From richard.walsh at comcast.net Thu Jan 31 10:48:31 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB Message-ID: <013120081848.1710.47A217FE000E3CFA000006AE2200763704089C040E99D20B9D0E080C079D@comcast.net> Gilad Shainer wrote: > With more cores on a single node, the IB benefits are seen in much lower number > of nodes. I am testing some applications on a new cluster that I have (dual > sockets quad core Barcelona), and my first results are with Fluent new > benchmarks. I will have the numbers posted soon, so you all can take a look. For > 2 nodes, IB shows an average of 15-20% higher performance then GigE, and this > gap gets bigger with cluster size. At 4 nodes the difference was 40-50%. Even > more important, 3 nodes results with IB were higher then 8 nodes with GigE, and > GigE stop scaling after 3-4 nodes (performance numbers were flat after 3-4 > nodes). Interesting. I am guessing this is with ConnectX adapters and PCIe Gen2. When does the InfiniBand technology used in this test flatten? How about the less expensive SDR technology referred to in Jeff's article? Can you provide the curves up to 8 nodes for both IB and GE in this test? A quick analysis, limited to this Fluent test, suggests that 4 nodes plus ConnectX technology roughly equals 8 nodes plus on-board GE technology. Can you provide the system per node price differences? Then we can rougly determine the cost benefit relationship. Regards, rbw -- "Making predictions is hard, especially about the future." Niels Bohr -- Richard Walsh Thrashing River Consulting-- 5605 Alameda St. Shoreview, MN 55126 Phone #: 612-382-4620 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080131/5c0ee3d2/attachment.html From Shainer at mellanox.com Thu Jan 31 11:32:56 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <013120081848.1710.47A217FE000E3CFA000006AE2200763704089C040E99D20B9D0E080C079D@comcast.net> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A6AB@mtiexch01.mti.com> Richard Walsh wrote: >> With more cores on a single node, the IB benefits are seen in much lower number >> of nodes. I am testing some applications on a new cluster that I have (dual >> sockets quad core Barcelona), and my first results are with Fluent new >> benchmarks. I will have the numbers posted soon, so you all can take a look. For >> 2 nodes, IB shows an average of 15-20% higher performance then GigE, and this >> gap gets bigger with cluster size. At 4 nodes the difference was 40-50%. Even >> more important, 3 nodes results with IB were higher then 8 nodes with GigE, and >> GigE stop scaling after 3-4 nodes (performance numbers were flat after 3-4 >> nodes). > Interesting. I am guessing this is with ConnectX adapters and PCIe Gen2. When does > the InfiniBand technology used in this test flatten? How about the less expensive SDR > technology referred to in Jeff's article? Can you provide the curves up to 8 nodes for > both IB and GE in this test? > > A quick analysis, limited to this Fluent test, suggests that 4 nodes plus ConnectX > technology roughly equals 8 nodes plus on-board GE technology. Can you provide > the system per node price differences? Then we can rougly determine the cost > benefit relationship. In my testing I have used ConnectX and PCIe Gen1. I will have the numbers posted soon and will send a note to this list once done. The major take was that GigE did not scale beyond 3 servers, so adding more nodes did not provide more performance (Fluent rating stayed the same from 3 nodes with GigE and above). The IB shows almost linear scaling as we have added more nodes. The less expensive SDR technology should follow the same lines, as latency was more important in those benchmarks, so the numbers should be in the same range. For 2 nodes only the performance gap was 15-20%. If I am taking the safe approach and assuming 10-15% gap with the low cost SDR. The servers will cost you ~$8K, the IB part (for 2 nodes you don't need a switch) according to the pricing in Jeffs is <$300 (2 adapters and one cable) which is less than 4% added cost. Of course when you scale beyond 2 nodes, the cost/performance advantage with IB increases dramatically (as GigE performance is flat beyond 3 servers). From Shainer at mellanox.com Thu Jan 31 12:12:03 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <47A228F2.1070309@physics.isu.edu> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FE5A6C2@mtiexch01.mti.com> > With more cores on a single node, the IB benefits are seen in much lower number of nodes. I am testing some applications on a new cluster that I have (dual sockets quad core Barcelona), and my first results are with Fluent new benchmarks. I will have the numbers posted soon, so you all can take a look. For 2 nodes, IB shows an average of 15-20% higher performance then GigE, and this gap gets bigger with cluster size. A quick side question. Is it possible to use IB as a cross-over with no switch? If I had just 2 fat nodes could I connect the HCAs directly to each other and avoid the switch costs? Could this be extended to ring or hypercube topologies? Gilad: Yes, for 2 nodes you don't need a switch, just connect the HCAs directly. For more than 2 nodes, it is easier just to use a switch. Thanks, Brian Oborn From oborbria at athena.physics.isu.edu Thu Jan 31 12:00:50 2008 From: oborbria at athena.physics.isu.edu (Brian Oborn) Date: Wed Nov 25 01:06:49 2009 Subject: [Beowulf] Cheap SDR IB In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com> Message-ID: <47A228F2.1070309@physics.isu.edu> > With more cores on a single node, the IB benefits are seen in much lower number of nodes. I am testing some applications on a new cluster that I have (dual sockets quad core Barcelona), and my first results are with Fluent new benchmarks. I will have the numbers posted soon, so you all can take a look. For 2 nodes, IB shows an average of 15-20% higher performance then GigE, and this gap gets bigger with cluster size. A quick side question. Is it possible to use IB as a cross-over with no switch? If I had just 2 fat nodes could I connect the HCAs directly to each other and avoid the switch costs? Could this be extended to ring or hypercube topologies? Thanks, Brian Oborn From kekechen at cc.gatech.edu Wed Jan 30 23:01:19 2008 From: kekechen at cc.gatech.edu (Keke Chen) Date: Wed Nov 25 01:06:50 2009 Subject: [Beowulf] CFP: International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE08) Message-ID: <47A1723F.1080209@cc.gatech.edu> We apologize if you receive multiple copies ======== Call For Papers =================== The 7th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2008) Monterrey, Mexico, Nov 11 - 13, 2008 http://www.cs.rmit.edu.au/fedconf/ Scale of use, ease of use, breadth of use and choice of use have earmarked the most important transitions of semantic technologies in the years since the first ODBASE conference in 2002. Recent methods allow for scaling of semantic technologies to handling dozens of millions of triples; they allow for composing intriguing semantic applications within a few days; they address target applications from the sciences up to eCommerce; and they allow to chose among plenty of existing ontologies and half a dozen of RDF stores, inferencing engines, or ontology mapping systems. While these developments greatly contribute to the success of semantic technologies, for enterprise-wide and Web-scale applications, the envelope needs to be pushed much higher, faster, wider, and broader. The 2008 conference on Ontologies, DataBases, and Applications of Semantics (ODBASE'08) solicits original research papers that push the current boundaries. As in recent years, the focus of the conference lies in addressing research issues that bridge traditional boundaries between disciplines such as databases, artificial intelligence, semantic web, or data extraction. Also, ODBASE'08 encourages the submission of papers that examine the information needs of various applications, including electronic commerce, electronic government, bioinformatics, or emergency response. ODBASE'08 will consider two categories of papers: research and experience. Research papers must contain novel, unpublished results. Experience papers must describe existing, realistically large systems. In the latter case, preference will be given to papers that describe software products or systems that are in wide (experimental) use. ODBASE'08 intends to draw a highly diverse body of researchers and practitioners by being part of the Federated conferences Event "On the Move to Meaningful Internet Systems 2008" that co-locates five conferences: ODBASE'08, DOA'08 (International Symposium on Distributed Objects and Applications), CoopIS'08 (International Conference on Cooperative Information Systems), GADA'08 (International Conference on Grid computing, high-performAnce and Distributed Applications), and IS'08 (International Symposium on Information Security). TOPICS OF INTEREST Specific areas of interest to ODBASE'08 include but are not limited to: * Semantic data models and semantic querying * Semantic dataspaces * Ontology engineering * Semantic integration, including ontology matching, merging, etc. * Management of large ontology-driven data and knowledge bases * Semantic information retrieval * Emergent semantics * Social semantic systems * Semantic multimedia management * Metadata management * XML and Semantics * Hypertext, multimedia, and hypermedia semantics * Semantic middleware * Semantic SOA * Ontological support for location-aware services and mobile information systems * Searching and managing dynamic knowledge Applications, Evaluations, and Experiences in the following domains: * Web 2.0 * Personal Information Management * Media Archives and Digital Libraries * Enterprise-wide Information Systems * Web-based Information Systems * Web Services * eCommerce * eScience * eOrganizations (virtual organizations, virtual marketplaces, etc.) * Bioinformatics * Emergency Response * Ubiquitous and Mobile Information Systems IMPORTANT DATES Abstract Submission Deadline June 8, 2008 Paper Submission Deadline June 15, 2008 Acceptance Notification August 10, 2008 Camera Ready Due August 25, 2008 Registration Due August 25, 2008 OTM Conferences November 9 - 14, 2008 SUBMISSION GUIDELINES Papers submitted to ODBASE'08 must not have been accepted for publication elsewhere or be under review for another workshop or conference. All submitted papers will be carefully evaluated based on originality, significance, technical soundness, and clarity of expression. All papers will be refereed by at least three members of the program committee, and at least two will be experts from industry in the case of practice reports. All submissions must be in English. Submissions must not exceed 18 pages in the final camera-ready paper style. Submissions must be laid out according to the final camera-ready formatting instructions and must be submitted in PDF format. The paper submission site will be announced shortly Failure to comply with the formatting instructions for submitted papers will lead to the outright rejection of the paper without review. Failure to commit to presentation at the conference automatically excludes a paper from the proceedings. ORGANISATION COMMITTEE General Co-Chairs * Robert Meersman, VU Brussels, Belgium * Zahir Tari, RMIT University, Australia Program Committee Co-Chairs * Feng Ling, Tsinghua University, China * Fausto Giunchiglia, University of Trento, Italy * Malu Castellanos, HP, USA Program Committee Members (to be extended and confirmed) * Harith Alani, University of Southampton * Renato Barrera, UNAM * Sonia Bergamaschi, University of Modena and Reggio Emilia * Mohand Boughanem, Universit? Paul Sabatier of Toulouse * Edgar Chavez, Universidad de Michoacan * Oscar Corcho, Universidad Polit?cnica de Madrid * Farookh Hussain, Curtin University of Technology * Vipul Kashyap, Clinical Informatics R&D, Partners HealthCare System * Phokion Kolaitis, IBM * Manolis Koubarakis, National and Kapodistrian University of Athens * Maurizio Lenzerini, Universita di Roma "La Sapienza" * Juanzi Li Tsinghua University * Alexander L?ser, SAP Research * Riichiro Mizoguchi, Osaka University * Wenny Rahayu, La Trobe University * Rajugan, Rajagopalapillai, Curtin University of Technology * Arnon Rosenthal, The MITRE Corporation * Pavel Shvaiko, University of Trento * Umberto Straccia, ISTI-CNR * Eleni Stroulia, University of Alberta * Heiner Stuckenschmidt, University of Mannheim * York Sure, SAP * Yannis Velegrakis, University of Trento * Guido Vetere, IBM * Jose Luis Zechinelli, CENTIA * Yanchun Zhang Victoria University * Jingshan Huang, University of South Carolina * Octavian Udrea, University of Toronto * Li Ma, IBM * Wolfgang Nejdl, University of Hannover, Germany From kekechen at cc.gatech.edu Wed Jan 30 23:02:37 2008 From: kekechen at cc.gatech.edu (Keke Chen) Date: Wed Nov 25 01:06:50 2009 Subject: [Beowulf] CFP: International Conference on COOPERATIVE INFORMATION SYSTEMS (CoopIS 2008) Message-ID: <47A1728D.2040208@cc.gatech.edu> We apologize if you receive multiple copies ======== Call For Papers =================== 16th International Conference on COOPERATIVE INFORMATION SYSTEMS (CoopIS 2008) Monterrey, Mexico, Nov 12 - 14, 2008 http://www.cs.rmit.edu.au/fedconf Acceptance rate of CoopIS in recent years was approx. 20% Cooperative Information Systems are the cornerstone for moving the technical network infrastructure to a meaningful integrated information infrastructure. The CIS paradigm has traditionally encompassed distributed systems technologies such as middleware, business process management (BPM) and Web technologies. In recent years service oriented architectures have fundamentally altered the technological landscape of CIS systems. Service Oriented Computing (SOC) introduces the service abstraction (a remotely accessible software component) as the building block of both inter and intra organizational distributed applications and its supporting middleware. Cooperative Information Systems applications are heavily distributed and highly coordinated, often exhibiting inter-organizational interaction patterns and requiring distributed access and sharing of computing and information resources. Typically they fall under the categories of e-Business, e-Commerce, e-Government, e-Health, e-Science among others. The CoopIS conference series has established itself as a major international forum for exchanging ideas and results on scientific research for practitioners in fields such as computer supported cooperative work (CSCW), middleware, Internet data management, electronic commerce, human-computer interaction, workflow management, agent technologies, and software architectures, to name a few. In addition, the 2008 edition of CoopIs aims to highlight the impact of service oriented computing and the importance of sustainability of CIS as a necessary prerequisite for mission critical applications. As in previous years, CoopIS'08 will be part of a joint event with other conferences, in the context of the OTM ("On The Move") federated conferences, covering different aspects of distributed information systems. Topics that are addressed by CoopIS'08 are logically grouped in three broad areas, and include but are not limited to: * Business Process Management and Compliance o Business Process Integration and Management o Cooperation Aspects in Business Process Management o Distributed Workflow Management and Systems o Service orchestration and service compositions o Process choreographies o Business process compliance o Integrated supply chains o Concurrent engineering and distributed groupware o Business level policies o Governance, risk and compliance models and runtimes o Sustainability of processes * Advanced middleware and architectures and runtimes o Service oriented middleware o Web services standards and runtimes o Grid computing infrastructure o Enterprise Grids architectures and services o Web centric information and processing architectures o Semantic interoperability o Self-adapting and self-healing systems o Model driven middleware architectures o Multi-agent systems and architectures for CIS o Peer-to-peer technologies o Security and privacy in CIS o Quality of service in cooperative information systems o Mediation, matchmaking, and brokering architectures o Collaboration and negotiation protocols o Markets, auctions, exchanges, and coalitions * CIS Applications o Novel CIS applications for the large organizations: e-business, e-commerce, e-government o Advances in e-science and Grid computing applications o Medical and biological information systems o Industrial applications of CIS o Web 2.0 IMPORTANT DATES Abstract Submission Deadline June 8, 2008 Paper Submission Deadline June 15, 2008 Acceptance Notification August 10, 2008 Camera Ready Due August 25, 2008 Registration Due August 25, 2008 OTM Conferences November 9 - 14, 2008 SUBMISSION GUIDELINES Papers submitted to CoopIS'08 must not have been accepted for publication elsewhere or be under review for another workshop or conference. All submitted papers will be carefully evaluated based on originality, significance, technical soundness, and clarity of expression. All papers will be refereed by at least three members of the program committee, and at least two will be experts from industry in the case of practice reports. All submissions must be in English. Submissions must not exceed 18 pages in the final camera-ready paper style. Submissions must be laid out according to the final camera-ready formatting instructions and must be submitted in PDF format. The paper submission site will be announced later Failure to comply with the formatting instructions for submitted papers will lead to the outright rejection of the paper without review. Failure to commit to presentation at the conference automatically excludes a paper from the proceedings. ORGANISATION COMMITTEE General Co-Chairs * Robert Meersman, VU Brussels, Belgium * Zahir Tari, RMIT University, Australia Program Committee Co-Chairs * Johann Eder, University of Klagenfurt, Austria * Masaru Kitsuregawa, University of Tokyo, Japan * Ling Liu, Georgia Institute of Technology, USA Program Committee Members (to be extended and confirmed) * Ghaleb Abdulla, Lawrence Livermore National Laboratory, USA * Marco Aiello, University of Groningen, The Netherlands * Joonsoo Bae, Chonbuk National Universiry, South Korea * Alistair Barros, SAP, Research Centre Brisbane, Australia * Zohra Bellahsene, LIRMM- CNRS/Universit? Montpellier 2, France * Salima Benbernou, University Lyon 1, France * Djamal Benslimane, University of Lyon, France * M. Brian Blake, Georgetown University, Washington DC, USA * Klemens B?hm, University of Karlsruhe, Germany * Christoph Bussler, Cisco Systems, Inc, USA * Ying Cai, Iowa State University, USA * James Caverlee, Texas A&M University, USA * Keke Chen, Yahoo!, USA * Vincenzo D'Andrea, University of Trento, Italy * Xiaoyoung Du, Renmin University of China, PR China * Marlon Dumas, University of Tartu, Estonia * Schahram Dustdar, Vienna University of Technology, Austria * Rik Eshuis, Eindhoven University, The Netherlands * Opher Etzion, IBM Israel Software Lab * Renato Fileto, Federal University of Santa Catarina, Brazil * Klaus Fischer, DFKI, Germany * Avigdor Gal, Technion Israel Institute of Technology, Israel * Bugra Gedik, IBM TJ Watson, USA * Dimitrios Georgakopoulos, Telcordia, USA * Paul Grefen, Eindhoven University of Technology, The Netherlands * Amarnath Gupta, University of California San Diego, USA * Mohand-Said Hacid, Lyon University, France * Thorsten Hampel, University of Paderborn, Germany * Richard Hull, Lucent Technologies, USA * Patrick Hung, University of Ontario Institute of Technology (UOIT), Canada * Paul Johannesson, Royal Institute of Technology (KTH), Sweden * Dimka Karastoyanova, University of Stuttgart, Germany * Rania Khalaf, IBM Research * Hiroyuki Kitagawa, University of Tsukuba * Shim Kyusock, Seoul National Univ. * Akhil Kumar, Penn State University, USA * Wang-Chien Lee, Pennsylvania State University, USA * Frank Leymann, University of Stuttgart, Germany * Chen Li, University of California, Irvine, USA * Sanjay K. Madria, Missouri University of Science and Technology, USA * Leo Mark, Georgia Institute of Technology * Maristella Matera, DEI - Politecnico di Milano, Italy * Massimo Mecella, Universita' di Roma, Italy * Nirmal Mukhi, IBM T J Watson Research Center * Mohamed Mokbel, University of Minnessota, USA * Miyuki Nakano, University of Tokyo, Japan * Werner Nutt, Free University of Bozen-Bolzano, Italy * Andreas Oberweis, University of Karlsruhe, Germany * Cesare Pautasso, University of Lugano, Switzerland * Barbara Pernici, Politecnico di Milano, Italy * Frank Puhlmann, Hasso Plattner Institut, Germany * Manfred Reichert, Ulm University, Germany * Stefanie Rinderle-Ma, Ulm University, Germany * Lakshmish Ramaswamy, University of Georgia, USA * Duncan Ruiz, Catholic University of RS, Brazil * Kai-Uwe Sattler, TU Ilmenau, Germany * Jialie Shen, Singapore Management University, Singapore * Aameek Singh, IBM Almaden Research Center * Mudhakar Srivatsa, IBM TJ Watson Research Center, USA * Jianwen Su, University of California, Santa Barbara, USA * Wei Tang, Teradata Corp. USA * Anthony Tung, National University of Singapore, Singapore * Susan Urban, Texas Tech University, USA * Willem-Jan Van den Heuvel, Tilburg University, The Netherlands * Maria Esther Vidal, Universidad Simon Bolivar, Caracas Venezuela * Shan Wang, Renmin University of China, PR China * X. Sean Wang, University of Vermont, USA * Jeffrey Yu, Chinese University of Hong Kong, HK * Matthias Weske, University of Potsdam, Germany * Li Xiong, Emory University, USA * Jian Yang, Macquarie University, Australia * Leon Zhao, University of Arizona, USA * Xiaofang Zhou, University of Queensland, Australia * Aoying Zhou, East China Normal University, PR China * Michael zur Muehlen, Stevens Institute of Technology, USA From kekechen at cc.gatech.edu Wed Jan 30 23:04:29 2008 From: kekechen at cc.gatech.edu (Keke Chen) Date: Wed Nov 25 01:06:50 2009 Subject: [Beowulf] CFP: International Conference on Grid Computing, High-performance and Distributed Applications (GADA08) Message-ID: <47A172FD.4060103@cc.gatech.edu> We apologize if you receive multiple copies ======== Call For Papers =================== International Conference on Grid computing, high-performAnce and Distributed Applications (GADA'08) Monterrey, Mexico, Nov 13 - 14, 2008 http://www.cs.rmit.edu.au/fedconf In the last decade, grid computing has developed into one of the most important topics in the computing field. The research area of grid computing has been making particularly rapid progress in the last few years, due to the increasing number of scientific applications that are demanding intensive use of computational resources and a dynamic and heterogeneous infrastructure. Within this framework, the GADA workshop arose in 2004 as a forum for researchers in grid computing whose aim was to extend their background in this area, and more specifically, for those who used grid environments in managing and analyzing data. Both GADA'04 and GADA'05 were constituted as successful events, due to the large number of high-quality papers received, as well as the brainstorming of experiences and ideas interchanged in the associated forums. Because of this demonstrated success, GADA was upgraded as a Conference within On The Move Federated Conferences and Workshops (OTM'06). GADA'06 covered a broader set of disciplines, although grid computing kept a key role in the set of main topics of the conference. The objective of grid computing is the integration of heterogeneous computing systems and data resources with the aim of providing a global computing space. The achievement of this goal is creating revolutionary changes in the field of computation, because it enables resource sharing across networks, with data being one of the most important resources. Thus, data access, management and analysis within grid and distributed environments are also dealt as main part of the conference. Therefore, the main goal of GADA'08 is to provide a framework in which a community of researchers, developers and users can exchange ideas and works related to grid, high-performance and distributed applications and systems. The second goal of GADA'08 is to create interaction between grid computing researchers and the other OTM attendees. GADA'08 intends to draw a highly diverse body of researchers and practitioners by being part of the "On the Move to Meaningful Internet Systems and Ubiquitous Computing 2008" federated conferences event that includes five co-located conferences: * GADA'08 (International Conference on Grid computing, high-performAnce and Distributed Applications) * CoopIS'08 (International Conference on Cooperative Information Systems) * DOA'08 (International Symposium on Distributed Objects and Applications) * ODBASE'08 (International Conference on Ontologies, DataBases, and applications of Semantics) * IS'08 (Information Security Symposium) TOPICS OF INTEREST Topics of interest include, but are not limited to: * Computational grids * Data grids * High-performance computing * Distributed applications * Cluster computing * Parallel applications * Grid infrastructures for data analysis * High-performance computing for data-intensive applications * Grid computing infrastructures, middleware and tools * Mobile Grid Computing * Grid computing services * Collaboration technologies * Data analysis and management on grids * Distributed and parallel I/O systems * Extracting knowledge from data grids * Agent architectures for grid and distributed environments * Agent-based data extraction in distributed systems * Semantic Grid * Security in distributed environments * Security in computational and data grids * Grid standards as related to applications IMPORTANT DATES Abstract Submission Deadline June 8, 2008 Paper Submission Deadline June 15, 2008 Acceptance Notification August 10, 2008 Camera Ready Due August 25, 2008 Registration Due August 25, 2008 OTM Conferences November 9 - 14, 2008 SUBMISSION GUIDELINES Papers submitted to GADA'08 must not have been accepted for publication elsewhere or be under review for another workshop or conference. All submitted papers will be carefully evaluated based on originality, significance, technical soundness, and clarity of expression. All submissions must be in English. Submissions should be in PDF format and must not exceed 18 pages in the final camera-ready format. The paper submission site will be announced shortly Failure to commit to presentation at the conference automatically excludes a paper from the proceedings. GADA PC co-chairs * Dennis Gannon Computer Science Department Indiana University Lindley Hall, Room 215 150 S. Woodlawn Ave. Bloomington, IN 47405-7104 Phone: (812) 855-5184 Fax: (812) 855-4829 Email: gannon@cs.indiana.edu * Pilar Herrero Facultad de Inform?tica Universidad Polit?cnica de Madrid Madrid (Spain) Phone: (+34) 91.336.74.56 Fax: (+34) 91.336.65.95E Email: pherrero@fi.upm.es * Daniel S. Katz Louisiana State University Louisiana (USA) Phone: (+1) 225.578.2750 Fax: (+1) 225.578.5362 Email: d.katz@ieee.org * Mar?a S. P?rez Facultad de Inform?tica Universidad Polit?cnica de Madrid Madrid (Spain) Phone: (+34) 91.336.73.80 Fax: (+34) 91.336.73.73 Email: mperez@fi.upm.es Program Committee (to be confirmed and extended) * Jemal Abawajy, Deakin University, Victoria, Australia * Akshai Aggarwal, University of Windsor, Canada * Sattar B. Sadkhan Almaliky, Iraq - Alnahrain University, Iraq * Artur Andrzejak, Zuse Institute Berlin (ZIB), Germany * Amy Apon, University of Arkansas, USA * Oscar Ardaiz, Universidad de Navarra, Spain * Costin Badica, University of Craiova, Romania * Rosa M. Badia, UPC, Barcelona, Spain * Mark Baker, University of Reading, UK * Angelos Bilas, Univ. of Crete and FORTH, Greece * Jose L. Bosque, Universidad de Cantabria, Spain * Juan A. Bot?a Blaya, Universidad de Murcia, Spain * Pascal Bouvry, Universit? du Luxembourg, Luxembourg * Rajkumar Buyya, University of Melbourne, Melbourne, Australia * Santi Caball? Llobet, Open University of Catalonia, Spain * Mario Cannataro, Univ. of Catanzaro, Italy * Jes?s Carretero, Universidad Carlos III, Spain * Charlie Catlett, Argonne National Laboratory, USA * Pablo Chacin , Universitat Polit?cnica de Catalunya, Spain * Isaac Chao, Universitat Polit?cnica de Catalunya, Spain * Jinjun Chen , Swinburne University of Technology, Australia * F?lix J. Garc?a Clemente , Universidad de Murcia, Spain * Carmela Comito, University of Calabria, Italy * Toni Cortes, UPC, Barcelona, Spain * Geoff Coulson, , Lancaster University, UK * Jose Cunha, Universidade Nova de Lisboa, Portugal * Ewa Deelman, USC Information Sciences Institute, USA * Marios Dikaiakos, University of Cyprus, Cyprus * Beniamino Di Martino, Department of Information Engineering, Seconda Universit? di Napoli, Italy * Jack Dongarra, University of Tennessee, Knoxville * Markus Endler, PUC-Rio, * Maria Ganzha, Elblag University of Humanities and Economy, Poland * Felix Garc?a, Universidad Carlos III, Spain * Alastair Hampshire, University of Nottingham, UK * Neil P Chue Hong , The University of Edinburgh, UK * Alastair Hampshire, University of Nottingham, UK * Eduardo Huedo, Universidad Complutense de Madrid, Spain * Jan Humble, University of Nottingham, UK * Liviu Joita, Cardiff University, UK * Kostas Karasavvas, National e-Science Centre, UK * Chung-Ta King, National Tsing Hua University, Taiwan * Kamil Kuliberda, Polish-Japanese Institute of Information Technology, Poland * Laurent Lefevre, INRIA, France * Ignacio M. Llorente, UCM-CAB, Madrid, Spain * Francisco Luna, University of Malaga, Spain * Edgar Magana, Universitat Polit?cnica de Catalunya, Spain * Gregorio Martinez, Universidad de Murcia, Spain * Ruben S. Montero, UCM-CAB, Madrid, Spain * Reagan Moore, San Diego Supercomputer Center (SDSC), USA * Mirela Notare, Barddal University, Brazil * Hong Ong, Oak Ridge National Laboratory, USA * Mohamed Ould-Khaoua, University of Glasgow, UK * Marcin Paprzycki, Warsaw School of Social Psychology, Poland * Manish Parashar, Rutgers University, NJ * Jose M. Pe?a, UPM, Spain * Dana Petcu, Western University of Timisoara, Romania * Beth A Plale, Indiana University, USA * Jos? Luis V?zquez Poletti, Universidad Complutense de Madrid, Spain * Mar?a Eugenia de Pool, Universidad Nacional Experimental de Guayana, Venezuela * Bhanu Prasad, Florida A &M University, USA * Thierry Priol, IRISA-INRIA, France * V?ctor Robles,UPM,Spain * Rizos Sakellariou, Univ. of Manchester, UK * Manuel Salvadores, Imbert Management Consulting Group, Spain * Alberto Sanchez, UPM, Spain * Hamid Sarbazi-Azad, Sharif University of Technology, Iran * Franciszek Seredynski, Polish Academy of Science, Poland * Francisco Jos? da Silva e Silva, Universidade Federal do Maranh?o, Brasil * Antonio F. G?mez Skarmeta, Universidad de Murcia, Spain * Enrique Soler, Universidad de M?laga, Spain * Heinz Stockinger, Swiss Institute of Bioinformatics, Lausanne, Switzerland * Alan Sussman, University of Maryland, College Park, USA * Elghazali Talbi, University of Lille, France * Jordi Torres, Barcelona SuperComputing Center (BSC-CNS), Spain * Cho-Li Wang, Hong Kong University, China * Adam Wierzbicki, Polish-Japanese Institute of Information Technology, Poland * Laurence T. Yang, St. Francis Xavier University, Canada From kekechen at cc.gatech.edu Wed Jan 30 23:49:15 2008 From: kekechen at cc.gatech.edu (Keke Chen) Date: Wed Nov 25 01:06:50 2009 Subject: [Beowulf] CALL FOR WORKSHOP PROPOSALS: OnTheMove OTM Federated Conferences and Workshops 2008 Message-ID: <47A17D7B.5050701@cc.gatech.edu> *************************************************************** CALL FOR WORKSHOP PROPOSALS OnTheMove OTM Federated Conferences and Workshops 2008 (OTM'08) 9-14th November, 2008 Monterrey, Mexico http://www.cs.rmit.edu.au/fedconf Proceedings will be published by Springer Verlag *************************************************************** The 14 workshops of OTM'06 as well as the 11 workshops of OTM'07 were a real success as hundreds of researchers converged through the presentation of interesting ideas in several domains relevant to the themes of distributed, meaningful and ubiquitous computing and information systems. Proposals for new workshops are presently solicited for affiliation with OTM 2008. The main goal of the OTM 2008 workshops is to stimulate and facilitate an active exchange, interaction and comparison of new approaches and methods. OTM'08 provides an opportunity for a highly diverse body of researchers and practitioners by federating five successful related and complementary conferences: * GADA'08 (International Conference on Grid computing, high-performAnce and Distributed Applications) * CoopIS'08 (International Conference on Cooperative Information Systems) * DOA'08 (International Symposium on Distributed Objects and Applications) * ODBASE'08 (International Conference on Ontologies, DataBases, and applications of Semantics) * IS'08 (Information Security Symposium) OTM'08 especially encourages proposals that are related to the OnTheMove themes. The format of each workshop is to be determined by the organisers. Please consult the formats of the workshops held in the previous editions for examples of successful workshops: * http://www.cs.rmit.edu.au/fedconf/2007/index.html?page=persys2007cfp * http://www.cs.rmit.edu.au/fedconf/2006/index.html?page=is2006cfp * http://www.cs.rmit.edu.au/fedconf/2005/cams2005cfp.html SUBMISSION: ----------- Researchers and practitioners are invited to submit workshop proposals to the OTM 2008 Workshop Chair: Pilar Herrero (pherrero@fi.upm.es) no later than *** Monday, February 18, 2008 *** Submission should be made by e-mail (ASCII/PS/PDF/DOC format are accepted) using "OTM Workshop Proposal Submission" as the email subject. Prospective organizers are also encouraged to discuss with the Workshops Chair prior to submitting proposals. PROPOSAL CONTENTS: ------------------ In order to make it easier to evaluate your proposal, it would be greatly appreciated if in crafting your proposal you could include the following information: * A brief technical description of the workshop, specifying the workshop goals and the technical issues that will be its focus. * A brief discussion of why and to whom the workshop is of interest. * A list of related workshops held within the last two years, if any, and their relation to the proposed workshop. * If applicable, detailed information about previous editions of the same workshop (e.g., number of submissions, number of attendees). * A preliminary call for participation/papers. * The names, postal addresses, phone numbers, and email addresses of the proposed workshop organizing committee. * The name of the primary contact for the organizing committee; an email address of this person should be given. * A description of the qualifications of the individual committee members with respect to organizing scientific events, including a list of workshops previously arranged by any members of the proposed organizing committee, if any. * The Advertising procedure: How do the committee members plan to advertise their workshop * A brief description of how the proposed workshop could complement the four main conferences scopes. SELECTION CRITERIA: ------------------- The selection of the workshops to be included in the final OTM 2008 program will be based upon a number of factors, including: the scientific/technical interest of the topics, the quality of the proposal, the need to avoid strictly overlapping workshops, and the unavoidable need to limit the overall number of selected workshops. ORGANIZERS' RESPONSABILITIES: ----------------------------- Workshop organizers will be responsible for the following: * Making a Web site for the workshop to be located in the main web site http://www.cs.rmit.edu.au/fedconf (All the conferences and workshops MUST be located in this main web site) * Advertising the workshop and issuing a call for participation/papers. * Collecting submissions, notifying acceptances in due time, and ensuring a transparent and fair selection process. All workshop organizers commit themselves to adopt the same deadlines for submissions and notifications of acceptance. * Ensuring that the workshop organizers and the participants get registered to the workshop and are invited to register to the main conference. * At least one PC co-chair per OTM Workshop MUST be registered and present for the whole event. OTM'08 reserves the right to cancel any workshop if the above responsibilities are not fulfilled or if too few attendees register for the workshop. WORKSHOPS PROCEEDINGS: ---------------------- Papers accepted by the workshops are likely to be published as a joint volume of Lecture Notes in Computer Science (LNCS) by Springer. We look forward to your support in making OTM?08 workshops the most exciting one. *********************************************************************** Send proposals (in ASCII/PS/PDF/DOC format) and inquiries via email to: Pilar Herrero (pherrero@fi.upm.es) using "OTM Workshop Proposal Submission" as the email subject. ******** Deadline: Monday, February 18, 2008 ************ ***********************************************************************