From gdjacobs at gmail.com Thu Feb 1 02:49:27 2007 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Wed Nov 25 01:05:39 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <60632.192.168.1.1.1170268112.squirrel@mail.eadline.org> References: <45BE8E7E.4010808@brookes.ac.uk> <45C0791C.5080904@brookes.ac.uk> <60632.192.168.1.1.1170268112.squirrel@mail.eadline.org> Message-ID: <45C1C5B7.9080608@gmail.com> Douglas Eadline wrote: > If you want to do a little development and impress your friends, > try playing with pgapack (Parallel Genetic Algorithm Library) > > http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html > > You can develop a GA on single computer then run it on > a cluster. > > -- > Doug I see this and think "stock market" or "sports betting". -- Geoffrey D. Jacobs From gerry.creager at tamu.edu Thu Feb 1 03:43:32 2007 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:05:39 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070131212014.0304b408@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <6.2.3.4.2.20070131212014.0304b408@mail.jpl.nasa.gov> Message-ID: <45C1D264.4000209@tamu.edu> Jim Lux wrote: > At 02:03 PM 1/31/2007, Robert G. Brown wrote: >> On Wed, 31 Jan 2007, Mitchell Wisidagamage wrote: >> >>> Thank you very much for the fire dynamics idea. I will have a look at >>> it. >>> >>> I did try to contact many e-science projects including some >>> researchers at Oxford. But I got no reply. Then I went to get some >>> contacts from a tutor who worked at a e-science project himself. He >>> told me people, especially scientists are "very jealous" of their >>> data. And not replying is a kind way of saying "no". And there's the >>> problem of "who's this guy wanting my data", "what will he do with it?". >>> >>> I have given up the e-science idea. Now looking for other real world >>> applications. >> >> Remember, NASA puts all (or at least a lot) of its e.g. weather data >> online. > > Well.. not exactly NASA.. operational "weather" data is the province of > NOAA. NASA does research, not operational, data, so there's typically a > time lag, especially for processed and calibrated data. > > By and large, most environmental data collected by NASA winds up in > DAACs (Distributed Active Archiving Centers). Physical Oceanography > data, for instance, winds up at PO-DAAC... > http://www-podaac.jpl.nasa.gov/ which has data for sea surface > temperature, sea surface topography, and ocean vector winds acquired by > NASA instruments. This whole process is very well documented, and the > data moves through the various levels of processing and into the > archives in a regular and stately fashion. > > But, for instance, the live data from a single instrument (e.g. QuikSCAT > for ocean winds, on which I worked) also gets fed to a realtime process > at NOAA within about an hour after it's received on the ground every 100 > minutes, and thence to folks like NCAR who run numerical models, which > then winds up at the NWS and makes the weather predictions more accurate > on the evening news. This is a bit harder to find in a reliable online > source, especially if you want things gridded into standard geographic > grids, etc. It's all out there, but since the funding stream for > distribution is more tenuous (NOAA doesn't have as much money as NASA > for this sort of thing, but they do have "real time" requirements), the > data tends to be a bit more "raw" or idiosyncratic, and not necessarily > in HDF files, etc. It tends to be in whatever format is convenient for > them, which may or may not be convenient for you. For research purposes, the National Centers for Environmental Prediction (ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/) makes available all their model runs on a 6-hourly schedule. These data are available for ~3 days, then expire off the servers here. Historical data subsets are available via the National Climatic Data Center NOMADS portal (http://nomads.ncdc.noaa.gov/) which was designed to facilitate access to the datasets. The National Centers for Atmospheric Research (http://www.ncar.ucar.edu/) allows access to some limited historic data in their archives without restriction and facilitates scientific research with accounts to scientists. >> And there are many things one can do with it. Look for the >> NOAA sites. You can get sunspot data, proxy temperature data, and much >> more, and build your very own climate model. If you do, don't be >> surprised if it fails to agree with the current one (due to be >> re-released today, IIRC, from the IPCC). > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From 06002352 at brookes.ac.uk Thu Feb 1 03:56:19 2007 From: 06002352 at brookes.ac.uk (Mitchell Wisidagamage) Date: Wed Nov 25 01:05:39 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070131211133.03400140@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <6.2.3.4.2.20070131211133.03400140@mail.jpl.nasa.gov> Message-ID: <45C1D563.5060201@brookes.ac.uk> > > Optimum path routing of ships and/or airplanes, taking into account the > winds, currents, sea state, temperatures, etc. > > Large realtime and climatological databases are available. > The path optimization algorithms are simple and fairly well known (A and > A-star are two to start with). The challenge is in suitable heuristics > to prune the search space. > > You can optimize for minimum time in transit, or minimum fuel cost, or > minimum probability of delay, etc. Very nice example! Thank you I requested some JPL CDs of images when I was a teenager. I was very impressed at the time. :o) From 06002352 at brookes.ac.uk Thu Feb 1 04:04:59 2007 From: 06002352 at brookes.ac.uk (Mitchell Wisidagamage) Date: Wed Nov 25 01:05:39 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070131213744.03074e00@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <36397.192.168.1.1.1170189086.squirrel@mail.eadline.org> <45BFE17F.5080901@brookes.ac.uk> <45C0E921.7090600@tempemusic.com> <45C14798.9040304@brookes.ac.uk> <6.2.3.4.2.20070131213744.03074e00@mail.jpl.nasa.gov> Message-ID: <45C1D76B.5070204@brookes.ac.uk> >> >> Now I'm not sure what to do with these data sets. I should program my >> own application. But how should I be processing them?...without the >> algorithms for processing I'm lost. :o) > > > http://www.ocean-systems.com/VOSS.htm > www.weather.navy.mil/paoweb/starsams.ppt > > http://realdistance.com/ > > I'll need some time to digest these models. Hope it's not to complicated. Thank you every for taking time to help me. or Cheers as everyone here call it. Wow lots of people with scientific backgrounds on here. I thought this was geeky mailing list with programmers trying to solve cluster problems. From 06002352 at brookes.ac.uk Thu Feb 1 04:25:38 2007 From: 06002352 at brookes.ac.uk (Mitchell Wisidagamage) Date: Wed Nov 25 01:05:39 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <45C1542A.4030701@tamu.edu> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> Message-ID: <45C1DC42.90604@brookes.ac.uk> > > Please don't fall into the trap of thinking "e-Science" requires a tie > to the Globus Toolkit to be valid. > I do not think this (anymore). I queried Matthew Haynos from IBM who's an expert in this area some time ago as I'm new to grid computing. The silly questions are from me :o) Answers are his. Because at the moment distributed computing is only popular in the academic research and highly specialized part of the industry...atleast that's what I think. Any professional and personal comments from your expereince? Not true. Distributed computing is more and more mainstream. I think too that you are looking at distributed computing perhaps too narowly. Even if you are referring to supercomputing, witness that more and more of the Top 500 supercomputing sites are increasingly commerical (as opposed to academic or public institutions). Anyhow I just read it again and you stated that "Grid computing becoming more of a defacto standard for distributed computing in enterprises". May I ask why do you think that? I would say b/c of the growing ubiquity of scale-out computing (lots of machines, lots of resources, etc.) What's happening here is that scheduling, etc. is going from the machine into the network. People no longer know where things are going to run with hundreds / thousands of blade processors. This is a sea change. People use to say run this piece of work on this machine, now it's just run this work, I have no idea where. I've written an article series for IBM's grid site on developerWorks: Check out: http://www-128.ibm.com/developerworks/search/searchResults.jsp?searchType=1&pageLang=&displaySearchScope=dW&searchSite=dW&lastUserQuery1=perspectives+on+grid&lastUserQuery2=&lastUserQuery3=&lastUserQuery4=&query=perspectives+on+grid+haynos&searchScope=dW particularly the "Next-generation distributed computing" article for a primer. I think you'll find the five or so articles in the series interesting. From gerry.creager at tamu.edu Thu Feb 1 04:52:16 2007 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <45C1DC42.90604@brookes.ac.uk> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: <45C1E280.9000502@tamu.edu> Mitchell Wisidagamage wrote: >> >> Please don't fall into the trap of thinking "e-Science" requires a tie >> to the Globus Toolkit to be valid. >> > I do not think this (anymore). I queried Matthew Haynos from IBM who's > an expert in this area some time ago as I'm new to grid computing. The > silly questions are from me :o) Answers are his. > > Because at the moment distributed computing is only popular in the > academic research and highly specialized part of the industry...atleast > that's what I think. Any professional and personal comments from your > expereince? > > Not true. Distributed computing is more and more mainstream. I think > too that you are looking at distributed computing perhaps too narowly. > Even if you are referring to supercomputing, witness that more and more > of the Top 500 supercomputing sites are increasingly commerical (as > opposed to academic or public institutions). > > Anyhow I just read it again and you stated that "Grid computing becoming > more of a defacto standard for distributed computing in enterprises". > > May I ask why do you think that? > I would say b/c of the growing ubiquity of scale-out computing (lots of > machines, lots of resources, etc.) What's happening here is that > scheduling, etc. is going from the machine into the network. People no > longer know where things are going to run with hundreds / thousands of > blade processors. This is a sea change. People use to say run this > piece of work on this machine, now it's just run this work, I have no > idea where. I've written an article series for IBM's grid site on > developerWorks: > > Check out: > http://www-128.ibm.com/developerworks/search/searchResults.jsp?searchType=1&pageLang=&displaySearchScope=dW&searchSite=dW&lastUserQuery1=perspectives+on+grid&lastUserQuery2=&lastUserQuery3=&lastUserQuery4=&query=perspectives+on+grid+haynos&searchScope=dW > > > particularly the "Next-generation distributed computing" article for a > primer. I think you'll find the five or so articles in the series > interesting. I've read the article series and it is interesting. And, I'm not completely given over to anti-grid sentiment. The problem remains, however, to be embodied by a colleague, recounting his experience in running an ocean circulation model: "We only had a 13% slowdown running this as a grid application when compared to our local cluster." Now, there are several things to consider that go unsaid here. One is the degree of coupling in the code. Another is the size of the datasets that have to be moved to the various sites to facilitate operations. some codes will perform well when distributed broadly, while others will die a horrid death waiting for pieces of the result to come back from that P3 installation in Outer Geekdom. Some will suffer simply from communications latency. Others will just continue to chug along. By way of illustration, we benchmarked my MM5 semi-production run of 72 forecast hours for 3 domains of increasing resolution across the United States. To complete in the same timeframe as a locally submitted job, we found a requirement to double the number of processors when it was distributed out to the "grid". This is an extreme example, of course, and not one I propose to repeat anytime soon... It's much easier to run MM5 and WRF locally and not have to worry quite so much about resource reservation and odd processors failing mid-run. -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From deadline at clustermonkey.net Thu Feb 1 05:28:25 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <45C1C5B7.9080608@gmail.com> References: <45BE8E7E.4010808@brookes.ac.uk> <45C0791C.5080904@brookes.ac.uk> <60632.192.168.1.1.1170268112.squirrel@mail.eadline.org> <45C1C5B7.9080608@gmail.com> Message-ID: <36197.192.168.1.1.1170336505.squirrel@mail.eadline.org> > Douglas Eadline wrote: >> If you want to do a little development and impress your friends, >> try playing with pgapack (Parallel Genetic Algorithm Library) >> >> http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html >> >> You can develop a GA on single computer then run it on >> a cluster. >> >> -- >> Doug > > I see this and think "stock market" or "sports betting". Good "luck" with that. In any case, GA's and cluster design are not that foreign http://aggregate.org/FNN/ For those interested in other engineering and scientific uses take a look at: http://www.talkorigins.org/faqs/genalg/genalg.html -- Doug > > -- > Geoffrey D. Jacobs > > > !DSPAM:45c1c5f1232511543480883! > -- Doug From peter.st.john at gmail.com Thu Feb 1 06:08:57 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <36197.192.168.1.1.1170336505.squirrel@mail.eadline.org> References: <45BE8E7E.4010808@brookes.ac.uk> <45C0791C.5080904@brookes.ac.uk> <60632.192.168.1.1.1170268112.squirrel@mail.eadline.org> <45C1C5B7.9080608@gmail.com> <36197.192.168.1.1.1170336505.squirrel@mail.eadline.org> Message-ID: Since I had wanted a cluster to run my GA, I thought about using the GA to configure the cluster. So that's a great link for me! Peter On 2/1/07, Douglas Eadline wrote: > > > Douglas Eadline wrote: > >> If you want to do a little development and impress your friends, > >> try playing with pgapack (Parallel Genetic Algorithm Library) > >> > >> > http://www-fp.mcs.anl.gov/CCST/research/reports_pre1998/comp_bio/stalk/pgapack.html > >> > >> You can develop a GA on single computer then run it on > >> a cluster. > >> > >> -- > >> Doug > > > > I see this and think "stock market" or "sports betting". > > Good "luck" with that. In any case, GA's and cluster design > are not that foreign > > http://aggregate.org/FNN/ > > For those interested in other engineering and scientific uses > take a look at: > > http://www.talkorigins.org/faqs/genalg/genalg.html > > -- > Doug > > > > > > -- > > Geoffrey D. Jacobs > > > > > > !DSPAM:45c1c5f1232511543480883! > > > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070201/544d16d3/attachment.html From rgb at phy.duke.edu Thu Feb 1 06:20:11 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <36197.192.168.1.1.1170336505.squirrel@mail.eadline.org> References: <45BE8E7E.4010808@brookes.ac.uk> <45C0791C.5080904@brookes.ac.uk> <60632.192.168.1.1.1170268112.squirrel@mail.eadline.org> <45C1C5B7.9080608@gmail.com> <36197.192.168.1.1.1170336505.squirrel@mail.eadline.org> Message-ID: On Thu, 1 Feb 2007, Douglas Eadline wrote: > For those interested in other engineering and scientific uses > take a look at: > > http://www.talkorigins.org/faqs/genalg/genalg.html Fabulous article, actually. Thanks! I've actually written a parallel GA embedded in a NN training program, and have been working for years in a desultory fashion on building a "super"-GA that can get past several of the "problems" GAs have -- primarily premature convergence, which actually has a rather nasty scaling structure as one tries to find the "better" local optima in a problem with a complex/rugged fitness landscape in high dimensionality, and a few other problems that aren't well known (or at least aren't published much, possibly because they are worth a lot of money:-). This review of GAs is one of the best I've read, even better than Wikipedia's which is saying a lot! rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hahn at mcmaster.ca Thu Feb 1 07:40:39 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <45C1DC42.90604@brookes.ac.uk> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: > Not true. Distributed computing is more and more mainstream. I think too oh, one other snide comment about grid: I suspect the grid-fad could not have happened without the fraud perpetrated by worldcom and others during the internet bubble. in those days, it was popular to claim that the network was becoming truely ubiquitous and incomprehensibly fast. for instance: http://www-128.ibm.com/developerworks/grid/library/gr-heritage/#N100A6 I don't know about you, but in the 6 years since then, my home net connection has stayed the same speed, possibly a bit more expensive. desktop/LANs are still mostly at 100bT, with 1000bT in limited use. I do notice that grabbing large files off the net (ftp, RPMs, etc) often runs at O(MBps) which is about a 10x improvement over the past 10-15 years. so the doubling time turns out to be more like 3 years rather than 9 months. in-cluster networking has improved somewhat faster, but not dramatically so. From atp at piskorski.com Thu Feb 1 07:54:41 2007 From: atp at piskorski.com (Andrew Piskorski) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] clusters in gaming In-Reply-To: <20070131164304.GB21677@leitl.org> References: <20070131164304.GB21677@leitl.org> Message-ID: <20070201155441.GA46052@tehun.pair.com> On Wed, Jan 31, 2007 at 05:43:04PM +0100, Eugen Leitl wrote: > I've been looking at Second Life recently, which does most > things server-side (in fact, running a distributed world > with game physics) unlike games like WoW, where the intelligence Why? Is there some compelling underlying reason they can't make use of all those desktop cycles like other massively multiplayer games do? > What I didn't like is that most of the game is purportedly > based on a byte-compiled language, with some long-term plans What language? Some ad-hoc thing of their own? > to switch to .Net (Mono, actually), which should result in > much improved performance. Current performance is > rather ridiculous, even high-priority simulations like > private islands only tolerate few 10 avatars before severe > performance degradation, and even crashes. > Can things be compiled in realtime by passing code snippets > in conventional compiled languages, or is this always limited Well, sure, I think that's been done, although I don't know if anyone's using it for real in a production setting. Here are a few links to related subjects - tcc, CriTcl, and LuaJIT: http://fabrice.bellard.free.fr/tcc/ http://wiki.tcl.tk/2523 http://luajit.luaforge.net/luajit.html But why would you think that just-in-time compilation of C or the like would be central in fixing Second Life's performance problems, rather than just doing a better job of software engineering in general? I know nothing about Second Life, but from your description, if they're looking to change programming languages, Erlang (or something like it) might be the best fit. -- Andrew Piskorski http://www.piskorski.com/ From peter.st.john at gmail.com Thu Feb 1 08:25:12 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: Moore's Law (which has grown in scope since Moore) applies to the aggregate effect of many technologies. Individual techs proceed in fits and starts. Predictions about FLOPS/dollar seem to be sustainable, but e.g. I predict a jump in chip density when the price point of vapor deposition manufactured diamond gets low enough (diamond conducts heat way better than silicon, and chips are suffering from thermodynamics limits). When AT&T divested, you could not get a decent telephone anymore; they were too expensive to make so well. Then after years of crummy phones, suddenly everyone had a cell-phone just like Captain Kirk's. Sure I want fiber optics to my house. But maybe the power company will carry data on the wasted bandwidth of power lines. Keep the faith :-) Peter On 2/1/07, Mark Hahn wrote: > > > Not true. Distributed computing is more and more mainstream. I think > too > > oh, one other snide comment about grid: I suspect the grid-fad could not > have happened without the fraud perpetrated by worldcom and others during > the internet bubble. in those days, it was popular to claim that the > network > was becoming truely ubiquitous and incomprehensibly fast. for instance: > > http://www-128.ibm.com/developerworks/grid/library/gr-heritage/#N100A6 > > I don't know about you, but in the 6 years since then, my home net > connection has stayed the same speed, possibly a bit more expensive. > desktop/LANs are still mostly at 100bT, with 1000bT in limited use. > I do notice that grabbing large files off the net (ftp, RPMs, etc) > often runs at O(MBps) which is about a 10x improvement over the past > 10-15 years. so the doubling time turns out to be more like 3 years > rather than 9 months. in-cluster networking has improved somewhat > faster, but not dramatically so. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070201/475922cd/attachment.html From hahn at mcmaster.ca Thu Feb 1 08:50:50 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: > Moore's Law (which has grown in scope since Moore) applies to the aggregate > effect of many technologies. Individual techs proceed in fits and starts. well, specifically it applies to fields there the primary metric is a function of density. for instance, disk capacity is on an exponential, since it's a product of in-track and inter-track density. just like chips, where each linear shrink of 1/sqrt(2) leads to a doubling of devices in the same area. in both cases, these curves are sometimes strongly modulated by "quantum" shifts in the technology (perhaps multi-gate transistors, or the succeeding generations of disk heads - perhaps patterned media upcoming.) in networking, I see generational shifts, but no area-driven exponential. so I think the application of moore's law to networking is mistaken... > Predictions about FLOPS/dollar seem to be sustainable, but e.g. I predict a > jump in chip density when the price point of vapor deposition manufactured > diamond gets low enough (diamond conducts heat way better than silicon, and > chips are suffering from thermodynamics limits). excellent example of a generational shift, rather than part of the relentless sequence of shrinks. (I guess you could argue that there are generational aspects to the shrink/area thing too, since, for instance, visible-optical gave way to UV and presumably eventually immersion litho. or maybe it'll be imprint litho next.) > When AT&T divested, you could not get a decent telephone anymore; they were > too expensive to make so well. Then after years of crummy phones, suddenly > everyone had a cell-phone just like Captain Kirk's. I guess that's more of an economic network effect. but am I alone in thinking that cellphones are one of the suckiest products on the market? (the phones themselves are OK; it's the bundling and customer-screwage I'm not fond of. imagine if your phone was an ipv6 device and contained an agent that simply negotiated quality*byte rates with whatever connectivity supplier happend to have good signal strength locally...) > Sure I want fiber optics to my house. But maybe the power company will carry > data on the wasted bandwidth of power lines. Keep the faith :-) call me an unrealistic idealist, but I'm hoping for wimax-like stuff (perhaps with some nice subversive/grassroots mesh routing) to eliminate the incredibly annoying cell monopolies. regards, mark. From jlb17 at duke.edu Thu Feb 1 09:15:57 2007 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: On Thu, 1 Feb 2007 at 11:50am, Mark Hahn wrote > but am I alone in > thinking that cellphones are one of the suckiest products on the market? No. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From peter.st.john at gmail.com Thu Feb 1 09:34:12 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: Mark, On 2/1/07, Mark Hahn wrote: > > > Moore's Law ... (I guess you could argue that there are > generational aspects to the shrink/area thing too, since, for instance, > visible-optical gave way to UV and presumably eventually immersion litho. > or maybe it'll be imprint litho next.) Yeah, I'm thinking of the smooth curve (to which we can apply cubic splines) is the combined effect of many discrete step-funcitons. I guess that's more of an economic network effect. but am I alone in > thinking that cellphones are one of the suckiest products on the market? > (the phones themselves are OK; it's the bundling and customer-screwage > I'm not fond of. Yes indeed; cell phones cool, cell phone comanies less so. Voice over IP ought to be free by now :-) > Sure I want fiber optics to my house. But maybe the power company will > carry > > data on the wasted bandwidth of power lines. Keep the faith :-) > > call me an unrealistic idealist, but I'm hoping for wimax-like stuff > (perhaps with some nice subversive/grassroots mesh routing) to eliminate > the incredibly annoying cell monopolies. Me too. I want a small laser on my rooftop, with prisms splitting to receivers on the roofs of two or four neighbors, with a uucp type friendly free protocol. I guess they should be MASERs but I'm no physicist. regards, mark. regards, Pete. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070201/713ac333/attachment.html From jmiguel at hpcc-usa.org Thu Feb 1 08:36:50 2007 From: jmiguel at hpcc-usa.org (John Miguel) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] HPCC'07 Government Supercomputer Conference April 3-5, 2007 - Please Post Message-ID: The National High Performance Computing and Communications Council will hold its 21st annual Computing and Information Technology conference April 3-5, 2007 at the Hyatt Regency Hotel and Spa in Newport, RI. Over the years, this high level conference has become known for its content, prominent speakers and networking, not commercial glitter and hype. ? The Council was established pursuant to the authority of and in accordance with the desires of the President of the United States as expressed in a White House memorandum to the heads of Departments and Agencies, dated 28 June 1966, and in the instructions of the 89th Congress as expressed in the summary of H.R. 4845. Its mission is education and training in computer and information technology for the public sector. ? The audience consists of Government, Industry and University CIOs, CTOs, CEOs, Technology and Business decision makers, IT and IRM Directors, System Managers, Department Heads, Computer Scientists, Computer Security Officers. ? Attendance is limited to 200 and all attendees receive the Government hotel room rate. Sample conference evaluation : "Hotel Awesome, Speakers Outstanding, Chocolate Fondue ... Priceless." ? Conference information is available at: WWW.HPCC-USA.ORG, or by phone at 401-624-1732. John Miguel Ph. D. President National HPCC Council 480-895-1326 ? Tentative Program for HPCC?07 January 1, 2007 HPCC?07 WWW.HPCC-USA.ORG 401-624-1723 April 3-5, 2007 Hyatt Hotel and Resort Newport, RI Theme: ?Supercomputing: Innovation, Imagination and Application? Tuesday April 3, 2007 Chairman: Steve Louis, Lawrence Livermore National Laboratory 9:00 Dr. Stephen R. Wheat Executive Director HPC Platform Organization, Intel Corporation 9:45 Dr. William Harrod (invited) High Productivity Program Office Defense Research Projects Agency ?High Productivity Computing Program? 10:30 AM Break 11:00 Dr. Douglas B. Kothe Oak Ridge National Laboratory 11:45 Dr. Andrew B. White Jr. Deputy Associate Director, Theory, Simulation and Computing Los Alamos National Laboratory ?Project Road Runner: Petaflop Computing? 12:30 Lunch 2:00 Dr. Daniel E. Atkins Director, Office of Cyberinfrastructure, National Science Foundation 2:45 Dr. Kelvin K. Droegemeier (invited) Associate Vice President for Research University of Oklahoma 3:30 PM Break 4:00 Panel Discussion ?Supercomputing: An Over the Horizon View? Dr. Kelley B. Gaither, Moderator Associate Director, TACC, University of Texas Dr. Jose Munoz, NSF Dr. Douglass E. Post, DOD HPC Modernization Office Dr. Karl Schulz, TACC, UTexas Dr. Walter Brooks, NASA Dr. Robert Graybill, USC, ISI 5:30 Networking Reception 7:00 Birds of a Feather Break Out Sessions Wednesday April 4, 2007 Chairman: Stephen Schneller, NUWC/DOD HPC MOD Office 9:00 Debra Goldfarb CEO, Tabor Communications 9:45 ?Tools For Debugging Multicore Applications? Chris Gottbrath, Product Manage Etnus, LLC 10:30 AM Break 11:00 Dr. Georges E. Karniadakis Brown University ?HPC in Medicine: The Computational Man? 11:45 Dr. John E. West Director, HPC Major Shared Resource Center U. S. Army Engineer Research and Development Center 12:30 Lunch 2:00 Dr. Charles Romine (invited) Director, National Coordination Office, Networking, IT R&D ?National Plans, Programs and Initiatives? 2:45 Microsoft ?Scaling Out Excel on Windows CCS Clusters? 3:30 PM Break 4:00 Panel Discussion ?Future Requirements for Storage and Backup? Dr. Robert Chadduck, National Archives & Records Administration Dr. Robert Ross, Argonne National Laboratory Ellen M. Salmon, NASA John L. Cole, Army Research Laboratory Lee West, Sandia National Laboratory Joshua Lubell, National Institute of Standards and Technology 6:30 Reception and Dinner Thursday April 5, 2007 8:30 Complimentary Full Breakfast 9:00 Director Data Center Technology American Power Conversion ?21st Century Data Center Design? 10:30 High-End Computing Market Trends Dan Little, CTO High-End Computing Market Services 11:00 Conference Close -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4537 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070201/780c9e7d/attachment.bin From mikhailberis at gmail.com Thu Feb 1 09:42:53 2007 From: mikhailberis at gmail.com (Dean Michael Berris) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] clusters in gaming In-Reply-To: <20070131164304.GB21677@leitl.org> References: <20070131164304.GB21677@leitl.org> Message-ID: <45C2269D.1090202@gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Eugen Leitl wrote: > > While I do see what a usual C/C++ MPI approach wouldn't > be probably enough for a highly dynamic and flexible virtual > environment, the result still strikes me as inelegant, > and killing architectural deficiences by throwing enough > hardware at it (not necessary always wrong, mark, just > not in this case). > I don't see why a usual C/C++ MPI approach wouldn't work, though the scaling issues of adding a new node to the cluster is certainly a problem that may be a hindrance from the implementation -- but one that can be remedied by having local clusters "gridded" together using some protocol. As for throwing hardware at it, I don't think that's a problem -- that's actually a good solution. That being said, if the implementation was already good to start with then adding more hardware would have (supposedly) better effect on the overall performance/experience. > Can things be compiled in realtime by passing code snippets > in conventional compiled languages, or is this always limited > to highly dynamic environments like Smalltalk (which OpenCroquet > is based on) or Lisp (with sbcl and cmucl there are now great > compilers for Lisp, though I don't know about MPI support)? > Short answer is yes: it can even be done in C++. However what I would rather use in these situations would be a dynamic language like as you mention Lisp or things like Python (embedded in C++ or the other way around, see Boost.Python). I think it's an architecture problem more than anything as far as the SL server side is concerned. But then when you're faced with a problem like full-3D physics engine in the server side, that's not something "as easy as Hello, World" to implement (or fix for the matter). Though it certainly is not "impossible", "hard" would be an understatement especially now that it's in full-deployment with thousands of people getting on it at any given time. - -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFwiadBGP1EujvLlsRAqm0AJ4poLgPs0dFqGSFfoNLn5qhe3h7sgCgrIoB sbwpSOkwDAlEWHBnbxbz/Vc= =sdze -----END PGP SIGNATURE----- From eugen at leitl.org Thu Feb 1 12:19:59 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] clusters in gaming In-Reply-To: <45C2269D.1090202@gmail.com> References: <20070131164304.GB21677@leitl.org> <45C2269D.1090202@gmail.com> Message-ID: <20070201201959.GF21677@leitl.org> On Fri, Feb 02, 2007 at 01:42:53AM +0800, Dean Michael Berris wrote: > I don't see why a usual C/C++ MPI approach wouldn't work, though the In theory, there is no reason why these (or even Fortran) wouldn't be adequate either, but in practice it would be very difficult to accomodate user-contributed scripted objects into a rigid array/pointer framework. Adding new methods in C to a brand new object instantiated at runtime is certainly possible, but it sounds intensely painful. >From the point of view of running a massively parallel realtime (fake) physics simulation with many 10 k simultaneous viewers/ points of input it looks as if requires a massive numerical performance, which suggests C (less C++). Common Lisp now has very good compilers, but I wonder how well that translates into numerics, and similiar to C++ the unwary programmer can produce very slow code (CONSing, GC, etc). > scaling issues of adding a new node to the cluster is certainly a There are two types of regions, isolated islands, and addition to the main "continent". Both look quite suitable for geometric problem tesselation (one node, one region) and incremental node addition as the terrain grows. > problem that may be a hindrance from the implementation -- but one that > can be remedied by having local clusters "gridded" together using some > protocol. As far as I know SL is run on one local cluster, which is why I thought of how a Beowulf approach would help. They're complicating it by using virtual machines, and packing several VMs on one physical server. After (frequent) upgrades servers are restarted in a rolling fashion, and I presume snapshot/resume migration is a useful function here. But then, there are cluster-wide process migration available, which are a lot more fine-grained. > As for throwing hardware at it, I don't think that's a problem -- that's > actually a good solution. That being said, if the implementation was I thought the cluster had some 1000 nodes, but http://gwynethllewelyn.net/article119visual1layout1.html claims there are just 5000 virtual servers. Maybe they just run 5 VServers/node, and there are really 1 kNodes, which is a reasonably large cluster for just 16 kUsers at peak (not for your garden-variety Beowulf, but for a game server). > already good to start with then adding more hardware would have > (supposedly) better effect on the overall performance/experience. It would be really interesting to learn how current SL scales. > Short answer is yes: it can even be done in C++. However what I would > rather use in these situations would be a dynamic language like as you > mention Lisp or things like Python (embedded in C++ or the other way > around, see Boost.Python). Thanks for the link. > I think it's an architecture problem more than anything as far as the SL > server side is concerned. But then when you're faced with a problem like > full-3D physics engine in the server side, that's not something "as easy > as Hello, World" to implement (or fix for the matter). OpenCroquet uses a deterministic computation model, which replicates worlds to the end unser nodes a la P2P, and synchronizes differing inputs so that each simulation instance doesn't diverge. It can also do a master/slave type of state replication, if I understand it correctly, so in theory it could use physics accelerators, and clone state to slower nodes. SL in comparison does about anything but primitive rendering cluster-side. Given current assymetric broadband, this seems to be a superior approach to do everything P2P. (And I would imagine OpenCroquet hasn't even begun to deal with the nasty problem of NAT penetration). > Though it certainly is not "impossible", "hard" would be an > understatement especially now that it's in full-deployment with > thousands of people getting on it at any given time. It's really interesting. I wish there was more information flow out of Linden Labs, on how they're doing it. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From rgb at phy.duke.edu Thu Feb 1 12:26:17 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: On Thu, 1 Feb 2007, Peter St. John wrote: > Moore's Law (which has grown in scope since Moore) applies to the aggregate > effect of many technologies. Individual techs proceed in fits and starts. > Predictions about FLOPS/dollar seem to be sustainable, but e.g. I predict a > jump in chip density when the price point of vapor deposition manufactured > diamond gets low enough (diamond conducts heat way better than silicon, and > chips are suffering from thermodynamics limits). > > When AT&T divested, you could not get a decent telephone anymore; they were > too expensive to make so well. Then after years of crummy phones, suddenly > everyone had a cell-phone just like Captain Kirk's. > > Sure I want fiber optics to my house. But maybe the power company will carry > data on the wasted bandwidth of power lines. Keep the faith :-) I'm not certain and am too lazy to plot it out and check, but it seems to me that communications has consistently lagged computation in the time constant used in a Moore's-type law. For CPUs it has been a fairly predictable 18-20 month doubling time (at constant cost, connected to the doubling time of VLSI for a long time but now more complex), which means a factor of ten takes somewhere around five or six years to accomplish. That's three orders of magnitude in 15 to 20 years. It took those same twenty years to go from 10 Mbps to 1 Gbps ethernet, only two orders of magnitude, at anything like constant cost. Most things I've read on the subject suggest that if anything the CPU/Communications gap is widening, forcing systems designers to use methodology developed for clusters and cluster communications even within a system (e.g. Hypertransport). Also, phone companies ARE gradually laying fiber everywhere, and while they may or may not take it right up to your house they'll certainly take it to your neighborhood, and maybe only "finish off" with copper. It's just that installing fiber is expensive, and takes time, and customers won't pay much of a premium for it. They "have" to do it anyway to compete with e.g. cable, and they are all doubtless running scared in front of the possibility that nobody will own non-cell phones anymore in a year or five so that either they are in a position to deliver streaming media to the home in competition with the cable company or they all belly right up in that market. A bit of a race, in other words, where they are ahead and behind at the same time. It won't be done for computer users, though. Not enough money in it, and what there is is already developed. Delivering entertainment, on the other hand -- there aren't any visible upper bounds on what one use there. If you treble the bandwidth, you just make HDTV cheaper and permit more stations and make it more feasible to deliver movies on demands in real time -- bleep through 4-5 GB in 1 minute or two, then display it at your liesure... rgb > > Peter > > > On 2/1/07, Mark Hahn wrote: >> >> > Not true. Distributed computing is more and more mainstream. I think >> too >> >> oh, one other snide comment about grid: I suspect the grid-fad could not >> have happened without the fraud perpetrated by worldcom and others during >> the internet bubble. in those days, it was popular to claim that the >> network >> was becoming truely ubiquitous and incomprehensibly fast. for instance: >> >> http://www-128.ibm.com/developerworks/grid/library/gr-heritage/#N100A6 >> >> I don't know about you, but in the 6 years since then, my home net >> connection has stayed the same speed, possibly a bit more expensive. >> desktop/LANs are still mostly at 100bT, with 1000bT in limited use. >> I do notice that grabbing large files off the net (ftp, RPMs, etc) >> often runs at O(MBps) which is about a 10x improvement over the past >> 10-15 years. so the doubling time turns out to be more like 3 years >> rather than 9 months. in-cluster networking has improved somewhat >> faster, but not dramatically so. >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Thu Feb 1 12:28:20 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: On Thu, 1 Feb 2007, Peter St. John wrote: > Me too. I want a small laser on my rooftop, with prisms splitting to > receivers on the roofs of two or four neighbors, with a uucp type friendly > free protocol. I guess they should be MASERs but I'm no physicist. Oh, just chop up your microwave oven, line up an old umbrella lined with foil, and beam away. Just keep your head out from in front and don't let your children or pets anywhere near it...;-) rgb > > regards, mark. > > > regards, Pete. > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From coutinho at dcc.ufmg.br Thu Feb 1 12:09:26 2007 From: coutinho at dcc.ufmg.br (Bruno Rocha Coutinho) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] failure rates Message-ID: <45C248F6.6080807@dcc.ufmg.br> Most fault-tolerance literature assume that system components have exponential failure rates. But software sometimes don't have exponential failure rates if the cause of the failure is related to a timer, a overflow or resource leaks. In this case failure rate could be fixed and you end with all process failing at the same time. I think that is safe to assume exponential failure rates for hardware and in spite of most machine crashes today are OS (not hardware) related, most people assume that OSs are well behaved and don't suffer of fixed rate failures. 2007/1/30, enver ever : Hello there I am a PhD student working on mathematical looking to the availability of Beowulf clusters. I was looking whether or not it is possible to take exponential failure rates fot the nodes. Thats the case in these publications: 1- "A Realistic Evaluation of Consistency Algorithms for Replicated Files"Annual Simulation Symposium archive Proceedings of the 21st annual symposium on Simulation table of contents Tampa, Florida, United States Pages: 121 - 130 Year of Publication: 1988 ISBN:0-8186-0845-5 2-"Availability Modeling and Analysis on High Performance ClusterComputing Systems"Availability, Reliability and Security, 2006. ARES 2006. The First International Conference on Publication Date: 20-22 April 2006 3-"A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster" Chokchai Leangsuksun1, Tong Liu1, Tirumala Rao1, Stephen L. Scott2, and Richard Libby Linux.com | LCI 5th International Linux Cluster Conference. I think it can be taken as exponentially distributed since in many multi-server systems this was the approach followed. I would appreciate if you could add any comments Many Regards _________________________________________________________________ MSN Hotmail is evolving ? check out the new Windows Live Mail http://ideas.live.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mikhailberis at gmail.com Thu Feb 1 13:13:48 2007 From: mikhailberis at gmail.com (Dean Michael Berris) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] clusters in gaming In-Reply-To: <20070201201959.GF21677@leitl.org> References: <20070131164304.GB21677@leitl.org> <45C2269D.1090202@gmail.com> <20070201201959.GF21677@leitl.org> Message-ID: <45C2580C.5040801@gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Eugen Leitl wrote: > On Fri, Feb 02, 2007 at 01:42:53AM +0800, Dean Michael Berris wrote: > >> I don't see why a usual C/C++ MPI approach wouldn't work, though the > > In theory, there is no reason why these (or even Fortran) wouldn't be adequate either, > but in practice it would be very difficult to accomodate user-contributed > scripted objects into a rigid array/pointer framework. Adding new > methods in C to a brand new object instantiated at runtime is certainly > possible, but it sounds intensely painful. > Actually I was thinking more of having just primitive operations being implemented as either free functions or functors in C++, and having a chaining approach to making more complex functors. The idea is that once complex operations can be "generated" and "(de)serialized", new stuff is apparently just a combination of old/primitive stuff. >>From the point of view of running a massively parallel realtime > (fake) physics simulation with many 10 k simultaneous viewers/ > points of input it looks as if requires a massive numerical > performance, which suggests C (less C++). Common Lisp now has > very good compilers, but I wonder how well that translates > into numerics, and similiar to C++ the unwary programmer can > produce very slow code (CONSing, GC, etc). > With the advances in C++ optimizing compilers and using modern C++ programming approaches (template metaprogramming, policy-driven programming, lazy-functional programming, etc.) there's a very good chance that a lot of the "slow code" can be avoided. But of course, there has to be a conscious effort to profile->benchmark->optimize C++ code which can only be done if you had 1) time and 2) resources at hand. But seeing how much money's being put into SL right now, I think it's just a matter of time before the resources will be available. :) >> scaling issues of adding a new node to the cluster is certainly a > > There are two types of regions, isolated islands, and addition to the > main "continent". Both look quite suitable for geometric problem tesselation > (one node, one region) and incremental node addition as the terrain > grows. > Sounds simple, but now that leads to non-optimal resource allocation. If it was made that one node was allocated to one island, then you run into scaling problems when you have very high traffic regions. That's why an architectural solution should be found, because mapping regions to nodes 1-1 doesn't seem to work: because if you have 1000 regions 1:1 to nodes and 20k people in one region, what are the 999 nodes going to do? >> problem that may be a hindrance from the implementation -- but one that >> can be remedied by having local clusters "gridded" together using some >> protocol. > > As far as I know SL is run on one local cluster, which is why I thought > of how a Beowulf approach would help. They're complicating it by using > virtual machines, and packing several VMs on one physical server. > After (frequent) upgrades servers are restarted in a rolling fashion, > and I presume snapshot/resume migration is a useful function here. > But then, there are cluster-wide process migration available, > which are a lot more fine-grained. > I don't have this information available, though it would be interesting to note how this would really work. As early as now, they're encountering scalability problems having hundreds of people packed into a region. Apparently it does work, because people can still (somehow) bear with the performance degradation in these areas. >> As for throwing hardware at it, I don't think that's a problem -- that's >> actually a good solution. That being said, if the implementation was > > I thought the cluster had some 1000 nodes, but > http://gwynethllewelyn.net/article119visual1layout1.html > claims there are just 5000 virtual servers. Maybe they > just run 5 VServers/node, and there are really 1 kNodes, > which is a reasonably large cluster for just 16 kUsers > at peak (not for your garden-variety Beowulf, but > for a game server). > But the problem is, the physics in areas where there are a lot of objects is still performed all in the cluster. So adding more people and more objects will overload the physics engine on their end, and at 16kUsers at peak, can definitely overload certain nodes allocated for certain regions. But then I don't have any idea how they have it coded or implemented, so I can only speculate. >> already good to start with then adding more hardware would have >> (supposedly) better effect on the overall performance/experience. > > It would be really interesting to learn how current SL scales. > I'll look forward to reading something about that. >> I think it's an architecture problem more than anything as far as the SL >> server side is concerned. But then when you're faced with a problem like >> full-3D physics engine in the server side, that's not something "as easy >> as Hello, World" to implement (or fix for the matter). > > OpenCroquet uses a deterministic computation model, which replicates > worlds to the end unser nodes a la P2P, and synchronizes differing inputs > so that each simulation instance doesn't diverge. It can also do a master/slave > type of state replication, if I understand it correctly, so in > theory it could use physics accelerators, and clone state to slower > nodes. SL in comparison does about anything but primitive rendering > cluster-side. Given current assymetric broadband, this seems > to be a superior approach to do everything P2P. (And I would imagine > OpenCroquet hasn't even begun to deal with the nasty problem of NAT > penetration). > Doing everything server side is a good idea, especially for giving better client-side experience IF the server can handle it. Apparently, SL on the server side is hitting the limits that their architecture have either explicitly or implicitly defined. Sounds still like the architecture might need more help to improve current performance. >> Though it certainly is not "impossible", "hard" would be an >> understatement especially now that it's in full-deployment with >> thousands of people getting on it at any given time. > > It's really interesting. I wish there was more information flow out > of Linden Labs, on how they're doing it. > I wish the same too... They've open sourced the viewer, I just hope they open source the server too. - -- Dean Michael C. Berris http://cplusplus-soup.blogspot.com/ mikhailberis AT gmail DOT com +63 928 7291459 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFwlgMBGP1EujvLlsRAj6ZAKCzSSXKrGU2RaKeTDhB/Tf3vgLKfwCfWszt nrL+cl7CvnRMaSm2QWQg6Tk= =owi6 -----END PGP SIGNATURE----- From James.P.Lux at jpl.nasa.gov Thu Feb 1 13:43:34 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> At 07:40 AM 2/1/2007, Mark Hahn wrote: >>Not true. Distributed computing is more and more mainstream. I think too > >oh, one other snide comment about grid: I suspect the grid-fad could >not have happened without the fraud perpetrated by worldcom and others during >the internet bubble. in those days, it was popular to claim that the network >was becoming truely ubiquitous and incomprehensibly fast. for instance: > >http://www-128.ibm.com/developerworks/grid/library/gr-heritage/#N100A6 In the long run, ubiquitous and fast IS going to be true (however, latency is something you can't get around... speed of light and all that). As long ago as 1993, I was at a conference where a speaker from AT&T commented that historical telecom pricing methods (longer distances cost more) were obsolete, since the dominant cost was in the termination, with, even then, a gross oversupply of fiber across the Atlantic. Hence the availability of cheap flat rate long distance (5c a minute anywhere, anytime).. the bulk of the system is no longer capacity limited. >I don't know about you, but in the 6 years since then, my home net >connection has stayed the same speed, possibly a bit more expensive. Interestingly, they've just rolled out FiOS (fiber to the home) in my area, which is a HUGE jump in potential bandwidth from the existing DSL or Cable Modem delivery methods. And, moderately competitive in price (5 Mbps is $40/month, including the bundled ISP kinds of features). What's fascinating is the faster tiers.. you can get 15 Mbps down/2 up for $50/mo and 30 M down/5 up for $180 Granted, these are consumer offerings and have all the usual network congestion caveats, but hey, at least they are offering 30 Mbps for the last mile, which is quite impressive. >desktop/LANs are still mostly at 100bT, with 1000bT in limited use. But that's more driven by replacement cycles and the lack of real demand for faster speeds to the desktop. If your facility has a 1.5 Mbps pipe to the internet, giving users a 1 Gb/s won't change their performance much compared to 100 Mb/s. There's also a wiring infrastructure issue. While desktops are typically replaced on a 3 year cycle, the wiring infrastructure cycles through a bit slower, especially in smaller businesses and residential (that is, I'm not likely to start ripping out the drywall to replace the Cat 5 wiring I put in back in 1998)... and frankly, since right now, I have maybe 700 kbps at home to the internet (one way), and then a wireless connection from laptop to home network, there's not much to be gained by improving the home wiring infrastructure. (If I go with the FiOS offering though, that may prompt some re-evaluation) Likewise, a small business with half a dozen or a dozen desktops and a couple servers isn't going to see a huge benefit from faster networking, because they're throttled by the server's disk speed, more than anything else. (assuming they're not hosting a big website, etc.) So, you're looking at GigE making a difference in two areas: replacing cable TV (all those 20 Mbps HDTV streams) and in big companies. But even in big companies, GigE to the desktop doesn't necessarily buy you much, if you're all competing for the same server resources. >I do notice that grabbing large files off the net (ftp, RPMs, etc) >often runs at O(MBps) which is about a 10x improvement over the past >10-15 years. so the doubling time turns out to be more like 3 years >rather than 9 months. Which is probably consistent with equipment refurbishment cycles. > in-cluster networking has improved somewhat faster, but not > dramatically so. >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From hahn at mcmaster.ca Thu Feb 1 15:25:16 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> Message-ID: >> the internet bubble. in those days, it was popular to claim that the >> network >> was becoming truely ubiquitous and incomprehensibly fast. for instance: > > In the long run, ubiquitous and fast IS going to be true (however, latency is in the long run, everything is true ;) > gross oversupply of fiber across the Atlantic. Hence the availability of > cheap flat rate long distance (5c a minute anywhere, anytime).. the bulk of > the system is no longer capacity limited. interesting - I assumed that long-distance became cheap not due to oversupply of fiber and bandwidth, but rather transition away from old-fashioned circuit switching (ie, towards digital compressed voice over packets.) I know that buying fiber/lambdas/bandwidth is still very much not what I'd call cheap, though I have no doubt it's much better/cheaper than in the past. >> I don't know about you, but in the 6 years since then, my home net >> connection has stayed the same speed, possibly a bit more expensive. > > Interestingly, they've just rolled out FiOS (fiber to the home) in my area, > which is a HUGE jump in potential bandwidth from the existing DSL or Cable > Modem delivery methods. And, moderately competitive in price (5 Mbps is > $40/month, including the bundled ISP kinds of features). What's fascinating > is the faster tiers.. you can get 15 Mbps down/2 up for $50/mo and 30 M > down/5 up for $180 seems strange to me - what kind of residential customer would pay for that kind of thing (and remain free of the RIAA/MPAA)? some smart form of wireless seems like an obvious good solution for residential last-mile. maybe that's a disruptive innovation that will finally put the telco/cableco's out of their misery. > Likewise, a small business with half a dozen or a dozen desktops and a couple > servers isn't going to see a huge benefit from faster networking, because > they're throttled by the server's disk speed, more than anything else. if their servers disks are only 100bT speed, they're broken. it may well be that most SMB servers are that crappy, in spite of the fact that a recycled linux box and one disk will deliver 40 MB/s... > So, you're looking at GigE making a difference in two areas: replacing cable > TV (all those 20 Mbps HDTV streams) how many 20Mb streams does a typical endpoint need? either residential or commercial? > and in big companies. But even in big > companies, GigE to the desktop doesn't necessarily buy you much, if you're > all competing for the same server resources. wow, dim view of the competence of server admins, but you may be right... regards, mark hahn. From James.P.Lux at jpl.nasa.gov Thu Feb 1 16:00:18 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: <6.2.3.4.2.20070201152945.030e8c08@mail.jpl.nasa.gov> At 08:25 AM 2/1/2007, Peter St. John wrote: >Moore's Law (which has grown in scope since Moore) applies to the >aggregate effect of many technologies. Individual techs proceed in >fits and starts. Predictions about FLOPS/dollar seem to be >sustainable, but e.g. I predict a jump in chip density when the >price point of vapor deposition manufactured diamond gets low enough >(diamond conducts heat way better than silicon, and chips are >suffering from thermodynamics limits). > >When AT&T divested, you could not get a decent telephone anymore; >they were too expensive to make so well. Then after years of crummy >phones, suddenly everyone had a cell-phone just like Captain Kirk's. > >Sure I want fiber optics to my house. But maybe the power company >will carry data on the wasted bandwidth of power lines. What wasted bandwidth on power lines? Wires of random composition and topology, some over 100 years old, strung hither and yon, above and below ground doesn't sound like a particularly good propagation medium for wideband signals. Sure, signal processing and adaptive processing can do some good, but it's still a shared medium (i.e. that same power line that serves you also serves 8 of your neighbors). Twisted pairs of wires, coaxial cable, optical waveguides.. that's a consistent broadband propagation medium. Data over powerlines might be useful for time of use electricity metering, etc... Jim, W6RMK From James.P.Lux at jpl.nasa.gov Thu Feb 1 16:24:13 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> Message-ID: <6.2.3.4.2.20070201161644.03125d90@mail.jpl.nasa.gov> At 12:26 PM 2/1/2007, Robert G. Brown wrote: >On Thu, 1 Feb 2007, Peter St. John wrote: > > >Also, phone companies ARE gradually laying fiber everywhere, and while >they may or may not take it right up to your house they'll certainly >take it to your neighborhood, and maybe only "finish off" with copper. >It's just that installing fiber is expensive, About $900 per house, where I live, according to some acquaintances in the telco. >and takes time, and >customers won't pay much of a premium for it. They "have" to do it >anyway to compete with e.g. cable, and they are all doubtless running >scared in front of the possibility that nobody will own non-cell phones >anymore in a year or five so that either they are in a position to >deliver streaming media to the home in competition with the cable >company or they all belly right up in that market. A bit of a race, in >other words, where they are ahead and behind at the same time. The term of art is "triple play"... phone, entertainment, internet access all from one provider. >It won't be done for computer users, though. Not enough money in it, >and what there is is already developed. Delivering entertainment, on >the other hand -- there aren't any visible upper bounds on what one use >there. If you treble the bandwidth, you just make HDTV cheaper and >permit more stations and make it more feasible to deliver movies on >demands in real time -- bleep through 4-5 GB in 1 minute or two, then >display it at your liesure... Subject to a raft of content management requirements (maybe I don't want you fast forwarding through commercials? Maybe I want to charge you "per viewing" The big question/challenge in that business is how do you monetize individual uses of something that has previously been consumed as a utility stream e.g. rather than broadcasting a program for all, or none, to view, and charging advertisers by using statistical measures (Nielsen ratings), can I actually measure the viewership (with demographic breakdowns) and charge on that basis.. Yes, Mr. Vendor, 354,313.5 people watched your commercial, of which 516 were in your target demographic.... Or, rather than charging you $10/month for HBO, and you can watch that movie as many times as you want, we can charge you only $0.99 per viewing of Movie #A (so the we can pay studio X their fee) and $0.89 for a viewing of Movie #B (because Studio Y didn't give as many gross points to their star, so they can discount it) Jim From James.P.Lux at jpl.nasa.gov Thu Feb 1 16:56:12 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> Message-ID: <6.2.3.4.2.20070201162503.02d3be80@mail.jpl.nasa.gov> At 03:25 PM 2/1/2007, Mark Hahn wrote: >>>the internet bubble. in those days, it was popular to claim that >>>the network >>>was becoming truely ubiquitous and incomprehensibly fast. for instance: >> >>In the long run, ubiquitous and fast IS going to be true (however, latency is > >in the long run, everything is true ;) > >>gross oversupply of fiber across the Atlantic. Hence the >>availability of cheap flat rate long distance (5c a minute >>anywhere, anytime).. the bulk of the system is no longer capacity limited. > >interesting - I assumed that long-distance became cheap not due to >oversupply of fiber and bandwidth, but rather transition away from >old-fashioned circuit switching (ie, towards digital compressed voice >over packets.) Not much compression going on for voice traffic. It's carried as 64 kbps data at 8 ksamples/second, pretty much. There is some statistical multiplexing possible (TASI) because people don't talk at 100% duty cycle, but not a huge amount. >I know that buying fiber/lambdas/bandwidth is still very much not >what I'd call cheap, though I have no doubt it's much better/cheaper >than in the past. In 1993, the capital cost of "one voice channel" worth (64 kbps) of capacity across the atlantic was less than $10, as I recall. Compare that to leased line T-1 rates back then of many dollars per eighth of a mile per month, and that's before you bought the CSU/DSU to connect to the copper. Mind you, the ATT guy thought that it would be 155 Mbps ATM to the desktop, and we see where that went. >>>I don't know about you, but in the 6 years since then, my home net >>>connection has stayed the same speed, possibly a bit more expensive. >> >>Interestingly, they've just rolled out FiOS (fiber to the home) in >>my area, which is a HUGE jump in potential bandwidth from the >>existing DSL or Cable Modem delivery methods. And, moderately >>competitive in price (5 Mbps is $40/month, including the bundled >>ISP kinds of features). What's fascinating is the faster tiers.. >>you can get 15 Mbps down/2 up for $50/mo and 30 M down/5 up for $180 > >seems strange to me - what kind of residential customer would pay >for that kind of thing (and remain free of the RIAA/MPAA)? An interesting question.. I think the upper tiers are there to complement similar offerings in the commercial/business market. Or, there IS a burgeoning market for live video feeds from adult entertainment providers. Without them, the VCR market would never have taken off. Contemplate Youtube type applications, but in HD.. 20 Mbps is the basic rate for HD. I might be interested, for instance, in seeing the Mentos and Pepsi artists in HD, rather than lowfi 15 fps QCIF. But, also, consider something like streaming audio at CD quality (not MP3 compressed).. A stereo 44.1ksps 16 bit stream is about 1.5 Mbps, and say I, my wife, and my daughters all want to listen to different programs at the same time. There will also be video that is not afflicted by MPAA. NasaTV is free to all and streamed over the network as well as being shoved out over C-band transponders. I can see using 15M+ sorts of rates in bursts for myself (downloading the aforementioned climate databases, for instance...) >some smart form of wireless seems like an obvious good solution for >residential last-mile. maybe that's a disruptive innovation that will >finally put the telco/cableco's out of their misery. Nobody has come up with a *good* wireless solution that is as cheap and reliable as pulling a physical media. There's a raft of spectrum occupancy issues, etc. Let's assume you've got a neighborhood with 400 houses in it at a density of, say, 500 square meters/house (roughly 8 houses/acre). It's perhaps 10-20 meters between houses on average. Say each house needs 50 Mbps of bandwidth (e.g. two cable channels worth). If you use a short range wireless scheme (notional range of 50 m) a given transmitter is going to cover half a dozen houses, so each transmitter would need a bandwidth of about 300 Mbps (which is fairly hefty, but not out of the question). AND there would need to be some smart switching in the system that feeds that transmitter the correct subset of the Terabits/second available... And, some way to cleverly do spectrum reuse (so that if you have houses A, B,C, D, and E lined up, A can use channel 1, B can use channel 6, C can use channel 11, and by house D, the signal for Channel 1 going to house A has faded enough that we can reuse it for D, 6 for E, etc.) This is highly nontrivial, and nobody has come up with a automagic way to do it that is efficient and self organizing. Right now, though, the Cable TV folks feed 1 GHz of bandwidth to you and YOU do the channel selection, which reduces their physical plant cost... all they need is power distribution with no intelligence, just management of SNR. (this breaks down in the upstream case, which is a fundamental problem with Cable Modems) >>Likewise, a small business with half a dozen or a dozen desktops >>and a couple servers isn't going to see a huge benefit from faster >>networking, because they're throttled by the server's disk speed, >>more than anything else. > >if their servers disks are only 100bT speed, they're broken. it may well >be that most SMB servers are that crappy, in spite of the fact that >a recycled linux box and one disk will deliver 40 MB/s... Not so much a limitation as that, as the 10 desktops aren't all going to be hitting the server at exactly the same time, most of the time. Relatively few business desktops are doing things like streaming video. They're just moving documents to and from the server, and that's a sort of bursty traffic, so its not a big deal. And 40 MB/s implies 13 Megatransfers/second across a 32 bit bus, with a 33 MHz bus, a transfer from the disk and a transfer to the NIC doesn't leave a whole lot of time for fetching instructions from RAM, etc. now, if your office is comprised of diskless clients....that's another story. >>So, you're looking at GigE making a difference in two >>areas: replacing cable TV (all those 20 Mbps HDTV streams) > >how many 20Mb streams does a typical endpoint need? either residential >or commercial? I can see at least 3 streams for residential. 1 for live viewing, 1 for recording on the TIVO, 1 for the second TV. >>and in big companies. But even in big companies, GigE to the >>desktop doesn't necessarily buy you much, if you're all competing >>for the same server resources. > >wow, dim view of the competence of server admins, but you may be right... No.. it's that network traffic from desktop to server just isn't all that high in most environments. For instance, I consume almost NO network bandwidth most of the time at work, because most of what I work with is on the local machine. Even in a high transaction rate call center, there's just not that many bytes flying back and forth. "yes, Mr. Lux, and your account number is? ...." blurp there's 100 bytes to the server in a SQL query, and maybe a kilobyte coming back. 10 seconds pass, "And you'd like the bullion delivered where?" blurp.. after 30 seconds, the operator sends the delivery address with a few hundred bytes to the transaction processor. blurp...100 bytes come back "your confirmation number is 2.71828, Thank you for calling" Then, that triggers a few more kbytes of traffic to the vault and the delivery truck company, etc. But overall, that's what, an average of 1 kb per second, at most? So the call center has 1000 people.. we're up to only 1 Megabit/second. Even if they do complete screen paints at every step over the network, it's still not that much traffic. Some sort of call center where they look at scanned images might be an example of a bigger volume user. "Yes.. I'm looking at your bearer bonds now, and we'll be able to execute that sell order for 100,000 shares MSFT." or, more realistically, "I'm looking through your loan application now and on page 32, there's a problem with the property description you submitted three years ago." or "Yes, Mr. Lux, that IS a big dent that we need to fix in your bumper" But even then, a full screen image is only a few megabytes, at most, unless you're totally profligate with uncompressed 24 bit TIFF images. The big advantage of GigE to the desktop is that when you do send big files (say a full screen image), it takes less time. But the average rate is still low. >regards, mark hahn. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From deadline at eadline.org Thu Feb 1 18:34:46 2007 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070201162503.02d3be80@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> <6.2.3.4.2.20070201162503.02d3be80@mail.jpl.nasa.gov> Message-ID: <48680.192.168.1.1.1170383686.squirrel@mail.eadline.org> --snip-- > There is some > statistical multiplexing possible (TASI) because people don't talk at > 100% duty cycle, but not a huge amount. You have not met my ... (never mind, you never know where these emails end up) -- Doug From James.P.Lux at jpl.nasa.gov Thu Feb 1 19:31:57 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <48680.192.168.1.1.1170383686.squirrel@mail.eadline.org> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> <6.2.3.4.2.20070201162503.02d3be80@mail.jpl.nasa.gov> <48680.192.168.1.1.1170383686.squirrel@mail.eadline.org> Message-ID: <6.2.3.4.2.20070201193107.02d30f48@mail.jpl.nasa.gov> At 06:34 PM 2/1/2007, Douglas Eadline wrote: > --snip-- > > > There is some > > statistical multiplexing possible (TASI) because people don't talk at > > 100% duty cycle, but not a huge amount. > >You have not met my ... > >(never mind, you never know where these emails end up) Hah.. I need 5 Mbps at home, just to keep with the traffic on this list. Talk about 100% duty cycle. Jim From gdjacobs at gmail.com Fri Feb 2 00:57:34 2007 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> Message-ID: <45C2FCFE.7020303@gmail.com> Jim Lux wrote: > (If I go with the FiOS offering though, that may prompt > some re-evaluation) Why? Only a third of the bandwidth of fast ethernet at peak speeds (which you aren't going to see). Hell, an rtl8139 could handle that. > Likewise, a small business with half a dozen or a dozen desktops and a > couple servers isn't going to see a huge benefit from faster networking, > because they're throttled by the server's disk speed, more than anything > else. (assuming they're not hosting a big website, etc.) More likely throttled by the operators. > So, you're looking at GigE making a difference in two areas: replacing > cable TV (all those 20 Mbps HDTV streams) and in big companies. But > even in big companies, GigE to the desktop doesn't necessarily buy you > much, if you're all competing for the same server resources. Certain areas, such as digital video content development, are much more accessible with high speed interconnect going commodity. However, very few companies have the concentrated, high volume databases which would really tax a network. > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 -- Geoffrey D. Jacobs From rgb at phy.duke.edu Fri Feb 2 04:29:06 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <6.2.3.4.2.20070201193107.02d30f48@mail.jpl.nasa.gov> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> <6.2.3.4.2.20070201162503.02d3be80@mail.jpl.nasa.gov> <48680.192.168.1.1.1170383686.squirrel@mail.eadline.org> <6.2.3.4.2.20070201193107.02d30f48@mail.jpl.nasa.gov> Message-ID: On Thu, 1 Feb 2007, Jim Lux wrote: > At 06:34 PM 2/1/2007, Douglas Eadline wrote: > >> --snip-- >> >> > There is some >> > statistical multiplexing possible (TASI) because people don't talk at >> > 100% duty cycle, but not a huge amount. >> >> You have not met my ... >> >> (never mind, you never know where these emails end up) > > > Hah.. I need 5 Mbps at home, just to keep with the traffic on this list. > > Talk about 100% duty cycle. Aw, c'mon, I don't type THAT fast... rgb > > > Jim > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From James.P.Lux at jpl.nasa.gov Fri Feb 2 06:19:47 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] massive parallel processing application required In-Reply-To: <45C2FCFE.7020303@gmail.com> References: <45BE8E7E.4010808@brookes.ac.uk> <45C04FE9.5050502@streamline-computing.com> <45C07AE4.1020508@brookes.ac.uk> <45C14358.8030800@brookes.ac.uk> <45C1542A.4030701@tamu.edu> <45C1DC42.90604@brookes.ac.uk> <6.2.3.4.2.20070201132817.030f61e8@mail.jpl.nasa.gov> <45C2FCFE.7020303@gmail.com> Message-ID: <6.2.3.4.2.20070202061200.030f4090@mail.jpl.nasa.gov> At 12:57 AM 2/2/2007, Geoff Jacobs wrote: >Jim Lux wrote: > > > (If I go with the FiOS offering though, that may prompt > > some re-evaluation) >Why? Only a third of the bandwidth of fast ethernet at peak speeds >(which you aren't going to see). Hell, an rtl8139 could handle that. > > > Likewise, a small business with half a dozen or a dozen desktops and a > > couple servers isn't going to see a huge benefit from faster networking, > > because they're throttled by the server's disk speed, more than anything > > else. (assuming they're not hosting a big website, etc.) >More likely throttled by the operators. The operators of the desktops, I assume. The business offerings have commited information rates, etc. > > So, you're looking at GigE making a difference in two areas: replacing > > cable TV (all those 20 Mbps HDTV streams) and in big companies. But > > even in big companies, GigE to the desktop doesn't necessarily buy you > > much, if you're all competing for the same server resources. >Certain areas, such as digital video content development, are much more >accessible with high speed interconnect going commodity. However, very >few companies have the concentrated, high volume databases which would >really tax a network. One comment the guy from ATT made back in the 90s was that it's impossible to predict what really might happen when you do have real ubiquitous high speed access to the desktop (which is only just now becoming available, in the sense that the network connection is faster than the disk or CPU). It's that paradigm shift thing. The current software model and the conceptual models of the vast majority of application developers (or users who want things done) tends to be framed by the assumption that network access is slow and/or expensive(hence my comment about having everything locally) If you have a very fat, low latency, cheap pipe, all of a sudden, there are classes of applications (some of which we, by definition, cannot anticipate) that might become possible. For instance, the vision of "per use pricing" for office tools with very thin clients becomes possible. With a fat pipe, you could go back to the 60s timesharing model, with the desktop being just a display and a keyboard. Jim From rgb at phy.duke.edu Mon Feb 5 04:04:32 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] wulfware update... Message-ID: In case anybody on list currently cares, I spent the weekend repackaging the wulfware suite (not to be confused with the warewulf suite:-): xmlsysd, libwulf, wulfstat, wulflogger, and wulf2html (previously wulfweb). They are now in a single source tree in a single source rpm or tarball, which builds rpms for each of these packages all at once. I also worked on making wulf2html into a chkconfig controllable service. Basically, if you install it and configure it (edit scripts and the wulfhosts file in /etc/wulfware) on a system that can write to webspace, you can chkconfig it on and it will automatically start up on boot. It's probably not the most robust application of this sort ever written yet but it works automagically for me -- it comes up with a page that shows localhost only by default. This repackaging should make it easier to develop UIs in a single tree that also contains the library, even on systems that don't have the rpms installed. It was a bit of a pain to work on the library and a UI for testing it at the same time. Hopefully this will facilitate my work (long suspended) on gwulfstat. The new one-stop shop link is: http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php and the old links have gone away. I bumped the revision numbers to a notch above the highest number in the collective tree so that the rpms can be dropped into a yum repo and update happily -- from now on all numbers will advance together as a unit. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From walid.shaari at gmail.com Mon Feb 5 09:48:33 2007 From: walid.shaari at gmail.com (Walid) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] failure rates In-Reply-To: <45C248F6.6080807@dcc.ufmg.br> References: <45C248F6.6080807@dcc.ufmg.br> Message-ID: Hi, I do not know if i can help answering the original question really. but most of the failures we see from the system side are in that order hard disks interconnect cards misconfigured node Uncorrected Memory errors system board failures Unexplainable failures failures related to the application itself we do not see them as the user will resubmit his job and will correct their mistakes quietly. The question is cluster by definition are not highly available systems, they are made up of commodity hardware, and if most of these clusters are using the standard mpi implementation then they will work on the principle if it fails stop. and in most of the time failure investigation is minimal as the importance is getting the node back to work. so is failure rate really of concern? if it was so we would see more of fault tolerance layers in clusters and failure rate metrics in monitoring tools and reports. I am interested in reducing these failure rates as user demands are growing instead of using few nodes, now they are using as much as possible and requesting for even more, and the more you give them, the more failures we will get! What will you be trying to achieve with your thesis? will the question of how the reduce or manage the failures be part of it? regards Walid. From i.kozin at dl.ac.uk Tue Feb 6 10:57:23 2007 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] benchmarking database Message-ID: Dear All, we have launched recently an on-line benchmarking database http://www.cse.clrc.ac.uk/disco/dbd/ The emphasis is on clusters but not exclusively so. At the moment it has two simple interfaces: one for applications http://www.cse.clrc.ac.uk/disco/dbd/search-parallel.php and another for communication benchmarks IMB/PMB http://www.cse.clrc.ac.uk/disco/dbd/search-pmb.php Internally the database treats everything equally. The most basic unit is an independent "processing element" (PE) which can be a single-core CPU, a core in a multi-core CPU, GPU, cell or whatever. PEs can be oversubscribed ie run more than one thread (e.g. when HT or SMT is enabled). PEs aggregate into nodes between which communication takes place via some sort of interconnect. Application performance is compared against the same number of PEs. Hopefully we will improve the interface eventually and grow the number of applications and benchmarks. All your feedback is highly appreciated. If you would like to share your benchmarking data please contact me off the list. We are happy to accommodate results from trusted sources. Regards, Igor I. Kozin (i.kozin at dl.ac.uk) CCLRC Daresbury Laboratory, WA4 4AD, UK skype: in_kozin tel: +44 (0) 1925 603308 http://www.cse.clrc.ac.uk/disco From mathog at caltech.edu Tue Feb 6 11:08:41 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 Message-ID: The Tyan S2466N-4M motherboards bite once again. For various reasons I need to upgrade the software on our cluster from Mandrake 10.1 to something a bit more modern, so I tried Mandriva 2007. Wiped / and /boot on a test node, did a clean install of Mandriva 2007, and pretty much everything worked as it should. Unfortunately this resurrected the old problem where "poweroff" leaves the machine in a dead state: it doesn't respond to the front panel button until the power is unplugged, 20 seconds pass, and the power is restored. This problem was resolved the first time it showed up many years ago by upgrading to BIOS 4.06. There's no newer BIOS, so that isn't going to fix it this time. It isn't a Mandriva 2007 problem per se because we have another machine (a very old Athlon 850 with a Gigabyte motherboard) running that OS and it does "poweroff" correctly. The two machines (poweroff working and not working) have exactly the same versions of every RPM package. LILO is pretty basic on both of them too: image=/boot/vmlinuz label="linux" root=/dev/hda5 initrd=/boot/initrd.img append="resume=/dev/hda2" I suppose I could try a vanilla kernel next, but maybe there's some way to diagnose what state (S5, S0, whatever) the machine is going to on poweroff, and why? The documentation for ACPI itself is humongous, and for the linux implementations essentially absent, so I don't know what tool to run to find or modify this info. Interestingly /proc/acpi/sleep is missing on Mandriva 2007, but that doesn't seem to hurt anything on the working machine, so maybe that's (yet another) change in the ACPI interface? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Tue Feb 6 11:34:02 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 In-Reply-To: References: Message-ID: <45C8D82A.2090101@scalableinformatics.com> David Mathog wrote: > The Tyan S2466N-4M motherboards bite once again. > > For various reasons I need to upgrade the software on our cluster > from Mandrake 10.1 to something a bit more modern, so I tried > Mandriva 2007. Wiped / and /boot on a test node, did > a clean install of Mandriva 2007, and pretty much > everything worked as it should. Unfortunately > this resurrected the old problem where "poweroff" leaves the machine > in a dead state: it doesn't respond to the front panel button > until the power is unplugged, 20 seconds pass, and the power is > restored. This problem was resolved the first time it showed u Hi Dave: We have seen this on lots of Tyan boards in general. Kind of hard to recommend steering clear if you have a room full of them. > p > many years ago by upgrading to BIOS 4.06. There's no newer BIOS, > so that isn't going to fix it this time. It isn't a Mandriva 2007 > problem per se because we have another machine (a very old > Athlon 850 with a Gigabyte motherboard) running that OS and it > does "poweroff" correctly. The two machines (poweroff working and > not working) have exactly the same versions of every RPM package. > LILO is pretty basic on both of them too: > > > image=/boot/vmlinuz > label="linux" > root=/dev/hda5 > initrd=/boot/initrd.img > append="resume=/dev/hda2" I seem to remember having to do a noacpi option to make them behave. Something about acpi on these boards were horribly broken. FWIW we have seen this with a few late model Opteron (Tyan) boards as well. :( -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From ballen at gravity.phys.uwm.edu Tue Feb 6 12:17:03 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] important WDC firmware update Message-ID: Here is an important firmware update for WDC WDXXXXYS series drives on RAID controllers. Without this update you will see period drive dropouts and rebuilds on the RAID sets. We've been seeing this a lot with some Areca controllers; I am hoping that this firmware update will fix the problem. http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1493&p_created=1168299631&p_sid=La1EsAti&p_accessibility=0&p_redirect=&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PSZwX3Jvd19jbnQ9MjImcF9wcm9kcz0mcF9jYXRzPSZwX3B2PSZwX2N2PSZwX3NlYXJjaF90eXBlPXNlYXJjaF9mbmwmcF9wYWdlPTEmcF9zZWFyY2hfdGV4dD1maXJtd2FyZQ**&p_li=&p_topview=1 Cheers, Bruce From jlb17 at duke.edu Tue Feb 6 12:34:29 2007 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] important WDC firmware update In-Reply-To: References: Message-ID: On Tue, 6 Feb 2007 at 2:17pm, Bruce Allen wrote > Here is an important firmware update for WDC WDXXXXYS series drives on RAID > controllers. Without this update you will see period drive dropouts and > rebuilds on the RAID sets. We've been seeing this a lot with some Areca > controllers; I am hoping that this firmware update will fix the problem. Again!? You'd think they would have learned from last time. http://www.3ware.com/KB/article.aspx?id=10240 Note that more drives than just those mentioned in that link were affected, and note the date -- 2003. *sigh* What's old is new again, apparently. Thanks for the heads up. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From ballen at gravity.phys.uwm.edu Tue Feb 6 12:40:14 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] important WDC firmware update In-Reply-To: References: Message-ID: On Tue, 6 Feb 2007, Joshua Baker-LePain wrote: > On Tue, 6 Feb 2007 at 2:17pm, Bruce Allen wrote > >> Here is an important firmware update for WDC WDXXXXYS series drives on RAID >> controllers. Without this update you will see period drive dropouts and >> rebuilds on the RAID sets. We've been seeing this a lot with some Areca >> controllers; I am hoping that this firmware update will fix the problem. > > Again!? You'd think they would have learned from last time. > > http://www.3ware.com/KB/article.aspx?id=10240 > > Note that more drives than just those mentioned in that link were affected, > and note the date -- 2003. *sigh* What's old is new again, apparently. > > Thanks for the heads up. You're welcome! The old problem was the infamous acoustic noise reduction setting. Here I think the onlly change needed was to modify the default value of the firmware setting, which could also have been done with hdparm. The new problem seems to be related to the SMART auto offline test which the drive periodically runs to update its SMART data. But this is just an educated guess based on what WDC has written in their FAQ. Cheers, Bruce From rgb at phy.duke.edu Tue Feb 6 18:33:52 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 In-Reply-To: References: Message-ID: On Tue, 6 Feb 2007, David Mathog wrote: > The Tyan S2466N-4M motherboards bite once again. > > I suppose I could try a vanilla kernel next, but maybe there's some > way to diagnose what state (S5, S0, whatever) the machine is going > to on poweroff, and why? The documentation for ACPI itself is > humongous, and for the linux implementations essentially absent, > so I don't know what tool to run to find or modify this info. > Interestingly /proc/acpi/sleep is missing on Mandriva 2007, but that > doesn't seem to hurt anything on the working machine, so maybe that's > (yet another) change in the ACPI interface? Basically, good luck. This is why we left our 2466N's running RH 7.3 basically "forever". They were so damn touchy and difficult to get running so that they actually were stable and so that the buttons worked and so on that once we finally got there, I'd have taken a hammer to the head of anybody that tried to change them. Besides, they worked. Quite well and all the time. They were isolated so kernel security wasn't a major issue, so why change? Just put back the old OS image. Or is there some specific thing that you need to do that you can't on the old kernels? rgb > > Thanks, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Tue Feb 6 19:06:15 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] important WDC firmware update In-Reply-To: References: Message-ID: On Tue, 6 Feb 2007, Bruce Allen wrote: >>> Here is an important firmware update for WDC WDXXXXYS series drives on >>> RAID controllers. Without this update you will see period drive dropouts ... > The new problem seems to be related to the SMART auto offline test which the > drive periodically runs to update its SMART data. But this is just an > educated guess based on what WDC has written in their FAQ. I take it it isn't an issue with md raid? Linux can monitor via smartd and not get confused, we can hope? rgb > > Cheers, > Bruce > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From i.kozin at dl.ac.uk Wed Feb 7 02:59:13 2007 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] benchmarking database In-Reply-To: Message-ID: Apologies for an error. It was pointed out to me that the 2nd and 3rd links are incorrect. They should read as http://www.cse.clrc.ac.uk/disco/database/search-parallel.php http://www.cse.clrc.ac.uk/disco/database/search-pmb.php respectively. You may have found them from the main page away. > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org]On > Behalf Of Kozin, I (Igor) > Sent: 06 February 2007 18:57 > To: Beowulf Mailing List (E-mail) > Subject: [Beowulf] benchmarking database > > > Dear All, > we have launched recently an on-line benchmarking database > http://www.cse.clrc.ac.uk/disco/dbd/ > The emphasis is on clusters but not exclusively so. > At the moment it has two simple interfaces: one for applications > http://www.cse.clrc.ac.uk/disco/dbd/search-parallel.php > and another for communication benchmarks IMB/PMB > http://www.cse.clrc.ac.uk/disco/dbd/search-pmb.php > > Internally the database treats everything equally. > The most basic unit is an independent "processing element" (PE) > which can be a single-core CPU, a core in a multi-core CPU, GPU, cell > or whatever. PEs can be oversubscribed ie run more than one thread > (e.g. when HT or SMT is enabled). PEs aggregate into nodes > between which communication takes place via some sort of interconnect. > Application performance is compared against the same number of PEs. > > Hopefully we will improve the interface eventually and grow the > number of applications and benchmarks. > All your feedback is highly appreciated. > > If you would like to share your benchmarking data please contact > me off the list. We are happy to accommodate results from trusted > sources. > > Regards, > Igor > > I. Kozin (i.kozin at dl.ac.uk) > CCLRC Daresbury Laboratory, WA4 4AD, UK > skype: in_kozin > tel: +44 (0) 1925 603308 > http://www.cse.clrc.ac.uk/disco From mathog at caltech.edu Wed Feb 7 11:15:59 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 Message-ID: > We have seen this on lots of Tyan boards in general. This probably doesn't help, on the problem machine: % cd /tmp % cp /proc/acpi/dsdt . % iasl -d dsdt % iasl -tc dsdt.dsl Intel ACPI Component Architecture ASL Optimizing Compiler version 20060707 [Sep 8 2006] Copyright (C) 2000 - 2006 Intel Corporation Supports ACPI Specification Revision 3.0a dsdt.dsl 234: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) dsdt.dsl 239: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) dsdt.dsl 244: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) dsdt.dsl 295: Method (\_WAK, 1, NotSerialized) Warning 1079 - ^ Reserved method must return a value (_WAK) dsdt.dsl 309: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) dsdt.dsl 314: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) ASL Input: dsdt.dsl - 2550 lines, 85340 bytes, 671 keywords Compilation complete. 5 Errors, 1 Warnings, 0 Remarks, 346 Optimizations The _WAK warning is suspicious but I see that on other machines where the powerbutton does work, so that alone is not sufficient to cause the permanent poweroff. There's a note somewhere that at least older versions of linux ACPI did not check the return value in any case. However the 5 instances where uninitialized variables are used would go a long way towards explaining the flakiness of this Tyan board. That said, to date I've *never* seen a BIOS whose DSDT could be dumped and then recompiled cleanly. The best so far was a SuperMicro motherboard with only 1 error and 7 warnings. This is what comes, I believe, of 500 page specs like that for ACPI. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From mathog at caltech.edu Wed Feb 7 11:26:42 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 Message-ID: > However the 5 > instances where uninitialized variables are used would go a long > way towards explaining the flakiness of this Tyan board. On second thought, no. I checked these code sections and each instance is like this one: Method (_MSG, 1, NotSerialized) { Store (Local0, Local0) } Apparently they had to put something into the body of the method and used "store a value back onto itself" as a sort of no-op. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.st.john at gmail.com Wed Feb 7 11:59:42 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 In-Reply-To: References: Message-ID: That makes it sound like instead of NOOP, it's "test if value is initialized, and raise an error if not" which may not have been intended. You might try commenting out that line. Peter On 2/7/07, David Mathog wrote: > > > However the 5 > > instances where uninitialized variables are used would go a long > > way towards explaining the flakiness of this Tyan board. > > On second thought, no. I checked these code sections and > each instance is like this one: > > Method (_MSG, 1, NotSerialized) > { > Store (Local0, Local0) > } > > Apparently they had to put something into the body of the method > and used "store a value back onto itself" as a sort of no-op. > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070207/bc900307/attachment.html From landman at scalableinformatics.com Wed Feb 7 18:11:50 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] fast compiler question (pathscale/portland group/gcc) Message-ID: <45CA86E6.5030201@scalableinformatics.com> Folks: Rebuilding a code that uses sse2 inlines. Apart from setting up the appropriate include path for the intrinsic headers, are there any magic switches I need to set? I had done this a while ago, and now I am rebuilding someone-elses-code, and trying to remember what I did before. Most interested in gcc/pathscale. Have PGI locally, and others on remote system. Pointers, clues, and larts welcome. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Thu Feb 8 13:18:18 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] RE: S2466 permanent poweroff, round 2 Message-ID: After a lot of work, and much help at the kernel level from Alexey Starikovskiy, the solution turned out to be using chkconfig --del to turn off all of these: acpi, acpid, harddrake, haldaemon, wltool, messagebus, mandi and also to move asus_acpi.ko out of the /lib/modules tree. I have no idea why the asus module was loading (this being a Tyan motherboard) but it was. Along the way, with various combinations of the above services turned on I observed some incredibly bizarre misbehavior on this system. While logged onto the console (not in X11) either "reboot" or "poweroff" would often lock at "Sending all processes the KILL signal...", which is killall5. Once or twice it locked at the message before, "Sending all processes the TERM signal...". In one instance it rebooted and then crashed in the BIOS. With all of these services disabled it seems to run reliably now. Additionally, when acpid was running it was possible to shutdown the system by pushing the front panel button, but then the next "poweroff" would lock at the "KILL signal" message. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From elken at pathscale.com Thu Feb 8 14:07:40 2007 From: elken at pathscale.com (Tom Elken) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] Re: fast compiler question (pathscale/portland group/gcc) Message-ID: <45CB9F2C.6040805@pathscale.com> > Date: Wed, 07 Feb 2007 21:11:50 -0500 > From: Joe Landman > Subject: [Beowulf] fast compiler question (pathscale/portland > group/gcc) > Rebuilding a code that uses sse2 inlines. Apart from setting up the > appropriate include path for the intrinsic headers, are there any magic > switches I need to set? I had done this a while ago, and now I am > rebuilding someone-elses-code, and trying to remember what I did before. > > Most interested in gcc/pathscale. Hi Joe, Regarding PathScale Compilers, I have this from one of our compiler engineers: ----------------------- If the code already uses SSE2 intrinsics, the PathScale compiler does not need any "magic switches" for SSE2 intrinsics to be enabled. Some applications may need a configuration switch like --enable-sse for the application to use sse intrinsics. This is just a configure switch for the application, and not an option for the compiler. ----------------------- Cheers, Tom -- ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tom Elken Manager, Performance Engineering tom.elken@qlogic.com QLogic Corporation 650.934.8056 System Interconnect Group From dkondo at lri.fr Wed Feb 7 01:40:54 2007 From: dkondo at lri.fr (Derrick Kondo) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] [CFP] EuroPVM/MPI'07 Message-ID: <60ec14620702070140n5eb27a1frf3f68d798b68cfc1@mail.gmail.com> ************************************************************************ *** *** *** CALL FOR PAPERS *** *** *** ************************************************************************ EuroPVM/MPI 2007 14th European PVMMPI Users' Group Meeting Paris, France, September 30 - October 3, 2007 web: http://www.pvmmpi07.org e-mail: chairs@pvmmpi07.org organized by Project Grand-Large (http://grand-large.lri.fr/index.php/Accueil) from INRIA Futurs (http://www-futurs.inria.fr) BACKGROUND AND TOPICS PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) have evolved into the standard interfaces for high-performance parallel programming in the message-passing paradigm. EuroPVM/MPI is the most prominent meeting dedicated to the latest developments of PVM and MPI such as new support tools, implementation and applications using these interfaces. The EuroPVM/MPI meeting naturally encourages discussions of new message-passing and other parallel and distributed programming paradigms beyond MPI and PVM. The 14th European PVM/MPI Users' Group Meeting will be a forum for users and developers of PVM, MPI, and other message-passing programming environments. Through the presentation of contributed papers, vendor presentations, poster presentations and invited talks, attendees will have the opportunity to share ideas and experiences to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. Topics of interest for the meeting include, but are not limited to: * PVM and MPI implementation issues and improvements * Latest extensions to PVM and MPI * PVM and MPI for high-performance computing, clusters and grid environments * New message-passing and hybrid parallel programming paradigms * Interaction between message-passing software and hardware * Fault tolerance in message-passing programs * Performance evaluation of PVM and MPI applications * Tools and environments for PVM and MPI * Algorithms using the message-passing paradigm * Applications in science and engineering based on message-passing This year special emphasis will be put on large-scale issues, such as those related to hardware and interconnect techologies, or the potential or demonstrated shortcomings of PVM or MPI. As in the preceding years, the special session 'ParSim' will focus on numerical simulation for parallel engineering environments. EuroPVM/MPI 2007 will also hold the new 'Outstanding Papers' session introduced in 2006, where the best papers selected by the program committee will be presented. SUBMISSION INFORMATION Contributors are invited to submit a full paper as a PDF (or Postscript) document not exceeding 8 pages in English (2 pages for poster abstracts and Late and Breaking Results). The title page should contain an abstract of at most 100 words and five specific keywords. The paper needs to be formatted according to the Springer LNCS guidelines [2]. The usage of LaTeX for preparation of the contribution as well as the submission in camera ready format is strongly recommended. Style files can be found at the URL [2]. New work that is not yet mature for a full paper, short observations, and similar brief announcements are invited for the poster session. Contributions to the poster session should be submitted in the form of a two-page abstract. All these contributions will be fully peer reviewed by the program committee. Submissions to the special session 'Current Trends in Numerical Simulation for Parallel Engineering Environments' (ParSim 2007) are handled and reviewed by the respective session chairs. For more information please refer to the ParSim website [1]. All accepted submissions are expected to be presented at the conference by one of the authors, which requires registration for the conference. IMPORTANT DATES Submission of full papers and poster abstracts May 7th, 2007 Notification of authors June 11th, 2007 Camera-ready papers July 2nd, 2007 Submission of Late and Breaking Results September 15th, 2007 Tutorials September 30th, 2007 Conference October 1st-3rd, 2007 For up-to-date information, visit the conference web site at http//www.pvmmpi07.org. PROCEEDINGS In addition, selected papers of the conference, including those from the 'Outstanding Papers' session, will be considered for publication in a special issue of Parallel Computing in an extended format. GENERAL CHAIR * Jack Dongarra (University of Tennessee) PROGRAM CHAIRS * Franck Cappello (INRIA Futurs) * Thomas Herault (Universite Paris Sud-XI / INRIA Futurs) CONFERENCE VENUE The conference will be held in the historical, cultural and economic center of Paris, the capital of France. The city, which is renowned for its neo-classical architecture, hosts many museums and galleries and has an active nightlife. The symbol of Paris is the 324 metre (1,063 ft) Eiffel Tower on the banks of the Seine. Dubbed "the City of Light" (la Ville Lumiere) since the 19th century, Paris is regarded by many as one of the most beautiful and romantic cities in the world. It is also the most visited city in the world with more than 30 million foreign visitors per year. Paris is easily reachable from any European capital and most of the large European, American and Asian cities. It is an ideal starting point for visiting european institutes and cities. REFERENCES [1] ParSim 2007: http://wwwbode.in.tum.de/Par/arch/events/parsim07/ [2] Springer Guidelines: http://www.springer.de/comp/lncs/authors.html From =?utf-8?Q?Pablo_Hern=C3=A1n_Rodr?= Wed Feb 7 07:10:09 2007 From: =?utf-8?Q?Pablo_Hern=C3=A1n_Rodr?= (=?utf-8?Q?Pablo_Hern=C3=A1n_Rodr?=) Date: Wed Nov 25 01:05:40 2009 Subject: [Beowulf] mpich ch_p4mpd problem.. Message-ID: Hello, my name is Pablo. I'm having problems with MPI. When i execute a MPI program this error ocurs MPI_INIT : MPIRUN chose the wrong device ch_p4; program needs device ch_p4mpd From your post, I believe that you know how to change from using ch_p4 mpd to ch_p4. I'd be glad if you could tell me how did you do that. Thanks Pablo -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas From wavelet at iutlecreusot.u-bourgogne.fr Wed Feb 7 08:24:10 2007 From: wavelet at iutlecreusot.u-bourgogne.fr (Wavelet colloque) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Call for papers : Wavelet Applications in Industrial Processing V Message-ID: *** Call for Papers and Announcement *** Wavelet Applications in Industrial Processing V (SA109) Part of SPIE?s International Symposium on Optics East 2007 9-12 September 2007 ? Seaport World Trade Center ? Boston, MA, USA --- Abstract Due Date: 26 February 2007 --- --- Manuscript Due Date: 13 August 2007 --- Web site http://spie.org/Conferences/Calls/07/oe/submitAbstract/index.cfm? fuseaction=SA109 Conference Chairs: Fr?d?ric Truchetet, Univ. de Bourgogne (France); Olivier Laligant, Univ. de Bourgogne (France) Program Committee: Patrice Abry, ?cole Normale Sup?rieure de Lyon (France); Radu V. Balan, Siemens Corporate Research; Atilla M. Baskurt, Univ. Claude Bernard Lyon 1 (France); Amel Benazza-Benyahia, Ecole Sup?rieure des Communications de Tunis (Tunisia); Albert Bijaoui, Observatoire de la C?te d'Azur (France); Seiji Hata, Kagawa Univ. (Japan); Henk J. A. M. Heijmans, Ctr. for Mathematics and Computer Science (Netherlands); William S. Hortos, Associates in Communication Engineering Research and Technology; Jacques Lewalle, Syracuse Univ.; Wilfried R. Philips, Univ. Gent (Belgium); Alexandra Pizurica, Univ. Gent (Belgium); Guoping Qiu, The Univ. of Nottingham (United Kingdom); Hamed Sari-Sarraf, Texas Tech Univ.; Peter Schelkens, Vrije Univ. Brussel (Belgium); Paul Scheunders, Univ. Antwerpen (Belgium); Kenneth W. Tobin, Jr., Oak Ridge National Lab.; G?nther K. G. Wernicke, Humboldt-Univ. zu Berlin (Germany); Gerald Zauner, Fachhochschule Wels (Austria) The wavelet transform, multiresolution analysis, and other space- frequency or space-scale approaches are now considered standard tools by researchers in image and signal processing. Promising practical results in machine vision and sensors for industrial applications and non destructive testing have been obtained, and a lot of ideas can be applied to industrial imaging projects. This conference is intended to bring together practitioners, researchers, and technologists in machine vision, sensors, non destructive testing, signal and image processing to share recent developments in wavelet and multiresolution approaches. Papers emphasizing fundamental methods that are widely applicable to industrial inspection and other industrial applications are especially welcome. Papers are solicited but not limited to the following areas: o New trends in wavelet and multiresolution approach, frame and overcomplete representations, Gabor transform, space-scale and space- frequency analysis, multiwavelets, directional wavelets, lifting scheme for: - sensors - signal and image denoising, enhancement, segmentation, image deblurring - texture analysis - pattern recognition - shape recognition - 3D surface analysis, characterization, compression - acoustical signal processing - stochastic signal analysis - seismic data analysis - real-time implementation - image compression - hardware, wavelet chips. o Applications: - machine vision - aspect inspection - character recognition - speech enhancement - robot vision - image databases - image indexing or retrieval - data hiding - image watermarking - non destructive evaluation - metrology - real-time inspection. o Applications in microelectronics manufacturing, web and paper products, glass, plastic, steel, inspection, power production, chemical process, food and agriculture, pharmaceuticals, petroleum industry. All submissions will be peer reviewed. Please note that abstracts must be at least 500 words in length in order to receive full consideration. ----------------------------------------- ! Abstract Due Date: 26 February 2007 ! ! Manuscript Due Date: 13 August 2007 ! ----------------------------------------- ------------- Submission of Abstracts for Optics East 2007 Symposium ------------ Abstract Due Date: 26 February 2007 - Manuscript Due Date: 13 August 2007 Abstracts, if accepted, will be distributed at the meeting. * IMPORTANT! - Submissions imply the intent of at least one author to register, attend the symposium, present the paper (either orally or in poster format), and submit a full-length manuscript for publication in the conference Proceedings. - By submitting your abstract, you warrant that all clearances and permissions have been obtained, and authorize SPIE to circulate your abstract to conference committee members for review and selection purposes and if it is accepted, to publish your abstract in conference announcements and publicity. - All authors (including invited or solicited speakers), program committee members, and session chairs are responsible for registering and paying the reduced author, session chair, program committee registration fee. (Current SPIE Members receive a discount on the registration fee.) * Instructions for Submitting Abstracts via Web - You are STRONGLY ENCOURAGED to submit abstracts using the ?submit an abstract? link at: http://spie.org/events/oe - Submitting directly on the Web ensures that your abstract will be immediately accessible by the conference chair for review through MySPIE, SPIE?s author/chair web site. - Please note! When submitting your abstract you must provide contact information for all authors, summarize your paper, and identify the contact author who will receive correspondence about the submission and who must submit the manuscript and all revisions. Please have this information available before you begin the submission process. - First-time users of MySPIE can create a new account by clicking on the create new account link. You can simplify account creation by using your SPIE ID# which is found on SPIE membership cards or the label of any SPIE mailing. - If you do not have web access, you may E-MAIL each abstract separately to: abstracts@spie.org in ASCII text (not encoded) format. There will be a time delay for abstracts submitted via e-mail as they will not be immediately processed for chair review. IMPORTANT! To ensure proper processing of your abstract, the SUBJECT line must include only: SUBJECT: SA109, TRUCHETET, LALIGANT - Your abstract submission must include all of the following: 1. PAPER TITLE 2. AUTHORS (principal author first) For each author: o First (given) Name (initials not acceptable) o Last (family) Name o Affiliation o Mailing Address o Telephone Number o Fax Number o Email Address 3. PRESENTATION PREFERENCE "Oral Presentation" or "Poster Presentation." 4. PRINCIPAL AUTHOR?S BIOGRAPHY Approximately 50 words. 5. ABSTRACT TEXT Approximately 500 words. Accepted abstracts for this conference will be included in the abstract CD-ROM which will be available at the meeting. Please submit only 500-word abstracts that are suitable for publication. 6. KEYWORDS Maximum of five keywords. If you do not have web access, you may E-MAIL each abstract separately to: abstracts@spie.org in ASCII text (not encoded) format. There will be a time delay for abstracts submitted via e- mail as they will not be immediately processed for chair review. * Conditions of Acceptance - Authors are expected to secure funding for registration fees, travel, and accommodations, independent of SPIE, through their sponsoring organizations before submitting abstracts. - Only original material should be submitted. - Commercial papers, papers with no new research/development content, and papers where supporting data or a technical description cannot be given for proprietary reasons will not be accepted for presentation in this symposium. - Abstracts should contain enough detail to clearly convey the approach and the results of the research. - Government and company clearance to present and publish should be final at the time of submittal. If you are a DoD contractor, allow at least 60 days for clearance. Authors are required to warrant to SPIE in advance of publication of the Proceedings that all necessary permissions and clearances have been obtained, and that submitting authors are authorized to transfer copyright of the paper to SPIE. * Review, Notification, Program Placement - To ensure a high-quality conference, all abstracts and Proceedings manuscripts will be reviewed by the Conference Chair/Editor for technical merit and suitability of content. Conference Chair/Editors may require manuscript revision before approving publication, and reserve the right to reject for presentation or publication any paper that does not meet content or presentation expectations. SPIE?s decision on whether to accept a presentation or publish a manuscript is final. - Applicants will be notified of abstract acceptance and sent manuscript instructions by e-mail no later than 7 May 2007. Notification of acceptance will be placed on SPIE Web the week of 4 June 2007 at http://spie.org/events/oe - Final placement in an oral or poster session is subject to the Chairs' discretion. Instructions for oral and poster presentations will be sent to you by e-mail. All oral and poster presentations require presentation at the meeting and submission of a manuscript to be included in the Proceedings of SPIE. * Proceedings of SPIE - These conferences will result in full-manuscript Chairs/Editor- reviewed volumes published in the Proceedings of SPIE and in the SPIE Digital Library. - Correctly formatted, ready-to-print manuscripts submitted in English are required for all accepted oral and poster presentations. Electronic submissions are recommended, and result in higher quality reproduction. Submission must be provided in PostScript created with a printer driver compatible with SPIE?s online Electronic Manuscript Submission system. Instructions are included in the author kit and from the ?Author Info? link at the conference website. - Authors are required to transfer copyright of the manuscript to SPIE or to provide a suitable publication license. - Papers published are indexed in leading scientific databases including INSPEC, Ei Compendex, Chemical Abstracts, International Aerospace Abstracts, Index to Scientific and Technical Proceedings and NASA Astrophysical Data System, and are searchable in the SPIE Digital Library. Full manuscripts are available to Digital Library subscribers. - Late manuscripts may not be published in the conference Proceedings and SPIE Digital Library, whether the conference volume will be published before or after the meeting. The objective of this policy is to better serve the conference participants as well as the technical community at large, by enabling timely publication of the Proceedings. - Papers not presented at the meeting will not be published in the conference Proceedings, except in the case of exceptional circumstances at the discretion of SPIE and the Conference Chairs/Editors. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070207/4e79c879/attachment.html From dkondo at lri.fr Thu Feb 8 03:25:42 2007 From: dkondo at lri.fr (Derrick Kondo) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] [PCGrid 2007] call for participation: workshop on desktop grids Message-ID: <60ec14620702080325l5f683f10q299c061f05d2dbf3@mail.gmail.com> CALL FOR PARTICIPATION (see advance program below) WORKSHOP ON LARGE-SCALE, VOLATILE DESKTOP GRIDS (PCGRID 2007) held in conjunction with the IEEE International Parallel & Distributed Processing Symposium (IPDPS) March 30, 2007 Long Beach, California U.S.A. http://pcgrid07.lri.fr Desktop grids utilize the free resources available in Intranet or Internet environments for supporting large-scale computation and storage. For over a decade, desktop grids have been one of the largest and most powerful distributed computing systems in the world, offering a high return on investment for applications from a wide range of scientific domains (including computational biology, climate prediction, and high-energy physics). While desktop grids sustain up to Teraflops/second of computing power from hundreds of thousands to millions of resources, fully leveraging the platform's computational power is still a major challenge because of the immense scale, high volatility, and extreme heterogeneity of such systems. The purpose of the workshop is to provide a forum for discussing recent advances and identifying open issues for the development of scalable, fault-tolerant, and secure desktop grid systems. The workshop seeks to bring desktop grid researchers together from theoretical, system, and application areas to identify plausible approaches for supporting applications with a range of complexity and requirements on desktop environments. ##################################################################### ADVANCE PROGRAM (In each session below, the following list of papers will be presented. For the detailed schedule, see http://pcgrid07.lri.fr/program.html) --------------------------------------------------------------------- KEYNOTE SPEAKER: David P. Anderson, Director of BOINC and SETI@home, University of California at Berkeley --------------------------------------------------------------------- SESSION I: SYSTEMS Invited Paper: Open Internet-based Sharing for Desktop Grids in iShare Xiaojuan Ren, Purdue University, U.S.A. Ayon Basumallik, Purdue University, U.S.A. Zhelong Pan, VMWare, Inc., U.S.A. Rudolf Eigenmann, Purdue University, U.S.A. Invited Paper: Decentralized Dynamic Host Configuration in Wide-area Overlay Networks of Virtual Workstations Arijit Ganguly, University of Florida, U.S.A. David I. Wolinsky, University of Florida, U.S.A. P. Oscar Boykin, University of Florida, U.S.A. Renato J. Figueiredo, University of Florida, U.S.A. SZTAKI Desktop Grid: a Modular and Scalable Way of Building Large Computing Grids Zoltan Balaton, MTA SZTAKI Research Institute, Hungary Gabor Gombas, MTA SZTAKI Research Institute, Hungary Peter Kacsuk, MTA SZTAKI Research Institute, Hungary Adam Kornafeld, MTA SZTAKI Research Institute, Hungary Jozsef Kovacs, MTA SZTAKI Research Institute, Hungary Attila Csaba Marosi, MTA SZTAKI Research Institute, Hungary Gabor Vida, MTA SZTAKI Research Institute, Hungary Norbert Podhorszki, UC Davis, U.S.A. Tamas Kiss, University of Westminster, U.K. Direct Execution of Linux Binary on Windows for Grid RPC Workers Yoshifumi Uemura, University of Tsukuba, Japan Yoshihiro Nakajima, University of Tsukuba, Japan Mitsuhisa Sato, University of Tsukuba, Japan --------------------------------------------------------------------- SESSION II: SCHEDULING AND RESOURCE MANAGEMENT Local Scheduling for Volunteer Computing David Anderson, UC Berkeley, U.S.A. John McLeod VII, Sybase, Inc., U.S.A. Moving Volunteer Computing towards Knowledge-Constructed, Dynamically-Adaptive Modeling and Scheduling Michela Taufer, University of Texas at El Paso, U.S.A. Andre Kerstens, University of Texas at El Paso, U.S.A. Trilce Estrada, University of Texas at El Paso, U.S.A. David Flores, University of Texas at El Paso, U.S.A. Richard Zamudio, University of Texas at El Paso, U.S.A. Patricia Teller, University of Texas at El Paso, U.S.A. Roger Armen, The Scripps Research Institute, U.S.A. Charles L. Brooks III, The Scripps Research Institute, U.S.A. Proxy-based Grid Information Dissemination Deger Erdil, State University of New York at Binghamton, U.S.A. Michael Lewis, State University of New York at Binghamton, U.S.A. Nael Abu-Ghazaleh, State University of New York at Binghamton, U.S.A. --------------------------------------------------------------------- SESSION III: DATA-INTENSIVE APPLICATIONS AND DISTRIBUTED STORAGE Challenges in Executing Data Intensive Biometric Workloads on a Desktop Grid Christopher Moretti, University of Notre Dame, U.S.A. Timothy Faltemier, University of Notre Dame, U.S.A. Douglas Thain, University of Notre Dame, U.S.A. Patrick Flynn, University of Notre Dame, U.S.A. Invited Paper: Storage@home: Petascale Distributed Storage Adam L. Beberg, Stanford University, U.S.A. Vijay Pande, Stanford University, U.S.A. --------------------------------------------------------------------- SESSION IV: THEORY Applying IC-Scheduling Theory to Familiar Classes of Computations Gennaro Cordasco, University of Salerno, Italy Grzegorz Malewicz, Google, Inc., U.S.A. Arnold Rosenberg, University of Massachusetts at Amherst, U.S.A. Invited Paper: A Combinatorial Model for Self-Organizing Networks Yuri Dimitrov, Ohio State University, U.S.A. Gennaro Mango, Ohio State University, U.S.A. Carlo Giovine, Ohio State University, U.S.A. Mario Lauria, Ohio State University, U.S.A. Invited Paper: Towards Contracts & SLA in Large Scale Clusters & Desktops Grids Denis Caromel, INRIA, France Francoise Baude, INRIA, France Alexandre di Costanzo, INRIA, France Christian Delbe, INRIA, France Mario Leyton, INRIA, France ##################################################################### ORGANIZATION General Chairs Derrick Kondo, INRIA Futurs, France Franck Cappello, INRIA Futurs, France Program Chair Gilles Fedak, INRIA Futurs, France From ppk at ats.ucla.edu Thu Feb 8 13:37:19 2007 From: ppk at ats.ucla.edu (Korambath, Prakashan) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] New Daylight savings times rules and Linux OS updates? References: Message-ID: <43F64E86355A744E9D51506B6C6783B9014CE04C@EM2.ad.ucla.edu> As per the Energy Policy Act of 2005 (H.R.6.ENR), new daylight saving will start on the second Sunday of March and end in first Sunday of November starting 2007. I was just wondering whether there is any update to reflect this change for Linux OS machines running different versions of Fedora Core. Thanks. Prakashan Korambath -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070208/92781969/attachment.html From brian.ropers.huilman at gmail.com Thu Feb 8 18:02:37 2007 From: brian.ropers.huilman at gmail.com (Brian D. Ropers-Huilman) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] New Daylight savings times rules and Linux OS updates? In-Reply-To: <43F64E86355A744E9D51506B6C6783B9014CE04C@EM2.ad.ucla.edu> References: <43F64E86355A744E9D51506B6C6783B9014CE04C@EM2.ad.ucla.edu> Message-ID: I was just presented with the answer to this question recently, so it's on my mind. You can use zdump -v on the /etc/localtime file, or wherever it points and grep for 2007. If you see March and November, you're fine. Most modern distributions are updated for this already. FC6 and FC5 specifically are fine. FC4 needs the tzdata-2005m-1.fc4 and FC3 needs the tzdata-2005m-1.fc3 RPMs installed to be updated. Google can help on this one too. -- Brian D. Ropers-Huilman On 2/8/07, Korambath, Prakashan wrote: > > As per the Energy Policy Act of 2005 (H.R.6.ENR), new daylight saving will > start on the second Sunday of March and end in first Sunday of November > starting 2007. I was just wondering whether there is any update to reflect > this change for Linux OS machines running different versions of Fedora Core. > Thanks. > > Prakashan Korambath > > -- Brian D. Ropers-Huilman From hahn at mcmaster.ca Fri Feb 9 07:27:14 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: <20070209143941.GA10394@zresearch.com> References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> <20070209143941.GA10394@zresearch.com> Message-ID: >> nice graph. but how does it look if you compare a single glusterfs >> brick with a single NFS brick? > > The purpose of glusterfs has never been to beat NFS in a point to point > throughput competition, sure. but my point is that comparing some large number of servers under protocol X to a single server under protocol Y is not all that meaningful. > since in real world there are a lot of requests > happening in parallel and it is more important to achieve a higher > aggregated bandwidth. surely a single glusterfs brick can handle more than one request at a time, though... > That being said, it is worthy to note that glusterfs is still better than > NFS in point-to-point (single NFS brick vs single glusterfs brick). > > On Gig/E - both nfs and glusterfs peak on the link speed for read. for write > glusterfs peaks on the link speed, but nfs did not that's odd, and indicates that the nfs config you tested was hitting disk limits. and unfortunately, that makes the comparison even less comprehensible. looking at the config again, it appears that the node might have just a single disk, which would make the results quite expected. > On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs, > from the source repository) and is clearly way faster than NFS. "clearly"s like that make me nervous. to an IB enthusiast, SDP may be more aesthetically pleasing, but why do you think IPoIB should be noticably slower than SDP? lower cpu overhead, probably, but many people have no problem running IP at wirespeed on IB/10GE-speed wires... From coutinho at dcc.ufmg.br Fri Feb 9 12:41:44 2007 From: coutinho at dcc.ufmg.br (Bruno Rocha Coutinho) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement Message-ID: <45CCDC88.8080102@dcc.ufmg.br> As glusterfs is a parallel filesystem, I think that a more valuable experiment is comparing it against another parallel filesystem, like pvfs2 or lustre, in a distributed environment. This could show the performance of glusterfs in its intended setting. 2007/2/9, Mark Hahn : >> nice graph. but how does it look if you compare a single glusterfs >> brick with a single NFS brick? > > The purpose of glusterfs has never been to beat NFS in a point to point > throughput competition, sure. but my point is that comparing some large number of servers under protocol X to a single server under protocol Y is not all that meaningful. > since in real world there are a lot of requests > happening in parallel and it is more important to achieve a higher > aggregated bandwidth. surely a single glusterfs brick can handle more than one request at a time, though... > That being said, it is worthy to note that glusterfs is still better than > NFS in point-to-point (single NFS brick vs single glusterfs brick). > > On Gig/E - both nfs and glusterfs peak on the link speed for read. for write > glusterfs peaks on the link speed, but nfs did not that's odd, and indicates that the nfs config you tested was hitting disk limits. and unfortunately, that makes the comparison even less comprehensible. looking at the config again, it appears that the node might have just a single disk, which would make the results quite expected. > On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs, > from the source repository) and is clearly way faster than NFS. "clearly"s like that make me nervous. to an IB enthusiast, SDP may be more aesthetically pleasing, but why do you think IPoIB should be noticably slower than SDP? lower cpu overhead, probably, but many people have no problem running IP at wirespeed on IB/10GE-speed wires... _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From avati at zresearch.com Fri Feb 9 06:39:41 2007 From: avati at zresearch.com (Anand Avati) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> Message-ID: <20070209143941.GA10394@zresearch.com> > nice graph. but how does it look if you compare a single glusterfs > brick with a single NFS brick? The purpose of glusterfs has never been to beat NFS in a point to point throughput competition, since in real world there are a lot of requests happening in parallel and it is more important to achieve a higher aggregated bandwidth. That being said, it is worthy to note that glusterfs is still better than NFS in point-to-point (single NFS brick vs single glusterfs brick). On Gig/E - both nfs and glusterfs peak on the link speed for read. for write glusterfs peaks on the link speed, but nfs did not On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs, from the source repository) and is clearly way faster than NFS. avati -- Shaw's Principle: Build a system that even a fool can use, and only a fool will want to use it. From avati at zresearch.com Fri Feb 9 10:03:54 2007 From: avati at zresearch.com (Anand Avati) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> <20070209143941.GA10394@zresearch.com> Message-ID: <20070209180354.GA399@zresearch.com> > > that's odd, and indicates that the nfs config you tested was hitting > disk limits. and unfortunately, that makes the comparison even less > comprehensible. looking at the config again, it appears that the node > might have just a single disk, which would make the results quite > expected. all tests were conducted on the same hardware. a point-to-point (single server, single client) write over NFS on Gig/E did not peak the link throughput. on the same hardware and network, glusterfs write peaks the link speed. > >On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and > >ib-verbs, > >from the source repository) and is clearly way faster than NFS. > > "clearly"s like that make me nervous. to an IB enthusiast, SDP may be > more aesthetically pleasing, but why do you think IPoIB should be > noticably > slower than SDP? in a general sense, filesystem throughput is related to link latency, since applications (unless doing AIO) issue the next read/write _after_ the current one completes. having writeback and readaheads help solve the problem to a certain extent, but, in general for filesystems lowlatency transports surely helps. > lower cpu overhead, probably, but many people have no > problem running IP at wirespeed on IB/10GE-speed wires... none of those problems, its about latency. SDP has a lot less latency than IPoIB. avati -- Shaw's Principle: Build a system that even a fool can use, and only a fool will want to use it. From kevin.ball at qlogic.com Fri Feb 9 13:08:11 2007 From: kevin.ball at qlogic.com (Kevin Ball) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> <20070209143941.GA10394@zresearch.com> Message-ID: <1171055290.8612.82.camel@ammonite> Hi Mark, > > > On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs, > > from the source repository) and is clearly way faster than NFS. > > "clearly"s like that make me nervous. to an IB enthusiast, SDP may be > more aesthetically pleasing, but why do you think IPoIB should be noticably > slower than SDP? lower cpu overhead, probably, but many people have no > problem running IP at wirespeed on IB/10GE-speed wires... As I understand it, one reason why SDP is faster than IPoIB is that the way IPoIB is currently spec'ed requires there be an extra copy relative to SDP. It is also specced with a smaller MTU, which makes a fair difference. I believe there is movement afoot to change the spec to allow for a larger MTU, but I'm not an IB expert and don't follow it religiously. -Kevin > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Fri Feb 9 15:21:15 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: <1171055290.8612.82.camel@ammonite> References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> <20070209143941.GA10394@zresearch.com> <1171055290.8612.82.camel@ammonite> Message-ID: >>> On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and ib-verbs, >>> from the source repository) and is clearly way faster than NFS. >> >> "clearly"s like that make me nervous. to an IB enthusiast, SDP may be >> more aesthetically pleasing, but why do you think IPoIB should be noticably >> slower than SDP? lower cpu overhead, probably, but many people have no >> problem running IP at wirespeed on IB/10GE-speed wires... > > As I understand it, one reason why SDP is faster than IPoIB is that the > way IPoIB is currently spec'ed requires there be an extra copy relative > to SDP. that's what I meant by "cpu overhead". but the point is that current CPUs have 10-20 GB/s of memory bandwidth hanging around, so it's not necessarily much of a win to avoid a copy. even in olden days, it was common to show some workloads where hosts doing TCP checksumming actually _benefited_ performance by populating the cache. > It is also specced with a smaller MTU, which makes a fair > difference. I believe there is movement afoot to change the spec to > allow for a larger MTU, but I'm not an IB expert and don't follow it > religiously. MTU is another one of those things that got a rep for importance, but which is really only true in certain circumstances. bigger MTU reduces the per-packet overhead. by squinting at the table in question, it appears to show ~300 MB/s on a single node. with 8k packets, that's ~40K pps, vs ~5k pps for 64k MTU. seems like a big win, right? well, except why assume each packet requires an interrupt? reducing the overhead, whether through fewer copies or bigger MTUs is certainly a good thing. these days, neither is necessarily essential unless you're really, really pushing the limits. there are only a few people in the universe (such as Cern, or perhaps the big telescopes) who genuinely have those kinds of data rates. we're a pretty typical supercomputing center, I think, and see only quite short bursts into the GB/s range (aggregate, quadrics+lustre). I'm genuinely curious: do you (anyone) have applications which sustain many GB/s either IPC or IO? regards, mark hahn. From hahn at mcmaster.ca Sat Feb 10 08:50:17 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: <45CCDC88.8080102@dcc.ufmg.br> References: <45CCDC88.8080102@dcc.ufmg.br> Message-ID: > As glusterfs is a parallel filesystem, I think that a more valuable > experiment is comparing it against another parallel filesystem, like pvfs2 or > lustre, in a distributed environment. This could show the performance of > glusterfs in its intended setting. well then, why didn't the glusterfs developers do that? besides, I think I actually disagree: just as you want to show parallel scaling relative to the same code run serially, it's valuable to show a clusterfs versus a single-node running a fs not specifically designed for scaling. actually, that's another point: the comparison should have been normalized by the number of bricks involved, since linear scaling of aggregate bandwidth of separate nodes writing separate files to separate bricks is, well, merely expected. might as well compare versus 16 separate NFS servers which happen to share a namespace by being automounted into a tree ;) but really, I'm not trying to be down on glusterfs. it seems fine, though I'm not really clear on what its goals are (against any of the other cluster fs's). From csamuel at vpac.org Sun Feb 11 22:26:41 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement In-Reply-To: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> References: <10960.201.234.224.4.1169585317.squirrel@mail.gnu-india.org> Message-ID: <200702121726.44645.csamuel@vpac.org> On Wed, 24 Jan 2007, Anand Babu wrote: > The current release of GlusterFS is running stable and performs > exceedingly well against NFS ?please refer to benchmarks at > http://www.gluster.org/docs/index.php/GlusterFS_Benchmarks for > benchmark comparison. It would be interesting to see comparison benchmark results for Bonnie++ as well. -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070212/b20c147a/attachment.bin From 06002352 at brookes.ac.uk Mon Feb 12 17:11:14 2007 From: 06002352 at brookes.ac.uk (Mitchell Wisidagamage) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45CCDC88.8080102@dcc.ufmg.br> References: <45CCDC88.8080102@dcc.ufmg.br> Message-ID: <45D11032.6040006@brookes.ac.uk> I think this news is of some relevance to the group... http://news.bbc.co.uk/1/hi/technology/6354225.stm From csamuel at vpac.org Mon Feb 12 19:22:12 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D11032.6040006@brookes.ac.uk> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> Message-ID: <200702131422.12498.csamuel@vpac.org> On Tue, 13 Feb 2007, Mitchell Wisidagamage wrote: > I think this news is of some relevance to the group... > > http://news.bbc.co.uk/1/hi/technology/6354225.stm Single or double precision ? -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070213/c420231c/attachment.bin From landman at scalableinformatics.com Mon Feb 12 19:46:34 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <200702131422.12498.csamuel@vpac.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> Message-ID: <45D1349A.1020308@scalableinformatics.com> It looked like it did IEEE754 doubles. Any Intel types out there to confirm/deny? Chris Samuel wrote: > On Tue, 13 Feb 2007, Mitchell Wisidagamage wrote: > >> I think this news is of some relevance to the group... >> >> http://news.bbc.co.uk/1/hi/technology/6354225.stm > > Single or double precision ? > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From larry.stewart at sicortex.com Mon Feb 12 21:13:07 2007 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <200702131422.12498.csamuel@vpac.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> Message-ID: <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> On Feb 12, 2007, at 10:22 PM, Chris Samuel wrote: > On Tue, 13 Feb 2007, Mitchell Wisidagamage wrote: > >> I think this news is of some relevance to the group... >> >> http://news.bbc.co.uk/1/hi/technology/6354225.stm > > Single or double precision ? > > -- http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=197004697 eetimes has a better article, that says it is single precision. each core is cited as being 3 mm**2 in 65 nm. The mips-64 cores we're using in 90 nm are 6 mm**2, with a double precision FP unit. So by area, the Intel cores are similar in complexity Intel is stacking dram dice above the cpu as an L4 cache, but the article doesn't really explain how they plan to expand the off-chip bw other than by going to photonics eventually. -Larry From hahn at mcmaster.ca Mon Feb 12 21:32:35 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1349A.1020308@scalableinformatics.com> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> Message-ID: > It looked like it did IEEE754 doubles. Any Intel types out there to > confirm/deny? singles: http://www.pcper.com/article.php?aid=363 IMO, the chip is mainly interesting to explore how much we can abandon the von Neumann architecture as a whole, rather than stupidly putting more and more of them onto a chip. after all, the nearest-neighbor latency (125 ps!) is comparable to cache or even register-file. (admittedly, in this chip, the links are only 32b wide, which means any useful inter-PE message (say, at least a cachineline) would take more than a couple cycles... what I don't really understand is why there aren't lots of groups doing this kind of exploratory chip. is it just that any interesting chip tends to push design, circuit and fab boundaries all at the same time? >>> http://news.bbc.co.uk/1/hi/technology/6354225.stm frankly, I'm a bit embarassed by all these experts being quoted as saying that multicore is the brave new world. I saw one article that claimed that no OS existed to utilize 80 threads, and that no programmers could use them. (counterexample: Altix running Linux and OpenMP code from pretty mundane programmers...) amdahl's law: not just a good idea... From hahn at mcmaster.ca Mon Feb 12 21:46:49 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> Message-ID: > Intel is stacking dram dice above the cpu as an L4 cache, but the article stacking seems like a major hack - I'd rather think about how to do processor-in-memory (perhaps zram?). also, current production dram is around 1Gb/128MB, and the chip's already got 400 KB of memory onchip. it's still really important to have a substantial fan-out from cpu to memory for capacity. > doesn't really > explain how they plan to expand the off-chip bw other than by going to > photonics > eventually. isn't photonics still at the hand-waving stage? I was just noticing how 10G XFP's have not gotten much cheaper over the past couple years. is there really a prospect for wide and fast photonic links, given that copper links are at ~3 Gb pretty easily? have people figured out how to mass-produce photonic-chipped systems as efficient as copper PC boards and bumped chips? From greg.lindahl at qlogic.com Mon Feb 12 23:06:12 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> Message-ID: <20070213070612.GA5466@localhost.localdomain> On Tue, Feb 13, 2007 at 12:46:49AM -0500, Mark Hahn wrote: > isn't photonics still at the hand-waving stage? I was just noticing how 10G > XFP's have not gotten much cheaper over the past couple years. Yes, but there are several companies about to produce lower-cost optical and active copper cables for IB. Since "4 wire" ethernet is pretty similar to IB, presumably they'll see a price drop, too. We were showing several of these technologies working at SC last November. Also note that 4 gig FC is the same data rate as DDR IB. Now what the final costs will be, who knows? -- greg From eugen at leitl.org Tue Feb 13 00:21:27 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <20070213070612.GA5466@localhost.localdomain> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <20070213070612.GA5466@localhost.localdomain> Message-ID: <20070213082127.GW21677@leitl.org> On Mon, Feb 12, 2007 at 11:06:12PM -0800, Greg Lindahl wrote: > Yes, but there are several companies about to produce lower-cost > optical and active copper cables for IB. Since "4 wire" ethernet is > pretty similar to IB, presumably they'll see a price drop, too. Inter-die and inter-wafer level optical interconnects need not to be standartized, but just link up (over short distances, using fab-side connected high-precision tiny waveguide geometries) the on-silicon mesh fabric, running its custom protocol. What would be interesting in how this thing would deal with routing around defective dies wafer-scale, either by remapping, or realtime with the mesh signalling protocol. > We were showing several of these technologies working at SC last > November. > > Also note that 4 gig FC is the same data rate as DDR IB. > > Now what the final costs will be, who knows? The problem with this chip (I've been expecting something very like this in about 1996, and in fact designed a paper CPU which is very much like this, only much wider, stack-based and leaner-core) is that's pure vaporware. It would also need radically stripped-down kernels, which basically rules out Linux (but there are reasonably lean things like L4 & Co available, which could run Linux as a wrapper on some fatter nodes). -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From diep at xs4all.nl Tue Feb 13 02:14:55 2007 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk><200702131422.12498.csamuel@vpac.org><8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com><20070213070612.GA5466@localhost.localdomain> <20070213082127.GW21677@leitl.org> Message-ID: <002b01c74f57$d239a250$0300a8c0@gourmandises> Hi Eugen, Can you explain the OS issue to us? Thanks, Vincent ----- Original Message ----- From: "Eugen Leitl" To: Sent: Tuesday, February 13, 2007 9:21 AM Subject: Re: [Beowulf] Teraflop chip hints at the future > On Mon, Feb 12, 2007 at 11:06:12PM -0800, Greg Lindahl wrote: > >> Yes, but there are several companies about to produce lower-cost >> optical and active copper cables for IB. Since "4 wire" ethernet is >> pretty similar to IB, presumably they'll see a price drop, too. > > Inter-die and inter-wafer level optical interconnects need not > to be standartized, but just link up (over short distances, using > fab-side connected high-precision tiny waveguide geometries) the > on-silicon mesh fabric, running its custom protocol. What would be > interesting in how this thing would deal with routing around defective > dies > wafer-scale, either by remapping, or realtime with the mesh > signalling protocol. > >> We were showing several of these technologies working at SC last >> November. >> >> Also note that 4 gig FC is the same data rate as DDR IB. >> >> Now what the final costs will be, who knows? > > The problem with this chip (I've been expecting something very like > this in about 1996, and in fact designed a paper CPU which is very much > like this, only much wider, stack-based and leaner-core) is that's pure > vaporware. > It would also need radically stripped-down kernels, which basically rules > out Linux > (but there are reasonably lean things like L4 & Co available, which > could run Linux as a wrapper on some fatter nodes). > > -- > Eugen* Leitl leitl http://leitl.org > ______________________________________________________________ > ICBM: 48.07100, 11.36820 http://www.ativel.com > 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From diep at xs4all.nl Tue Feb 13 02:21:08 2007 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk><200702131422.12498.csamuel@vpac.org><45D1349A.1020308@scalableinformatics.com> Message-ID: <004c01c74f58$af48ab00$0300a8c0@gourmandises> Yeah looks all like not much of a double precision. Of course lucky my chess software is not using much floating point, but integers instead. With respect to integer multiplication, what does the chip support there? 32 x 32 == 64 bits (stored in 2 registers) 64 x 64 == 128 bits ? Thanks, Vincent ----- Original Message ----- From: "Mark Hahn" To: Sent: Tuesday, February 13, 2007 6:32 AM Subject: Re: [Beowulf] Teraflop chip hints at the future >> It looked like it did IEEE754 doubles. Any Intel types out there to >> confirm/deny? > > singles: > > http://www.pcper.com/article.php?aid=363 > > IMO, the chip is mainly interesting to explore how much we can abandon > the von Neumann architecture as a whole, rather than stupidly putting > more and more of them onto a chip. after all, the nearest-neighbor > latency (125 ps!) is comparable to cache or even register-file. > (admittedly, in this chip, the links are only 32b wide, which means any > useful inter-PE message (say, at least a cachineline) would take > more than a couple cycles... > > what I don't really understand is why there aren't lots of groups doing > this kind of exploratory chip. is it just that any interesting chip > tends to push design, circuit and fab boundaries all at the same time? > >>>> http://news.bbc.co.uk/1/hi/technology/6354225.stm > > frankly, I'm a bit embarassed by all these experts being quoted as saying > that multicore is the brave new world. I saw one article that claimed > that no OS existed to utilize 80 threads, and that no programmers could > use them. > (counterexample: Altix running Linux and OpenMP code from pretty mundane > programmers...) > > amdahl's law: not just a good idea... > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From eugen at leitl.org Tue Feb 13 02:49:34 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <002b01c74f57$d239a250$0300a8c0@gourmandises> References: <45CCDC88.8080102@dcc.ufmg.br> <20070213082127.GW21677@leitl.org> <002b01c74f57$d239a250$0300a8c0@gourmandises> Message-ID: <20070213104934.GY21677@leitl.org> On Tue, Feb 13, 2007 at 11:14:55AM +0100, Vincent Diepeveen wrote: > Can you explain the OS issue to us? DRAM die piggy-backing is rather expensive and has its own limit on the memory bandwidth issue, so long-term memory will have to be embedded within the CPU. Because of yield limits such embedded RAM will have only very small sizes, few MBytes at best. (This also opens the way to wafer-scale integration, which has also been overdue for a very long time). Current kernels would be hard-pressed to alone fit into such tight memory spaces. Fortunately, there is no point to include code for e.g. I/O, MMU (notice the Cell doesn't do MMU for SPEs, instead using cache transistors as SRAM), video etc. in an effectively embedded node, so kernels can be slimmed down to few 10 kBytes, thus limiting redundancy. Another critical point is to put message passing (ideally, a subset of MPI) directly into machine instructions, to limit latency. The on-die/on-wafer mesh fabric has to send message with almost the same penalty as the on-die access. OOP does implicit memory protection if there are enough cores, because the only way to impact another node's address space is by sending a message. There also needs to be a machinery which allows a large object to be recursively decomposed into composite objects, which will eventually fit into such a small node. This is not exactly Beowulf anymore, even if such hardware is commodity. Linux might have some life in it, though, if it goes the L4/L3 way, and only runs the whole Linux hog on the fat node (host system). Caveat: my crystal ball might or might not be defective. It's been showing me the same thing for the last decade. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From scheinin at crs4.it Tue Feb 13 02:57:04 2007 From: scheinin at crs4.it (Alan Louis Scheinine) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <20070213082127.GW21677@leitl.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <20070213070612.GA5466@localhost.localdomain> <20070213082127.GW21677@leitl.org> Message-ID: <45D19980.80707@crs4.it> Eugen Leitl wrote: > The problem with this chip (I've been expecting something very like > this in about 1996, and in fact designed a paper CPU which is very much > like this, only much wider, stack-based and leaner-core) is that's pure vaporware. > It would also need radically stripped-down kernels, which basically rules out Linux > (but there are reasonably lean things like L4 & Co available, which > could run Linux as a wrapper on some fatter nodes). Cell (CBE) shows that a similar approach can run Linux, one conventional processor and lots of specialized cores on the same chip. best regards, Alan Scheinine -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From eugen at leitl.org Tue Feb 13 03:07:25 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D19980.80707@crs4.it> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <20070213070612.GA5466@localhost.localdomain> <20070213082127.GW21677@leitl.org> <45D19980.80707@crs4.it> Message-ID: <20070213110725.GZ21677@leitl.org> On Tue, Feb 13, 2007 at 11:57:04AM +0100, Alan Louis Scheinine wrote: > Cell (CBE) shows that a similar approach can run Linux, one > conventional processor and lots of specialized cores on the same chip. Yes, but it lacks the on-die switch to scalably mesh cores, so you only get a few cores for each fat (=many pins) connection to memory. This is no way to run a kNode in one physical box. Apropos of Cell, I presume most of you have seen http://moss.csc.ncsu.edu/~mueller/cluster/ps3/ Anyone knows how badly the GBit virtualization layer (hypervisor) on the PS3 hits latency on Cell Linux? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From pj at sgi.com Mon Feb 12 21:11:41 2007 From: pj at sgi.com (Paul Jackson) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1349A.1020308@scalableinformatics.com> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> Message-ID: <20070212211141.80667629.pj@sgi.com> The most useful article I've found on Intel's teraflop chip is on Anandtech: The Era of Tera: Intel Reveals more about 80-core CPU http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2925&p=1 That article says: Although the chip itself is capable of processing over one trillion floating point operations per second, don't be fooled by the numbers; these aren't 128-bit FP operations but rather single-precision FP operations. Each tile features two fully pipelined 32-bit floating point multiple-accumulator (FPMAC) units. There are no other execution units on each tile, so all arithmetic operations must be carried out through these FPMACs. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.925.600.0401 From harsha at zresearch.com Tue Feb 13 00:56:36 2007 From: harsha at zresearch.com (Harshavardhana) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: GlusterFS 1.2-BENKI (GNU Cluster File System) - Announcement (Mark Hahn) In-Reply-To: <200702102000.l1AK0B0o025171@bluewest.scyld.com> References: <200702102000.l1AK0B0o025171@bluewest.scyld.com> Message-ID: <50837.220.227.64.170.1171356996.squirrel@zresearch.com> Hi Mark, >>>> On IB - nfs works only with IPoIB, whereas glusterfs does SDP (and >>>> ib-verbs, >>>> from the source repository) and is clearly way faster than NFS. >>> >>> "clearly"s like that make me nervous. to an IB enthusiast, SDP may be >>> more aesthetically pleasing, but why do you think IPoIB should be >>> noticably >>> slower than SDP? lower cpu overhead, probably, but many people have no >>> problem running IP at wirespeed on IB/10GE-speed wires... >> >> As I understand it, one reason why SDP is faster than IPoIB is that the >> way IPoIB is currently spec'ed requires there be an extra copy relative >> to SDP. > > that's what I meant by "cpu overhead". but the point is that current > CPUs have 10-20 GB/s of memory bandwidth hanging around, so it's not > necessarily much of a win to avoid a copy. even in olden days, > it was common to show some workloads where hosts doing TCP checksumming > actually _benefited_ performance by populating the cache. > "CPU Overhead" it's a nice word to use in many of the cases. I am not able to understand what are you trying to prove with IPoIB v/s SDP. SDP is better as seen with latency issues and these will help for many of the Engg Applications in Aviation, Energy and Health Care Research departments. Imagine a 1 million Cell contact problems on LS-DYNA for STRESS analysis needs a Higher Network I/O and Disk Speed. As they require days to complete, with a small latency improvement even can bring up a larger gain when the Jobs run for days. Yes there are bottlenecks to the application too comes into picture as the LS-DYNA doesn't scale well after running 24CPUS. But in a very big environment with 500odd machines. With 1000 of users submitting their jobs helps a lot writing and communicating onto a single shared directory through the master server's. >> It is also specced with a smaller MTU, which makes a fair >> difference. I believe there is movement afoot to change the spec to >> allow for a larger MTU, but I'm not an IB expert and don't follow it >> religiously. > > MTU is another one of those things that got a rep for importance, > but which is really only true in certain circumstances. bigger MTU > reduces the per-packet overhead. by squinting at the table in question, > it appears to show ~300 MB/s on a single node. with 8k packets, that's > ~40K pps, vs ~5k pps for 64k MTU. seems like a big win, right? well, > except why assume each packet requires an interrupt? > > reducing the overhead, whether through fewer copies or bigger MTUs > is certainly a good thing. these days, neither is necessarily essential > unless you're really, really pushing the limits. there are only a few > people in the universe (such as Cern, or perhaps the big telescopes) > who genuinely have those kinds of data rates. we're a pretty typical > supercomputing center, I think, and see only quite short bursts into > the GB/s range (aggregate, quadrics+lustre). > > I'm genuinely curious: do you (anyone) have applications which sustain > many GB/s either IPC or IO? > > regards, mark hahn. > Regarding NFS which is well used by many of the companies around the world for their clustering. You name one i can show you their data centers running NFS SGI servers with Filers for each 100nodes eg most renowned names like Intel, GE Global Research, Texas Intruments, Analog Devices.. and many more. Bechmarking against the NFS was to give an idea to the Industry of the benefits of a parallel filesystem against their present running environment. Against LustreFS yes we are coming up with a benchmark which is followed by present NFS benchmark. GlusterFS is trying to prove Scaling with increase in performance and also has a privilege of being in Userspace, which gives us the handling software out of the Kernel Policies of which handling has been proved cumbersome in many cases. Present benchmark is for people to "Give a Sight" into glusterfs and working through it. Regards & Thanks. -- Harshavardhana "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." From becker at scyld.com Tue Feb 13 06:03:20 2007 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] BWBUG: Weather DELAY -- Today's BWBUG meeting rescheduled to Feb 27th at Georgetown university Message-ID: Because of the weather, today's BWBUG meeting (February 13 2007) has been rescheduled to Feb 27 2007. The location will remain the same, Georgetown University. The website has the updated information http://bwbug.org/ As usual, check bwbug.org for updated web conference and dial-in information on the day of the meeting. I'll see you in two weeks! -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 ---------- Forwarded message ---------- Date: Mon, 12 Feb 2007 15:58:38 -0500 From: Michael Fitzmaurice To: bwbug@bwbug.org Subject: bwbug: Because of the weather the BWBUG meeting will be moved to Feb 27th at the same location Georgetown university Because of the weather the BWBUG meeting will be moved to Feb 27th at the same location at Georgetown University. Sorry for the inconvenience. Please go to the http://bwbug.org for more details Mike Fitzmaurice From rbw at ahpcrc.org Tue Feb 13 07:03:56 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> Message-ID: <45D1D35C.8080502@ahpcrc.org> Mark Hahn wrote: >> It looked like it did IEEE754 doubles. Any Intel types out there to >> confirm/deny? > singles: > > http://www.pcper.com/article.php?aid=363 > > IMO, the chip is mainly interesting to explore how much we can abandon > the von Neumann architecture as a whole, rather than stupidly putting > more and more of them onto a chip. after all, the nearest-neighbor > latency (125 ps!) is comparable to cache or even register-file. Yes, but how much does it really abandon von Neumann. It is just a lot of little von Neumann machines unless the mesh is fully programmable and the DRAM stacks can source data for any operation on any cpu as the application's data flows through the application kernel(s) however it is laid out across the chip. And in that case it is a multi-core ASIC emulating an FPGA ... why not just use an FPGA ... ;-) ... and avoid wasting all those hard-wired functional units that won't be needed for this or that particular kernel. To abondon von Neuman you have to abandon the cyclic re-referencing of the same store and "store" results in-wire or along the path defined by a code customed data-flow processor. Them you eliminate as much of the memory reference latency as possible. The problem/question is how much of the given applications kernel can you swallow on a single chip before having to got back to some kind of general memory for data or instructions. I like the idea of an array of FPGA cores on a chip (super-FPGA model). Less wasted hardware. In some sense, these super, multi-mini-core designs are another ASIC hammer looking for a nail. Fixed instruction architectures ultimately waste hardware. Why not program the processor instead of instructions for a predefined one-size fits all ASIC? But I suppose the industry has to get there somehow ... and super-multi-mini core is one way. The RAW processor already mapped out the benefits of this approach, but I think they are just a mile post on the way to a super FPGA model. I think every one should be learning to program in Mitrion-C ... ;-). rbw -- Richard B. Walsh "The world is given to me only once, not one existing and one perceived. The subject and object are but one." Erwin Schroedinger Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From landman at scalableinformatics.com Tue Feb 13 07:36:17 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1D35C.8080502@ahpcrc.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> Message-ID: <45D1DAF1.9060904@scalableinformatics.com> Richard Walsh wrote: > model. I think every one should be learning to program in Mitrion-C ... > ;-). Glad the smiley face is there. As soon as you introduce the abstraction layer of the virtual processor, you diminish performance, as you can't cram enough devices onto the gates. > rbw I am not anti-FPGA (contrary actually, all APUs are good APUs if they are providing significant acceleration). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From scheinin at crs4.it Tue Feb 13 08:27:45 2007 From: scheinin at crs4.it (Alan Louis Scheinine) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1D35C.8080502@ahpcrc.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> Message-ID: <45D1E701.3030207@crs4.it> Richard Walsh wrote: > why not just use an FPGA ... ;-) I taught myself VHDL and programmed two algorithms in Mitrion-C, but I decided that less general hardware such as CBE (Cell) is better for HPC. I made an estimate of how much area on a FPGA is needed for a group of controllers (the smallest type, pico-Blaze) for having software control of the arithmetic and the area for communications and decided that CBE is much faster for the same funtionality. FPGA is just too general for HPC. It reminds me of the evolution of Thinking Machines in which they started at the bit-level then built machines with Weitek ALU's at the end of each row. I would like to add that the algorithms that I see are often not purely SIMD though they are parallel. CBE is partially SIMD in that each processor can work on 4 4-byte words with the same operation, but in addition there are 8 special purpose processors each with their own instruction stream. Examples of high speed-ups for SIMD problems is like picking the low-lying fruit. IMHO having many instruction streams allows speed-up of algorithms that involve many logical decisions -- what comes to mind is Monte Carlo simulation of the flow of genes in pedigrees for finding the genetic component of somatic attributes. More generally, pointer chasing. MIMD at a high level of core (instruction sequencer) integration would give a performance boost to an interesting sector of algorithms of the pointer chasing type. I have not written algorithms for the CBE, so my remarks on that aspect may be ignorant. best regards, Alan Scheinine -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From rgb at phy.duke.edu Tue Feb 13 08:41:15 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1D35C.8080502@ahpcrc.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> Message-ID: On Tue, 13 Feb 2007, Richard Walsh wrote: > To abondon von Neuman you have to abandon the cyclic re-referencing of > the same store and "store" results in-wire or along the path defined by a > code > customed data-flow processor. Them you eliminate as much of the memory Or perhaps you have to move to http://en.wikipedia.org/wiki/Quantum_computing This actually moves you into an entirely different class of computational complexity, a process that is so intrinsically parallel that it is actually quite difficult and immensely expensive to express it serially! Just for the fun of it, mind you. Although there are definitely plenty of people working on this very hard, and I'm guessing that we'll start seeing this within 1-2 decades if not sooner. Not every problem maps well into it, but the ones that do... rgb > reference latency as possible. The problem/question is how much of the > given applications kernel can you swallow on a single chip before having to > got back to some kind of general memory for data or instructions. I like the > idea > of an array of FPGA cores on a chip (super-FPGA model). Less wasted > hardware. In some sense, these super, multi-mini-core designs are another > ASIC hammer looking for a nail. Fixed instruction architectures ultimately > waste hardware. Why not program the processor instead of instructions > for a predefined one-size fits all ASIC? > > But I suppose the industry has to get there somehow ... and super-multi-mini > core is one way. The RAW processor already mapped out the benefits of > this approach, but I think they are just a mile post on the way to a super > FPGA > model. I think every one should be learning to program in Mitrion-C ... ;-). > > rbw > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From James.P.Lux at jpl.nasa.gov Tue Feb 13 09:17:50 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> Message-ID: <6.2.3.4.2.20070213090901.02e03c28@mail.jpl.nasa.gov> At 09:32 PM 2/12/2007, Mark Hahn wrote: >>It looked like it did IEEE754 doubles. Any Intel types out there >>to confirm/deny? >what I don't really understand is why there aren't lots of groups doing >this kind of exploratory chip. is it just that any interesting chip >tends to push design, circuit and fab boundaries all at the same time? > >>>>http://news.bbc.co.uk/1/hi/technology/6354225.stm > >frankly, I'm a bit embarassed by all these experts being quoted as saying >that multicore is the brave new world. Wouldn't Illiac IV be an example of multicore? (albeit SIMD, and I assume the terascale doesn't require all cores to do same ops in lockstep) Maybe CM would be a better early example? > I saw one article that claimed that no OS existed to utilize 80 > threads, and that no programmers could use them. Jeeze.. pop up task manager in my desktop machine running WinXP, and there's gotta be at least 100 threads.. granted, some are blocked for I/O or timers, or not doing a whole lot >(counterexample: Altix running Linux and OpenMP code from pretty mundane >programmers...) > >amdahl's law: not just a good idea... >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Tue Feb 13 09:25:38 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> Message-ID: <6.2.3.4.2.20070213091802.02dff098@mail.jpl.nasa.gov> At 09:46 PM 2/12/2007, Mark Hahn wrote: >>Intel is stacking dram dice above the cpu as an L4 cache, but the article > >stacking seems like a major hack - I'd rather think about how to do >processor-in-memory (perhaps zram?). It's a technology thing.. you can't get DRAM densities with processes used for CPUs and the like. Different fabs, different processes, even though the feature sizes are similar. There's also some thermal issues. If you use a CPU process to build ram, it's not very dense (think cache on current chips... which I think tends to be static ram at 3 transistors per cell). I don't know that you can even build a big CPU on a DRAM process. DRAMs are pretty highly optimized (read, they've spent billions of dollars on tweaking the device models to within a gnats eyelash of the physics limits).. for instance, because with DRAM you only read or write one location at time, very few transistors change state on any given cycle, so the power dissipation is low. Compare with a CPU where you have thousands of transistors changing state on a cycle. I'm not a chip designer, so there's probably a lot of subtleties... >>doesn't really >>explain how they plan to expand the off-chip bw other than by going >>to photonics >>eventually. > >isn't photonics still at the hand-waving stage? I was just noticing how 10G >XFP's have not gotten much cheaper over the past couple years. is there >really a prospect for wide and fast photonic links, given that copper links >are at ~3 Gb pretty easily? have people figured out how to >mass-produce photonic-chipped systems as efficient as copper PC >boards and bumped chips? Not really... there IS some progress on VCSELs and detectors, but the conventional transmit data with electrons instead of photons folks are also making progress. 1 Tb/sec isn't unusual. As always, dealing with propagation uncertainties (electromagnetic, either way) is challenging. At 1 Tb/sec, a bit is only 30 microns long in free space. Go to the IEEE High Speed Digital Interconnect Workshop in Santa Fe this year... there's amazing stuff that people are doing. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Tue Feb 13 09:36:36 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D1D35C.8080502@ahpcrc.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> Message-ID: <6.2.3.4.2.20070213092750.02d9cd18@mail.jpl.nasa.gov> At 07:03 AM 2/13/2007, Richard Walsh wrote: >Mark Hahn wrote: >>>It looked like it did IEEE754 doubles. Any Intel types out there >>>to confirm/deny? >>singles: >> >>http://www.pcper.com/article.php?aid=363 >> >>IMO, the chip is mainly interesting to explore how much we can abandon >>the von Neumann architecture as a whole, rather than stupidly putting >>more and more of them onto a chip. after all, the nearest-neighbor >>latency (125 ps!) is comparable to cache or even register-file. >Yes, but how much does it really abandon von Neumann. It is just a lot >of little von Neumann machines unless the mesh is fully programmable >and the DRAM stacks can source data for any operation on any cpu as >the application's data flows through the application kernel(s) however it >is laid out across the chip. And in that case it is a multi-core >ASIC emulating >an FPGA ... why not just use an FPGA ... ;-) ... and avoid wasting all those >hard-wired functional units that won't be needed for this or that particular >kernel. In fact, modern high density FPGAs (viz Xilinx Virtex II 6000 series) have partitioned their innards into little cells, some with ALU and combinatorial logic and a little memory, some with lots of memory and not so much logic. And, you can program them in Verilog, which is a fairly high level language. There are huge libraries of useful functions out there that you can "call". It's still a bit (a lot?) clunky compared to zapping out C code on a general purpose machine, but it can be done. of an array of FPGA cores on a chip (super-FPGA model). Less wasted >hardware. In some sense, these super, multi-mini-core designs are another >ASIC hammer looking for a nail. Fixed instruction architectures ultimately >waste hardware. Why not program the processor instead of instructions >for a predefined one-size fits all ASIC? I think that as a general rule, the special purpose cores (ASICs) are going to be smaller, lower power, and faster (for a given technology) than the programmable cores (FPGAs). Back in the late 90s, I was doing tradeoffs between general purpose CPUs (PowerPCs), DSPs (ADSP21020), and FPGAs for some signal processing applications. At that time, the DSP could do the FFTs, etc, for the least joules and least time. Since then, however, the FPGAs have pulled ahead, at least for spaceflight applications. But that's not because of architectural superiority in a given process.. it's that the FPGAs are benefiting from improvements in process (higher density) and nobody is designing space qualified DSPs using those processes (so they are stuck with the old processes). Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented in an FPGA, although there are a couple of space qualified ASIC implementations (from Atmel and Aeroflex). In a high volume consumer application, where cost is everything, the ASIC is always going to win over the FPGA. For more specialized scientific computing, the trade is a bit more even... But even so, the beowulf concept of combining large numbers of commodity computers leverages the consumer volume for the specialized application, giving up some theoretical performance in exchange for dollars. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From greg.lindahl at qlogic.com Tue Feb 13 23:10:42 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <20070212211141.80667629.pj@sgi.com> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <20070212211141.80667629.pj@sgi.com> Message-ID: <20070214071042.GA4621@localhost.localdomain> > That article says: > > Although the chip itself is capable of processing over one trillion > floating point operations per second, don't be fooled by the numbers; > these aren't 128-bit FP operations but rather single-precision FP > operations. Yes, but x86 doesn't have 128-bit FP operations -- it does multiple 32-bit or 64-bit ops at once. So at best anandtech is being confused, and at worst they're really confused. I've never seen anyone ever count a group of 32-bit or 64-bit ops as 128-bit ops. And we shouldn't start doing it, either. -- greg From hahn at mcmaster.ca Wed Feb 14 09:17:59 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <6.2.3.4.2.20070213091802.02dff098@mail.jpl.nasa.gov> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <6.2.3.4.2.20070213091802.02dff098@mail.jpl.nasa.gov> Message-ID: >>> Intel is stacking dram dice above the cpu as an L4 cache, but the article >> >> stacking seems like a major hack - I'd rather think about how to do >> processor-in-memory (perhaps zram?). > > It's a technology thing.. you can't get DRAM densities with processes used > for CPUs and the like. Different fabs, different processes, even though the 1Gb (128GB) seems to be the current state-of-production for normal DRAM; Intel has 24 MB on some chips, though we mightn't call those production - the mass-market chips are at a "mere" 8MB onchip. so, waving hands wildly, there's about a 16x density advantage; this is a bit more than one might expect from transistor counts (~1 vs ~6, iirc), but as you say, dram is highly tweaked for density. > feature sizes are similar. There's also some thermal issues. If you use a > CPU process to build ram, it's not very dense (think cache on current actually, I was more thinking of putting more memory (not necessarily standard dram) onto a CPU-oriented process. > don't know that you can even build a big CPU on a DRAM process. DRAMs are > pretty highly optimized (read, they've spent billions of dollars on tweaking > the device models to within a gnats eyelash of the physics limits).. for that's not the point, of course - even a small CPU on each dram chip would add up to a profoundly powerful system. for instance, take a pretty mundane 2-socket, 16GB workstation today and notice it's got probably 128 separate dram chips. imagine if each of those had even a small onchip processor (say, 2-4Mt). the potential is there for something quite useful (I admit practical problems to getting dram vendors/industry to do such a thing...) > instance, because with DRAM you only read or write one location at time, very well, I have the impression that a lot of the power dissipated by modern chips is actually the external clock/PLL and drivers. then again, a dram chip only dissipates a fraction of a watt (I looked at a Micron 1Gb ddr2/667- it could possibly dissipate <.5 (all banks interleave), but normal back-to-back sequential activity would be only ~.3W. that's for ddr2 at 1.8V - ddr3 is 1.5 and I imagine the trend to lower voltages will continue. > few transistors change state on any given cycle, so the power dissipation is > low. Compare with a CPU where you have thousands of transistors changing > state on a cycle. that's still a good point. a single transaction on a current dram would only warm up one row of one bank. probably modelable by ignoring the dissipation of the array itself, and just counting the control/sense/io logic. > Go to the IEEE High Speed Digital Interconnect Workshop in Santa Fe this > year... there's amazing stuff that people are doing. alas, my day-job is sys admin/programmer/dogsbody, not designing new, cutting-edge compute architectures ;( regards, mark hahn. From James.P.Lux at jpl.nasa.gov Wed Feb 14 09:51:21 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <6.2.3.4.2.20070213091802.02dff098@mail.jpl.nasa.gov> Message-ID: <6.2.3.4.2.20070214093547.02e76ba8@mail.jpl.nasa.gov> At 09:17 AM 2/14/2007, Mark Hahn wrote: >that's not the point, of course - even a small CPU on each dram chip >would add up to a profoundly powerful system. for instance, take a >pretty mundane >2-socket, 16GB workstation today and notice it's got probably 128 separate >dram chips. imagine if each of those had even a small onchip processor >(say, 2-4Mt). the potential is there for something quite useful (I >admit practical problems to getting dram vendors/industry to do such >a thing...) I'm not sure you could put any processor (except maybe something like a microcontroller) into a DRAM design and keep the densities up. There are all sorts of things that might bite you.. aside from thermal issues, I suspect that the number of mask layers, etc. is fairly small for DRAM. The actual materials on the chip (doping levels, etc.) may not allow for a reasonably performing processor with reasonable feature sizes and thermal properties. Getting the heat away from the junction is a big deal. I think DRAMs are built with a maximum of 4 layers of interconnect with vias, while processors have a lot more layers and a much more sophisticated interconnect structure. >>instance, because with DRAM you only read or write one location at time, very > >well, I have the impression that a lot of the power dissipated by modern >chips is actually the external clock/PLL and drivers. Each and every switch has some non-zero power associated with changing state. Sure, the core swings smaller voltages and energies, but a DRAM cell is a lot smaller than a flipflop or half-adder in the CPU, and only one is changing at a time, as opposed to thousands. > then again, a dram chip only dissipates a fraction of a watt (I > looked at a Micron 1Gb ddr2/667- it could possibly dissipate <.5 > (all banks interleave), but normal >back-to-back sequential activity would be only ~.3W. that's for ddr2 at >1.8V - ddr3 is 1.5 and I imagine the trend to lower voltages will continue. To a point.. at some point, the leakage current starts to dominate over the switching energy as you make the features smaller and smaller. around 1 Volt is "how low you can go" voltage is important for switching energy because it goes as V^2*C, while power for leakage is linear in V. A big advantage of integrating CPU and memory, though, is that you don't have to "go offchip" which saves a huge amount in drivers/receivers, etc. Of course, this is why everyone is looking to integrated photonics and/or real high speed serial interconnects. The I/O buffer might consume a hundred or thousand times more power than the onchip logic driving it. Trading some more logic inside to serialize and deserialize, and do adapative equalization, in exchange for fewer "wires out of the chip" is a good deal. Then, there's the speed of light problem. Put two chips 10cm apart on a board, and the round trip time (say for address to get there and data to get back) is going to be in the nanoseconds area, even if the chip itself were infinitely fast. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From eugen at leitl.org Wed Feb 14 10:15:28 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <6.2.3.4.2.20070214093547.02e76ba8@mail.jpl.nasa.gov> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <8DD7A007-1CDB-494E-BDB3-F795E646A10B@sicortex.com> <6.2.3.4.2.20070213091802.02dff098@mail.jpl.nasa.gov> <6.2.3.4.2.20070214093547.02e76ba8@mail.jpl.nasa.gov> Message-ID: <20070214181528.GD21677@leitl.org> On Wed, Feb 14, 2007 at 09:51:21AM -0800, Jim Lux wrote: > I'm not sure you could put any processor (except maybe something like > a microcontroller) into a DRAM design and keep the densities > up. There are all sorts of things that might bite you.. aside from IBM has just announced at the ISSCC a 1-transistor eDRAM substitute for the 6T-SRAM cell used in caches. (Others have already demonstrated 1T-SRAM years ago, AMD has Z-RAM, Intel Floating Body Cells, T-RAM doesn't need a capacitor, etc. -- embedded RAM is reasonably common in network processors, IIRC). http://www.heise.de/newsticker/meldung/85295 It's 45 nm SOI (starting 2008), 1.5 ns access (SRAM does 0.8..1 ns), and is supposed to be far more dissipation-friendly. Theoretically this gives you 6 times the eDRAM of a CPU cache, which is at least 12 MBytes, and possibly up to 48 MBytes (Power6 dual-core has 8 MBytes on-die cache). > thermal issues, I suspect that the number of mask layers, etc. is > fairly small for DRAM. The actual materials on the chip (doping > levels, etc.) may not allow for a reasonably performing processor > with reasonable feature sizes and thermal properties. Getting the > heat away from the junction is a big deal. > > I think DRAMs are built with a maximum of 4 layers of interconnect > with vias, while processors have a lot more layers and a much more > sophisticated interconnect structure. Above processes are compatible with CPU processes, so there's some hope the piggybacking in Terascale doesn't have to be forever. > Each and every switch has some non-zero power associated with > changing state. Sure, the core swings smaller voltages and energies, > but a DRAM cell is a lot smaller than a flipflop or half-adder in the > CPU, and only one is changing at a time, as opposed to thousands. At the horizon, there's MRAM which can also do logic with a little extension to each cell (a kind of nonvolatile FPGA). It's not that hugely fast, but it's static, and very low power. > A big advantage of integrating CPU and memory, though, is that you > don't have to "go offchip" which saves a huge amount in > drivers/receivers, etc. Of course, this is why everyone is looking Yes, this is a major advantage. No pads, too, but a few serial high-speed links. > to integrated photonics and/or real high speed serial > interconnects. The I/O buffer might consume a hundred or thousand > times more power than the onchip logic driving it. Trading some more > logic inside to serialize and deserialize, and do adapative > equalization, in exchange for fewer "wires out of the chip" is a good deal. > > Then, there's the speed of light problem. Put two chips 10cm apart Increasing density to true 3d integration is a very good way to reduce the average distance. Stacking computation modules on a 3d lattice also minimizes dead space, of course with current cooling you won't get more than a few 10 MW out of a paper basket volume before the cluster goes China syndrome. > on a board, and the round trip time (say for address to get there and > data to get back) is going to be in the nanoseconds area, even if the > chip itself were infinitely fast. The mammal CNS has a 120 m/s signalling limit, yet it can process pretty complex stimuli in few 10 ms. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From mathog at caltech.edu Wed Feb 14 12:12:20 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] S2466 permanent poweroff, round 2 Message-ID: Robert G. brown wrote: > They were so damn touchy and difficult to get > running so that they actually were stable and so that the buttons worked > and so on that once we finally got there, I'd have taken a hammer to the > head of anybody that tried to change them. The S2466Ns are incredibly touchy, aren't they? When reimaging the other 19 nodes some of them had to be reset, and in a couple of cases, unplugged and plugged back in, before they all come up properly with "boel" loaded from the headnode. I never did get acpi working perfectly, just (barely) good enough. If the "button" module is loaded once, and never looked at sideways again, then after "poweroff" the front panel switch works to restart the system. Turn acpi on, or even just do: rmmod button; modprobe button and that front panel switch won't work after "poweroff". Never could get acpid working at the same time, so no way to trigger a shutdown from the power button. That's less of a problem though, since historically if "rsh nodename; poweroff" doesn't get through, it's about 50% odds that that node will also ignore its reset and power buttons. Anyway, the two main reasons for upgrading these nodes were: 1. Get athcool working. This knocks about 50W/CPU off the idle power consumption and drops the idle CPU temps from 39C to 29C. (A previous fling with athcool and the previous kernel did not work.) 2. Hopefully eliminate a bug in i2c that was causing sensors to stop working every once in a while, resulting in node shutdowns because the Over Temp scripts would suddenly be unable to obtain valid temperature or fan speed measurements. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From atp at piskorski.com Wed Feb 14 13:25:00 2007 From: atp at piskorski.com (Andrew Piskorski) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: Message-ID: <20070214212457.GA19610@tehun.pair.com> On Wed, Feb 14, 2007 at 12:17:59PM -0500, Mark Hahn wrote: > that's not the point, of course - even a small CPU on each dram chip would > add up to a profoundly powerful system. for instance, take a pretty mundane > 2-socket, 16GB workstation today and notice it's got probably 128 separate > dram chips. imagine if each of those had even a small onchip processor A few years ago, there was at least one academic research project designing such CPU-in-RAM chips. Basically, a RAM chip with a smallish CPU in the corner. I can't remember the name of the project though! I think they actually fabbed some of their chips, although I don't recall for sure. But, I do remember that they were strictly working on SINGLE chips, one small CPU in one big RAM chip. At least at the time, they were not even considering how to link up many such chips to form a larger, multi-CPU machine. -- Andrew Piskorski http://www.piskorski.com/ From brian.ropers.huilman at gmail.com Wed Feb 14 14:01:46 2007 From: brian.ropers.huilman at gmail.com (Brian D. Ropers-Huilman) Date: Wed Nov 25 01:05:41 2009 Subject: Fwd: [Beowulf] Teraflop chip hints at the future In-Reply-To: References: <20070214212457.GA19610@tehun.pair.com> Message-ID: Meant to send this to the list as well... ---------- Forwarded message ---------- From: Brian D. Ropers-Huilman Date: Feb 14, 2007 4:01 PM Subject: Re: [Beowulf] Teraflop chip hints at the future To: Andrew Piskorski Perhaps your thinking of Thomas Sterling's work on the MIND processor: "He is developing the MIND processor in memory architecture based on ParalleX, an advanced message-driven split-transaction computing model for scalable low-power fault-tolerant operation. In addition, he is developing an ultra lightweight supervisor runtime kernel in support of MIND and other fine grain architectures (like CELL) and the Agincourt parallel programming language for high efficiency through intrinsics in support of latency hiding and low overhead synchronization for both conventional and innovative parallel computer architectures." [ http://cacr.library.caltech.edu/104/01/IWIA_paper2005_Sterling.pdf ] [ http://www.cct.lsu.edu/~tron/ ] On 2/14/07, Andrew Piskorski wrote: > On Wed, Feb 14, 2007 at 12:17:59PM -0500, Mark Hahn wrote: > > > that's not the point, of course - even a small CPU on each dram chip would > > add up to a profoundly powerful system. for instance, take a pretty mundane > > 2-socket, 16GB workstation today and notice it's got probably 128 separate > > dram chips. imagine if each of those had even a small onchip processor > > A few years ago, there was at least one academic research project > designing such CPU-in-RAM chips. Basically, a RAM chip with a > smallish CPU in the corner. I can't remember the name of the project > though! > > I think they actually fabbed some of their chips, although I don't > recall for sure. But, I do remember that they were strictly working > on SINGLE chips, one small CPU in one big RAM chip. At least at the > time, they were not even considering how to link up many such chips to > form a larger, multi-CPU machine. > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Brian D. Ropers-Huilman -- Brian D. Ropers-Huilman From eugen at leitl.org Fri Feb 16 07:11:40 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] failure trends in a large disk drive population Message-ID: <20070216151140.GO21677@leitl.org> http://labs.google.com/papers/disk_failures.pdf -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From hahn at mcmaster.ca Fri Feb 16 12:08:13 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] failure trends in a large disk drive population In-Reply-To: <20070216151140.GO21677@leitl.org> References: <20070216151140.GO21677@leitl.org> Message-ID: > http://labs.google.com/papers/disk_failures.pdf this is awesome! my new new-years resolution is to be more google-like, especially in gathering potentially large amounts of data for this kind of retrospective analysis. thanks for posting the ref. From justin at cs.duke.edu Fri Feb 16 12:17:25 2007 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] failure trends in a large disk drive population In-Reply-To: <20070216151140.GO21677@leitl.org> References: <20070216151140.GO21677@leitl.org> Message-ID: > http://labs.google.com/papers/disk_failures.pdf Despite my Duke e-mail address, I've been at Google since July. While I'm not a co-author, I'm part of the group that did this study and can answer (some) questions people may have about the paper. -jdm Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu Web: http://www.cs.duke.edu/~justin/ From weikuan.yu at gmail.com Tue Feb 13 11:40:52 2007 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] HotI 2007 Call for Papers Message-ID: <45D21444.3080608@gmail.com> Apologies for _multiple_ copies ========================================================== Hot Interconnects 15 IEEE Symposium on High-Performance Interconnects August 22-24, 2007 Stanford University Palo Alto, California, USA Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from on-chip processor-memory interconnects to wide-area networks. This yearly conference is very well attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field. Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols. This conference is directed particularly at new and exciting technology and product innovations in these areas. Contributions should focus on real experimental systems, prototypes, or leading-edge products and their performance evaluation. In addition to those subscribing to the main theme of the conference, contributions are also solicited in the topics listed below. * Novel and innovative interconnect architectures * Multi-core processor interconnects * System-on-Chip Interconnects * Advanced chip-to-chip communication technologies * Optical interconnects * Protocol and interfaces for interprocessor communication * Survivability and fault-tolerance of interconnects * High-speed packet processing engines and network processors * System and storage area network architectures and protocols * High-performance host-network interface architectures * High-bandwidth and low-latency I/O * Tb/s switching and routing technologies * Innovative architectures for supporting collective communication * Novel communication architectures to support grid computing Submission Guideline o Submission deadline: March 31, 2007 o Notification of acceptance: May 15, 2007 o Papers need sufficient technical detail to judge quality and suitability for presentation. o Submit title, author, abstract, and full paper (six pages, double-column, IEEE format). o Papers should be submitted electronically at the specified link location found on http://www.hoti.org o For further information please see http://www.hoti.org/hoti15/cfp.html About the Conference - Conference held at the William Hewlett Teaching Center at Stanford University. - Papers selected will be published in proceedings by the IEEE Computer Society. - Presentations are 30-minute talks in a single-track format. - Online information at http://www.hoti.org GENERAL CO-CHAIRS * John W. Lockwood, Washington University in St. Louis * Fabrizio Petrini, Pacific Northwest National Laboratory TECHNICAL CO-CHAIRS * Ron Brightwell, Sandia National Laboratories * Dhabaleswar (DK) Panda, The Ohio State University LOCAL ARRANGEMENTS CHAIR * Songkrant Muneenaem, Washington University in St. Louis PANEL CHAIR * Daniel Pitt, Santa Clara University PUBLICITY CO-CHAIRS * Weikuan Yu, Oak Ridge National Laboratory PUBLICATION CHAIR * Luca Valcarenghi, Scuola Superiore Sant?Anna FINANCE CHAIR * Herzel Ashkenazi, Xilinx TUTORIAL CO-CHAIRS - TBA REGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis Webmaster * Liz Rogers, LRD Group Steering Committee o Allen Baum, Intel o Lily Jow, Hewlett Packard o Mark Laubach, Broadband Physics o John Lockwood, Washington University in St. Louis o Daniel Pitt, Santa Clara University From list.sudhakar at gmail.com Wed Feb 14 21:51:28 2007 From: list.sudhakar at gmail.com (Sudhakar G) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] DLM internals Message-ID: Hi, Can any one let me know how DLM (Distributed Lock Manager) works. The internals of it. ie., whether the logic of granting of locks is centralised or distributed. If distributed how? Thanks Sudhakar -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070215/1864b1a1/attachment.html From libo at buaa.edu.cn Thu Feb 15 17:55:43 2007 From: libo at buaa.edu.cn (Li, Bo) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> Message-ID: <001801c7516d$95ab11d0$5a01a8c0@JSIIBM> Intel has been promote a conception Many Cores and Small Cores for Teraflop chip, which was reported recently. I have done some on Cell programming and optimization. Many-Core architectures will be a bit difficult for programmers, not for its algorithms but for its inter-connection. When 80 cores are hungry with data and codes, I am afraid of the performance. How to manage them with a better architecture will be a serious topic for us. Cell will reach Teraflop in 2010 and currently its single performance is about 256GFlops. I think its performance is more reasonable than Intel one at moment. Regards, Li, Bo From ntmoore at gmail.com Fri Feb 16 06:50:57 2007 From: ntmoore at gmail.com (Nathan Moore) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] (no subject) Message-ID: Hello all, I have a small beowulf cluster of Scientific Linux 4.4 machines with common NIS logins and NFS shared home directories. In the short term, I'd rather not buy a tape drive for backups. Instead, I've got a jury-rigged backup scheme. The node that serves the home directories via NFS runs a nightly tar job (through cron), root@server> tar cf home_backup.tar ./home root@server> mv home_backup.tar /data/backups/ where /data/backups is a folder that's shared (via NFS) across the cluster. The actual backup then occurs when the other machines in the cluster (via cron) copy home_backup.tar to a private (root-access- only) local directory. root@client> cp /mnt/server-data/backups/home_backup.tar /private_data/ where "/mnt/server-data/backups/" is where the server's "/data/ backups/" is mounted, and where /private_data/ is a folder on client's local disk. Here's the problem I'm seeing with this scheme. users on my cluster have quite a bit of stuff stored in their home directories, and home_backup.tar is large (~4GB). When I try the cp command on client, only 142MB of the 4.2GB is copied over (this is repeatable - not a random error, and always about 142MB). The cp command doesn't fail, rather, it quits quietly. Why would only some of the file be copied over? Is there a limit on the size of files which can be transferred via NFS? There's certainly sufficient space on disk for the backups (both client's and server's disks are 300GB SATA drives, formatted to ext3) I'm using the standard NFS that's available in SL43, config is basically default. regards, Nathan Moore - - - - - - - - - - - - - - - - - - - - - - - Nathan Moore Physics, Pasteur 152 Winona State University nmoore@winona.edu AIM:nmoorewsu - - - - - - - - - - - - - - - - - - - - - - - From mathog at caltech.edu Fri Feb 16 12:50:49 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population Message-ID: Eugen Leitl wrote: > http://labs.google.com/papers/disk_failures.pdf Interesting. However google apparently uses: serial and parallel ATA consumer-grade hard disk drives, ranging in speed from 5400 to 7200 rpm Not quite clear what they meant by "consumer-grade", but I'm assuming that it's the cheapest disk in that manufacturer's line. I don't typically buy those kinds of disks, as they have only a 1 year warranty but rather purchase those with 5 year warranties. Even for workstations. So I'm not too sure how useful their data is. I think everyone here would have agreed without the study that a disk reallocating blocks and throwing scan errors is on the way out. Quite surprising about the lack of a temperature correlation though. At the very least I would have expected increased temps to lead to faster loss of bearing lubricant. That tends to manifest as a disk that spun for 3 years not being able to restart after being off for a half an hour. Presumably you've all seen that. If they have great power and systems management at their data centers the systems may not have been down long enough for this to be observed. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From peter.st.john at gmail.com Fri Feb 16 13:22:59 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] (no subject) In-Reply-To: References: Message-ID: Nathan, You might experiment with the flags to cp; e.g. I might try cp -i -v The -i will prompt you when it wants to overwrite an existing file (maybe at 142MB in you are getting a permissions error) and -v is verbose (so maybe it will stop failing silently). You also might want to specify stderr at the command line; you may not be seeing the window that the error messages are in, depending on how your windows are set up. This happens in the NT environment all the time, GUI errors go to a DOS prompt that blinks and vanishes :-) I use UGU.com to find man pages for arbitrary flavors of unix. Peter On 2/16/07, Nathan Moore wrote: > > Hello all, > > I have a small beowulf cluster of Scientific Linux 4.4 machines with > common NIS logins and NFS shared home directories. In the short > term, I'd rather not buy a tape drive for backups. Instead, I've got > a jury-rigged backup scheme. The node that serves the home > directories via NFS runs a nightly tar job (through cron), > > root@server> tar cf home_backup.tar ./home > root@server> mv home_backup.tar /data/backups/ > > where /data/backups is a folder that's shared (via NFS) across the > cluster. The actual backup then occurs when the other machines in > the cluster (via cron) copy home_backup.tar to a private (root-access- > only) local directory. > > root@client> cp /mnt/server-data/backups/home_backup.tar > /private_data/ > > where "/mnt/server-data/backups/" is where the server's "/data/ > backups/" is mounted, and where /private_data/ is a folder on > client's local disk. > > Here's the problem I'm seeing with this scheme. users on my cluster > have quite a bit of stuff stored in their home directories, and > home_backup.tar is large (~4GB). When I try the cp command on > client, only 142MB of the 4.2GB is copied over (this is repeatable - > not a random error, and always about 142MB). The cp command doesn't > fail, rather, it quits quietly. Why would only some of the file be > copied over? Is there a limit on the size of files which can be > transferred via NFS? There's certainly sufficient space on disk for > the backups (both client's and server's disks are 300GB SATA drives, > formatted to ext3) > > I'm using the standard NFS that's available in SL43, config is > basically default. > regards, > > Nathan Moore > > > - - - - - - - - - - - - - - - - - - - - - - - > Nathan Moore > Physics, Pasteur 152 > Winona State University > nmoore@winona.edu > AIM:nmoorewsu > - - - - - - - - - - - - - - - - - - - - - - - > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070216/c4bf178c/attachment.html From landman at scalableinformatics.com Fri Feb 16 13:40:59 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <45D624EB.80602@scalableinformatics.com> Hi David David Mathog wrote: > Eugen Leitl wrote: > >> http://labs.google.com/papers/disk_failures.pdf > > Interesting. However google apparently uses: > > serial and parallel ATA consumer-grade hard disk drives, > ranging in speed from 5400 to 7200 rpm > > Not quite clear what they meant by "consumer-grade", but I'm assuming > that it's the cheapest disk in that manufacturer's line. I don't > typically buy those kinds of disks, as they have only a 1 year > warranty but rather purchase those with 5 year warranties. Even > for workstations. Seagates. > > So I'm not too sure how useful their data is. I think everyone here Quite useful IMO. I know it would be PC, but I (and many others) would like to see a clustering of the data, specifically to see if there are any hyperplanes that separate the disks in terms of vendors, models, interfaces, etc. CERN had a study up about this which I had read and linked to, but now it seems to be gone, and I did not download a copy for myself. > would have agreed without the study that a disk reallocating blocks and > throwing scan errors is on the way out. Quite surprising about the "Tic tic tic whirrrrrrr" scares the heck out of me now :( > lack of a temperature correlation though. At the very least I would > have expected increased temps to lead to faster loss of bearing > lubricant. That tends to manifest as a disk that spun for 3 years > not being able to restart after being off for a half an hour. > Presumably you've all seen that. If they have great power and systems > management at their data centers the systems may not have been > down long enough for this to be observed. With enough disks, their sampling should be reasonably good, albeit biased towards their preferred vendor(s) and model(s). Would like to see that data. CERN compared SCSI, IDE, SATA, and FC. They found (as I remember, quoting from a document I no longer can find online) that there really weren't any significant reliability differences between them. I would like to see this sort of analysis here, and see if the real data (not the estimated MTBFs) shows a signal. I am guessing that we could build a pragmatic and time dependent MTBF based upon the time rate of change of the AFR. I think the Google paper was basically saying that they wanted to do something like this using the SMART data, but found that it was insufficient by itself to render a meaningful predictable model. That is, in and of itself, quite interesting. If you could read back reasonable sets of parameters from a machine and estimate the likelihood of it going south, this would be quite nice (or annoying) for admins everywhere. Also good in terms of tightening down real support costs and the value of warranties, default and extended. > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From landman at scalableinformatics.com Fri Feb 16 14:01:21 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <45D624EB.80602@scalableinformatics.com> References: <45D624EB.80602@scalableinformatics.com> Message-ID: <45D629B1.6070209@scalableinformatics.com> Joe Landman wrote: > Quite useful IMO. I know it would be PC, but I (and many others) would s/PC/non-PC/ my fault -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Fri Feb 16 14:05:40 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population Message-ID: Justin Moore wrote: > Subject: Re: [Beowulf] failure trends in a large disk drive population > To: Eugen Leitl > Cc: Beowulf@beowulf.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > > http://labs.google.com/papers/disk_failures.pdf > > Despite my Duke e-mail address, I've been at Google since July. While > I'm not a co-author, I'm part of the group that did this study and can > answer (some) questions people may have about the paper. > Dangling meat in front of the bears, eh? Well... Is there any info for failure rates versus type of main bearing in the drive? Failure rate versus any other implementation technology? Failure rate vs. drive speed (RPM)? Or to put it another way, is there anything to indicate which component designs most often result in the eventual SMART events (reallocation, scan errors) and then, ultimately, drive failure? Failure rates versus rack position? I'd guess no effect here, since that would mostly affect temperature, and there was little temperature effect. Failure rates by data center? (Are some of your data centers harder on drives than others? If so, why?) Are there air pressure and humidity measurements from your data centers? Really low air pressure (as at observatory height) is a known killer of disks, it would be interesting if lesser changes in air pressure also had a measurable effect. Low humidity cranks up static problems, high humidity can result in condensation. Again, what happens with values in between? Are these effects quantifiable? Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From rbw at ahpcrc.org Fri Feb 16 14:17:20 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <6.2.3.4.2.20070213092750.02d9cd18@mail.jpl.nasa.gov> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> <6.2.3.4.2.20070213092750.02d9cd18@mail.jpl.nasa.gov> Message-ID: <45D62D70.2020807@ahpcrc.org> Jim Lux wrote: > At 07:03 AM 2/13/2007, Richard Walsh wrote: >> Yes, but how much does it really abandon von Neumann. It is just a lot >> of little von Neumann machines unless the mesh is fully programmable >> and the DRAM stacks can source data for any operation on any cpu as >> the application's data flows through the application kernel(s) >> however it >> is laid out across the chip. And in that case it is a multi-core >> ASIC emulating >> an FPGA ... why not just use an FPGA ... ;-) ... and avoid wasting >> all those >> hard-wired functional units that won't be needed for this or that >> particular >> kernel. > In fact, modern high density FPGAs (viz Xilinx Virtex II 6000 series) > have partitioned their innards into little cells, some with ALU and > combinatorial logic and a little memory, some with lots of memory and > not so much logic. Hey Jim, Yes, I do understand this although attention for double precision ops on FPGAs is focused on the Xilinx Virtex-5 at 65 nm. You can already get a PCIe card version I think. My comments about new 80-core/ASIC Intel chip were to suggest two things ... first was that having the ability to program your own (ala VHDL, Verilog, Mitrion-C, Handel-C, etc. ) core that is specific to your kernel is more circuit-efficient in theory, so if you are going to have multiple cores consider having them be programmable. Its like the plumber that brings only and all the tools he needs into to house to do the job at hand. The second point I was trying to make was that all cyclic re-referencing of the same store (local or remote) is a reflection of the von Neuman model (even to the stacked DRAM in the new Intel chip). When the processor cannot "swallow the kernel whole" it has to consume it in von Neuman-like bites which imply register, cache, and memory writes. Part of the programmable core process is in making the connections between upstream and downstream hardware in a data-flow fashion that replace some number of cyclic stores with in-line passes to the next collection of functional units required by the applications specific kernel. In this way, the "diameter" of the re-reference cycle is enlarged and the latency penalty is therefore reduced. So while the ASIC-cores in the new Intel chip are not programmable in the FPGA sense there is the hope/expectation that the interconnect on the chip will give the data flow benefits described. These are the features of the multi-core TRIPS and Raw processors that allow them to emulate ILP, TLP, and DLP oriented architectures and applications. The extent to which FPGAs are more flexible in this regard give them an advantage over less "wire-exposed" multi-core ASIC architectures. There are obvious draw backs to FPGAs ... they are not commodity enough, programmability is poor, foriegn, and the improvements (Mitrion-C) generally consume 2x the circuits and run at 1/2 the clock that the FPGA in use is capable of. Joe Landman pointed out the large chunk of the device that the interface architecture can consume, and for HPC size data sets you still need to stream data in and out to external memory (algorithms must be pipelined). Still it seems like over the long haul some of the FPGA advantages mentioned will creep into the HPC space -- either on the chip or via accelerators. Underwood at Sandia has nice a paper showing that peak flop performance on FPGAs exceed commodity CPUs in summer of 2004 (same time Intel dropped the race to the 4.0 GHz clock) ... although the data needs to be updated with the Virtex-5 and the new multi-core processors. Here are some papers that I think you can Google that I have found useful/interesting. 1. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. Taylor, et al. 2. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. 3. FPGAs vs CPUs: Trends in Peak Floating-Point Performance. Keith Underwood. 4. Architectures and APIs" Assessing Requirements for Delivering FPGA Performance to Applications. Underwood and Hemmert 5. A 64-bit Floating-point FPGA Matrix Multiplications. Yong Dou et al. 6. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on FPGAs Ling Zhuo and Viktor Prasanna 7. Computing Lennard-Jones Potentials and Forces wth Reconfigurable Hardware > I think that as a general rule, the special purpose cores (ASICs) are > going to be smaller, lower power, and faster (for a given technology) > than the programmable cores (FPGAs). Back in the late 90s, I was > doing tradeoffs between general Here you are arguing for an ASIC for each typical HPC kernel ... ala the GRAPE processor. I will buy that ... but a commodity multi-core, CPU is not HPC-special-purpose or low power compared to an FPGA. > purpose CPUs (PowerPCs), DSPs (ADSP21020), and FPGAs for some signal > processing applications. At that time, the DSP could do the FFTs, > etc, for the least joules and least time. Since then, however, the > FPGAs have pulled ahead, at least for spaceflight applications. But > that's not because of architectural superiority in a given process.. > it's that the FPGAs are benefiting from improvements in process > (higher density) and nobody is designing space qualified DSPs using > those processes (so they are stuck with the old processes). Better process is good, but I think I hear you arguing for HPC-specific ASICs again like the GRAPE ... if they can be made cheaply, then you are right ... take the bit stream from the FPGA CFD code I have written and tuned, and produce 1000 ASICs for my special purpose CFD-only cluster. I can run it at higher clock rates, but I may need a new chip every time I change my code. > Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented > in an FPGA, although there are a couple of space qualified ASIC > implementations (from Atmel and Aeroflex). > > In a high volume consumer application, where cost is everything, the > ASIC is always going to win over the FPGA. For more specialized > scientific computing, the trade is a bit more even ... But even so, > the beowulf concept of combining large numbers of commodity computers > leverages the consumer volume for the specialized application, giving > up some theoretical performance in exchange for dollars. Right, otherwise we would all be using our own version of GRAPE, but we are all looking for "New, New Thing" ... a new price-performance regime to take us up to the next level. Is it going to be FPGAs, GPGPUs, commodity multi-core, PIM, or novel 80-processor Intel chips. I think we are in for a period of extend HPC market fragmentation, but in any case I think two features of FPGA processing, the programmable core and data flow programming model have intrinsic/theoretical appeal. These forces may be completely overwhelmed by other forces in the market place of course ... Regards, rbw -- Richard B. Walsh "The world is given to me only once, not one existing and one perceived. The subject and object are but one." Erwin Schroedinger Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From hahn at mcmaster.ca Fri Feb 16 14:17:57 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] (no subject) In-Reply-To: References: Message-ID: > not buy a tape drive for backups. Instead, I've got a jury-rigged backup tapes suck. I acknowlege that this is partly a matter of taste, experience and history, but they really do have some undesirable properties. > scheme. The node that serves the home directories via NFS runs a nightly tar > job (through cron), > root@server> tar cf home_backup.tar ./home > root@server> mv home_backup.tar /data/backups/ > > where /data/backups is a folder that's shared (via NFS) across the cluster. > The actual backup then occurs when the other machines in the cluster (via > cron) copy home_backup.tar to a private (root-access-only) local directory. > > root@client> cp /mnt/server-data/backups/home_backup.tar > /private_data/ > > where "/mnt/server-data/backups/" is where the server's "/data/backups/" is > mounted, and where /private_data/ is a folder on client's local disk. did you consider just doing something like: root@client> ssh -i backupkey tar cf - /home | \ gzip > /private_data/home_backup.`date +%a`.gz I find that /home contents tend to be compressible, and I particularly like fewer "moving parts". using single-use ssh keys is also a nice trick. > large (~4GB). When I try the cp command on client, only 142MB of the 4.2GB > is copied over (this is repeatable - not a random error, and always about > 142MB). might it actually be be sizeof(tar)-2^32? that is, someone's using a u32 for a file size or offset? this sort of thing was pretty common years ago. (isn't scientific linux extremely "stable" in the sense of "old versions"?) > only some of the file be copied over? Is there a limit on the size of files > which can be transferred via NFS? There's certainly sufficient space on disk it's certainly true that old enough NFS had 4GB problems, as well as similar vintage user-space tools. From rgb at phy.duke.edu Fri Feb 16 14:53:25 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] failure trends in a large disk drive population In-Reply-To: References: <20070216151140.GO21677@leitl.org> Message-ID: On Fri, 16 Feb 2007, Mark Hahn wrote: >> http://labs.google.com/papers/disk_failures.pdf > > this is awesome! my new new-years resolution is to be more google-like, > especially in gathering potentially large amounts of data for this kind of > retrospective analysis. > > thanks for posting the ref. Yeah, I already reposted the link to our campus-wide sysadmin list. There go all sort of assumptions, guesses and deductions to be replaced by -- gasp -- data! rgb > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hahn at mcmaster.ca Fri Feb 16 15:01:43 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:41 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: > Is there any info for failure rates versus type of main bearing > in the drive? I thought everyone used something like the "thrust plate" bearing that seagate (maybe?) introduced ~10 years ago. > Failure rate vs. drive speed (RPM)? surely "consumer-grade" rules out 10 or 15k rpm disks; their collection of 5400 and 7200 disks is probably skewed, as well (since 5400's have been uncommon for a couple years.) > Or to put it another way, is there anything to indicate which > component designs most often result in the eventual SMART > events (reallocation, scan errors) and then, ultimately, drive > failure? reading the article, I did wish their analysis more resembled one done by clinical or behavioral types, who would have evaluated outcome attributed to all the factors combinatorially. > Failure rates versus rack position? I'd guess no effect here, > since that would mostly affect temperature, and there was > little temperature effect. funny, when I saw figure5, I thought the temperature effect was pretty dramatic. in fact, all the metrics paint a pretty clear picture of infant mortality, then reasonably fit drives suriving their expected operational life (3 years). in senescence, all forms of stress correlate with increased failure. I have to believe that the 4/5th year decreases in AFR are either due to survival effects or sampling bias. > changes in air pressure also had a measurable effect. Low > humidity cranks up static problems, high humidity can result does anyone have recent-decade data on the conventional wisdom about too-low humidity? I'm dubious that it matters in a normal machineroom where components tend to stay put. regards, mark hahn. From landman at scalableinformatics.com Fri Feb 16 15:10:22 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Teraflop chip hints at the future In-Reply-To: <45D62D70.2020807@ahpcrc.org> References: <45CCDC88.8080102@dcc.ufmg.br> <45D11032.6040006@brookes.ac.uk> <200702131422.12498.csamuel@vpac.org> <45D1349A.1020308@scalableinformatics.com> <45D1D35C.8080502@ahpcrc.org> <6.2.3.4.2.20070213092750.02d9cd18@mail.jpl.nasa.gov> <45D62D70.2020807@ahpcrc.org> Message-ID: <45D639DE.1030105@scalableinformatics.com> Richard Walsh wrote: > Here you are arguing for an ASIC for each typical HPC kernel ... ala > the GRAPE processor. I will buy that ... but > a commodity multi-core, CPU is not HPC-special-purpose or low power > compared to an FPGA. FPGA power is good, several Watts in most cases. When you don't have to power extra cruft things are good. Latest quad core from AMD/Intel are in the 20W/core region (30 for the current Intel, 20 for the new gen). It would not surprise me to see this get to 10W/core and below. >> purpose CPUs (PowerPCs), DSPs (ADSP21020), and FPGAs for some signal >> processing applications. At that time, the DSP could do the FFTs, >> etc, for the least joules and least time. Since then, however, the >> FPGAs have pulled ahead, at least for spaceflight applications. But >> that's not because of architectural superiority in a given process.. >> it's that the FPGAs are benefiting from improvements in process >> (higher density) and nobody is designing space qualified DSPs using >> those processes (so they are stuck with the old processes). > Better process is good, but I think I hear you arguing for > HPC-specific ASICs again like the GRAPE ... if they > can be made cheaply, then you are right ... take the bit stream from > the FPGA CFD code I have written and tuned, and > produce 1000 ASICs for my special purpose CFD-only cluster. I can This sounds like D.E.Shaw's work (though I think they are doing it in FPGA) > run it at higher clock rates, but I may need a > new chip every time I change my code. You need a new bitfile everytime you change FPGAs or FPGA boards. This means that FPGA bitfiles are largely immobile. Of course the process to change the bitfile is a rebuild and ... >> Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented >> in an FPGA, although there are a couple of space qualified ASIC >> implementations (from Atmel and Aeroflex). >> >> In a high volume consumer application, where cost is everything, the >> ASIC is always going to win over the FPGA. For more specialized >> scientific computing, the trade is a bit more even ... But even so, >> the beowulf concept of combining large numbers of commodity computers >> leverages the consumer volume for the specialized application, giving >> up some theoretical performance in exchange for dollars. > Right, otherwise we would all be using our own version of GRAPE, Some things can be specialized and made fast. GPUs. > but we are all looking for "New, New Thing" > ... a new price-performance regime to take us up to the next level. > Is it going to be FPGAs, GPGPUs, commodity > multi-core, PIM, or novel 80-processor Intel chips. I think we are > in for a period of extend HPC market > fragmentation, but in any case I think two features of FPGA I am not convinced it is going to be fragmented for long. Take everything more expensive than $5000US and call it DOA unless it can easily drop right in and hit 10-100x node performance. Node pricing is dropping rapidly. A 5+ TF cluster quoted several months ago using previous generation technology came in around a few million $. One quoted recently came in well under $1M. > processing, the programmable core and data flow > programming model have intrinsic/theoretical appeal. These forces > may be completely overwhelmed by other > forces in the market place of course ... Unless GPUs just won't work, they may be a safe bet as one of the emerging winners. Cell should be in there as well. We demo'ed a little FPGA board (disclosure: we work with the company that builds it, and we do sell it) that attached to a USB2 port, that ran HMMer faster than an 8 core cluster. The cost and power difference is huge there, but hopefully we will be able to run p7Viterbi fast on GPUs. Then economies of scale may be able to drive some of this into motherboards, though most MB makers are reluctant to add anything that increases the cost of their product. Even if it is better and makes their product stand out. Graphics cards are in *everything* so you should pretty much expect them to be one of the winners, if they can get the codes to run on them. Cell-BE's are going into millions of PS3s, and I while it might be a stretch, it is possible that some places may deploy clusters of these (PSC deploying a PS3 cluster? :) ). What is pretty clear right now is that anyone with an excessively high price per unit or per SDK, is pretty much knocking themselves out of the market. Anyone who cannot build and create large volumes of these things is pretty much in trouble in this space. The other thing that is pretty clear is that as the multi-cores go even more multi, chips that hyperspecialize in one area may become marginalized. There is some data I am not sure if I can talk about, so I'll talk about the other data that I can. The Intel quad core units can do something like 35 GF/socket (rough calc, I am sure some Intel person can correct me, so please do). This is good, though it puts pressure on the hyperspecialized chips. Joe > > Regards, > > rbw > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Fri Feb 16 15:13:45 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: On Fri, 16 Feb 2007, David Mathog wrote: > Justin Moore wrote: >> Subject: Re: [Beowulf] failure trends in a large disk drive population >> To: Eugen Leitl >> Cc: Beowulf@beowulf.org >> Message-ID: >> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed >> >> >>> http://labs.google.com/papers/disk_failures.pdf >> >> Despite my Duke e-mail address, I've been at Google since July. While >> I'm not a co-author, I'm part of the group that did this study and can >> answer (some) questions people may have about the paper. >> > > Dangling meat in front of the bears, eh? Well... Hey Justin. Are you going to stay in NC and move to the new facility as they build it? Let me add one general question to David's. How did they look for predictive models on the SMART data? It sounds like they did a fairly linear data decomposition, looking for first order correlations. Did they try to e.g. build a neural network on it, or use fully multivariate methods (ordinary stats can handle it up to 5-10 variables). This is really an extension of David's questions below. It would be very interesting to add variables to the problem (if possible) until the observed correlations resolve (in sufficiently high dimensionality) into something significantly predictive. That would be VERY useful. rgb > > Is there any info for failure rates versus type of main bearing > in the drive? > > Failure rate versus any other implementation technology? > > Failure rate vs. drive speed (RPM)? > > Or to put it another way, is there anything to indicate which > component designs most often result in the eventual SMART > events (reallocation, scan errors) and then, ultimately, drive > failure? > > Failure rates versus rack position? I'd guess no effect here, > since that would mostly affect temperature, and there was > little temperature effect. > > Failure rates by data center? (Are some of your data centers > harder on drives than others? If so, why?) Are there air > pressure and humidity measurements from your data centers? > Really low air pressure (as at observatory height) > is a known killer of disks, it would be interesting if lesser > changes in air pressure also had a measurable effect. Low > humidity cranks up static problems, high humidity can result > in condensation. Again, what happens with values in between? > Are these effects quantifiable? > > Regards, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From James.P.Lux at jpl.nasa.gov Fri Feb 16 14:15:49 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> At 12:50 PM 2/16/2007, David Mathog wrote: >Eugen Leitl wrote: > > > http://labs.google.com/papers/disk_failures.pdf > >Interesting. However google apparently uses: > > serial and parallel ATA consumer-grade hard disk drives, > ranging in speed from 5400 to 7200 rpm > >Not quite clear what they meant by "consumer-grade", but I'm assuming >that it's the cheapest disk in that manufacturer's line. I don't >typically buy those kinds of disks, as they have only a 1 year >warranty but rather purchase those with 5 year warranties. But this is potentially a very interesting trade-off, and one right in line with the Beowulf concept of leveraging cheap consumer gear... Say you need 100 widgets worth of horsepower. Are you better off buying 103 pro widgets at $500 and a 3% failure rate or 110 consumer widgets at $450 and a 10% failure rate.... $51.5K vs $49.5K... the cheap drives win.. And, in fact, if the drives fail randomly during the year (not a valid assumption in general, but easy to calculate on the back of an envelope), then you actually get more compute power with the cheap drives (105 average vs 101.5 average over the year) This also assumes that the failure rate is "small" and "independent" (that is, you don't wind up with a bad batch that all fail simultaneously from some systemic flaw.. the bane of a reliability calculation) One failing I see of many cluster applications is that they are quite brittle.. that is, they depend on a particular number of processors toiling on the task, and the complement of processors not changing during the "run". But this sort of thing makes a 100 node cluster no different than depending on the one 100xspeed supercomputer. I think it's pretty obvious that Google has figured out how to partition their workload in a "can use any number of processors" sort of way, in which case, they probably should be buying the cheap drives and just letting them fail (and stay failed.. it's probably cheaper to replace the whole node than to try and service one)... James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From justin at cs.duke.edu Fri Feb 16 16:28:39 2007 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: >> Despite my Duke e-mail address, I've been at Google since July. While >> I'm not a co-author, I'm part of the group that did this study and can >> answer (some) questions people may have about the paper. >> > > Dangling meat in front of the bears, eh? Well... I can always hide behind my duck-blind-slash-moat-o'-NDA. :) > Is there any info for failure rates versus type of main bearing > in the drive? > > Failure rate versus any other implementation technology? We haven't done this analysis, but you might be interested in this paper from CMU: http://www.usenix.org/events/fast07/tech/schroeder.html They performed a similar study on drive reliability -- with the help of some people/groups here, I believe -- and found no significant differences in reliability between different disk technologies (SATA, SCSI, IDE, FC, etc). > Failure rate vs. drive speed (RPM)? Again, we may have the data but it hasn't been processed. > Or to put it another way, is there anything to indicate which > component designs most often result in the eventual SMART > events (reallocation, scan errors) and then, ultimately, drive > failure? One of the problems noted in the paper is that even if you assume that *any* SMART event is indicative in some way of an upcoming failure -- and are willing to deal with a metric boatload of false positives -- over one-third of failed drives had zero counts on all SMART parameters. And one of these parameters -- seek errors -- were observed on nearly three-quarters of the drives in our fleet, so you really would be dealing with boatloads of false positives. > Failure rates versus rack position? I'd guess no effect here, > since that would mostly affect temperature, and there was > little temperature effect. I imagine it wouldn't matter. Even if it did, I'm not sure we have this data in an easy-to-parse-and-include format. > Failure rates by data center? (Are some of your data centers > harder on drives than others? If so, why?) The CMU study is broken down by data center. There is certainly the case in their study that some data centers appear to be harder on drives than others, but there may be age and vintage issues coming into play in their study (an issue they acknowledge in the paper). My intuition -- again, not having analyzed the data -- is that application characteristics and not data center characteristics are going to have a more pronounced effect. There is a section on how utilization effects AFR over time. > Are there air pressure and humidity measurements from your data > centers? Really low air pressure (as at observatory height) is a known > killer of disks, it would be interesting if lesser changes in air > pressure also had a measurable effect. Low humidity cranks up static > problems, high humidity can result in condensation. Once we start getting data from our Tibetan Monastery/West Asia data center I'll let you know. :) -jdm Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu Web: http://www.cs.duke.edu/~justin/ From bushnell at chem.ucsb.edu Fri Feb 16 17:34:49 2007 From: bushnell at chem.ucsb.edu (John Bushnell) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] (no subject) In-Reply-To: References: Message-ID: Hi, I used to do similar kinds of backups on our smallish clusters, but recently decided to do something slightly smarter, and have been using rsnapshot to do backups since. It uses rsync and hard links to make snapshots of /home (or any filesystem you want) without replicating every single byte each time, that is, it only saves changes to the file system. So after the first time you run it, only a relatively small amount of backup traffic is necessary to get a coherent snapshot of the whole thing. I set up a seperate cheap box with a couple large drives and three gig-ethernet cards, and each one plugs into a switch for one of our clusters. Now /home directories for all three clusters are all backed up nightly with very little network overhead and no intervention. It has been running without a hitch, and it is easy to add less frequent backups for /usr/local or the like that I'd hate to lose. Definitely worth the effort in my case! And it is trivial to export the snapshot directories (read-only of course) back to the clusters as needed for recovery purposes. - John On Fri, 16 Feb 2007, Nathan Moore wrote: > Hello all, > > I have a small beowulf cluster of Scientific Linux 4.4 machines with common > NIS logins and NFS shared home directories. In the short term, I'd rather > not buy a tape drive for backups. Instead, I've got a jury-rigged backup > scheme. The node that serves the home directories via NFS runs a nightly tar > job (through cron), > root@server> tar cf home_backup.tar ./home > root@server> mv home_backup.tar /data/backups/ > > where /data/backups is a folder that's shared (via NFS) across the cluster. > The actual backup then occurs when the other machines in the cluster (via > cron) copy home_backup.tar to a private (root-access-only) local directory. > > root@client> cp /mnt/server-data/backups/home_backup.tar > /private_data/ > > where "/mnt/server-data/backups/" is where the server's "/data/backups/" is > mounted, and where /private_data/ is a folder on client's local disk. > > Here's the problem I'm seeing with this scheme. users on my cluster have > quite a bit of stuff stored in their home directories, and home_backup.tar is > large (~4GB). When I try the cp command on client, only 142MB of the 4.2GB > is copied over (this is repeatable - not a random error, and always about > 142MB). The cp command doesn't fail, rather, it quits quietly. Why would > only some of the file be copied over? Is there a limit on the size of files > which can be transferred via NFS? There's certainly sufficient space on disk > for the backups (both client's and server's disks are 300GB SATA drives, > formatted to ext3) > > I'm using the standard NFS that's available in SL43, config is basically > default. > regards, > > Nathan Moore > > > - - - - - - - - - - - - - - - - - - - - - - - > Nathan Moore > Physics, Pasteur 152 > Winona State University > nmoore@winona.edu > AIM:nmoorewsu > - - - - - - - - - - - - - - - - - - - - - - - > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From joelja at bogus.com Fri Feb 16 18:19:19 2007 From: joelja at bogus.com (Joel Jaeggli) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <45D66627.2040106@bogus.com> Mark Hahn wrote: >> Failure rate vs. drive speed (RPM)? > > surely "consumer-grade" rules out 10 or 15k rpm disks; > their collection of 5400 and 7200 disks is probably skewed, > as well (since 5400's have been uncommon for a couple years.) Ictually I'd bet that's most of the 5400rpm disks would be maxtor maxline II nearline drives, netapp also used then in several filers. They were the first 300GB drive by a couple of months and came with a 5 year warranty... I have several dozen of them, and for the most part there still working though the warranties are all expiring at this point. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From eugen at leitl.org Sun Feb 18 09:01:04 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <45D66627.2040106@bogus.com> References: <45D66627.2040106@bogus.com> Message-ID: <20070218170104.GJ21677@leitl.org> On Fri, Feb 16, 2007 at 06:19:19PM -0800, Joel Jaeggli wrote: > Ictually I'd bet that's most of the 5400rpm disks would be maxtor > maxline II nearline drives, netapp also used then in several filers. > They were the first 300GB drive by a couple of months and came with a 5 > year warranty... I have several dozen of them, and for the most part > there still working though the warranties are all expiring at this point. I have two of these sitting here to be installed tomorrow for the couple that failed within a few months of each other, and had to be RMAed. They run pretty hot for 5400 rpm drives, maybe too many platters. The falure was predicted by an increasing SMART failure rate, until smartd sent error reports via email, indicating impending failure. The drives were in a 2x mini-ITX HA configuration in a Travla C147 case, which was poorly ventilated -- now the systems are to be recycled as a CARP cluster with the pfSense firewall, an embedded version which boots from CF flash -- that effectively solved the thermal problems. I wish Google's data did include WD Raptors and Caviar RE2 drives. I would really like to know whether these are worth the price premium over consumer SATA. Btw -- smartd doesn't seem to be able to handle SATA, at least, last time I tried. http://smartmontools.sourceforge.net/#testinghelp How do you folks gather data on them? Oh, and those of you who run GAMMA MPI on GBit Broadcoms, any lockups? SunFire X2100 seems to be supported (it has a Broadcom and an nForce NIC, the X2100 M2 seems to have two Broadcoms and two nVidia NICs) by GAMMA, so I'd like to try it, but rather not risk a lockup. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From hahn at mcmaster.ca Sun Feb 18 10:45:50 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <20070218170104.GJ21677@leitl.org> References: <45D66627.2040106@bogus.com> <20070218170104.GJ21677@leitl.org> Message-ID: > over consumer SATA. Btw -- smartd doesn't seem to be able to handle > SATA, at least, last time I tried. > > http://smartmontools.sourceforge.net/#testinghelp > > How do you folks gather data on them? I use smartctl - the smart support in libata entered the mainstream 2.6.15 kernel (2006-01-03!) From deadline at eadline.org Sun Feb 18 12:49:20 2007 From: deadline at eadline.org (Douglas Eadline) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> References: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> Message-ID: <39898.192.168.1.1.1171831760.squirrel@mail.eadline.org> snip > > One failing I see of many cluster applications is that they are quite > brittle.. that is, they depend on a particular number of processors > toiling on the task, and the complement of processors not changing > during the "run". But this sort of thing makes a 100 node cluster no > different than depending on the one 100xspeed supercomputer. I had written a few columns about the "static" nature of clusters (and how I would like to program). Thought you might find it interesting: http://www.clustermonkey.net//content/view/158/32/ It turned into a three part series. -- Doug From eugen at leitl.org Sun Feb 18 13:09:16 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <45D66627.2040106@bogus.com> <20070218170104.GJ21677@leitl.org> Message-ID: <20070218210916.GB10115@leitl.org> On Sun, Feb 18, 2007 at 01:45:50PM -0500, Mark Hahn wrote: > I use smartctl - the smart support in libata > entered the mainstream 2.6.15 kernel (2006-01-03!) I've got nitrogen:~# uname -a Linux nitrogen 2.6.15-amd64-smp-vs #1 SMP Tue Apr 25 09:54:14 CEST 2006 x86_64 GNU/Linux but nitrogen:~# smartctl -a /dev/sda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA HDT722525DLA380 Version: V44O Serial number: VDK41BT4D4TKTK Device type: disk Local Time is: Sun Feb 18 22:07:39 2007 CET Device does not support SMART Device does not support Error Counter logging [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging and nitrogen:~# smartctl -d sata -a /dev/sda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ =======> INVALID ARGUMENT TO -d: sata =======> VALID ARGUMENTS ARE: ata, scsi, 3ware,N <======= Use smartctl -h to get a usage summary This is debian amd64, so probably the packages are way out of date. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From hahn at mcmaster.ca Sun Feb 18 14:09:09 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <20070218210916.GB10115@leitl.org> References: <45D66627.2040106@bogus.com> <20070218170104.GJ21677@leitl.org> <20070218210916.GB10115@leitl.org> Message-ID: > Linux nitrogen 2.6.15-amd64-smp-vs #1 SMP Tue Apr 25 09:54:14 CEST 2006 x86_64 GNU/Linux > smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen the machine I checked has 5.33, and 5.36 is the date on the sources I grabbed in early dec. the machine in question is running HP XC 3.0, based on RHEL4's 2.6.9 - obviously with some backports. > Device: ATA HDT722525DLA380 Version: V44O > Serial number: VDK41BT4D4TKTK > Device type: disk > Local Time is: Sun Feb 18 22:07:39 2007 CET > Device does not support SMART well, afaikt, that's actually a pretty recent sata disk, and certainly does support SMART. might you need smartctl -e? some bioses offer an option to en/disable smart, but afaikt -e fixed that. > nitrogen:~# smartctl -d sata -a /dev/sda I needed -d ata on my system. (libata, which has always been intended to support both parallel and serial - and is now the preferred driver for some pata disks...) From csamuel at vpac.org Sun Feb 18 14:29:54 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> References: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> Message-ID: <200702190929.54437.csamuel@vpac.org> On Sat, 17 Feb 2007, Jim Lux wrote: > I think it's pretty obvious that Google has figured out how to > partition their workload in a "can use any number of processors" sort > of way, in which case, they probably should be buying the cheap > drives and just letting them fail (and stay failed.. it's probably > cheaper to replace the whole node than to try and service one)... IIRC they also have figured out a way to be fault tolerant by sending queries out to multiple systems for each part of the DB they are querying, so if one of those fails others will respond anyway. Apparently they use more reliable hardware for things like the advertising service.. -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070219/d05ed3c7/attachment.bin From jamesjamiejones at aol.com Sun Feb 18 13:49:47 2007 From: jamesjamiejones at aol.com (matt jones) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population (google fileing system) In-Reply-To: <200702190929.54437.csamuel@vpac.org> References: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> <200702190929.54437.csamuel@vpac.org> Message-ID: <45D8C9FB.50903@aol.com> i've read in the past somewhere that the Google File System is capable of having many copies of the data. often having 4 copies on different nodes. and as you say run the query to many of them. if one fails there are still 3, if another there are still 2. i've also read somewhere else that if one fails, it can automatically recreate the image from the remaining ones on a spare node. bringing it back to 4. this approach is rather ott, but it works and works well. i suspect this sort of thing could be done cheaper by just using 3 per copy and hoping that you never lose 2 or more nodes at once. essentially this is a huge distributed files system with integrated RAID software. Chris Samuel wrote: > IIRC they also have figured out a way to be fault tolerant by sending queries out to multiple systems for each part of the DB they are querying, so if one of those fails others will respond anyway. > > Apparently they use more reliable hardware for things like the advertising service -- matt. From diep at xs4all.nl Mon Feb 19 09:17:43 2007 From: diep at xs4all.nl (Vincent Diepeveen) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population References: <45D66627.2040106@bogus.com> Message-ID: <00ec01c75449$e11209b0$0300a8c0@gourmandises> Aren't those maxtors eating nearly 2x more power than drives from other manufacturers? Vincent ----- Original Message ----- From: "Joel Jaeggli" To: "Mark Hahn" Cc: ; "David Mathog" Sent: Saturday, February 17, 2007 3:19 AM Subject: Re: [Beowulf] Re: failure trends in a large disk drive population > Mark Hahn wrote: > >>> Failure rate vs. drive speed (RPM)? >> >> surely "consumer-grade" rules out 10 or 15k rpm disks; >> their collection of 5400 and 7200 disks is probably skewed, >> as well (since 5400's have been uncommon for a couple years.) > > Ictually I'd bet that's most of the 5400rpm disks would be maxtor > maxline II nearline drives, netapp also used then in several filers. > They were the first 300GB drive by a couple of months and came with a 5 > year warranty... I have several dozen of them, and for the most part > there still working though the warranties are all expiring at this point. > >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From hahn at mcmaster.ca Mon Feb 19 10:13:47 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <00ec01c75449$e11209b0$0300a8c0@gourmandises> References: <45D66627.2040106@bogus.com> <00ec01c75449$e11209b0$0300a8c0@gourmandises> Message-ID: > Aren't those maxtors eating nearly 2x more power than drives from other > manufacturers? I doubt it. disks all dissipate around 10W (+-50%) when active. that stretches a bit lower for laptop disks, and a bit higher for FC, high-rpm and/or high-platter-count disks. unfortunately, seagate has gutted maxtor's website, so it seems almost impossible to get a simple spec sheet for maxline drives. I did find a japanese sheet that simply lists 9.97W (probably idle) which is entirely typical for such a disk. but I think the point is that putting any disk in a poorly ventilated enclosure is asking for trouble. it's not really clear what google's paper implies about this, since they basically say that new, too-cold disks are at risk, and old too-hot ones. From momentics at gmail.com Mon Feb 19 01:00:26 2007 From: momentics at gmail.com (momentics) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population (google fileing system) In-Reply-To: <45D8C9FB.50903@aol.com> References: <6.2.3.4.2.20070216140656.02d61730@mail.jpl.nasa.gov> <200702190929.54437.csamuel@vpac.org> <45D8C9FB.50903@aol.com> Message-ID: <81d7574a0702190100i7e52632bh7eb71c0bb05f96fe@mail.gmail.com> On 2/19/07, matt jones wrote: > if one fails there > are still 3, if another there are still 2. i've also read somewhere else > that if one fails, it can automatically recreate the image from the > remaining ones on a spare node. [...] >this approach is rather ott, but it works and works well. not sure of Google gents; but we're using reliability model to calculate number of nodes and their physical locations (continuous scheduling) - to meet the expected reliability coefficient specified by the system operator/deployer/configurator (for EE, SW and HW parts). HDD is unreliable system part, with the nearly known reliability (expected -actually), moreover, as we know, most of HDDs have SMART metrics - the good way to correct live coefficients within used math model. The outcome here is to use adaptive techs. So Googles are using the same way probably - a good company anyhow... ta-da! :) Scal@Grid ? http://sgrid.sourceforge.net/ // (the perfect doc - the amazing work) From ballen at gravity.phys.uwm.edu Mon Feb 19 22:15:59 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <45D66627.2040106@bogus.com> <20070218170104.GJ21677@leitl.org> <20070218210916.GB10115@leitl.org> Message-ID: Sorry guys, I have been distracted the last few days -- if this is a smartmontools question please repeat it -- I can probably answer easily. Cheers, Bruce On Sun, 18 Feb 2007, Mark Hahn wrote: >> Linux nitrogen 2.6.15-amd64-smp-vs #1 SMP Tue Apr 25 09:54:14 CEST 2006 >> x86_64 GNU/Linux >> smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen > > the machine I checked has 5.33, and 5.36 is the date on the sources I grabbed > in early dec. the machine in question is running HP XC 3.0, based on RHEL4's > 2.6.9 - obviously with some backports. > >> Device: ATA HDT722525DLA380 Version: V44O >> Serial number: VDK41BT4D4TKTK >> Device type: disk >> Local Time is: Sun Feb 18 22:07:39 2007 CET >> Device does not support SMART > > well, afaikt, that's actually a pretty recent sata disk, and certainly > does support SMART. might you need smartctl -e? some bioses offer an > option to en/disable smart, but afaikt -e fixed that. > >> nitrogen:~# smartctl -d sata -a /dev/sda > > I needed -d ata on my system. (libata, which has always been intended > to support both parallel and serial - and is now the preferred driver for > some pata disks...) > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From richard at hinditron.com Mon Feb 19 22:30:56 2007 From: richard at hinditron.com (Richard) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] A Sample MPI job Message-ID: <45DA95A0.5020504@hinditron.com> Dear All, This is my first post to this mailing list. I don't know if I can ask this question here.But I will try. I have a big system ( proprietory system not beowulf ) to maintain and I don't know anything about programming. I want to test the performance of this system. Can anyone send me a sample MPI/MPICH job that I can run to test the performance of the system. Thanks, Richard From becker at scyld.com Tue Feb 20 00:16:36 2007 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] BayBUG meeting today, February 20 2007 in Sunnyvale CA Message-ID: Bay Area Beowulf User Group (BayBUG) Bi-monthly in 2007, our first 2007 meeting will be February 20, 2007 2:30 - 5:00 p.m. AMD headquarters Common Building, Room C-6/7/8 991 Stewart Drive Sunnyvale CA Join us for food and drinks and to learn from and network with other Linux HPC professionals. Speakers: * David B. Jackson, Chief Technical Officer, Cluster Resources, Inc. * John Gustafson, Chief Technical Officer, HPC, ClearSpeed Technology Save the date for the next BayBUG: April 24, 2007 Presentation 1 Title: Batch vs. Interactive Scheduling in Clustered Computing Abstract: Mr. Jackson will give an overview of batch v. interactive scheduling in clustered computing. He will discuss how tools can help empower organizations to fully understand, control and optimize their compute resources, focusing on the pros and cons of each approach. In particular, he will address issues of availability and reliability, as well as its ability to scale to larger systems, more complex problems and multiple clusters as well as user friendly design. He will then explore the current landscape of available tools, noting which considerations users must take into account when choosing resource managers and workload schedulers. Because of his deep experience with the Moab Cluster Suite family, he will help users better understand this popular technology. Then he will make a closer examination of how popular tools such as Moab can be optimized for particular cluster architectures, including Beowulf-class clusters. Speaker Bio: David B. Jackson, CTO of Cluster Resources, Inc. David has more than fifteen years of experience in the high performance computing (HPC) industry. He designed and developed the pervasive Maui Scheduler and other open-source resource management software, and has since been the lead architect for cluster, grid and hosting center management suites (Moab Cluster Suite, Moab Grid Suite and Moab Hosting Suite). He has worked for numerous high performance computing centers providing resource management and scheduling services including Lawrence Livermore National Laboratory, San Diego Supercomputer Center, NCSA, PNNL, MHPCC, and the Center for High Performance Computing. David has also worked as a consultant at IBM's AIX System Center. A founding member of the Global Grid Forum scheduling working group and a key member of the Department of Energy's Scalable System Software Initiative, He has a M.S. in Computer Science) and B.S.'s in Electrical and Computer Engineering and in Computer Science from Brigham Young University. Presentation 2 Title: Requirements for Successful Use of Accelerators Abstract: Accelerator boards offer the possibility of increasing performance on highly-specific tasks, while economizing on electric power and space requirements that frequently limit the scale of Linux clusters. We present the full set of issues that must be considered for successful use of such accelerators, including software, precision, compatibility, latency, bandwidth, and memory size. Surprisingly, the applications for which accelerator boards such as those made by ClearSpeed work well also tend to be somewhat insensitive to bandwidth to the host node and highly insensitive to the latency. Speaker Bio: John Gustafson, Chief Technical Officer, HPC, ClearSpeed John joined ClearSpeed in 2005 after leading high-performance computing efforts at Sun Microsystems. He has 32 years experience using and designing compute-intensive systems, including the first matrix algebra accelerator and the first commercial massively-parallel cluster while at Floating Point Systems. His pioneering work on a 1024-processor nCUBE at Sandia National Laboratories created a watershed in parallel computing, for which he received the inaugural Gordon Bell Award. He also has received three R&D 100 Awards for innovative performance models, including the model commonly known as Gustafson's Law or Scaled Speedup. John received his B.S. degree from Caltech and his M.S. and Ph.D. degrees from Iowa State University, all in Applied Mathematics. -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From rbw at ahpcrc.org Tue Feb 20 07:39:36 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <45DB1638.7030803@ahpcrc.org> David Mathog wrote: > throwing scan errors is on the way out. Quite surprising about the > lack of a temperature correlation though. At the very least I would > have expected increased temps to lead to faster loss of bearing > lubricant. That tends to manifest as a disk that spun for 3 years > Not sure the vapor pressure of the perfluoroethers that they use as lubricants varies that much over the operating temperature regime of a disk drive. If one can assume it is insignificant, then age alone would be the major contributing factor here (that is "hot" drives would not lubricant-age any faster that merely wam drives). rbw > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Richard B. Walsh "The world is given to me only once, not one existing and one perceived. The subject and object are but one." Erwin Schroedinger Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From hahn at mcmaster.ca Tue Feb 20 08:38:30 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <45DB1638.7030803@ahpcrc.org> References: <45DB1638.7030803@ahpcrc.org> Message-ID: > Not sure the vapor pressure of the perfluoroethers that they use as > lubricants > varies that much over the operating temperature regime of a disk drive. on the other hand, do these lubricants tend to get sticky or something at lowish temperatures? the google results showed significantly greater failures in young drives at low temperatures. (as well as extreme-temp drives when at end-of-life.) From hahn at mcmaster.ca Tue Feb 20 12:25:21 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? Message-ID: Hi Beowulfers, have you had any experience using 10gbaseT yet? I happened on a media story about how quite a few vendors have recently intro'd support for it. the only nics mentioned were the heavyweight encumbered-with-TOE kind - on them at least, the article mentioned 20-25W dissipation. (which doesn't seem like a big deal - about a disk and a half, fraction of a cpu, or a tenth of a GPU :) anyway, I'm interested to hear if anyone's played with 10gbaseT (or 10G clusters in general)... thanks, mark hahn. From patrick at myri.com Tue Feb 20 13:31:00 2007 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: References: Message-ID: <45DB6894.1070607@myri.com> Hi Mark, Mark Hahn wrote: > Hi Beowulfers, > have you had any experience using 10gbaseT yet? No switch available with 10G-BaseT ports yet (there are not many 10GigE switches to begin with). There may be new products by the end of 2007 with such ports, but I am not optimistic about general availability. Today, it's CX-4 water hoses for clusters and LR fibers for general networking. > I happened on a media story about how quite a few vendors have recently > intro'd support for it. the only nics mentioned were the heavyweight > encumbered-with-TOE kind - on them at least, > the article mentioned 20-25W dissipation. (which doesn't seem > like a big deal - about a disk and a half, fraction of a cpu, > or a tenth of a GPU :) 20-25W is a lot. It's too much to land on a motherboard (that's one of the argument against TOE) and it's way too much for dense switches. It's going to get better eventually, but it's going to take time. I would expect cheaper (quad) fiber solutions sooner than pervasive 10G-BaseT. Patrick -- Patrick Geoffray Myricom, Inc. http://www.myri.com From hahn at mcmaster.ca Tue Feb 20 14:02:07 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] A Sample MPI job In-Reply-To: <45DA95A0.5020504@hinditron.com> References: <45DA95A0.5020504@hinditron.com> Message-ID: > I have a big system ( proprietory system not beowulf ) to maintain and I > don't know anything about programming. beowulf doesn't mean "weekend hack of noname parts" - proprietary-ness is orthogonal to the beowulf design principle... > I want to test the performance of this system. Can anyone send me a sample > MPI/MPICH job that I can run to test the performance of the system. there are lots of freely available benchmark codes available. one classic is linpack, which forms the basis for the top500 list. it's valuable because it does some realistic(ish) computation, checks its results, and can be tuned to bang hard on cpu/memory. I'd suggest just downloading the HPCC (HPC challenge) benchmarks; it includes a raw network latency/bandwidth test as well. From hahn at mcmaster.ca Wed Feb 21 06:48:19 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: References: Message-ID: > I have been using the MYRICOM 10Gb card in my NFS server (head node) for > the Beowulf cluster. And it works well. I have a inexpensive 3Com switch > (3870) with 48 1Gb ports that has a 10Gb port in it and I connect the > NFS server to that port. The switch does have small fans in it. that sounds like a smart, strategic use. cx4, I guess. is the head node configured with a pretty hefty raid (not that saturating a single GB link is that hard...) thanks, mark hahn. From rbw at ahpcrc.org Wed Feb 21 07:06:17 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <45DB1638.7030803@ahpcrc.org> Message-ID: <45DC5FE9.3020809@ahpcrc.org> Mark Hahn wrote: >> Not sure the vapor pressure of the perfluoroethers that they use as >> lubricants >> varies that much over the operating temperature regime of a disk >> drive. > on the other hand, do these lubricants tend to get sticky or something > at lowish temperatures? the google results showed significantly greater > failures in young drives at low temperatures. (as well as extreme-temp > drives when at end-of-life.) > Hey Mark, Good question. The properties of perfluoropolyethers (Krytox, Fomblin, Demnum) must be well-known, but the Googling I did yielded only a single reference on point which I did not want to pay for. The ambient temperature range in the study is pretty small which would limit viscosity variation, but when the head arrives at a long unread location on a cooler disk maybe there are some shear effects. Any disk drive vendors read this list and care to comment? rbw -- Richard B. Walsh "The world is given to me only once, not one existing and one perceived. The subject and object are but one." Erwin Schroedinger Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From kyron at neuralbs.com Wed Feb 21 10:55:49 2007 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <200702211355.49281.kyron@neuralbs.com> [snip] > > > > Dangling meat in front of the bears, eh? Well... > > Hey Justin. Are you going to stay in NC and move to the new facility as > they build it? > > Let me add one general question to David's. > > How did they look for predictive models on the SMART data? It sounds > like they did a fairly linear data decomposition, looking for first > order correlations. Did they try to e.g. build a neural network on it, > or use fully multivariate methods (ordinary stats can handle it up to > 5-10 variables). > > This is really an extension of David's questions below. It would be > very interesting to add variables to the problem (if possible) until the > observed correlations resolve (in sufficiently high dimensionality) into > something significantly predictive. That would be VERY useful. > > rgb RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of this data. Now the question is, will we ever get access to this data? ;) poke--> Justin From TPierce at rohmhaas.com Wed Feb 21 06:06:16 2007 From: TPierce at rohmhaas.com (Thomas H Dr Pierce) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: Message-ID: Hello, I have been using the MYRICOM 10Gb card in my NFS server (head node) for the Beowulf cluster. And it works well. I have a inexpensive 3Com switch (3870) with 48 1Gb ports that has a 10Gb port in it and I connect the NFS server to that port. The switch does have small fans in it. I had to compile the driver and patch the Linux kernel (RedHat Enterprise 4U4). There are tuning parameters, some of which I have tried and some which I have not (Don't break it if it works well... ) It'd been running well for 4 months now. My internal cluster benchmarks ( parallel Quantum mechanics programs ) improved by about 20% with disk backups improving by 60%. Pretty much everything that uses MPI runs faster since the NFS server network usage is a smaller percentage of the network wall clock time. I think the 10Gb link to the NFS server is a effective upgrade component of a beowulf cluster if one is using 1Gb ethernet, MPI and NFS. ------ Sincerely, Tom Pierce Mark Hahn Sent by: beowulf-bounces@beowulf.org 02/20/2007 03:25 PM To Beowulf Mailing List cc Subject [Beowulf] anyone using 10gbaseT? Hi Beowulfers, have you had any experience using 10gbaseT yet? I happened on a media story about how quite a few vendors have recently intro'd support for it. the only nics mentioned were the heavyweight encumbered-with-TOE kind - on them at least, the article mentioned 20-25W dissipation. (which doesn't seem like a big deal - about a disk and a half, fraction of a cpu, or a tenth of a GPU :) anyway, I'm interested to hear if anyone's played with 10gbaseT (or 10G clusters in general)... thanks, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070221/ae2157aa/attachment.html From TPierce at rohmhaas.com Wed Feb 21 07:45:03 2007 From: TPierce at rohmhaas.com (Thomas H Dr Pierce) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: Message-ID: Dear Mark and the List, The head node is about a terabyte of raid10, with home directories and application directories NFS mounted to the cluster. I am still tuning NFS, 16 daemons now) and, of course, the head node had 1Gb link to my intranet for remote cluster access. The 10Gb link to the switch uses cx4 cable. It did not cost too much and I only needed two meters of it. 10 Gb is very nice and makes me lust for inexpensive low latency 10Gb switches... but I'll wait for the marketplace to develop. Engineering calculations (Fluent, Abacus) can fill the 10 Gb link for 5 to 15 minutes. But that is better than the 20-40 minutes they used to use. I suspect they are checkpointing and restarting their MPI iterations. ------ Sincerely, Tom Pierce Mark Hahn 02/21/2007 09:48 AM To Thomas H Dr Pierce cc Beowulf Mailing List Subject Re: [Beowulf] anyone using 10gbaseT? > I have been using the MYRICOM 10Gb card in my NFS server (head node) for > the Beowulf cluster. And it works well. I have a inexpensive 3Com switch > (3870) with 48 1Gb ports that has a 10Gb port in it and I connect the > NFS server to that port. The switch does have small fans in it. that sounds like a smart, strategic use. cx4, I guess. is the head node configured with a pretty hefty raid (not that saturating a single GB link is that hard...) thanks, mark hahn. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070221/7ec332e5/attachment.html From csamuel at vpac.org Wed Feb 21 15:35:47 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: References: Message-ID: <200702221035.47641.csamuel@vpac.org> On Thu, 22 Feb 2007, Thomas H Dr Pierce wrote: > I have been using the MYRICOM 10Gb card in my NFS server (head node) ?for > the Beowulf cluster. And it works well. ?I have a inexpensive 3Com switch > (3870) with 48 1Gb ports ? that has a 10Gb port in it and I connect the NFS > server to that port. The switch does have small fans in it. Very interesting, would you mind saying (roughly) how much the card and the switch cost ? cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/5a194d3e/attachment.bin From justin at cs.duke.edu Wed Feb 21 15:50:41 2007 From: justin at cs.duke.edu (Justin Moore) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <200702211355.49281.kyron@neuralbs.com> References: <200702211355.49281.kyron@neuralbs.com> Message-ID: >> How did they look for predictive models on the SMART data? It sounds >> like they did a fairly linear data decomposition, looking for first >> order correlations. Did they try to e.g. build a neural network on it, >> or use fully multivariate methods (ordinary stats can handle it up to >> 5-10 variables). >> >> This is really an extension of David's questions below. It would be >> very interesting to add variables to the problem (if possible) until the >> observed correlations resolve (in sufficiently high dimensionality) into >> something significantly predictive. That would be VERY useful. >> > > RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of > this data. Now the question is, will we ever get access to this data? > ;) As mentioned in an earlier e-mail (I think) there were 4 SMART variables whose values were strongly correlated with failure, and another 4-6 that were weakly correlated with failure. However, of all the disks that failed, less than half (around 45%) had ANY of the "strong" signals and another 25% had some of the "weak" signals. This means that over a third of disks that failed gave no appreciable warning. Therefore even combining the variables would give no better than a 70% chance of predicting failure. To make things worse, many of the "weak" signals were found on a significant number of disks. For example, among the disks that failed, many had a large number of seek error; however, over 70% of disks in the fleet -- failed and working -- had a large number of seek errors. About all I can say beyond what's in the paper is that we're aware of the shortcomings of the existing work and possible paths forward. In response, we are Hello, this is the Google NDA bot. In our massive trawling of the Internet and other data sources, I have detected a possible violation of the Google NDA. This has been corrected. We now return you to your regularly scheduled e-mail. [ Continue ] [ I'm Feeling Confidential ] So that's our master plan. Just don't tell anyone. :) -jdm P.S. Unfortunately, I doubt that we'll be willing or able to release the raw data behind the disk drive study. Department of Computer Science, Duke University, Durham, NC 27708-0129 Email: justin@cs.duke.edu Web: http://www.cs.duke.edu/~justin/ From ctierney at hypermall.net Wed Feb 21 16:23:53 2007 From: ctierney at hypermall.net (Craig Tierney) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: <200702221035.47641.csamuel@vpac.org> References: <200702221035.47641.csamuel@vpac.org> Message-ID: <45DCE299.3090809@hypermall.net> Chris Samuel wrote: > On Thu, 22 Feb 2007, Thomas H Dr Pierce wrote: > >> I have been using the MYRICOM 10Gb card in my NFS server (head node) for >> the Beowulf cluster. And it works well. I have a inexpensive 3Com switch >> (3870) with 48 1Gb ports that has a 10Gb port in it and I connect the NFS >> server to that port. The switch does have small fans in it. > > Very interesting, would you mind saying (roughly) how much the card and the > switch cost ? > The list price of a Myrinet-10G card with a CX4 port is $695 (From website). The cable will run you about $100 (depending on source and length). I found a price for the 48-port 3Com 3870 with 10GBASE-X module and CX4 transceiver for about $4800 (www.costcentral.com). The switch says it is layer-3 capable, but I don't see if that is a part of the 10BASE-X module or a separate cost. I didn't think it was that cheap. I would prefer Layer 3 if this was going into a rack of a multi-rack system, but the price is right. Craig From csamuel at vpac.org Wed Feb 21 16:45:53 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question Message-ID: <200702221145.54360.csamuel@vpac.org> Hi folks, We've got an IBM Power5 cluster running SLES9 and using the GM drivers. We occasionally get users who manage to use up all the DMA memory that is addressable by the Myrinet card through the Power5 hypervisor. Through various firmware and driver tweaks (thanks to both IBM and Myrinet) we've gotten that limit up to almost 1GB and then we use an undocumented environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB of that per process (as we've got 4 cores in each box), which we enforce through Torque. The problems went away. Or at least it did until just now. :-( The characterstic error we get is: [13]: alloc_failed, not enough memory (Fatal Error) Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers> Now Myrinet can handle running out of DMA memory once a process is running, but when it starts it must be able to allocate a (fairly trivial) amount of DMA memory otherwise you get that fatal error. Looking at the node I can confirm that there are only 3 user processes running, so what I am after is a way of determining how much of that DMA memory a process has allocated. I looked at /proc/${PID}/maps and saw this: 40028000-40029000 r--s 00002000 00:0c \ 8483 /dev/gm0 which to me looks like a memory mapping, but to my eyes that looks like just 1,000 bytes.. Does anyone have any ideas at all ? Oh - switching to the Myrinet MX drivers (which doesn't have this problem) is not an option, we have an awful lot of users, mostly (non-computer) scientists, who have their own codes and trying to persuade them to recompile would be very hard - which would be necessary as we've not been able to convince MPICH-GM to build shared libraries on Linux on Power with the IBM compilers. :-( cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/5bb05d3a/attachment.bin From kyron at neuralbs.com Wed Feb 21 16:51:03 2007 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <200702211355.49281.kyron@neuralbs.com> Message-ID: <200702211951.04156.kyron@neuralbs.com> Justin, Yes, I came across your previous post further down the intertwined thread. One other thing that could have been interesting to see then would be to have monitored _all_ of the system's "health" monitors such as voltage, powersupply fan speed. There may be some other correlations to be made from fluctuating/dying powersupplies... a shot in the dark but all is linked ;) As for the [censored] GOOGLE_NDA_BOT.... LOL! :) Thanks, that felt good. Eric Le mercredi 21 f?vrier 2007 18:50, Justin Moore a ?crit?: > > >> How did they look for predictive models on the SMART data? It sounds > >> like they did a fairly linear data decomposition, looking for first > >> order correlations. Did they try to e.g. build a neural network on it, > >> or use fully multivariate methods (ordinary stats can handle it up to > >> 5-10 variables). > >> > >> This is really an extension of David's questions below. It would be > >> very interesting to add variables to the problem (if possible) until the > >> observed correlations resolve (in sufficiently high dimensionality) into > >> something significantly predictive. That would be VERY useful. > >> > > > > RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of > > this data. Now the question is, will we ever get access to this data? > > ;) > > As mentioned in an earlier e-mail (I think) there were 4 SMART variables > whose values were strongly correlated with failure, and another 4-6 that > were weakly correlated with failure. However, of all the disks that > failed, less than half (around 45%) had ANY of the "strong" signals and > another 25% had some of the "weak" signals. This means that over a > third of disks that failed gave no appreciable warning. Therefore even > combining the variables would give no better than a 70% chance of > predicting failure. > > To make things worse, many of the "weak" signals were found on a > significant number of disks. For example, among the disks that failed, > many had a large number of seek error; however, over 70% of disks in the > fleet -- failed and working -- had a large number of seek errors. > > About all I can say beyond what's in the paper is that we're aware of > the shortcomings of the existing work and possible paths forward. In > response, we are > > Hello, this is the Google NDA bot. In our massive trawling of the > Internet and other data sources, I have detected a possible violation of > the Google NDA. This has been corrected. We now return you to your > regularly scheduled e-mail. > [ Continue ] [ I'm Feeling Confidential ] > > > So that's our master plan. Just don't tell anyone. :) > -jdm > > P.S. Unfortunately, I doubt that we'll be willing or able to release the > raw data behind the disk drive study. > > Department of Computer Science, Duke University, Durham, NC 27708-0129 > Email: justin@cs.duke.edu > Web: http://www.cs.duke.edu/~justin/ > From hahn at mcmaster.ca Wed Feb 21 18:44:26 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <200702211355.49281.kyron@neuralbs.com> Message-ID: > weakly correlated with failure. However, of all the disks that failed, less > than half (around 45%) had ANY of the "strong" signals and another 25% had > some of the "weak" signals. This means that over a third of disks that > failed gave no appreciable warning. Therefore even combining the variables > would give no better than a 70% chance of predicting failure. well, a factorial analysis might still show useful interactions. > number of disks. For example, among the disks that failed, many had a large > number of seek error; however, over 70% of disks in the fleet -- failed and > working -- had a large number of seek errors. was there any trend across time in the seek errors? > So that's our master plan. Just don't tell anyone. :) hah. well, if it were me, the M.P. would involve some sort of proactive treatment: say, a full-disk read once a day. smart self-tests _ought_ to be more valuable than that, but otoh, the vendor probably munge the measurements pretty badly. regards, mark hahn. From atchley at myri.com Wed Feb 21 18:47:45 2007 From: atchley at myri.com (Scott Atchley) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <200702221145.54360.csamuel@vpac.org> References: <200702221145.54360.csamuel@vpac.org> Message-ID: <86905403-9BAE-47A4-B1AF-C5B310570EFE@myri.com> On Feb 21, 2007, at 7:45 PM, Chris Samuel wrote: > Hi folks, > > We've got an IBM Power5 cluster running SLES9 and using the GM > drivers. > > We occasionally get users who manage to use up all the DMA memory > that is > addressable by the Myrinet card through the Power5 hypervisor. > > Through various firmware and driver tweaks (thanks to both IBM and > Myrinet) > we've gotten that limit up to almost 1GB and then we use an > undocumented > environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB > of that > per process (as we've got 4 cores in each box), which we enforce > through > Torque. > > The problems went away. Or at least it did until just now. :-( > > The characterstic error we get is: > > [13]: alloc_failed, not enough memory (Fatal Error) > Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers> > > Now Myrinet can handle running out of DMA memory once a process is > running, > but when it starts it must be able to allocate a (fairly trivial) > amount of > DMA memory otherwise you get that fatal error. > > Looking at the node I can confirm that there are only 3 user processes > running, so what I am after is a way of determining how much of > that DMA > memory a process has allocated. > > I looked at /proc/${PID}/maps and saw this: > > 40028000-40029000 r--s 00002000 00:0c \ > 8483 /dev/gm0 > > which to me looks like a memory mapping, but to my eyes that looks > like just > 1,000 bytes.. > > Does anyone have any ideas at all ? Isn't this in hex? If so, it would be 4096 bytes. I do not use GM much and I do not know what this is. I just loaded GM on one node and with no GM processes running except the mapper, I have a similar entry (at a different address, but also 0x1000). I would guess this is to allow GM and the mapper to communicate. I will check internally. > Oh - switching to the Myrinet MX drivers (which doesn't have this > problem) is > not an option, we have an awful lot of users, mostly (non-computer) > scientists, who have their own codes and trying to persuade them to > recompile > would be very hard - which would be necessary as we've not been > able to > convince MPICH-GM to build shared libraries on Linux on Power with > the IBM > compilers. :-( > > cheers, > Chris I am sorry you have not had success with MPICH-GM to compile dynamic libs. Have you sent email to Myricom help? Regards, Scott From csamuel at vpac.org Wed Feb 21 19:12:23 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <200702221145.54360.csamuel@vpac.org> References: <200702221145.54360.csamuel@vpac.org> Message-ID: <200702221412.26920.csamuel@vpac.org> On Thu, 22 Feb 2007, Chris Samuel wrote: > Through various firmware and driver tweaks (thanks to both IBM and Myrinet) > we've gotten that limit up to almost 1GB and then we use an undocumented > environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB of that > per process (as we've got 4 cores in each box), which we enforce through > Torque. On further probing it appears that those particular processes have somehow lost that environment variable on one node. :-( So whilst it would be nice to be able to know how much DMA memory they are using we at least now know why the problem has suddenly reappeared. cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/f40d0893/attachment.bin From csamuel at vpac.org Wed Feb 21 19:16:50 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <86905403-9BAE-47A4-B1AF-C5B310570EFE@myri.com> References: <200702221145.54360.csamuel@vpac.org> <86905403-9BAE-47A4-B1AF-C5B310570EFE@myri.com> Message-ID: <200702221416.50865.csamuel@vpac.org> On Thu, 22 Feb 2007, Scott Atchley wrote: Hello Scott! > Isn't this in hex? If so, it would be 4096 bytes. I do not use GM ? > much and I do not know what this is. I just loaded GM on one node and ? > with no GM processes running except the mapper, I have a similar ? > entry (at a different address, but also 0x1000). I would guess this ? > is to allow GM and the mapper to communicate. I will check internally. Mea culpa, that is hex. So yes, it probably is just that. > I am sorry you have not had success with MPICH-GM to compile dynamic ? > libs. It was just an issue with libtool not knowing how to handle that compiler/OS combination really, I've got a feeling that newer releases handle it (or at least we may have fixed it internally) but that was fairly recent and so we have to live with the reality that there's a bunch of codes now that are statically linked. At least now we have an idea about why it is going wrong (see previous posting about the missing environment variable. > Have you sent email to Myricom help? We have in the past, but GM is now just in bug fix mode rather than new features so the extra code to track memory allocations doesn't really figure in their plans (and I can't say I blame them!). cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/740043af/attachment.bin From patrick at myri.com Wed Feb 21 20:06:50 2007 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <200702221145.54360.csamuel@vpac.org> References: <200702221145.54360.csamuel@vpac.org> Message-ID: <45DD16DA.5030302@myri.com> Hi Chris, Chris Samuel wrote: > We occasionally get users who manage to use up all the DMA memory that is > addressable by the Myrinet card through the Power5 hypervisor. The IOMMU limit set by the hypervisor varies depending on the machine, the hypervisor version and the phase of the moon. Sometimes, it's a limit per PCI slot (ie per device), sometimes it is a limit for the whole machine (can be virtual machine, that's one of the reason behind the hypervisor) and it's shared by all the devices. Sometimes, it's reasonable large (1 or 2 GB), sometimes it is ridiculously small (256 MB). The hypervisor does not make a lot of sense in a HPC environment, but it would be non-trivial work to remove it on PPC. > Through various firmware and driver tweaks (thanks to both IBM and Myrinet) > we've gotten that limit up to almost 1GB and then we use an undocumented > environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB of that > per process (as we've got 4 cores in each box), which we enforce through > Torque. > > The problems went away. Or at least it did until just now. :-( > > The characterstic error we get is: > > [13]: alloc_failed, not enough memory (Fatal Error) > Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers> > > Now Myrinet can handle running out of DMA memory once a process is running, > but when it starts it must be able to allocate a (fairly trivial) amount of > DMA memory otherwise you get that fatal error. GM does pipeline large messages with chunks of 1 MB, so you can progress as long as you can register 1 MB at a time (you can think of pathological deadlocking situations, but it's not the common case). However, GM registers some buffers for Eager messages at init time. From memory, it's in the order of 32 MB per process (constant, does not depend on the size of the job). If you can't register that, there is nothing you can do so aborting is a good idea. If you limit registration per process, then I can think of one situation that will hit the IOMMU limit: if a process dies of abnormal death (segfault, killed, whatever), the GM port will be "shutting down" while the outstanding messages are dropped. During this time, the memory is still registered. If you start another process at that time, you will effectively have more than 4 processes with registered memory, and it may exceed the limit. A quick workaround would be to modify the MPICH-GM init code to only try to open the first 4 GM ports. That will in effect guarantee that only 4 processes can register memory at one time (latest release of GM provides 13 ports). I see from your next post that it's not what happened. It could have :-) > Looking at the node I can confirm that there are only 3 user processes > running, so what I am after is a way of determining how much of that DMA > memory a process has allocated. There is no handy way, but it would not be hard to add this info to the output of gm_board_info. There is not many releases of GM these days. Nevertheless, I will add it to the queue, it's simple enough to not be considered a new feature. > Oh - switching to the Myrinet MX drivers (which doesn't have this problem) is > not an option, we have an awful lot of users, mostly (non-computer) Actually, MX would not behave well in your environment: MX does not pipeline large messages, it register the whole message at once (MX registration is much faster, and pipelining prevents overlap of communication with computation). With a 250 MB of DMA-able memory per process, that would be the maximum message size you can send or receive. We have plan to do something about that, but it's not at the top of the queue. The right thing would be to get rid of the hypervisor (by the way, the hypervisor makes the memory registration overhead much more expensive), but it probably will never happen. > scientists, who have their own codes and trying to persuade them to recompile > would be very hard - which would be necessary as we've not been able to > convince MPICH-GM to build shared libraries on Linux on Power with the IBM > compilers. :-( Time for dreaming about an MPI ABI :-) Patrick -- Patrick Geoffray Myricom, Inc. http://www.myri.com From csamuel at vpac.org Wed Feb 21 22:16:27 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <45DD16DA.5030302@myri.com> References: <200702221145.54360.csamuel@vpac.org> <45DD16DA.5030302@myri.com> Message-ID: <200702221716.28216.csamuel@vpac.org> On Thu, 22 Feb 2007, Patrick Geoffray wrote: > Hi Chris, G'day Patrick! > Chris Samuel wrote: > > We occasionally get users who manage to use up all the DMA memory that is > > addressable by the Myrinet card through the Power5 hypervisor. > > The IOMMU limit set by the hypervisor varies depending on the machine, > the hypervisor version and the phase of the moon. Indeed! > Sometimes, it's a limit per PCI slot (ie per device), sometimes it is a > limit for the whole machine (can be virtual machine, that's one of the > reason behind the hypervisor) and it's shared by all the devices. Fortunately on our systems we run just a single LPAR per physical compute node, so then it is reduced to the firmware revision, whether or not the "superslot" option is enabled and whether or not your Myrinet card is in that super slot. Of course ours weren't, so we had to move them all! > Sometimes, it's reasonable large (1 or 2 GB), sometimes it is ridiculously > small (256 MB). We started off at 256MB, the superslot option knocked that up to about 1GB but the driver reserves a percentage so that was down to around 700-800MB. Tweaking that %'age in the driver improves it up to around 900MB for MPI. > The hypervisor does not make a lot of sense in a HPC environment, but it > would be non-trivial work to remove it on PPC. IBM say that Power5 cannot run without a hypervisor. I happen to know, however, that the first O/S that was brought up on Power5 was Linux and that was because they could get it to run on the bare metal, whereas AIX wouldn't work until they got the hypervisor running. The folks at IBM's LTC in Canberra have argued on our side, but didn't win. Power4 can be run without a hypervisor though. [...] > I see from your next post that it's not what happened. It could have :-) :-) Useful details though, thanks! > > Looking at the node I can confirm that there are only 3 user processes > > running, so what I am after is a way of determining how much of that DMA > > memory a process has allocated. > > There is no handy way, but it would not be hard to add this info to the > output of gm_board_info. There is not many releases of GM these days. > Nevertheless, I will add it to the queue, it's simple enough to not be > considered a new feature. Oh, OK, we had been told that it wasn't appropriate to go into GM. Thanks! > > Oh - switching to the Myrinet MX drivers (which doesn't have this > > problem) is not an option, we have an awful lot of users, mostly > > (non-computer) > > Actually, MX would not behave well in your environment: MX does not > pipeline large messages, it register the whole message at once (MX > registration is much faster, and pipelining prevents overlap of > communication with computation). With a 250 MB of DMA-able memory per > process, that would be the maximum message size you can send or receive. Very useful to know, thanks! > We have plan to do something about that, but it's not at the top of the > queue. The right thing would be to get rid of the hypervisor (by the > way, the hypervisor makes the memory registration overhead much more > expensive), but it probably will never happen. No, IBM won't do that. Power4 is the most recent platform that will (apparently) run without the Hypervisor. :-( > > scientists, who have their own codes and trying to persuade them to > > recompile would be very hard - which would be necessary as we've not been > > able to convince MPICH-GM to build shared libraries on Linux on Power > > with the IBM compilers. :-( > > Time for dreaming about an MPI ABI :-) Just don't get too attached to it. cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/34d09430/attachment.bin From greg.lindahl at qlogic.com Wed Feb 21 22:35:13 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] DMA Memory Mapping Question In-Reply-To: <45DD16DA.5030302@myri.com> References: <200702221145.54360.csamuel@vpac.org> <45DD16DA.5030302@myri.com> Message-ID: <20070222063513.GH4677@localhost.localdomain> On Wed, Feb 21, 2007 at 11:06:50PM -0500, Patrick Geoffray wrote: > Time for dreaming about an MPI ABI :-) Ssssssh! It isn't April Fools yet! -- greg From csamuel at vpac.org Wed Feb 21 22:55:31 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: <45DCE299.3090809@hypermall.net> References: <200702221035.47641.csamuel@vpac.org> <45DCE299.3090809@hypermall.net> Message-ID: <200702221755.32234.csamuel@vpac.org> On Thu, 22 Feb 2007, Craig Tierney wrote: > I didn't think it was that cheap. ?I would prefer Layer 3 if > this was going into a rack of a multi-rack system, but the > price is right. Thanks Craig! -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070222/8359ae0a/attachment.bin From ballen at gravity.phys.uwm.edu Thu Feb 22 00:06:57 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: References: Message-ID: Hi Thomas, If I understand correctly, your NFS server is connected via a 10Gb/s link to the switch, and you've spent some effort to tune it. What sort of aggregate NFS performance are you seeing? Is it at the level of 300 or 400 MG/sec? Could you please provide a few details about the hardware in your NFS box? Cheers, Bruce From landman at scalableinformatics.com Thu Feb 22 05:04:43 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: References: Message-ID: <45DD94EB.6050203@scalableinformatics.com> FWIW: Bruce Allen wrote: > Hi Thomas, > > If I understand correctly, your NFS server is connected via a 10Gb/s > link to the switch, and you've spent some effort to tune it. What sort > of aggregate NFS performance are you seeing? Is it at the level of 300 > or 400 MG/sec? Could you please provide a few details about the > hardware in your NFS box? Our 15 and 16 disk JackRabbit units have no trouble with this, using channel bonded gigabit (no 10GbE). We measured with 4 simultaneous IOzones being run over NFS and an inexpensive gigabit switch. No jumbo frames, the fingers never left the hands, yadda yadda yadda. Atop reported the network IO rate, dstat didn't curiously enough ... the individual adapters were correct, but it got the channel bond throughput calculation wrong (multiplied the sum of the individual adapters by 4). vmstat also returned the disk IO rate. We were seeing a sustained 420 MB/s in this configuration. About 500 MB/s to disk (journaling overhead). Used RAID6. FWIW we also uncovered some nasty channel bond crashes under intense loads. We were able to get it stable and repeatable, at the cost of some of the more interesting channel bond modes. These crashes didn't manifest themselves until we started pushing the channel bond hard. We needed 4 clients banging on the bonded server. 3 would not crash it. These crashes manifested themselves throughout all kernels we tried (2.6.9 through 2.6.18). Running 4 separate (non-channel bonded) ports and doing the same tests did not show this crash. The driver was fine (e1000, don't even think of trying this with a Broadcom chip/driver) and well behaved. > > Cheers, > Bruce > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From mathog at caltech.edu Thu Feb 22 08:22:34 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population Message-ID: Justin Moore wrote: > As mentioned in an earlier e-mail (I think) there were 4 SMART variables > whose values were strongly correlated with failure, and another 4-6 that > were weakly correlated with failure. However, of all the disks that > failed, less than half (around 45%) had ANY of the "strong" signals and > another 25% had some of the "weak" signals. This means that over a > third of disks that failed gave no appreciable warning. Therefore even > combining the variables would give no better than a 70% chance of > predicting failure. Now we need to know exactly how you defined "failed". Presumably AFTER you have determined that a disk has failed various SMART parameters have very high values. As you say, before there are SMART indicators but no clear trend. What separates one set of SMART values (indicator) from the other (failed)? Is it possible that more frequent monitoring of SMART variables could catch the early failure (chest pains, so to speak) before the total failure (fatal heart failure)? This might give a few more seconds or minutes warning before disk failure, possibly enough time for a node to indicate it is about to fail and shutdown, especially if it can do so without writing much to the disk. Admittedly, this would not be nearly as useful as knowing that a disk will fail in a week! Disks that just stop spinning or won't spin back up (motor/spindle failure) are another problem that presumably cannot be detected by SMART. However this mode of failure is usually only seen in DOA disks and old, old disks. What fraction of the failed disks were this type of failure? Were there postmortem analyses of the power supplies in the failed systems? It wouldn't surprise me if low or noisy power lines led to an increased rate of disk failure. SMART wouldn't give this information (at least, not on any of the disks I have), but lm_sensors would. Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From James.P.Lux at jpl.nasa.gov Thu Feb 22 10:40:58 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <6.2.3.4.2.20070222103418.02ddef08@mail.jpl.nasa.gov> At 08:22 AM 2/22/2007, David Mathog wrote: >Justin Moore wrote: > > As mentioned in an earlier e-mail (I think) there were 4 SMART variables > > whose values were strongly correlated with failure, and another 4-6 that > > were weakly correlated with failure. However, of all the disks that > > failed, less than half (around 45%) had ANY of the "strong" signals and > > another 25% had some of the "weak" signals. This means that over a > > third of disks that failed gave no appreciable warning. Therefore even > > combining the variables would give no better than a 70% chance of > > predicting failure. > >Now we need to know exactly how you defined "failed". The paper defined failed as "requiring the computer to be pulled" whether or not the disk was actually dead. Were there postmortem analyses of the power supplies in the failed >systems? It wouldn't surprise me if low or noisy power lines led >to an increased rate of disk failure. SMART wouldn't give this >information (at least, not on any of the disks I have), but >lm_sensors would. I would make the case that it's not worth it to even glance at the outside of the case of a dead unit, much less do failure analysis on the power supply. FA is expensive, new computers are not. Pitch the dead (or "not quite dead yet, but suspect") computer, slap in a new one and go on. There is some non-zero value in understanding the failure mechanics, but probably only if the failure rate is high enough to make a difference. That is, if you had a 50% failure rate, it would be worth understanding. If you have a 3% failure rate, it might be better to just replace and move on. There is also some value in predicting failures, IF there's an economic benefit from knowing early. Maybe you can replace computers in batches less expensively than waiting for them to fail or maybe your in a situation where a failure is expensive (highly tuned brittle software with no checkpoints that has to run 1000 processors in lockstep for days on end). I can see Google being in the former case but probably not in the latter. Predictive statistics might also be useful if there is some "common factor" that kills many disks at once (Gosh, when Bob is the duty SA after midnight and it's the full moon, the airfilters clog with some strange fur and the drives overheat, but only in machine rooms with a window to the outside..) James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From robin at workstationsuk.co.uk Thu Feb 22 00:10:49 2007 From: robin at workstationsuk.co.uk (Robin Harker) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: <200702211355.49281.kyron@neuralbs.com> Message-ID: <3056.86.136.173.124.1172131849.squirrel@webmail.hostme.co.uk> So if we now know, (and we have seen similarly spirious behaviour with SATA Raid arrays), isn't the real solution to lose the node discs? Regards Robin > >>> How did they look for predictive models on the SMART data? It sounds >>> like they did a fairly linear data decomposition, looking for first >>> order correlations. Did they try to e.g. build a neural network on it, >>> or use fully multivariate methods (ordinary stats can handle it up to >>> 5-10 variables). >>> >>> This is really an extension of David's questions below. It would be >>> very interesting to add variables to the problem (if possible) until >>> the >>> observed correlations resolve (in sufficiently high dimensionality) >>> into >>> something significantly predictive. That would be VERY useful. >>> >> >> RGB, good idea, apply clustering/GA/MOGA analisys techniques to all of >> this data. Now the question is, will we ever get access to this data? >> ;) > > As mentioned in an earlier e-mail (I think) there were 4 SMART variables > whose values were strongly correlated with failure, and another 4-6 that > were weakly correlated with failure. However, of all the disks that > failed, less than half (around 45%) had ANY of the "strong" signals and > another 25% had some of the "weak" signals. This means that over a > third of disks that failed gave no appreciable warning. Therefore even > combining the variables would give no better than a 70% chance of > predicting failure. > > To make things worse, many of the "weak" signals were found on a > significant number of disks. For example, among the disks that failed, > many had a large number of seek error; however, over 70% of disks in the > fleet -- failed and working -- had a large number of seek errors. > > About all I can say beyond what's in the paper is that we're aware of > the shortcomings of the existing work and possible paths forward. In > response, we are > > Hello, this is the Google NDA bot. In our massive trawling of the > Internet and other data sources, I have detected a possible violation of > the Google NDA. This has been corrected. We now return you to your > regularly scheduled e-mail. > [ Continue ] [ I'm Feeling Confidential ] > > > So that's our master plan. Just don't tell anyone. :) > -jdm > > P.S. Unfortunately, I doubt that we'll be willing or able to release the > raw data behind the disk drive study. > > Department of Computer Science, Duke University, Durham, NC 27708-0129 > Email: justin@cs.duke.edu > Web: http://www.cs.duke.edu/~justin/ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > Robin Harker Workstations UK Ltd DDI: 01494 787710 Tel: 01494 724498 From kus at free.net Thu Feb 22 11:07:12 2007 From: kus at free.net (Mikhail Kuzminsky) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] anyone using 10gbaseT? In-Reply-To: Message-ID: In message from Thomas H Dr Pierce (Wed, 21 Feb 2007 09:06:16 -0500): >Hello, >I have been using the MYRICOM 10Gb card in my NFS server (head node) > for the Beowulf cluster. And it works well. I have a inexpensive > 3Com switch (3870) with 48 1Gb ports that has a 10Gb port in it and > I connect the NFS server to that port. The switch does have small > fans in it. > ... > My internal cluster >benchmarks ( >parallel Quantum mechanics programs ) improved by about 20% with disk >backups improving by 60%. Could you pls clarify - what are quantum-mechanical programs you used (and may be typical calculation methods used) ? Are they bound by disk I/O ? Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscow >Sincerely, > Tom Pierce From mathog at caltech.edu Thu Feb 22 12:30:21 2007 From: mathog at caltech.edu (David Mathog) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population Message-ID: Jim Lux wrote: > >Now we need to know exactly how you defined "failed". > > The paper defined failed as "requiring the computer to be pulled" > whether or not the disk was actually dead. That was sort of my point, if you're looking for indicators that lead to "failed disk" there should be a precise definition of what "failed disk" is. How am I to know what criteria Google uses for classifying a machine as nonfunctioning? If the system is pulled because the CPU blew up it's one thing, but if they pulled it for any disk related reason, we need to know how bad "bad" was. > I would make the case that it's not worth it to even glance at the > outside of the case of a dead unit, much less do failure analysis on > the power supply. FA is expensive, new computers are not. Pitch the > dead (or "not quite dead yet, but suspect") computer, slap in a new > one and go on. Well, they cared enough to do the study! I think the heart of the problem is that disk failures are a bit like airplane crashes: everything looks great until something snaps and then the plane goes down shortly thereafter. Similarly, there's just not that much time between the cause of the failure manifesting itself and the final disk failure. Once the disk heads start bouncing off the disk, or some piece of dirt or metal shaving gets between the disks and the heads, its all over pretty quickly. Until that point there may be a few weak indications that something is wrong, but they may or may not have a relation to the final failure event. For instance, a tiny bit of junk stuck to the surface may cause a few blocks to remap and never do anything else. It might or might not mean that a huge chunk of the same stuff is about to wreak havoc. (It's absence is clearly preferred though, since any remapped blocks can result in data loss.) Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From James.P.Lux at jpl.nasa.gov Thu Feb 22 17:12:10 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: References: Message-ID: <6.2.3.4.2.20070222170835.02e8abc0@mail.jpl.nasa.gov> At 12:30 PM 2/22/2007, David Mathog wrote: >Jim Lux wrote: > > > >Now we need to know exactly how you defined "failed". > > > > The paper defined failed as "requiring the computer to be pulled" > > whether or not the disk was actually dead. > >That was sort of my point, if you're looking for indicators that >lead to "failed disk" there should be a precise definition of >what "failed disk" is. How am I to know what criteria Google uses >for classifying a machine as nonfunctioning? If the system is >pulled because the CPU blew up it's one thing, but if they pulled it >for any disk related reason, we need to know how bad "bad" was. True.. there's a paragraph or so of how they determined "failed" (e.g. they didn't include drives removed from service because of scheduled replacement). > > I would make the case that it's not worth it to even glance at the > > outside of the case of a dead unit, much less do failure analysis on > > the power supply. FA is expensive, new computers are not. Pitch the > > dead (or "not quite dead yet, but suspect") computer, slap in a new > > one and go on. > >Well, they cared enough to do the study! Or, more realistically, that the small dollars spent on the study to identify a possible connection was tiny enough that it's probably down in the overall budgetary noise floor. >I think the heart of the problem is that disk failures are a bit like >airplane crashes: everything looks great until something snaps and then >the plane goes down shortly thereafter. I think one of the values of the study was that it actually did demonstrate just that.. you really can't do a very good job predicting failures in advance, so you'd better have a system in place to deal with the inevitable failures while they're in service. And, of course, they have some "real numbers" on failure rates, which is useful in and of itself, regardless of whether the failures could be predicted. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From landman at scalableinformatics.com Thu Feb 22 17:52:51 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <6.2.3.4.2.20070222170835.02e8abc0@mail.jpl.nasa.gov> References: <6.2.3.4.2.20070222170835.02e8abc0@mail.jpl.nasa.gov> Message-ID: <45DE48F3.9070808@scalableinformatics.com> Jim Lux wrote: >> > I would make the case that it's not worth it to even glance at the >> > outside of the case of a dead unit, much less do failure analysis on >> > the power supply. FA is expensive, new computers are not. Pitch the >> > dead (or "not quite dead yet, but suspect") computer, slap in a new >> > one and go on. >> >> Well, they cared enough to do the study! > > Or, more realistically, that the small dollars spent on the study to > identify a possible connection was tiny enough that it's probably down > in the overall budgetary noise floor. One may (correctly) argue that disks are (individually) disposable. The data may not be, and the important thing is how to protect that data. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From csamuel at vpac.org Thu Feb 22 21:31:44 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <3056.86.136.173.124.1172131849.squirrel@webmail.hostme.co.uk> References: <3056.86.136.173.124.1172131849.squirrel@webmail.hostme.co.uk> Message-ID: <200702231631.44996.csamuel@vpac.org> On Thu, 22 Feb 2007, Robin Harker wrote: > So if we now know, (and we have seen similarly spirious behaviour with > SATA Raid arrays), isn't the real solution to lose the node discs? Depends on the code you're running, if it hammers local scratch then either you have to have them or you have to invest in the infrastructure to be able to provide that through a distributed HPC filesystem instead.. cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070223/b6b4e1e7/attachment.bin From ferreiradesousa at gmail.com Fri Feb 23 14:31:00 2007 From: ferreiradesousa at gmail.com (Paulo Ferreira de Sousa) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Aztec woes Message-ID: <7c8f78e50702231431t23fcac18wde233c02811c795e@mail.gmail.com> Dear all, I am trying to run Aztec using mpif90 on a Scyld Beowulf and running into some difficulties. I was previously running the application in a "home-made" cluster and the makefiles that I've previously used are pretty much useless. Any help, like makefile examples or ways to link the Aztec library, would be greatly appreciated. I would be happy to answer any more questions you might have and apologize in advance for my level of Linux illiteracy. Kind regards, Paulo Ferreira de Sousa From hahn at mcmaster.ca Mon Feb 26 07:39:34 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:05:42 2009 Subject: [Beowulf] Re: HPL input file In-Reply-To: References: Message-ID: > I am trying to find out the speed of my cluster using HPL but I am not able > to understand what values to set in HPL.dat to find out the peak perfomance > (e.g. the values of N, NB, PxQ, etc). Kindly help me in this regard. following is the HPL.dat I'm currently using as a load-generator for my cluster's 8GB dual-socket-single-core nodes. it's not for generating HPL scores, but rather just to stress the system. comments: - you choose the problem size to match your memory - too low a value will result in not enough work per cpu and lower efficiency. on my system, I found no significant advantage to using more than 1GB/proc, but that should depend on the CPU and interconnect speed. (faster cpus will need more work to amortize communication; faster communication will lower the amount of work to amortize.) - I didn't find any strong dependence on NB. - P*Q=ncpus; for a switched interconnect, conventional wisdom is that you want PxQ to be close to square. on my machine (full-bisection quadrics with dual-processor nodes) I think I've measured it being slightly faster when run in a 1:2 shape (Q ~= 2P). - I haven't found any strong performance dependency on any of the other parameters, but other clusters may be different if they have slower or non-flat networks, more procs/node, etc. regards, mark hahn. HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 5 # of problems sizes (N) 1000 31700 31700 31700 31700 1 # of NBs 200 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 2 Qs 16.0 threshold 1 # of panel fact 1 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) From becker at scyld.com Mon Feb 26 12:25:56 2007 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] Rescheduled BWBUG meeting tomorrow, Feb 27 2007 Message-ID: --- Special Notes: - This is the weather-rescheduled presentation from February 13 - The meeting is at Georgetown University building off Wisconsin Ave. - See http://www.bwbug.org/ for full information and any corrections Baltimore Washington Beowulf User Group Meeting Date: 27 Feb 2007 at 2:30 pm - 5:00pm. Location: Georgetown University at Whitehaven Street 3300 Whitehaven Street, Washington DC 20007 Speaker: Donald Becker, CTO of Scyld Software / Penguin Computing Host: Michael Fitzmaurice Here is the announcement from Mike: February 27th 2:30 to 5:00 PM Don Becker will speak at Georgetown University located at . (this is NOT at the main Georgetown U campus. This is an off campus building one block from Wisconsin avenue) Don was the co creator, along with Thomas Sterling, of the Beowulf Project at NASA Goddard. The Beowulf Project is an example of one of the most successful U.S. Government technology transfer projects. In 2007 IDC estimates that the HPC market will exceed 13 billion dollars the majority of which will be cluster technology built on the original Beowulf concept. -- Donald Becker becker@scyld.com Scyld Software Scyld Beowulf cluster systems 914 Bay Ridge Road, Suite 220 www.scyld.com Annapolis MD 21403 410-990-9993 From mkamranmustafa at gmail.com Sun Feb 25 23:00:27 2007 From: mkamranmustafa at gmail.com (=?UTF-8?B?4omIz4Fz0YfComjHv+KJiA==?=) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] HPL input file Message-ID: Dear all, I am trying to find out the speed of my cluster using HPL but I am not able to understand what values to set in HPL.dat to find out the peak perfomance (e.g. the values of N, NB, PxQ, etc). Kindly help me in this regard. Regards, Kamran -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070226/eff35806/attachment.html From jbernstein at penguincomputing.com Mon Feb 26 13:23:26 2007 From: jbernstein at penguincomputing.com (Joshua Bernstein) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] Aztec woes Message-ID: <45E34FCE.7020801@penguincomputing.com> Paulo, If you could provide some output of the errors you are seeing I might be of some more help. If you are seeing these errors after typing "make" then just provide say the last 20 or so lines of the output. -Joshua Bernstein Software Engineer Penguin Computing > Dear all, > > I am trying to run Aztec using mpif90 on a Scyld Beowulf and running > into some difficulties. I was previously running the application in a > "home-made" cluster and the makefiles that I've previously used are > pretty much useless. > > Any help, like makefile examples or ways to link the Aztec library, > would be greatly appreciated. I would be happy to answer any more > questions you might have and apologize in advance for my level of > Linux illiteracy. > > Kind regards, > > Paulo Ferreira de Sousa From Michael.Fitzmaurice at gtsi.com Tue Feb 27 08:20:04 2007 From: Michael.Fitzmaurice at gtsi.com (Michael Fitzmaurice) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] bwbug: reminder: Donald Becker the co founder of the Beowulf Project will speak TODAY at Georgetown University at 3:00 PM Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- _______________________________________________ bwbug mailing list bwbug@bwbug.org http://www.pbm.com/mailman/listinfo/bwbug From dougg at torque.net Wed Feb 28 08:06:39 2007 From: dougg at torque.net (Douglas Gilbert) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population Message-ID: <45E5A88F.5020404@torque.net> Eugen Leitl wrote: > over consumer SATA. Btw -- smartd doesn't seem to be able to handle > SATA, at least, last time I tried. > > http://smartmontools.sourceforge.net/#testinghelp > > How do you folks gather data on them? Eugen, That FAQ entry is about 2 years out of date. smartmontools support for SATA disks behind a SCSI to ATA Translation (SAT) layer is now much better. Please try the recently released version 5.37 of smartmontools. I have updated that entry in the FAQ with more up-to-date information and the new entry should become visible soon. Doug Gilbert From wavelet at iutlecreusot.u-bourgogne.fr Wed Feb 28 12:00:50 2007 From: wavelet at iutlecreusot.u-bourgogne.fr (Wavelet colloque) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] Call for papers : Wavelet Applications in Industrial Processing V Message-ID: *** Call for Papers and Announcement *** Wavelet Applications in Industrial Processing V (SA109) Part of SPIE?s International Symposium on Optics East 2007 9-12 September 2007 ? Seaport World Trade Center ? Boston, MA, USA --- Abstract Due Date Deadline prolongation: 4 March 2007 --- --- Manuscript Due Date: 13 August 2007 --- Web site http://spie.org/Conferences/Calls/07/oe/submitAbstract/index.cfm? fuseaction=SA109 ABSTRACT TEXT Approximately 500 words. Conference Chairs: Fr?d?ric Truchetet, Univ. de Bourgogne (France); Olivier Laligant, Univ. de Bourgogne (France) Program Committee: Patrice Abry, ?cole Normale Sup?rieure de Lyon (France); Radu V. Balan, Siemens Corporate Research; Atilla M. Baskurt, Univ. Claude Bernard Lyon 1 (France); Amel Benazza-Benyahia, Ecole Sup?rieure des Communications de Tunis (Tunisia); Albert Bijaoui, Observatoire de la C?te d'Azur (France); Seiji Hata, Kagawa Univ. (Japan); Henk J. A. M. Heijmans, Ctr. for Mathematics and Computer Science (Netherlands); William S. Hortos, Associates in Communication Engineering Research and Technology; Jacques Lewalle, Syracuse Univ.; Wilfried R. Philips, Univ. Gent (Belgium); Alexandra Pizurica, Univ. Gent (Belgium); Guoping Qiu, The Univ. of Nottingham (United Kingdom); Hamed Sari-Sarraf, Texas Tech Univ.; Peter Schelkens, Vrije Univ. Brussel (Belgium); Paul Scheunders, Univ. Antwerpen (Belgium); Kenneth W. Tobin, Jr., Oak Ridge National Lab.; G?nther K. G. Wernicke, Humboldt-Univ. zu Berlin (Germany); Gerald Zauner, Fachhochschule Wels (Austria) The wavelet transform, multiresolution analysis, and other space- frequency or space-scale approaches are now considered standard tools by researchers in image and signal processing. Promising practical results in machine vision and sensors for industrial applications and non destructive testing have been obtained, and a lot of ideas can be applied to industrial imaging projects. This conference is intended to bring together practitioners, researchers, and technologists in machine vision, sensors, non destructive testing, signal and image processing to share recent developments in wavelet and multiresolution approaches. Papers emphasizing fundamental methods that are widely applicable to industrial inspection and other industrial applications are especially welcome. Papers are solicited but not limited to the following areas: o New trends in wavelet and multiresolution approach, frame and overcomplete representations, Gabor transform, space-scale and space- frequency analysis, multiwavelets, directional wavelets, lifting scheme for: - sensors - signal and image denoising, enhancement, segmentation, image deblurring - texture analysis - pattern recognition - shape recognition - 3D surface analysis, characterization, compression - acoustical signal processing - stochastic signal analysis - seismic data analysis - real-time implementation - image compression - hardware, wavelet chips. o Applications: - machine vision - aspect inspection - character recognition - speech enhancement - robot vision - image databases - image indexing or retrieval - data hiding - image watermarking - non destructive evaluation - metrology - real-time inspection. o Applications in microelectronics manufacturing, web and paper products, glass, plastic, steel, inspection, power production, chemical process, food and agriculture, pharmaceuticals, petroleum industry. All submissions will be peer reviewed. Please note that abstracts must be at least 500 words in length in order to receive full consideration. ------------------------------------------------------------------------ --------- ! Abstract Due Date Deadline prolongation: 4 March 2007 ! ! Manuscript Due Date: 13 August 2007 ! ------------------------------------------------------------------------ --------- ------------- Submission of Abstracts for Optics East 2007 Symposium ------------ Abstract Due Date Deadline prolongation: 4 March 2007 - Manuscript Due Date: 13 August 2007 Abstracts, if accepted, will be distributed at the meeting. * IMPORTANT! - Submissions imply the intent of at least one author to register, attend the symposium, present the paper (either orally or in poster format), and submit a full-length manuscript for publication in the conference Proceedings. - By submitting your abstract, you warrant that all clearances and permissions have been obtained, and authorize SPIE to circulate your abstract to conference committee members for review and selection purposes and if it is accepted, to publish your abstract in conference announcements and publicity. - All authors (including invited or solicited speakers), program committee members, and session chairs are responsible for registering and paying the reduced author, session chair, program committee registration fee. (Current SPIE Members receive a discount on the registration fee.) * Instructions for Submitting Abstracts via Web - You are STRONGLY ENCOURAGED to submit abstracts using the ?submit an abstract? link at: http://spie.org/events/oe - Submitting directly on the Web ensures that your abstract will be immediately accessible by the conference chair for review through MySPIE, SPIE?s author/chair web site. - Please note! When submitting your abstract you must provide contact information for all authors, summarize your paper, and identify the contact author who will receive correspondence about the submission and who must submit the manuscript and all revisions. Please have this information available before you begin the submission process. - First-time users of MySPIE can create a new account by clicking on the create new account link. You can simplify account creation by using your SPIE ID# which is found on SPIE membership cards or the label of any SPIE mailing. - If you do not have web access, you may E-MAIL each abstract separately to: abstracts@spie.org in ASCII text (not encoded) format. There will be a time delay for abstracts submitted via e-mail as they will not be immediately processed for chair review. IMPORTANT! To ensure proper processing of your abstract, the SUBJECT line must include only: SUBJECT: SA109, TRUCHETET, LALIGANT - Your abstract submission must include all of the following: 1. PAPER TITLE 2. AUTHORS (principal author first) For each author: o First (given) Name (initials not acceptable) o Last (family) Name o Affiliation o Mailing Address o Telephone Number o Fax Number o Email Address 3. PRESENTATION PREFERENCE "Oral Presentation" or "Poster Presentation." 4. PRINCIPAL AUTHOR?S BIOGRAPHY Approximately 50 words. 5. ABSTRACT TEXT Approximately 500 words. Accepted abstracts for this conference will be included in the abstract CD-ROM which will be available at the meeting. Please submit only 500-word abstracts that are suitable for publication. 6. KEYWORDS Maximum of five keywords. If you do not have web access, you may E-MAIL each abstract separately to: abstracts@spie.org in ASCII text (not encoded) format. There will be a time delay for abstracts submitted via e- mail as they will not be immediately processed for chair review. * Conditions of Acceptance - Authors are expected to secure funding for registration fees, travel, and accommodations, independent of SPIE, through their sponsoring organizations before submitting abstracts. - Only original material should be submitted. - Commercial papers, papers with no new research/development content, and papers where supporting data or a technical description cannot be given for proprietary reasons will not be accepted for presentation in this symposium. - Abstracts should contain enough detail to clearly convey the approach and the results of the research. - Government and company clearance to present and publish should be final at the time of submittal. If you are a DoD contractor, allow at least 60 days for clearance. Authors are required to warrant to SPIE in advance of publication of the Proceedings that all necessary permissions and clearances have been obtained, and that submitting authors are authorized to transfer copyright of the paper to SPIE. * Review, Notification, Program Placement - To ensure a high-quality conference, all abstracts and Proceedings manuscripts will be reviewed by the Conference Chair/Editor for technical merit and suitability of content. Conference Chair/Editors may require manuscript revision before approving publication, and reserve the right to reject for presentation or publication any paper that does not meet content or presentation expectations. SPIE?s decision on whether to accept a presentation or publish a manuscript is final. - Applicants will be notified of abstract acceptance and sent manuscript instructions by e-mail no later than 7 May 2007. Notification of acceptance will be placed on SPIE Web the week of 4 June 2007 at http://spie.org/events/oe - Final placement in an oral or poster session is subject to the Chairs' discretion. Instructions for oral and poster presentations will be sent to you by e-mail. All oral and poster presentations require presentation at the meeting and submission of a manuscript to be included in the Proceedings of SPIE. * Proceedings of SPIE - These conferences will result in full-manuscript Chairs/Editor- reviewed volumes published in the Proceedings of SPIE and in the SPIE Digital Library. - Correctly formatted, ready-to-print manuscripts submitted in English are required for all accepted oral and poster presentations. Electronic submissions are recommended, and result in higher quality reproduction. Submission must be provided in PostScript created with a printer driver compatible with SPIE?s online Electronic Manuscript Submission system. Instructions are included in the author kit and from the ?Author Info? link at the conference website. - Authors are required to transfer copyright of the manuscript to SPIE or to provide a suitable publication license. - Papers published are indexed in leading scientific databases including INSPEC, Ei Compendex, Chemical Abstracts, International Aerospace Abstracts, Index to Scientific and Technical Proceedings and NASA Astrophysical Data System, and are searchable in the SPIE Digital Library. Full manuscripts are available to Digital Library subscribers. - Late manuscripts may not be published in the conference Proceedings and SPIE Digital Library, whether the conference volume will be published before or after the meeting. The objective of this policy is to better serve the conference participants as well as the technical community at large, by enabling timely publication of the Proceedings. - Papers not presented at the meeting will not be published in the conference Proceedings, except in the case of exceptional circumstances at the discretion of SPIE and the Conference Chairs/Editors. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070228/df803a32/attachment.html From csamuel at vpac.org Wed Feb 28 20:38:14 2007 From: csamuel at vpac.org (Chris Samuel) Date: Wed Nov 25 01:05:43 2009 Subject: [Beowulf] Re: failure trends in a large disk drive population In-Reply-To: <45E5A88F.5020404@torque.net> References: <45E5A88F.5020404@torque.net> Message-ID: <200703011538.19482.csamuel@vpac.org> On Thu, 1 Mar 2007, Douglas Gilbert wrote: > That FAQ entry is about 2 years out of date. smartmontools > support for SATA disks behind a SCSI to ATA Translation (SAT) > layer is now much better. Please try the recently released > version 5.37 of smartmontools. For instance, you should be able to do: # smartctl -d ata -a /dev/sda for a SATA drive discovered as /dev/sda. This works happily on my home box (except for some unknown attributes - see [1]). The manual page says: -d TYPE, --device=TYPE Specifies the type of the device. The valid arguments to this option are ata, scsi, marvell, cciss,N and 3ware,N. If this option is not used then smartctl will attempt to guess the device type from the device name. [...] If you forget to specify it then it considers /dev/sda a SCSI device and tries (unsuccessfully) to talk to it as such. cheers! Chris [1] - Some of the SMART attributes are vendor specific and undocumented: http://www.csamuel.org/2007/02/22/seagate-st3300622as-unknown-smart-attribute-190/ -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20070301/d13970a9/attachment.bin